History log of /freebsd-10.0-release/sys/net/
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
259065 07-Dec-2013 gjb

- Copy stable/10 (r259064) to releng/10.0 as part of the
10.0-RELEASE cycle.
- Update __FreeBSD_version [1]
- Set branch name to -RC1

[1] 10.0-CURRENT __FreeBSD_version value ended at '55', so
start releng/10.0 at '100' so the branch is started with
a value ending in zero.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

258913 04-Dec-2013 rodrigc

MFC 258591
In vnet_route_uninit(), free some memory that is allocated in vnet_route_init().

To reproduce the problem:
(1) Take a GENERIC kernel config, and add options for: VIMAGE, WITNESS,
INVARIANTS.
(2) Run this command in a loop:
jail -l -u root -c path=/ name=foo persist vnet && jexec foo ifconfig lo0 127.0.0.1/8 && jail -r foo

see: http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021280.html
http://lists.freebsd.org/pipermail/freebsd-current/2010-November/021291.html

This doesn't eliminate all the "Freed UMA keg was not empty" warning messages
on the console, but it helps.

Approved by: re (gjb)


257956 11-Nov-2013 ae

MFC r256689:
Use the same actor key for media types of the same speed.

PR: 176097

MFC r256832:
Add a note that lacp_compose_key() should be updated, when new media
types will be added.

Submitted by: melifaro

Approved by: re (hrs)


257330 29-Oct-2013 melifaro

MFC r256624:

Fix long-standing issue with incorrect radix mask calculation.

Usual symptoms are messages like
rn_delete: inconsistent annotation
rn_addmask: mask impossibly already in tree
routing daemon constantly deleting IPv6 default route
or inability to flush/delete particular prefix in ipfw table.

Changes:
* Assume 32 bytes as maximum radix key length
* Remove rn_init()
* Statically allocate rn_ones/rn_zeroes
* Make separate mask tree for each "normal" tree instead of system
global one
* Remove "optimization" on masks reusage and key zeroying
* Change rn_addmask() arguments to accept tree pointer (no users in base)

MFC changes:
* keep rn_init()
* create global mask tree, protected with mutex, for old rn_addmask
users (currently 0 in base)
* Add new rn_addmask_r() function (rn_addmask in head) with additional
argument to accept tree pointer

PR: kern/182851, kern/169206, kern/135476, kern/134531
Found by: Slawa Olhovchenkov <slw@zxy.spb.ru>
Reviewed by: glebius (previous versions)
Sponsored by: Yandex LLC
Approved by: re (glebius)


257285 28-Oct-2013 grehan

MFC r257078
Fix panic in the tap driver when a tap and vmnet interface were
created after each other e.g.

ifconfig tap0
ifconfig vmnet0
<panic>

Appears to be a cut'n'paste error from the tap code to the vmnet
code where the name string wasn't updated in the call to make_dev().

Approved by: re (glebius)


256381 12-Oct-2013 markm

Merge from project branch via main. Uninteresting commits are trimmed.

Refactor of /dev/random device. Main points include:

* Userland seeding is no longer used. This auto-seeds at boot time
on PC/Desktop setups; this may need some tweeking and intelligence
from those folks setting up embedded boxes, but the work is believed
to be minimal.

* An entropy cache is written to /entropy (even during installation)
and the kernel uses this at next boot.

* An entropy file written to /boot/entropy can be loaded by loader(8)

* Hardware sources such as rdrand are fed into Yarrow, and are no
longer available raw.

------------------------------------------------------------------------
r256240 | des | 2013-10-09 21:14:16 +0100 (Wed, 09 Oct 2013) | 4 lines

Add a RANDOM_RWFILE option and hide the entropy cache code behind it.
Rename YARROW_RNG and FORTUNA_RNG to RANDOM_YARROW and RANDOM_FORTUNA.
Add the RANDOM_* options to LINT.

------------------------------------------------------------------------
r256239 | des | 2013-10-09 21:12:59 +0100 (Wed, 09 Oct 2013) | 2 lines

Define RANDOM_PURE_RNDTEST for rndtest(4).

------------------------------------------------------------------------
r256204 | des | 2013-10-09 18:51:38 +0100 (Wed, 09 Oct 2013) | 2 lines

staticize struct random_hardware_source

------------------------------------------------------------------------
r256203 | markm | 2013-10-09 18:50:36 +0100 (Wed, 09 Oct 2013) | 2 lines

Wrap some policy-rich code in 'if NOTYET' until we can thresh out
what it really needs to do.

------------------------------------------------------------------------
r256184 | des | 2013-10-09 10:13:12 +0100 (Wed, 09 Oct 2013) | 2 lines

Re-add /dev/urandom for compatibility purposes.

------------------------------------------------------------------------
r256182 | des | 2013-10-09 10:11:14 +0100 (Wed, 09 Oct 2013) | 3 lines

Add missing include guards and move the existing ones out of the
implementation namespace.

------------------------------------------------------------------------
r256168 | markm | 2013-10-08 23:14:07 +0100 (Tue, 08 Oct 2013) | 10 lines

Fix some just-noticed problems:

o Allow this to work with "nodevice random" by fixing where the
MALLOC pool is defined.

o Fix the explicit reseed code. This was correct as submitted, but
in the project branch doesn't need to set the "seeded" bit as this
is done correctly in the "unblock" function.

o Remove some debug ifdeffing.

o Adjust comments.

------------------------------------------------------------------------
r256159 | markm | 2013-10-08 19:48:11 +0100 (Tue, 08 Oct 2013) | 6 lines

Time to eat crow for me.

I replaced the sx_* locks that Arthur used with regular mutexes;
this turned out the be the wrong thing to do as the locks need to
be sleepable. Revert this folly.

# Submitted by: Arthur Mesh <arthurmesh@gmail.com> (In original diff)

------------------------------------------------------------------------
r256138 | des | 2013-10-08 12:05:26 +0100 (Tue, 08 Oct 2013) | 10 lines

Add YARROW_RNG and FORTUNA_RNG to sys/conf/options.

Add a SYSINIT that forces a reseed during proc0 setup, which happens
fairly late in the boot process.

Add a RANDOM_DEBUG option which enables some debugging printf()s.

Add a new RANDOM_ATTACH entropy source which harvests entropy from the
get_cyclecount() delta across each call to a device attach method.

------------------------------------------------------------------------
r256135 | markm | 2013-10-08 07:54:52 +0100 (Tue, 08 Oct 2013) | 8 lines

Debugging. My attempt at EVENTHANDLER(multiuser) was a failure; use
EVENTHANDLER(mountroot) instead.

This means we can't count on /var being present, so something will
need to be done about harvesting /var/db/entropy/... .

Some policy now needs to be sorted out, and a pre-sync cache needs
to be written, but apart from that we are now ready to go.

Over to review.

------------------------------------------------------------------------
r256094 | markm | 2013-10-06 23:45:02 +0100 (Sun, 06 Oct 2013) | 8 lines

Snapshot.

Looking pretty good; this mostly works now. New code includes:

* Read cached entropy at startup, both from files and from loader(8)
preloaded entropy. Failures are soft, but announced. Untested.

* Use EVENTHANDLER to do above just before we go multiuser. Untested.

------------------------------------------------------------------------
r256088 | markm | 2013-10-06 14:01:42 +0100 (Sun, 06 Oct 2013) | 2 lines

Fix up the man page for random(4). This mainly removes no-longer-relevant
details about HW RNGs, reseeding explicitly and user-supplied
entropy.

------------------------------------------------------------------------
r256087 | markm | 2013-10-06 13:43:42 +0100 (Sun, 06 Oct 2013) | 6 lines

As userland writing to /dev/random is no more, remove the "better
than nothing" bootstrap mode.

Add SWI harvesting to the mix.

My box seeds Yarrow by itself in a few seconds! YMMV; more to follow.

------------------------------------------------------------------------
r256086 | markm | 2013-10-06 13:40:32 +0100 (Sun, 06 Oct 2013) | 11 lines

Debug run. This now works, except that the "live" sources haven't
been tested. With all sources turned on, this unlocks itself in
a couple of seconds! That is no my box, and there is no guarantee
that this will be the case everywhere.

* Cut debug prints.

* Use the same locks/mutexes all the way through.

* Be a tad more conservative about entropy estimates.

------------------------------------------------------------------------
r256084 | markm | 2013-10-06 13:35:29 +0100 (Sun, 06 Oct 2013) | 5 lines

Don't use the "real" assembler mnemonics; older compilers may not
understand them (like when building CURRENT on 9.x).

# Submitted by: Konstantin Belousov <kostikbel@gmail.com>

------------------------------------------------------------------------
r256081 | markm | 2013-10-06 10:55:28 +0100 (Sun, 06 Oct 2013) | 12 lines

SNAPSHOT.

Simplify the malloc pools; We only need one for this device.

Simplify the harvest queue.

Marginally improve the entropy pool hashing, making it a bit faster
in the process.

Connect up the hardware "live" source harvesting. This is simplistic
for now, and will need to be made rate-adaptive.

All of the above passes a compile test but needs to be debugged.

------------------------------------------------------------------------
r256042 | markm | 2013-10-04 07:55:06 +0100 (Fri, 04 Oct 2013) | 25 lines

Snapshot. This passes the build test, but has not yet been finished or debugged.

Contains:

* Refactor the hardware RNG CPU instruction sources to feed into
the software mixer. This is unfinished. The actual harvesting needs
to be sorted out. Modified by me (see below).

* Remove 'frac' parameter from random_harvest(). This was never
used and adds extra code for no good reason.

* Remove device write entropy harvesting. This provided a weak
attack vector, was not very good at bootstrapping the device. To
follow will be a replacement explicit reseed knob.

* Separate out all the RANDOM_PURE sources into separate harvest
entities. This adds some secuity in the case where more than one
is present.

* Review all the code and fix anything obviously messy or inconsistent.
Address som review concerns while I'm here, like rename the pseudo-rng
to 'dummy'.

# Submitted by: Arthur Mesh <arthurmesh@gmail.com> (the first item)

------------------------------------------------------------------------
r255319 | markm | 2013-09-06 18:51:52 +0100 (Fri, 06 Sep 2013) | 4 lines

Yarrow wants entropy estimations to be conservative; the usual idea
is that if you are certain you have N bits of entropy, you declare
N/2.

------------------------------------------------------------------------
r255075 | markm | 2013-08-30 18:47:53 +0100 (Fri, 30 Aug 2013) | 4 lines

Remove short-lived idea; thread to harvest (eg) RDRAND enropy into the
usual harvest queues. It was a nifty idea, but too heavyweight.

# Submitted by: Arthur Mesh <arthurmesh@gmail.com>

------------------------------------------------------------------------
r255071 | markm | 2013-08-30 12:42:57 +0100 (Fri, 30 Aug 2013) | 4 lines

Separate out the Software RNG entropy harvesting queue and thread
into its own files.

# Submitted by: Arthur Mesh <arthurmesh@gmail.com>

------------------------------------------------------------------------
r254934 | markm | 2013-08-26 20:07:03 +0100 (Mon, 26 Aug 2013) | 2 lines

Remove the short-lived namei experiment.

------------------------------------------------------------------------
r254928 | markm | 2013-08-26 19:35:21 +0100 (Mon, 26 Aug 2013) | 2 lines

Snapshot; Do some running repairs on entropy harvesting. More needs
to follow.

------------------------------------------------------------------------
r254927 | markm | 2013-08-26 19:29:51 +0100 (Mon, 26 Aug 2013) | 15 lines

Snapshot of current work;

1) Clean up namespace; only use "Yarrow" where it is Yarrow-specific
or close enough to the Yarrow algorithm. For the rest use a neutral
name.

2) Tidy up headers; put private stuff in private places. More could
be done here.

3) Streamline the hashing/encryption; no need for a 256-bit counter;
128 bits will last for long enough.

There are bits of debug code lying around; these will be removed
at a later stage.

------------------------------------------------------------------------
r254784 | markm | 2013-08-24 14:54:56 +0100 (Sat, 24 Aug 2013) | 39 lines

1) example (partially humorous random_adaptor, that I call "EXAMPLE")
* It's not meant to be used in a real system, it's there to show how
the basics of how to create interfaces for random_adaptors. Perhaps
it should belong in a manual page

2) Move probe.c's functionality in to random_adaptors.c
* rename random_ident_hardware() to random_adaptor_choose()

3) Introduce a new way to choose (or select) random_adaptors via tunable
"rngs_want" It's a list of comma separated names of adaptors, ordered
by preferences. I.e.:
rngs_want="yarrow,rdrand"

Such setting would cause yarrow to be preferred to rdrand. If neither of
them are available (or registered), then system will default to
something reasonable (currently yarrow). If yarrow is not present, then
we fall back to the adaptor that's first on the list of registered
adaptors.

4) Introduce a way where RNGs can play a role of entropy source. This is
mostly useful for HW rngs.

The way I envision this is that every HW RNG will use this
functionality by default. Functionality to disable this is also present.
I have an example of how to use this in random_adaptor_example.c (see
modload event, and init function)

5) fix kern.random.adaptors from
kern.random.adaptors: yarrowpanicblock
to
kern.random.adaptors: yarrow,panic,block

6) add kern.random.active_adaptor to indicate currently selected
adaptor:
root@freebsd04:~ # sysctl kern.random.active_adaptor
kern.random.active_adaptor: yarrow

# Submitted by: Arthur Mesh <arthurmesh@gmail.com>

Submitted by: Dag-Erling Smørgrav <des@FreeBSD.org>, Arthur Mesh <arthurmesh@gmail.com>
Reviewed by: des@FreeBSD.org
Approved by: re (delphij)
Approved by: secteam (des,delphij)


256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


256218 09-Oct-2013 glebius

There are some high performance NICs that count statistics in hardware,
and there are ifnets, that do that via counter(9). Provide a flag that
would skip cache line trashing '+=' operation in ether_input().

Sponsored by: Netflix
Sponsored by: Nginx, Inc.
Reviewed by: melifaro, adrian
Approved by: re (marius)


256008 02-Oct-2013 glebius

Clear knlist before destroying it in tap(4) and tun(4). This fixes later
crash, when a kqueue descriptor tries to dereference appropriate knotes.

Approved by: re (kib)


255926 28-Sep-2013 glebius

Fix a fallout from r241610. One enc interface must be created on startup.

Pointy hat to: glebius
Reported by: gavin
Approved by: re (gjb)


255471 11-Sep-2013 glebius

Clean up SIOCSIFDSTADDR usage from ifnet drivers. The ioctl itself is
extremely outdated, and I doubt that it was ever used for ifnet drivers.
It was used for AF_INET sockets in pre-FreeBSD time.

Approved by: re (hrs)
Sponsored by: Nginx, Inc.


255442 10-Sep-2013 des

Fix the length calculation for the final block of a sendfile(2)
transmission which could be tricked into rounding up to the nearest
page size, leaking up to a page of kernel memory. [13:11]

In IPv6 and NetATM, stop SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFDSTADDR
and SIOCSIFNETMASK at the socket layer rather than pass them on to the
link layer without validation or credential checks. [SA-13:12]

Prevent cross-mount hardlinks between different nullfs mounts of the
same underlying filesystem. [SA-13:13]

Security: CVE-2013-5666
Security: FreeBSD-SA-13:11.sendfile
Security: CVE-2013-5691
Security: FreeBSD-SA-13:12.ifioctl
Security: CVE-2013-5710
Security: FreeBSD-SA-13:13.nullfs
Approved by: re


255362 07-Sep-2013 markm

Bring in some behind-the-scenes development, mainly By Arthur Mesh,
the rest by me.

o Namespace cleanup; the Yarrow name is now restricted to where it
really applies; this is in anticipation of being augmented or
replaced by Fortuna in the future. Fortuna is mentioned, but behind
#if logic, and is ignorable for now.

o The harvest queue is pulled out into its own modules.

o Entropy harvesting is emproved, both by being made more conservative,
and by separating (a bit!) the sources. Available entropy crumbs are
marginally improved.

o Selection of sources is made clearer. With recent revelations,
this will receive more work in the weeks and months to come.

Submitted by: Arthur Mesh (partly) <arthurmesh@gmail.com>


255360 07-Sep-2013 davide

Don't clear the unused SI_CHEAPCLONE flag in tap_create()/tuncreate().

Reviewed by: kib


255329 06-Sep-2013 davide

Retire netisr.netisr_direct and netisr.netisr_direct_force sysctls.
These were used to control/export dispatch policy but they're not anymore.
This commit cannot be MFC'ed to 9 because old netstat(9) binary relies
on such sysctl to work. On the other hand, there's no real reason to
keep'em around in 10.


255038 29-Aug-2013 adrian

Convert the if_lagg rwlock to an rmlock.

We've been seeing lots of cache line contention (but not lock contention!)
in our workloads between the various TX and RX threads going on.

The write lock is only grabbed when configuration changes are made - which
are infrequent.

With this patch, the contention and cycles spent waiting for updates
disappear.

Sponsored by: Netflix, Inc.


254963 27-Aug-2013 alfred

Remove include opt_ofed.h since OFED is unifdef'd.

Pointed out by: glebius


254925 26-Aug-2013 jhb

Remove most of the remaining sysctl name list macros. They were only
ever intended for use in sysctl(8) and it has not used them for many
years.

Reviewed by: bde
Tested by: exp-run by bdrewery


254831 25-Aug-2013 andre

Remove unnecessary setup of the m->pkthdr.header pointer.

Sponsored by: The FreeBSD Foundation


254823 25-Aug-2013 alfred

Remove the #ifdef OFED from the 20 byte mac in struct llentry.

With this change it is now possible to build the entire infiniband
stack as modules and load it dynamically including IP over IB.


254804 24-Aug-2013 andre

Restructure the mbuf pkthdr to make it fit for upcoming capabilities and
features. The changes in particular are:

o Remove rarely used "header" pointer and replace it with a 64bit protocol/
layer specific union PH_loc for local use. Protocols can flexibly overlay
their own 8 to 64 bit fields to store information while the packet is
worked on.

o Mechanically convert IP reassembly, IGMP/MLD and ATM to use pkthdr.PH_loc
instead of pkthdr.header.

o Extend csum_flags to 64bits to allow for additional future offload
information to be carried (e.g. iSCSI, IPsec offload, and others).

o Move the RSS hash type enumerator from abusing m_flags to its own 8bit
rsstype field. Adjust accessor macros.

o Add cosqos field to store Class of Service / Quality of Service information
with the packet. It is not yet supported in any drivers but allows us to
get on par with Cisco/Juniper in routing applications (plus MPLS QoS) with
a modernized ALTQ.

o Add four 8 bit fields l[2-5]hlen to store the relative header offsets
from the start of the packet. This is important for various offload
capabilities and to relieve the drivers from having to parse the packet
and protocol headers to find out location of checksums and other
information. Header parsing in drivers is a lot of copy-paste and
unhandled corner cases which we want to avoid.

o Add another flexible 64bit union to map various additional persistent
packet information, like ether_vtag, tso_segsz and csum fields.
Depending on the csum_flags settings some fields may have different usage
making it very flexible and adaptable to future capabilities.

o Restructure the CSUM flags to better signify their outbound (down the
stack) and inbound (up the stack) use. The CSUM flags used to be a bit
chaotic and rather poorly documented leading to incorrect use in many
places. Bring clarity into their use through better naming.
Compatibility mappings are provided to preserve the API. The drivers
can be corrected one by one and MFC'd without issue.

o The size of pkthdr stays the same at 48/56bytes (32/64bit architectures).

Sponsored by: The FreeBSD Foundation


254777 24-Aug-2013 andre

Whitespace, style cleanups, and improved comments.


254774 24-Aug-2013 andre

ename PFIL_LIST_[UN]LOCK() to PFIL_HEADLIST_[UN]LOCK() to avoid
confusion with the pfil_head chain locking macros.


254773 24-Aug-2013 andre

Resolve the confusion between the head_list and the hook list.

The linked list of pfil hooks is changed to "chain" and this term
is applied consistently. The head_list remains with "list" term.

Add KASSERT to vnet_pfil_uninit().

Update and extend comments.

Reviewed by: eri (previous version)


254771 24-Aug-2013 andre

Internalize pfil_hook_get(). There are no outside consumers of
this API, it is only safe for internal use and even the pfil(9)
man page says so in the BUGS section.

Reviewed by: eri


254770 24-Aug-2013 andre

Convert one instance of pfil hook callback missed in r254769.


254769 24-Aug-2013 andre

Introduce typedef for pfil hook callback function and replace all
spelled out occurrences with it.

Reviewed by: eri


254569 20-Aug-2013 bz

After r241616 properly export ifi_baudrate_pf in the 32bit compat case.

MFC after: 3 days


254523 19-Aug-2013 andre

Add m_clrprotoflags() to clear protocol specific mbuf flags at up and
downwards layer crossings.

Consistently use it within IP, IPv6 and ethernet protocols.

Discussed with: trociny, glebius


254020 07-Aug-2013 markj

Add a missing module version declaration to if_tun(4).

PR: 181078
Submitted by: Brandon Gooch <jamesbrandongooch@gmail.com>
MFC after: 1 week


253753 28-Jul-2013 hrs

sin6 should be assigned before the loop.


253751 28-Jul-2013 hrs

- Relax the restriction on the member interfaces with LLAs. Two or more
LLAs on the member interfaces are actually harmless when the parent
interface does not have a LLA.

- Add net.link.bridge.allow_llz_overlap. This is a knob to allow LLAs on
a bridge and the member interfaces at the same time. The default is 0.

Pointed out by: ume
MFC after: 3 days


253687 26-Jul-2013 adrian

Break out the static, global LACP debug options into a per-lagg unit
sysctl tree.

* Create a net.link.lagg.X.lacp node
* Add a debug node under that for tx_test and rx_test
* Add lacp_strict_mode, defaulting to 1

tx_test and rx_test are still a bitmap of unit numbers for now.
At some point it would be nice to create child nodes of the lagg bundle
for each sub-interface, and then populate those with various knobs
and statistics.

Sponsored by: Netflix


253655 25-Jul-2013 adrian

Fix typo.

Sponsored by: Netflix


253590 24-Jul-2013 marcel

Decouple the UUID generator from network interfaces by having MAC
addresses added to the UUID generator using uuid_ether_add(). The
UUID generator keeps an arbitrary number of MAC addresses, under
the assumption that they are rarely removed (= uuid_ether_del()).
This achieves the following:
1. It brings up closer to having the network stack as a loadable
module.
2. It allows the UUID generator to filter MAC addresses for best
results (= highest chance of uniqeness).
3. MAC addresses can come from anywhere, irrespactive of whether
it's used for an interface or not.

A side-effect of the change is that when no MAC addresses have been
added, a random multicast MAC address is created once and re-used if
needed. Previusly, when a random MAC address was needed, it was
created for every call. Thus, a change in behaviour is introduced
for when no MAC addresses exist.

Obtained from: Juniper Networks, Inc.


253346 15-Jul-2013 rodrigc

PR: 168520 170096
Submitted by: adrian, zec

Fix multiple kernel panics when VIMAGE is enabled in the kernel.
These fixes are based on patches submitted by Adrian Chadd and Marko Zec.

(1) Set curthread->td_vnet to vnet0 in device_probe_and_attach() just before calling
device_attach(). This fixes multiple VIMAGE related kernel panics
when trying to attach Bluetooth or USB Ethernet devices because
curthread->td_vnet is NULL.

(2) Set curthread->td_vnet in if_detach(). This fixes kernel panics when detaching networking
interfaces, especially USB Ethernet devices.

(3) Use VNET_DOMAIN_SET() in ng_btsocket.c

(4) In ng_unref_node() set curthread->td_vnet. This fixes kernel panics
when detaching Netgraph nodes.


253314 13-Jul-2013 adrian

Bring over some link aggregation / LACP protocol improvements and debugging
additions.

* Add some new tracing events to aid in debugging.
* Add in a debugging mode to drop transmit and received frames, specifically
to test whether seeing or hearing heartbeats correctly cause LACP to
drop the port.
* Add in (and make default) a strict LACP mode, which requires the
heartbeat on a port to be heard before it's used. Sometimes vendor ports
will hang but the link layer stays up, resulting in hung traffic.
* Add logging the number of link status flaps, again to aid in debugging
badly behaving switch ports.
* Calculate the lagg interface port speed as the multiple of the
configured ports, rather than the largest.

Obtained from: Netflix
MFC after: 2 weeks


253262 12-Jul-2013 hrs

Add a leaf node CTL_NET.PF_ROUTE.0.AF.NET_RT_DUMP.0.FIB. This returns
routing table with the specified FIB number, not td->td_proc->p_fibnum.


253261 12-Jul-2013 hrs

- Drop GIF_ACCEPT_REVETHIP flag by default.
- Add IFF_MONITOR support.


253100 09-Jul-2013 ae

Correct CTASSERT condition.


253084 09-Jul-2013 ae

Migrate structs arpstat, icmpstat, mrtstat, pimstat and udpstat to PCPU
counters.


253082 09-Jul-2013 ae

Add several macros to help migrate statistics structures to PCPU counters.


253081 09-Jul-2013 ae

Prepare network statistics structures for migration to PCPU counters.
Use uint64_t as type for all fields of structures.

Changed structures: ahstat, arpstat, espstat, icmp6_ifstat, icmp6stat,
in6_ifstat, ip6stat, ipcompstat, ipipstat, ipsecstat, mrt6stat, mrtstat,
pfkeystat, pim6stat, pimstat, rip6stat, udpstat.

Discussed with: arch@


252854 05-Jul-2013 cperciva

Fix typo: minmum -> minimum.

Submitted by: @z3ndrag0n


252548 03-Jul-2013 hrs

Fix a compiler warning.

MFC after: 1 week


252511 02-Jul-2013 hrs

- Allow ND6_IFF_AUTO_LINKLOCAL for IFT_BRIDGE. An interface with IFT_BRIDGE
is initialized with !ND6_IFF_AUTO_LINKLOCAL && !ND6_IFF_ACCEPT_RTADV
regardless of net.inet6.ip6.accept_rtadv and net.inet6.ip6.auto_linklocal.
To configure an autoconfigured link-local address (RFC 4862), the
following rc.conf(5) configuration can be used:

ifconfig_bridge0_ipv6="inet6 auto_linklocal"

- if_bridge(4) now removes IPv6 addresses on a member interface to be
added when the parent interface or one of the existing member
interfaces has an IPv6 address. if_bridge(4) merges each link-local
scope zone which the member interfaces form respectively, so it causes
address scope violation. Removal of the IPv6 addresses prevents it.

- if_lagg(4) now removes IPv6 addresses on a member interfaces
unconditionally.

- Set reasonable flags to non-IPv6-capable interfaces. [*]

Submitted by: rpaulo [*]
MFC after: 1 week


252184 25-Jun-2013 qingli

Due to the routing related networking kernel redesign work
in FBSD 8.0, interface routes have been returened to the
applications without the RTF_GATEWAY bit. This incompatibility
has caused some issues with Zebra, Qugga and the like.
This patch provides the RTF_GATEWAY flag bit in returned interface
routes so to behave similarly to pre 8.0 systems.

Reviewed by: hrs
Verified by: mackn at opendns dot com


251859 17-Jun-2013 delphij

Return ENETDOWN instead of ENOENT when all lagg(4) links are
inactive when upper layer tries to transmit packet. This
gives better feedback and meaningful errors for applications.

MFC after: 2 weeks
Reviewed by: thompsa


251799 16-Jun-2013 hrs

Return ENETDOWN when the parent interface is down.

MFC after: 1 week


251490 07-Jun-2013 trociny

Properly set curvnet context in lagg_port_setlladdr() task handler.

Reported by: Nikos Vassiliadis <nvass gmx.com>
Submitted by: zec
Tested by: Nikos Vassiliadis <nvass gmx.com>
MFC after: 1 week


251393 04-Jun-2013 jhb

Fix build with both INET and INET6 disabled.


251296 03-Jun-2013 andre

Allow drivers to specify a maximum TSO length in bytes if they are
limited in the amount of data they can handle at once.

Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() to
change the limit.

The lowest allowable size is IP_MAXPACKET / 8 (8192 bytes) as anything
less wouldn't be very useful anymore. The upper limit is still at
IP_MAXPACKET (65536 bytes). Raising it requires further auditing of
the IPv4/v6 code path's as the length field in the IP header would
overflow leading to confusion in firewalls and others packet handler on
the real size of the packet.

The placement into "struct ifnet" is a bit hackish but the best place
that was found. When the stack/driver boundary is updated it should
be handled in a better way.

Submitted by: cperciva (earlier version)
Reviewed by: cperciva
Tested by: cperciva
MFC after: 1 week (using spare struct members to preserve ABI)


251139 30-May-2013 luigi

Bring in a number of new features, mostly implemented by Michio Honda:

- the VALE switch now support up to 254 destinations per switch,
unicast or broadcast (multicast goes to all ports).

- we can attach hw interfaces and the host stack to a VALE switch,
which means we will be able to use it more or less as a native bridge
(minor tweaks still necessary).
A 'vale-ctl' program is supplied in tools/tools/netmap
to attach/detach ports the switch, and list current configuration.

- the lookup function in the VALE switch can be reassigned to
something else, similar to the pf hooks. This will enable
attaching the firewall, or other processing functions (e.g. in-kernel
openvswitch) directly on the netmap port.

The internal API used by device drivers does not change.

Userspace applications should be recompiled because we
bump NETMAP_API as we now use some fields in the struct nmreq
that were previously ignored -- otherwise, data structures
are the same.

Manpages will be committed separately.


251138 30-May-2013 luigi

clarify usage of NETMAP_BUF


250945 23-May-2013 ghelmer

While waiting for the bpf hold buffer to become idle, check
the return value from mtx_sleep() and exit bpfread() on
errors such as EINTR.

Reviewed by: jhb


250887 21-May-2013 ed

Allow certain headers to be included more easily.

Spotted by: http://hacks.owlfolio.org/header-survey/


250766 18-May-2013 melifaro

Use separate function to update mbuf checksum flags instead of
duplicating the same code in different places.

MFC after: 2 weeks


250764 18-May-2013 melifaro

Fix rte leak introduced in r248070.

MFC after: 2 weeks


250700 16-May-2013 julian

Finally change the mbuf to have its own fib field instead of stealing
4 flag bits. This was supposed to happen in 8.0, and again in 2012..

MFC after: never


250523 11-May-2013 hrs

Add IFF_MONITOR support to gre(4).

Tested by: Chip Marshall
MFC after: 1 week


250300 06-May-2013 andre

Back out r249318, r249320 and r249327 due to a heisenbug most
likely related to a race condition in the ipi_hash_lock with
the exact cause currently unknown but under investigation.


250131 01-May-2013 eadler

Correct a few sizeof()s

Submitted by: swildner@DragonFlyBSD.org
Reviewed by: alfred


250106 30-Apr-2013 luigi

remove $Id$ (whitespace change)


249925 26-Apr-2013 glebius

Add const qualifier to the dst parameter of the ifnet if_output method.


249628 18-Apr-2013 oleg

Recover missing arp_ifinit() call.

MFC after: 2 weeks


249506 15-Apr-2013 glebius

Switch lagg(4) statistics to counter(9).

The lagg(4) is often used to bond high speed links, so basic per-packet +=
on statistics cause cache misses and statistics loss.

Perfect solution would be to convert ifnet(9) to counters(9), but this
requires much more work, and unfortunately ABI change, so temporarily
patch lagg(4) manually.

We store counters in the softc, and once per second push their values
to legacy ifnet counters.

Sponsored by: Nginx, Inc.


249327 10-Apr-2013 glebius

Fix build.


249318 09-Apr-2013 andre

Change certain heavily used network related mutexes and rwlocks to
reside on their own cache line to prevent false sharing with other
nearby structures, especially for those in the .bss segment.

NB: Those mutexes and rwlocks with variables next to them that get
changed on every invocation do not benefit from their own cache line.
Actually it may be net negative because two cache misses would be
incurred in those cases.


249294 09-Apr-2013 ae

Use IP6STAT_INC/IP6STAT_DEC macros to update ip6 stats.

MFC after: 1 week


248851 28-Mar-2013 markj

Ignore interface renames instead of removing the interface from the bridge
group.

Reviewed by: rstone
Approved by: rstone (co-mentor)
Sponsored by: Sandvine Incorporated
MFC after: 1 week


248621 22-Mar-2013 glebius

Remove __FreeBSD_version ifdefs.


248490 19-Mar-2013 ae

Fix style and comments.


248324 15-Mar-2013 glebius

Use m_get/m_gethdr instead of compat macros.

Sponsored by: Nginx, Inc.


248322 15-Mar-2013 glebius

- Use m_getcl() instead of hand allocating.
- Convert panic() to KASSERT.
- Remove superfluous cleaning of mbuf fields after allocation.
- Add comment on possible use of m_get2() here.

Sponsored by: Nginx, Inc.


248207 12-Mar-2013 glebius

Functions m_getm2() and m_get2() have different order of arguments,
and that can drive someone crazy. While m_get2() is young and not
documented yet, change its order of arguments to match m_getm2().

Sorry for churn, but better now than later.


248155 11-Mar-2013 glebius

Reinitialize eh after pfil(9) processing.

PR: 176764
Submitted by: adri


248070 08-Mar-2013 melifaro

Fix long-standing issue with interface routes being unprotected:
Use RTM_PINNED flag to mark route as immutable.
Forbid deleting immutable routes without special rtrequest1_fib() flag.
Adding interface address with prefix already in route table is handled
by atomically deleting old prefix and adding interface one.

Discussed with: andre, eri
MFC after: 3 weeks


247842 05-Mar-2013 melifaro

Write lock is not required for find&compare operation.

MFC after: 2 weeks


246822 15-Feb-2013 glebius

Finish the r244185. This fixes ever growing counter of pfsync bad
length packets, which was actually harmless.

Note that peers with different version of head/ may grow this
counter, but it is harmless - all pfsync data is processed.

Reported & tested by: Anton Yuzhaninov <citrin citrin.ru>
Sponsored by: Nginx, Inc


246659 11-Feb-2013 glebius

Resolve source address selection in presense of CARP. Add a couple
of helper functions:

- carp_master() - boolean function which is true if an address
is in the MASTER state.
- ifa_preferred() - boolean function that compares two addresses,
and is aware of CARP.

Utilize ifa_preferred() in ifa_ifwithnet().

The previous version of patch also changed source address selection
logic in jails using carp_master(), but we failed to negotiate this part
with Bjoern. May be we will approach this problem again later.

Reported & tested by: Anton Yuzhaninov <citrin citrin.ru>
Sponsored by: Nginx, Inc


246482 07-Feb-2013 rrs

This fixes a out-of-order problem with several
of the newer drivers. The basic problem was
that the driver was pulling the mbuf off the
drbr ring and then when sending with xmit(), encounting
a full transmit ring. Thus the lower layer
xmit() function would return an error, and the
drivers would then append the data back on to the ring.
For TCP this is a horrible scenario sure to bring
on a fast-retransmit.

The fix is to use drbr_peek() to pull the data pointer
but not remove it from the ring. If it fails then
we either call the new drbr_putback or drbr_advance
method. Advance moves it forward (we do this sometimes
when the xmit() function frees the mbuf). When
we succeed we always call advance. The
putback will always copy the mbuf back to the top
of the ring. Note that the putback *cannot* be used
with a drbr_dequeue() only with drbr_peek(). We most
of the time, in putback, would not need to copy it
back since most likey the mbuf is still the same, but
sometimes xmit() functions will change the mbuf via
a pullup or other call. So the optimial case for
the single consumer is to always copy it back. If
we ever do a multiple_consumer (for lagg?) we
will need a test and atomic in the put back possibly
a seperate putback_mc() in the ring buf.

Reviewed by: jhb@freebsd.org, jlv@freebsd.org


246143 31-Jan-2013 glebius

Retire struct sockaddr_inarp.

Since ARP and routing are separated, "proxy only" entries
don't have any meaning, thus we don't need additional field
in sockaddr to pass SIN_PROXY flag.

New kernel is binary compatible with old tools, since sizes
of sockaddr_inarp and sockaddr_in match, and sa_family are
filled with same value.

The structure declaration is left for compatibility with
third party software, but in tree code no longer use it.

Reviewed by: ru, andre, net@


246095 29-Jan-2013 glebius

route_output() always supplies info with RTAX_GATEWAY member that
points to a sockaddr of AF_LINK family. Assert this instead of
checking.


245924 26-Jan-2013 np

Move lle_event to if_llatbl.h

lle_event replaced arp_update_event after the ARP rewrite and ended up
in if_ether.h simply because arp_update_event used to be there too.
IPv6 neighbor discovery is going to grow lle_event support and this is a
good time to move it to if_llatbl.h.

The two in-tree consumers of this event - OFED and toecore - are not
affected.

Reviewed by: bz@


245878 24-Jan-2013 glebius

- Utilize m_get2(), accidentially fixing some signedness bugs.
- Return EMSGSIZE in both cases if uio_resid is oversized or undersized.
- No need to clear rcvif.


245834 23-Jan-2013 luigi

leftover from r245579... flags for semi transparent mode and direct
forwarding through a VALE switch


245741 21-Jan-2013 glebius

If lagg(4) can't forward a packet due to underlying port problems,
return much more meaningful ENETDOWN to the stack, instead of EBUSY.


245134 07-Jan-2013 glebius

- Add dashes before copyright notices.
- Add $FreeBSD$.
- Remove unused define.


245102 06-Jan-2013 peter

Juggle some internal symbols from our antique zlib (that originally came
in from kernel-pppd which is long gone) so that ZFS and DTRACE play nice.

This is a horrible hack to get freefall to compile, and is in dire need
of reconciliation. This antique zlib-1.04 code needs to go away.


244752 27-Dec-2012 ae

Add an ability to set net.link.stf.permit_rfc1918 from the loader.

MFC after: 2 weeks


244750 27-Dec-2012 ae

Add net.link.stf.permit_rfc1918 sysctl variable. It can be used to allow
the use of private IPv4 addresses with stf(4).

MFC after: 2 weeks


244378 18-Dec-2012 kevlo

Fix typo in comment.

Reviewed by: thompsa


244183 13-Dec-2012 glebius

Fix problem in r238990. The LLE_LINKED flag should be tested prior to
entering llentry_free(), and in case if we lose the race, we should simply
perform LLE_FREE_LOCKED(). Otherwise, if the race is lost by the thread
performing arptimer(), it will remove two references from the lle instead
of one.

Reported by: Ian FREISLICH <ianf clue.co.za>


244090 10-Dec-2012 ghelmer

Changes to resolve races in bpfread() and catchpacket() that, at worst,
cause kernel panics.

Add a flag to the bpf descriptor to indicate whether the hold buffer
is in use. In bpfread(), set the "hold buffer in use" flag before
dropping the descriptor lock during the call to bpf_uiomove().
Everywhere else the hold buffer is used or changed, wait while
the hold buffer is in use by bpfread(). Add a KASSERT in bpfread()
after re-acquiring the descriptor lock to assist uncovering any
additional hold buffer races.


243903 05-Dec-2012 hrs

- Move definition of V_deembed_scopeid to scope6_var.h.
- Deembed scope id in L3 address in in6_lltable_dump().
- Simplify scope id recovery in rtsock routines.
- Remove embedded scope id handling in ndp(8) and route(8) completely.


243882 05-Dec-2012 glebius

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually


243866 04-Dec-2012 hrs

- Fix LOR in sa6_recoverscope() in rt_msg2()[1].
- Check V_deembed_scopeid before checking if sa_family == AF_INET6.
- Fix scope id handing in route(8)[2] and ifconfig(8).

Reported by: rpaulo[1], Mateusz Guzik[1], peter[2]


243799 02-Dec-2012 melifaro

Fix bpf_if structure leak introduced in r235745.
Move all such structures to delayed-free lists and
delete all matching on interface departure event.

MFC after: 1 week


243669 29-Nov-2012 pjd

- Use more appropriate loop (do { } while()) when generating ethernet address
for bridge interface.
- If we found a collision we can break the loop - only one collision is
possible and one is exactly enough to need to renegerate.

Obtained from: WHEEL Systems
MFC after: 1 week


243624 27-Nov-2012 andre

Remove unused and unnecessary CSUM_IP_FRAGS checksumming capability.
Checksumming the IP header of fragments is no different from doing
normal IP headers.

Discussed with: yongari
MFC after: 1 week


243615 27-Nov-2012 davidxu

Pass allocated unit number to make_dev, otherwise kernel panics later while
cloning second tap.

Reviewed by: kevlo,ed


243601 27-Nov-2012 glebius

Better safe than sorry: reinitialize eh after ng_ether(4) and
if_bridge(4) processing, since mbuf may be modified there.

Submitted by: youngari


243569 26-Nov-2012 glebius

Re-initialize eh pointer after m_adj()

Submitted by: Kohji Okuno <okuno.kohji jp.panasonic.com>
Reviewed by: yongari


243208 18-Nov-2012 adrian

Fix up a compile time warning if INET6 isn't defined.


243187 17-Nov-2012 hrs

Fill sin6_scope_id in sockaddr_in6 before passing it from the kernel to
userland via routing socket or sysctl. This eliminates the following
KAME-specific sin6_scope_id handling routine from each userland utility:

sin6.sin6_scope_id = ntohs(*(u_int16_t *)&sin6.sin6_addr.s6_addr[2]);

This behavior can be controlled by net.inet6.ip6.deembed_scopeid. This is
set to 1 by default (sin6_scope_id will be filled in the kernel).

Reviewed by: bz


242673 06-Nov-2012 ghelmer

Work around a race in bpfread() by validating the hold buffer pointer
before freeing it. Otherwise, we can lose a buffer and cause a panic
in catchpacket().


242463 02-Nov-2012 ae

Remove the recently added sysctl variable net.pfil.forward.
Instead, add protocol specific mbuf flags M_IP_NEXTHOP and
M_IP6_NEXTHOP. Use them to indicate that the mbuf's chain
contains the PACKET_TAG_IPFORWARD tag. And do a tag lookup
only when this flag is set.

Suggested by: andre


242161 26-Oct-2012 glebius

o Remove last argument to ip_fragment(), and obtain all needed information
on checksums directly from mbuf flags. This simplifies code.
o Clear CSUM_IP from the mbuf in ip_fragment() if we did checksums in
hardware. Some driver may not announce CSUM_IP in theur if_hwassist,
although try to do checksums if CSUM_IP set on mbuf. Example is em(4).
o While here, consistently use CSUM_IP instead of its alias CSUM_DELAY_IP.
After this change CSUM_DELAY_IP vanishes from the stack.

Submitted by: Sebastian Kuzminsky <seb lineratesystems.com>


242079 25-Oct-2012 ae

Remove the IPFIREWALL_FORWARD kernel option and make possible to turn
on the related functionality in the runtime via the sysctl variable
net.pfil.forward. It is turned off by default.

Sponsored by: Yandex LLC
Discussed with: net@
MFC after: 2 weeks


242013 24-Oct-2012 glebius

Fix fallout from r240071. If destination interface lookup fails,
we should broadcast a packet, not try to deliver it to NULL.

Reported by: rpaulo


241913 22-Oct-2012 glebius

Switch the entire IPv4 stack to keep the IP packet header
in network byte order. Any host byte order processing is
done in local variables and host byte order values are
never[1] written to a packet.

After this change a packet processed by the stack isn't
modified at all[2] except for TTL.

After this change a network stack hacker doesn't need to
scratch his head trying to figure out what is the byte order
at the given place in the stack.

[1] One exception still remains. The raw sockets convert host
byte order before pass a packet to an application. Probably
this would remain for ages for compatibility.

[2] The ip_input() still subtructs header len from ip->ip_len,
but this is planned to be fixed soon.

Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru>
Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>


241888 22-Oct-2012 melifaro

Make PFIL use per-VNET lock instead of per-AF lock. Since most used packet
filters (ipfw and PF) use the same ruleset with the same lock for both
AF_INET and AF_INET6 there is no need in more fine-grade locking.
However, it is possible to request personal lock by specifying
PFIL_FLAG_PRIVATE_LOCK flag in pfil_head structure (see pfil.9 for
more details).

Export PFIL lock via rw_lock(9)/rm_lock(9)-like API permitting pfil consumers
to use this lock instead of own lock. This help reducing locks on main
traffic path.

pfil_assert() is currently not implemented due to absense of rm_assert().
Waiting for some kind of r234648 to be merged in HEAD.

This change is part of bigger patch reducing routing locking.

Sponsored by: Yandex LLC
Reviewed by: glebius, ae
OK'd by: silence on net@
MFC after: 3 weeks


241725 19-Oct-2012 andre

Update to previous r241688 to use __func__ instead of spelled out function
name in log(9) message.

Suggested by: glebius


241688 18-Oct-2012 andre

Use LOG_WARNING level in in_attachdomain1() instead of printf().

Submitted by: vijju.singh-at-gmail.com


241686 18-Oct-2012 andre

Mechanically remove the last stray remains of spl* calls from net*/*.
They have been Noop's for a long time now.


241677 18-Oct-2012 glebius

Utilize new macro to initialize if_baudrate().


241650 17-Oct-2012 glebius

Fix VIMAGE build.

Reported by: Nikolai Lifanov <lifanov mail.lifanov.com>
Pointy hat to: glebius


241646 17-Oct-2012 emax

provide helper if_initbaudrate() to set if_baudrate_pf and if_baudrate_pf.
again, use ixgbe(4) as an example of how to use new helper function.

Reviewed by: jhb
MFC after: 1 week


241627 17-Oct-2012 delphij

Fix build.


241619 16-Oct-2012 emax

report total number of ports for each lagg(4) interface
via net.link.lagg.X.count sysctl

MFC after: 1 week


241616 16-Oct-2012 emax

introduce concept of ifi_baudrate power factor. the idea is to work
around the problem where high speed interfaces (such as ixgbe(4))
are not able to report real ifi_baudrate. bascially, take a spare
byte from struct if_data and use it to store ifi_baudrate power
factor. in other words,

real ifi_baudrate = ifi_baudrate * 10 ^ ifi_baudrate power factor

this should be backwards compatible with old binaries. use ixgbe(4)
as an example on how drivers would set ifi_baudrate power factor

Discussed with: kib, scottl, glebius
MFC after: 1 week


241610 16-Oct-2012 glebius

Make the "struct if_clone" opaque to users of the cloning API. Users
now use function calls:

if_clone_simple()
if_clone_advanced()

to initialize a cloner, instead of macros that initialize if_clone
structure.

Discussed with: brooks, bz, 1 year ago


241394 10-Oct-2012 kevlo

Revert previous commit...

Pointyhat to: kevlo (myself)


241370 09-Oct-2012 kevlo

Prefer NULL over 0 for pointers


241245 06-Oct-2012 glebius

A step in resolving mess with byte ordering for AF_INET. After this change:

- All packets in NETISR_IP queue are in net byte order.
- ip_input() is entered in net byte order and converts packet
to host byte order right _after_ processing pfil(9) hooks.
- ip_output() is entered in host byte order and converts packet
to net byte order right _before_ processing pfil(9) hooks.
- ip_fragment() accepts and emits packet in net byte order.
- ip_forward(), ip_mloopback() use host byte order (untouched actually).
- ip_fastforward() no longer modifies packet at all (except ip_ttl).
- Swapping of byte order there and back removed from the following modules:
pf(4), ipfw(4), enc(4), if_bridge(4).
- Swapping of byte order added to ipfilter(4), based on __FreeBSD_version
- __FreeBSD_version bumped.
- pfil(9) manual page updated.

Reviewed by: ray, luigi, eri, melifaro
Tested by: glebius (LE), ray (BE)


241231 05-Oct-2012 delphij

MFV: libpcap 1.3.0.

MFC after: 4 weeks


241183 04-Oct-2012 thompsa

Remove the M_NOWAIT from bridge_rtable_init as it isn't needed. The function
return value is not even checked and could lead to a panic on a null sc_rthash.

MFC after: 2 weeks


241166 03-Oct-2012 emaste

Cast through void * to silence compiler warning

The base netmap pointer and offsets involved are provided by the kernel
side of the netmap interface and will have appropriate alignment.

Sponsored by: ADARA Networks
MFC After: 2 weeks


241130 02-Oct-2012 jhb

Rename the module for 'device enc' to "if_enc" to avoid conflicting with
the CAM "enc" peripheral (part of ses(4)). Previously the two modules
used the same name, so only one was included in a linked kernel causing
enc0 to not be created if you added IPSEC to GENERIC. The new module
name follows the pattern of other network interfaces (e.g. "if_loop").

MFC after: 1 week


241037 28-Sep-2012 glebius

The drbr(9) API appeared to be so unclear, that most drivers in
tree used it incorrectly, which lead to inaccurate overrated
if_obytes accounting. The drbr(9) used to update ifnet stats on
drbr_enqueue(), which is not accurate since enqueuing doesn't
imply successful processing by driver. Dequeuing neither mean
that. Most drivers also called drbr_stats_update() which did
accounting again, leading to doubled if_obytes statistics. And
in case of severe transmitting, when a packet could be several
times enqueued and dequeued it could have been accounted several
times.

o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between
ALTQ queueing or buf_ring(9) queueing.
- It doesn't touch the buf_ring stats any more.
- It doesn't touch ifnet stats anymore.
- drbr_stats_update() no longer exists.

o buf_ring(9) handles its stats itself:
- It handles br_drops itself.
- br_prod_bytes stats are dropped. Rationale: no one ever
reads them but update of a common counter on every packet
negatively affects performance due to excessive cache
invalidation.
- buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since
we no longer account bytes.

o Drivers handle their stats theirselves: if_obytes, if_omcasts.

o mlx4(4), igb(4), em(4), vxge(4), oce(4) and ixv(4) no longer
use drbr_stats_update(), and update ifnet stats theirselves.

o bxe(4) was the most correct driver, it didn't call
drbr_stats_update(), thus it was the only driver accurate under
moderate load. Now it also maintains stats itself.

o ixgbe(4) had already taken stats from hardware, so just
- drop software stats updating.
- take multicast packet count from hardware as well.

o mxge(4) just no longer needs NO_SLOW_STATS define.

o cxgb(4), cxgbe(4) need no change, since they obtain stats
from hardware.

Reviewed by: jfv, gnn


240971 26-Sep-2012 glebius

- In the bridge_enqueue() do success/error accounting for
each fragment, not only once.
- In the GRAB_OUR_PACKETS() macro do increase if_ibytes.


240945 26-Sep-2012 emaste

Correct misspelling in debug output.


240942 25-Sep-2012 emaste

Revert part of an earlier patch attempt that snuck in with r240938.


240938 25-Sep-2012 emaste

Avoid INVARIANTS panic destroying an in-use tap(4)

The requirement (implied by the KASSERT in tap_destroy) that the tap is
closed isn't valid; destroy_dev will block in devdrn while other threads
are in d_* functions.

Note: if_tun had the same issue, addressed in SVN revisions r186391,
r186483 and r186497. The use of the condvar there appears to be
redundant with the functionality provided by destroy_dev.

Sponsored by: ADARA Networks
Reviewed by: dwhite
MFC after: 2 weeks


240932 25-Sep-2012 emaste

Remove an incorrect comment


240742 20-Sep-2012 glebius

Convert lagg(4) to use if_transmit instead of if_start.

In collaboration with: thompsa, sbruno, fabient


240736 20-Sep-2012 glebius

Utilize Jenkins hash with random seed for source nodes storage.


240723 20-Sep-2012 glebius

Add missing break.

Pointy hat to: glebius


240644 18-Sep-2012 glebius

Fix build, pass the pointy hat please.


240641 18-Sep-2012 glebius

Make ruleset anchors in pf(4) reentrant. We've got two problems here:

1) Ruleset parser uses a global variable for anchor stack.
2) When processing a wildcard anchor, matching anchors are marked.

To fix the first one:

o Allocate anchor processing stack on stack. To make this allocation
as small as possible, following measures taken:
- Maximum stack size reduced from 64 to 32.
- The struct pf_anchor_stackframe trimmed by one pointer - parent.
We can always obtain the parent via the rule pointer.
- When pf_test_rule() calls pf_get_translation(), the former lends
its stack to the latter, to avoid recursive allocation 32 entries.

The second one appeared more tricky. The code, that marks anchors was
added in OpenBSD rev. 1.516 of pf.c. According to commit log, the idea
is to enable the "quick" keyword on an anchor rule. The feature isn't
documented anywhere. The most obscure part of the 1.516 was that code
examines the "match" mark on a just processed child, which couldn't be
put here by current frame. Since this wasn't documented even in the
commit message and functionality of this is not clear to me, I decided
to drop this examination for now. The rest of 1.516 is redone in a
thread safe manner - the mark isn't put on the anchor itself, but on
current stack frame. To avoid growing stack frame, we utilize LSB
from the rule pointer, relying on kernel malloc(9) returning pointer
aligned addresses.

Discussed with: dhartmei


240640 18-Sep-2012 glebius

- Add $FreeBSD$ to allow modifications to this file.
- Move $OpenBSD$ to a more standard place.


240494 14-Sep-2012 glebius

o Create directory sys/netpfil, where all packet filters should
reside, and move there ipfw(4) and pf(4).

o Move most modified parts of pf out of contrib.

Actual movements:

sys/contrib/pf/net/*.c -> sys/netpfil/pf/
sys/contrib/pf/net/*.h -> sys/net/
contrib/pf/pfctl/*.c -> sbin/pfctl
contrib/pf/pfctl/*.h -> sbin/pfctl
contrib/pf/pfctl/pfctl.8 -> sbin/pfctl
contrib/pf/pfctl/*.4 -> share/man/man4
contrib/pf/pfctl/*.5 -> share/man/man5

sys/netinet/ipfw -> sys/netpfil/ipfw

The arguable movement is pf/net/*.h -> sys/net. There are
future plans to refactor pf includes, so I decided not to
break things twice.

Not modified bits of pf left in contrib: authpf, ftp-proxy,
tftp-proxy, pflogd.

The ipfw(4) movement is planned to be merged to stable/9,
to make head and stable match.

Discussed with: bz, luigi


240233 08-Sep-2012 glebius

Merge the projects/pf/head branch, that was worked on for last six months,
into head. The most significant achievements in the new code:

o Fine grained locking, thus much better performance.
o Fixes to many problems in pf, that were specific to FreeBSD port.

New code doesn't have that many ifdefs and much less OpenBSDisms, thus
is more attractive to our developers.

Those interested in details, can browse through SVN log of the
projects/pf/head branch. And for reference, here is exact list of
revisions merged:

r232043, r232044, r232062, r232148, r232149, r232150, r232298, r232330,
r232332, r232340, r232386, r232390, r232391, r232605, r232655, r232656,
r232661, r232662, r232663, r232664, r232673, r232691, r233309, r233782,
r233829, r233830, r233834, r233835, r233836, r233865, r233866, r233868,
r233873, r234056, r234096, r234100, r234108, r234175, r234187, r234223,
r234271, r234272, r234282, r234307, r234309, r234382, r234384, r234456,
r234486, r234606, r234640, r234641, r234642, r234644, r234651, r235505,
r235506, r235535, r235605, r235606, r235826, r235991, r235993, r236168,
r236173, r236179, r236180, r236181, r236186, r236223, r236227, r236230,
r236252, r236254, r236298, r236299, r236300, r236301, r236397, r236398,
r236399, r236499, r236512, r236513, r236525, r236526, r236545, r236548,
r236553, r236554, r236556, r236557, r236561, r236570, r236630, r236672,
r236673, r236679, r236706, r236710, r236718, r237154, r237155, r237169,
r237314, r237363, r237364, r237368, r237369, r237376, r237440, r237442,
r237751, r237783, r237784, r237785, r237788, r237791, r238421, r238522,
r238523, r238524, r238525, r239173, r239186, r239644, r239652, r239661,
r239773, r240125, r240130, r240131, r240136, r240186, r240196, r240212.

I'd like to thank people who participated in early testing:

Tested by: Florian Smeets <flo freebsd.org>
Tested by: Chekaluk Vitaly <artemrts ukr.net>
Tested by: Ben Wilber <ben desync.com>
Tested by: Ian FREISLICH <ianf cloudseed.co.za>


240112 04-Sep-2012 melifaro

Fix the build broken by r240099.
Hide link_pfil_hook under _KERNEL macro.

MFC after: 3 weeks


240099 04-Sep-2012 melifaro

Introduce new link-layer PFIL hook V_link_pfil_hook.
Merge ether_ipfw_chk() and part of bridge_pfil() into
unified ipfw_check_frame() function called by PFIL.
This change was suggested by rwatson? @ DevSummit.

Remove ipfw headers from ether/bridge code since they are unneeded now.

Note this thange introduce some (temporary) performance penalty since
PFIL read lock has to be acquired for every link-level packet.

MFC after: 3 weeks


240086 04-Sep-2012 glebius

- Move jenkins.h to jenkins_hash.c
- Provide missing function that can do hashing of arbitrary sized buffer.
- Refetch lookup3.c and do only minimal edits to it, so that diff between
our jenkins_hash.c and lookup3.c is minimal.
- Add declarations for jenkins_hash(), jenkins_hash32() to sys/hash.h.
- Document these functions in hash(9)

Obtained from: http://burtleburtle.net/bob/c/lookup3.c


240071 03-Sep-2012 glebius

Change bridge(4) to use if_transmit for forwarding packets to underlying
interfaces instead of queueing.

Tested by: ray


239905 30-Aug-2012 glebius

In ifc_alloc_unit():
- In the !wildcard case, return ENOSPC instead of confusing EEXIST
in case if ifc->ifc_maxunit reached.
- Fix unit leak, that I've introduced in previous revision.

Submitted by: Daan Vreeken <Daan vitsch.nl>


239519 21-Aug-2012 jhb

Fix a silly grammar bogon.

Submitted by: Stephen McKay


239440 20-Aug-2012 jhb

Refine the changes made in r208212 to avoid bogus failures from
if_delmulti() when clearing the configuration for a subinterface when
the parent interface is being detached. The current code was still
triggering an assertion in if_delmulti() due to the parent interface being
partially detached. Fix this by not calling if_delmulti() at all if the
parent interface is being detached. Warn if if_delmulti() fails when the
parent is not being detached (but similar to 208212, still proceed with
tearing down the vlan state).

Tested by: ae@
MFC after: 1 month


239357 17-Aug-2012 jhb

Unexpand a couple of TAILQ_FOREACH()s.


239065 05-Aug-2012 kib

After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason
to pull vm_param.h was removed. Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.

Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.

Suggested and reviewed by: alc
MFC after: 2 weeks


238990 02-Aug-2012 glebius

Fix races between in_lltable_prefix_free(), lla_lookup(),
llentry_free() and arptimer():

o Use callout_init_rw() for lle timeout, this allows us safely
disestablish them.
- This allows us to simplify the arptimer() and make it
race safe.
o Consistently use ifp->if_afdata_lock to lock access to
linked lists in the lle hashes.
o Introduce new lle flag LLE_LINKED, which marks an entry that
is attached to the hash.
- Use LLE_LINKED to avoid double unlinking via consequent
calls to llentry_free().
- Mark lle with LLE_DELETED via |= operation istead of =,
so that other flags won't be lost.
o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more
consistent and provide more informative KASSERTs.

The patch is a collaborative work of all submitters and myself.

PR: kern/165863
Submitted by: Andrey Zonov <andrey zonov.org>
Submitted by: Ryan Stone <rysto32 gmail.com>
Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>


238989 02-Aug-2012 glebius

The llentry_update() is used only by flowtable and the latter
always passes NULL pointer to it. Thus, code can be simplified
and function renamed to llentry_alloc() to match rtalloc().


238967 01-Aug-2012 glebius

Some more whitespace cleanup.


238945 31-Jul-2012 glebius

Some style(9) and whitespace changes.

Together with: Andrey Zonov <andrey zonov.org>


238871 28-Jul-2012 bz

Hardcode the loopback rx/tx checkum options for IPv6 to on without
checking. This allows the FreeBSD 9.1 release process to move forward.
Work around the problem that loopback connections to local addresses
not on loopback interfaces and not on interfaces w/ IPv6 checksum offloading
enabled would not work.
A proper fix to allow us to disable the "checksum offload" on loopback
for testing, measurements, ... as we allow for IPv4 needs to put in
place later.

Reported by: tuexen, Matthew Seaman (m.seaman infracaninophile.co.uk)
Reported by: Mike Andrews (mandrews bit0.com), kib, ...
PR: kern/170070
MFC after: 1 day
X-MFC after: re approval


238492 15-Jul-2012 melifaro

Permit changing MTU in 6to4 relay.

This behavior is recommended by RFC 4213 clause 3.2.

Sometimes fragmentation is the least evil.
For example, some Linux IPVS kernels forwards
ICMPv6 checksums to real servers incorrectly.

Reviewed by: hrs(previous version)
Approved by: kib(mentor)
MFC after: 1 week


238355 10-Jul-2012 emaste

Simplify error case

Submitted by: thompsa@


238346 10-Jul-2012 emaste

Plug potential mbuf leak when bridging fragments

If an error occurs when transmitting one mbuf in a chain of fragments,
free the subsequent fragments instead of leaking them.

Sponsored by: ADARA Networks


238309 09-Jul-2012 trociny

In epair_clone_destroy(), when destroying the second half, we have to
switch to its vnet before calling ether_ifdetach(). Otherwise if the
second half resides in a different vnet, if_detach() silently fails
leaving a stale pointer in V_ifnet list, and the system crashes trying
to access this pointer later.

Another solution could be not to allow to destroy epair unless both
ends are in the home vnet.

Discussed with: bz
Tested by: delphij


238298 09-Jul-2012 emaste

Restore error handling lost in r191603

This was missed in the change from IFQ_ENQUEUE to if_transmit.

Sponsored by: ADARA Networks


238183 06-Jul-2012 emaste

Implement SIOCGIFMEDIA for if_tap(4)

Appease certain if_tap(4) consumers by providing simulated Ethernet
media status.

DragonFly commit 70d9a675bf5441cc854a843ead702d08928c37f3

Obtained from: DragonFly BSD


238092 04-Jul-2012 glebius

When ip_output()/ip6_output() is supplied a struct route *ro argument,
it skips FLOWTABLE lookup. However, the non-NULL ro has dual meaning
here: it may be supplied to provide route, and it may be supplied to
store and return to caller the route that ip_output()/ip6_output()
finds. In the latter case skipping FLOWTABLE lookup is pessimisation.

The difference between struct route filled by FLOWTABLE and filled
by rtalloc() family is that the former doesn't hold a reference on
its rtentry. Reference is hold by flow entry, and it is about to
be released in future. Thus, route filled by FLOWTABLE shouldn't
be passed to RTFREE() macro.

- Introduce new flag for struct route/route_in6, that marks route
not holding a reference on rtentry.
- Introduce new macro RO_RTFREE() that cleans up a struct route
depending on its kind.
- All callers to ip_output()/ip6_output() that do supply non-NULL
but empty route should use RO_RTFREE() to free results of
lookup.
- ip_output()/ip6_output() now do FLOWTABLE lookup always when
ro->ro_rt == NULL.

Tested by: tuexen (SCTP part)


237852 30-Jun-2012 thompsa

Add the same check as vlan(4) where we ignore the ifnet departure event if the
interface is just being renamed.

PR: kern/169557
Submitted by: Mark Johnston
MFC after: 3 days


237787 29-Jun-2012 jhb

Hold GIF_LOCK() for almost all of gif_start(). It is required to be held
across in_gif_output() and in6_gif_output() anyway, and once it is held
across those it might as well be held for the entire loop. This simplifies
the code and removes the need for the custom IFF_GIF_WANTED flag (which
belonged in the softc and not as an IFF_* flag anyway).

Tested by: Vincent Hoffman vince unsane co uk


237263 19-Jun-2012 np

- Updated TOE support in the kernel.

- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
These are available as t3_tom and t4_tom modules that augment cxgb(4)
and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as
usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the
works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded? Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by: bz, gnn
Sponsored by: Chelsio communications.
MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)


236957 12-Jun-2012 rrs

Fix comment to better reflect how we are
cheating and using the csum_data. Also fix
style issues with the comments.


236955 12-Jun-2012 rrs

Note to self. Have morning coffee *before* committing things.
There is no mac_addr in the mbuf for BSD.. cheat like
we are supposed to and use the csum field since our friend
the gif tunnel itself will never use offload.


236954 12-Jun-2012 rrs

Opps forgot to commit the flag.


236951 12-Jun-2012 rrs

Allow a gif tunnel to be used with ALTq.

Reviewed by: gnn


236916 11-Jun-2012 thompsa

Fix a panic I introduced in r234487, the bridge softc pointer is set to null
early in the detach so rearrange things not to explode.

Reported by: David Roffiaen, Gustau Perez Querol
Tested by: David Roffiaen
MFC after: 3 days


236806 09-Jun-2012 melifaro

Fix typo introduced in r236559.

Pointed by: bcr
Approved by: kib(mentor)


236725 07-Jun-2012 trociny

Sort includes.

Submitted by: Daan Vreeken <pa4dan Bliksem.VEHosting.nl>
MFC after: 3 days


236724 07-Jun-2012 trociny

Add VIMAGE support to if_tap.

PR: kern/152047, kern/158686
Submitted by: Daan Vreeken <pa4dan Bliksem.VEHosting.nl>
MFC after: 1 week


236559 04-Jun-2012 melifaro

Fix panic introduced by r235745. Panic occurs after first packet traverse renamed interface.
Add several comments on locking

Found by: avg
Approved by: ae(mentor)
Tested by: avg
MFC after: 1 week


236332 30-May-2012 tuexen

Seperate SCTP checksum offloading for IPv4 and IPv6.
While there: remove some trainling whitespaces.

MFC after: 3 days
X-MFC with: 236170


236262 29-May-2012 jkim

Fix style(9) nits, reduce unnecessary type castings, etc., for bpf_setf().


236261 29-May-2012 jkim

- Save the previous filter right before we set new one.
- Reduce duplicate code and make it little easier to read.

MFC after: 2 weeks


236251 29-May-2012 jkim

Fix 32-bit shim for BIOCSETF to drop all packets buffered on the descriptor
and reset statistics as it should.

MFC after: 3 days


236231 29-May-2012 melifaro

Fix BPF_JITTER code broken by r235746.

Pointed by: jkim
Reviewed by: jkim (except locking changes)
Approved by: (mentor)
MFC after: 2 weeks


236178 28-May-2012 rea

if_lagg: allow to invoke SIOCSLAGGPORT multiple times in a row

Currently, 'ifconfig laggX down' does not remove members from this
lagg(4) interface. So, 'service netif stop laggX' followed by
'service netif start laggX' will choke, because "stop" will leave
interfaces attached to the laggX and ifconfig from the "start" will
refuse to add already-existing interfaces.

The real-world case is when I am bundling together my Ethernet and
WiFi interfaces and using multiple profiles for accessing network in
different places: system being booted up with one profile, but later
this profile being exchanged to another one, followed by 'service
netif restart' will not add WiFi interface back to the lagg: the
"stop" action from 'service netif restart' will shut down my main WiFi
interface, so wlan0 that exists in the lagg0 will be destroyed and
purged from lagg0; the "start" action will try to re-add both
interfaces, but since Ethernet one is already in lagg0, ifconfig will
refuse to add the wlan0 from WiFi interface.

Since adding the interface to the lagg(4) when it is already here
should be an idempotent action: we're really not changing anything,
so this fix doesn't change the semantics of interface addition.

Approved by: thompsa
Reviewed by: emaste
MFC after: 1 week


236170 28-May-2012 bz

It turns out that too many drivers are not only parsing the L2/3/4
headers for TSO but also for generic checksum offloading. Ideally we
would only have one common function shared amongst all drivers, and
perhaps when updating them for IPv6 we should introduce that.
Eventually we should provide the meta information along with mbufs to
avoid (re-)parsing entirely.

To not break IPv6 (checksums and offload) and to be able to MFC the
changes without risking to hurt 3rd party drivers, duplicate the v4
framework, as other OSes have done as well.

Introduce interface capability flags for TX/RX checksum offload with
IPv6, to allow independent toggling (where possible). Add CSUM_*_IPV6
flags for UDP/TCP over IPv6, and reserve further for SCTP, and IPv6
fragmentation. Define CSUM_DELAY_DATA_IPV6 as we do for legacy IP and
add an alias for CSUM_DATA_VALID_IPV6.

This pretty much brings IPv6 handling in line with IPv4.
TSO is still handled in a different way and not via if_hwassist.

Update ifconfig to allow (un)setting of the new capability flags.
Update loopback to announce the new capabilities and if_hwassist flags.

Individual driver updates will have to follow, as will SCTP.

Reported by: gallatin, dim, ..
Reviewed by: gallatin (glanced at?)
MFC after: 3 days
X-MFC with: r235961,235959,235958


236062 26-May-2012 thompsa

Turn LACP debugging from a compile time option to a sysctl, it is very handy to
be able to turn it on when negotiation to a switch misbehaves.

Submitted by: Andrew Boyer
MFC after: 3 days


235960 25-May-2012 bz

MFp4 bz_ipv6_fast:

Simple yet effective change enabling checksum "offload" on loopback
for IPv6 to avoid expensive computations.

Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems

Reviewed by: gnn (as part of the whole)
MFC After: 3 days


235747 21-May-2012 melifaro

Make most BPF ioctls() SMP-safe.

Approved by: kib(mentor)
MFC in: 4 weeks


235746 21-May-2012 melifaro

Call bpf_jitter() before acquiring BPF global lock due to malloc() being used inside bpf_jitter.

Eliminate bpf_buffer_alloc() and allocate BPF buffers on descriptor creation and BIOCSBLEN ioctl.
This permits us not to allocate buffers inside bpf_attachd() which is protected by global lock.

Approved by: kib(mentor)
MFC in: 4 weeks


235745 21-May-2012 melifaro

Fix old panic when BPF consumer attaches to destroying interface.
'flags' field is added to the end of bpf_if structure. Currently the only
flag is BPFIF_FLAG_DYING which is set on bpf detach and checked by bpf_attachd()
Problem can be easily triggered on SMP stable/[89] by the following command (sort of):
'while true; do ifconfig vlan222 create vlan 222 vlandev em0 up ; tcpdump -pi vlan222 & ; ifconfig vlan222 destroy ; done'

Fix possible use-after-free when BPF detaches itself from interface, freeing bpf_bif memory,
while interface is still UP and there can be routes via this interface.
Freeing is now delayed till ifnet_departure_event is received via eventhandler(9) api.

Convert bpfd rwlock back to mutex due lack of performance gain (currently checking if packet
matches filter is done without holding bpfd lock and we have to acquire write lock if packet matches)

Approved by: kib(mentor)
MFC in: 4 weeks


235744 21-May-2012 melifaro

Fix panic on attaching to non-existent interface (introduced by r233937, pointed by hrs@)
Fix panic on tcpdump being attached to interface being removed (introduced by r233937, pointed by hrs@ and adrian@)
Protect most of bpf_setf() by BPF global lock

Add several forgotten assertions (thanks to adrian@)

Document current locking model inside bpf.c
Document EVENTHANDLER(9) usage inside BPF.

Approved by: kib(mentor)
Tested by: gnn
MFC in: 4 weeks


235640 19-May-2012 marcel

Use the LLINDEX macro to access the link-level I/F index. This makes
it possible to work with a different type for the sdl_index field --
it only requires a recompile.

Obtained from: Juniper Networks, Inc.


235425 14-May-2012 delphij

Sync DLTs with the latest pcap version.

MFC after: 2 weeks


234946 03-May-2012 melifaro

Revert r234834 per luigi@ request.

Cleaner solution (e.g. adding another header) should be done here.

Original log:
Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h.
Remove ipfw/ip_fw_private.h header from non-ipfw code.

Requested by: luigi
Approved by: kib(mentor)


234936 03-May-2012 emaste

Relax restriction on direct tx to child ports

Lagg(4) restricts the type of packet that may be sent directly to a child
port, to avoid undesired output from accidental misconfiguration.
Previously only ETHERTYPE_PAE was permitted.

BPF writes to a lagg(4) child port are presumably intentional, so just
allow them, while still blocking other packets that should take the
aggregation path.

PR: kern/138620
Approved by: thompsa@


234834 30-Apr-2012 melifaro

Move several enums and structures required for L2 filtering from ip_fw_private.h to ip_fw.h.
Remove ipfw/ip_fw_private.h header from non-ipfw code.

Approved by: ae(mentor)
MFC after: 2 weeks


234572 22-Apr-2012 melifaro

Do not require radix write lock to be held while dumping route table
via sysctl(4) interface. This permits router not to stop forwarding
packets while route table is being written to user-supplied buffer.

Reported by: Pawel Tyll <ptyll@nitronet.pl>
Approved by: kib(mentor)

MFC after: 1 week


234488 20-Apr-2012 thompsa

Move the interface media check to a taskqueue, some interfaces (usb) sleep
during SIOCGIFMEDIA and we were holding locks.


234487 20-Apr-2012 thompsa

Add linkstate to bridge(4), set the link to up when at least one underlying
interface is up, otherwise the link is down.

This, among other things, allows carp to work on a bridge.

Prodded by: glebius
Tested by: Alexander Lunev


234403 18-Apr-2012 thompsa

Remove KASSERTS, they do not add any value here since the pointer is about to
be derefernced anyway.


234227 13-Apr-2012 luigi

A bit of cleanup in the names of fields of netmap-related structures.
Use the name 'ring' instead of 'queue' in all fields.
Bump NETMAP_API.


234171 12-Apr-2012 luigi

remove an unnecessary #define


234163 12-Apr-2012 thompsa

Set the proto to LAGG_PROTO_NONE before calling the detach routine so packets
are discarded, this is an issue because lacp drops the lock which may allow
network threads to access freed memory. Expand the lock coverage so the
detach/attach happen atomically.

Submitted by: Andrew Boyer (earlier version)


234098 10-Apr-2012 jhb

Add media types for 40G media that might be used with FreeBSD.

Reviewed by: bz
MFC after: 2 weeks


233946 06-Apr-2012 melifaro

Fix build broken by r233938.

Pointed by: David Wolfskill <david@catwhisker.org>
Approved by: kib (mentor)
Pointy hat to: melifaro


233938 06-Apr-2012 melifaro

- Improve performace for writer-only BPF users.

Linux and Solaris (at least OpenSolaris) has PF_PACKET socket families to send
raw ethernet frames. The only FreeBSD interface that can be used to send raw frames
is BPF. As a result, many programs like cdpd, lldpd, various dhcp stuff uses
BPF only to send data. This leads us to the situation when software like cdpd,
being run on high-traffic-volume interface significantly reduces overall performance
since we have to acquire additional locks for every packet.

Here we add sysctl that changes BPF behavior in the following way:
If program came and opens BPF socket without explicitly specifyin read filter we
assume it to be write-only and add it to special writer-only per-interface list.
This makes bpf_peers_present() return 0, so no additional overhead is introduced.
After filter is supplied, descriptor is added to original per-interface list permitting
packets to be captured.

Unfortunately, pcap_open_live() sets catch-all filter itself for the purpose of
setting snap length.

Fortunately, most programs explicitly sets (event catch-all) filter after that.
tcpdump(1) is a good example.

So a bit hackis approach is taken: we upgrade description only after second
BIOCSETF is received.

Sysctl is named net.bpf.optimize_writers and is turned off by default.

- While here, document all sysctl variables in bpf.4

Sponsored by Yandex LLC

Reviewed by: glebius (previous version)
Reviewed by: silence on -net@
Approved by: (mentor)

MFC after: 4 weeks


233937 06-Apr-2012 melifaro

- Improve BPF locking model.

Interface locks and descriptor locks are converted from mutex(9) to rwlock(9).
This greately improves performance: in most common case we need to acquire 1
reader lock instead of 2 mutexes.

- Remove filter(descriptor) (reader) lock in bpf_mtap[2]
This was suggested by glebius@. We protect filter by requesting interface
writer lock on filter change.

- Cover struct bpf_if under BPF_INTERNAL define. This permits including bpf.h
without including rwlock stuff. However, this is is temporary solution,
struct bpf_if should be made opaque for any external caller.

Found by: Dmitrij Tejblum <tejblum@yandex-team.ru>
Sponsored by: Yandex LLC

Reviewed by: glebius (previous version)
Reviewed by: silence on -net@
Approved by: (mentor)

MFC after: 3 weeks


233202 19-Mar-2012 jhb

Retire the IF_ADDR_LOCK() and IF_ADDR_UNLOCK() compat macros from HEAD.
The new [RW]LOCK macros are merged back to 8.x so should be suitable for
new code in HEAD even if it is to be MFC'd.


233113 18-Mar-2012 bz

Hide kernel option ROUTETABLES evaluations in the implementation
rather than the header file. With this also move RT_MAXFIBS and
RT_NUMFIBS into the implemantion to avoid further usage in other
code. rt_numfibs is all that should be needed.

This allows users to change the number of FIBs from 1..RT_MAXFIBS(16)
dynamically using the tunable without the need to change the kernel
config for the maximum anymore. This means that thet multi-FIB
feature is now fully available with GENERIC kernels.
The kernel option ROUTETABLES can still be used to set the default
numbers of FIBs in absence of the tunable.

Ok.ed by: julian, hrs, melifaro
MFC after: 2 weeks


232824 11-Mar-2012 luigi

- remove an extra parenthesis in a closing brace;
- add the macro NETMAP_RING_FIRST_RESERVED() which returns
the index of the first non-released buffer in the ring
(this is useful for code that retains buffers for some time
instead of processing them immediately)


232640 07-Mar-2012 thompsa

Move the vlan buffer space into the union which also fixes an unused variable
warning with !INET & !INET6.

Spotted by: pluknet


232629 06-Mar-2012 thompsa

Add the ability to set which packet layers are used for the load balance hash
calculation.


232487 04-Mar-2012 zec

Properly restore curvnet context when returning early from
ether_input_internal().

This change only affects options VIMAGE kernel builds.

PR: kern/165643
Submitted by: Vijay Singh
MFC after: 3 days


232449 03-Mar-2012 jmallett

o) Add COMPAT_FREEBSD32 support for MIPS kernels using the n64 ABI with userlands
using the o32 ABI. This mostly follows nwhitehorn's lead in implementing
COMPAT_FREEBSD32 on powerpc64.
o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the
32-bit ABI being used. Since the MIPS port is relatively-new, even the 32-bit
ABIs use a 64-bit time_t.
o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS
with 32-bit compatibility, then, disable some code which assumes otherwise
wrongly when built for MIPS. A more general macro to check in this case would
seem like a good idea eventually. If someone adds support for using n32
userland with n64 kernels on MIPS, then they will have to add a variety of
flags related to each piece of the ABI that can vary. That's probably the
right time to generalize further.
o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the
freebsd32 compat code. Probably this should be generalized at some point.

Reviewed by: gonzo


232315 29-Feb-2012 thompsa

Use a more appropriate default for the maximum number of addresses in the
bridge forwarding table.

PR: docs/164564
Discussed with: brueffer


232238 27-Feb-2012 luigi

A bunch of netmap fixes:

USERSPACE:
1. add support for devices with different number of rx and tx queues;

2. add better support for zero-copy operation, adding an extra field
to the netmap ring to indicate how many buffers we have already processed
but not yet released (with help from Eddie Kohler);

3. The two changes above unfortunately require an API change, so while
at it add a version field and some spares to the ioctl() argument
to help detect mismatches.

4. update the manual page for the two changes above;

5. update sample applications in tools/tools/netmap

KERNEL:

1. simplify the internal structures moving the global wait queues
to the 'struct netmap_adapter';

2. simplify the functions that map kring<->nic ring indexes

3. normalize device-specific code, helps mainteinance;

4. start exploring the impact of micro-optimizations (prefetch etc.)
in the ixgbe driver.
Use 'legacy' descriptors on the tx ring and prefetch slots gives
about 20% speedup at 900 MHz. Another 7-10% would come from removing
the explict calls to bus_dmamap* in the core (they are effectively
NOPs in this case, but it takes expensive load of the per-buffer
dma maps to figure out that they are all NULL.

Rx performance not investigated.

I am postponing the MFC so i can import a few more improvements
before merging.


232118 24-Feb-2012 thompsa

Only look for a usable MAC address for the bridge ID from ports within our
bridge, this allows us to have more than one independent bridge in the same
STP domain.

PR: kern/164369
Submitted by: Nikos Vassiliadis (earlier version)
MFC after: 2 weeks


232080 23-Feb-2012 thompsa

Add a sysctl/tunable default value for the use_flowid sysctl in r232008.


232070 23-Feb-2012 thompsa

Indicate this function decrements the timer as well as testing for expiry.


232054 23-Feb-2012 kmacy

When using flowtable llentrys can outlive the interface with which they're associated
at which the lle_tbl pointer points to freed memory and the llt_free pointer is no longer
valid.

Move the free pointer in to the llentry itself and update the initalization sites.

MFC after: 2 weeks


232030 23-Feb-2012 thompsa

Now that network interfaces advertise if they support linkstate notifications
we do not need to perform a media ioctl every 15 seconds.


232014 23-Feb-2012 thompsa

bstp_input() always consumes the packet so remove the mbuf handling dance
around it.

Obtained from: OpenBSD (r1.37)


232008 22-Feb-2012 thompsa

Using the flowid in the mbuf assumes the network card is giving a good hash for
the traffic flow, this may not be the case giving poor traffic distribution.
Add a sysctl which allows us to fall back to our own flow hash code.

PR: kern/164901
Submitted by: Eugene Grosbein
MFC after: 1 week


231852 17-Feb-2012 bz

Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:

Extend the so far IPv4-only support for multiple routing tables (FIBs)
introduced in r178888 to IPv6 providing feature parity.

This includes an extended rtalloc(9) KPI for IPv6, the necessary
adjustments to the network stack, and user land support as in netstat.

Sponsored by: Cisco Systems, Inc.
Reviewed by: melifaro (basically)
MFC after: 10 days


231678 14-Feb-2012 tijl

Change some headers such that lang/gcc* ports no longer patch them.

The lang/gcc* ports patch headers where they think something is
non-standard. These patched headers override the system headers which means
you have to rebuild these ports whenever you do installworld to make sure
they contain the latest changes.


231505 11-Feb-2012 bz

Introduce a new NET_RT_IFLISTL API to query the address list. It works
on extended and extensible structs if_msghdrl and ifa_msghdrl. This
will allow us to extend both the msghdrl structs and eventually if_data
in the future without breaking the ABI.

Bump __FreeBSD_version to allow ports to more easily detect the new API.

Reviewed by: glebius, brooks
MFC after: 3 days


231504 11-Feb-2012 bz

Backout changes from r228571. Remove if_data from struct ifa_msghdr again.
While this breaks carp on HEAD temporary, it restores the upgrade path from
stable, and head before 20111215.

Reviewed by: glebius, brooks


231229 08-Feb-2012 pluknet

g/c last bit of old ipv6 prefix management.

Reviewed by: bz
Obtained from: NetBSD, net/if.h, rev 1.80


231198 08-Feb-2012 luigi

- change the buffer size from a constant to a
TUNABLE variable (hw.netmap.buf_size) so we can experiment
with values different from 2048 which may give better cache performance.

- rearrange the memory allocation code so it will be easier
to replace it with a different implementation. The current code
relies on a single large contiguous chunk of memory obtained through
contigmalloc.
The new implementation (not committed yet) uses multiple
smaller chunks which are easier to fit in a fragmented address
space.


231130 07-Feb-2012 pjd

Allow to set if_bridge(4) sysctls from /boot/loader.conf.

MFC after: 3 days


231013 05-Feb-2012 glebius

Fix typo in r231010.

Submitted by: linimon


231010 05-Feb-2012 glebius

Better comment for ifa_init(), ifa_ref(), ifa_free().


231009 05-Feb-2012 glebius

In ifa_init() initialize if_data.ifi_datalen. This would be
required after upcoming changes from bz@.

Discussed with: bz


230598 26-Jan-2012 kmacy

A flowtable entry can continue referencing an llentry indefinitely if the entry is repeatedly
referenced within its timeout window. This change clears the LLE_VALID flag when an llentry
is removed from an interface's hash table and adds an extra check to the flowtable code
for the LLE_VALID flag in llentry to avoid retaining and using a stale reference.

Reviewed by: qingli@
MFC after: 2 weeks


230510 24-Jan-2012 bz

Replace random ARIN direct assignment legacy IPs with proper RFC 5735
TEST-NET1 block for use in documentation and example code addresses.

MFC after: 3 days


230108 14-Jan-2012 eadler

- Fix trivial typo

Approved by: nwhitehorn
MFC after: 3 days


230026 12-Jan-2012 rwatson

Clarify throughout the vlan(4) code the difference between a "tag" (the
802.1q-defined 16-bit VID, CFI, and PCP field in host by order) and a
VLAN ID (VID). Tags go in packets. VIDs identify VLANs.

No functional change is intended, so this should be safe to MFC. Further
cleanup with functional changes will be committed separately (for example,
renaming vlan_tag/vlan_tag_p, which modify the KPI and KBI).

Reviewed by: bz
Sponsored by: ADARA Networks, Inc.
MFC after: 3 days


229898 10-Jan-2012 lstewart

Consumers of bpfdetach() expect it to remove all bpf_if structs from the
bpf_iflist list which reference the specified ifnet. The existing implementation
only removes the first matching bpf_if found in the list, effectively leaking
list entries if an ifnet has been bpfattach()ed multiple times with different
DLTs.

Fix the leak by performing the detach logic in a loop, stopping when all bpf_if
structs referencing the specified ifnet have been detached and removed from the
bpf_iflist list.

Whilst here, also:

- Remove the unnecessary "bp->bif_ifp == NULL" check, as a bpf_if should never
exist in the list with a NULL ifnet pointer.

- Except when INVARIANTS is in the kernel config, silently ignore the case where
no bpf_if referencing the specified ifnet is found, as it is harmless and does
not require user attention.

Reviewed by: csjp
MFC after: 1 week


229873 09-Jan-2012 jhb

Convert the per-interface address list lock from a mutex to a reader/writer
lock.

Reviewed by: bz


229814 08-Jan-2012 glebius

Copy ifa->if_data to ifam->ifam_data. This was forgotten in r228571.

Submitted by: bz


229810 08-Jan-2012 glebius

Move arprequest() declaration to if_ether.h.


229698 06-Jan-2012 glebius

Since r228571 CARP is no longer an interface.


229621 05-Jan-2012 jhb

Convert all users of IF_ADDR_LOCK to use new locking macros that specify
either a read lock or write lock.

Reviewed by: bz
MFC after: 2 weeks


229614 05-Jan-2012 jhb

Add new variants of the IF_ADDR_*LOCK*() macros used for protecting
interface address lists that distinguish read locks from write locks.
To preserve the KPI, the previous operations are mapped to the write
lock macros. The lock is still kept as a mutex for now.

Reviewed by: bz
MFC after: 2 weeks


229587 05-Jan-2012 rwatson

Refine last comment.

Submitted by: joeld
Sponsored by: ADARA Networks, Inc.
MFC after: 3 days


229586 05-Jan-2012 rwatson

Add comment to the VLAN code about its integration with VIMAGE: we see what
the code is doing, we recognise the legitimacy of its goal, but we're not
quite sure it's going about it the right way. More pondering is clearly
required.

Sponsored by: ADARA Networks, Inc.
Discussed with: bz
MFC after: 3 days


229073 31-Dec-2011 lstewart

Revert r228986 until it can be reworked to avoid panicing the kernel when the
same interface is attached multiple times with different DLTs, as is done in
net80211 for example.

Reported by: adrian


228986 30-Dec-2011 lstewart

- Introduce the net.bpf.tscfg sysctl tree and associated code so as to make one
aspect of time stamp configuration per interface rather than per BPF
descriptor. Prior to this, the order in which BPF devices were opened and the
per descriptor time stamp configuration settings could cause non-deterministic
and unintended behaviour with respect to time stamping. With the new scheme, a
BPF attached interface's tscfg sysctl entry can be set to "default", "none",
"fast", "normal" or "external". Setting "default" means use the system default
option (set with the net.bpf.tscfg.default sysctl), "none" means do not
generate time stamps for tapped packets, "fast" means generate time stamps for
tapped packets using a hz granularity system clock read, "normal" means
generate time stamps for tapped packets using a full timecounter granularity
system clock read and "external" (currently unimplemented) means use the time
stamp provided with the packet from an underlying source.

- Utilise the recently introduced sysclock_getsnapshot() and
sysclock_snap2bintime() KPIs to ensure the system clock is only read once per
packet, regardless of the number of BPF descriptors and time stamp formats
requested. Use the per BPF attached interface time stamp configuration to
control if sysclock_getsnapshot() is called and whether the system clock read
is fast or normal. The per BPF descriptor time stamp configuration is then
used to control how the system clock snapshot is converted to a bintime by
sysclock_snap2bintime().

- Remove all FAST related BPF descriptor flag variants. Performing a "fast"
read of the system clock is now controlled per BPF attached interface using
the net.bpf.tscfg sysctl tree.

- Update the bpf.4 man page.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

In collaboration with: Julien Ridoux (jridoux at unimelb edu au)


228967 29-Dec-2011 yongari

Update if_obytes and if_omcast after successful transmit.
While I'm here update if_oerrors if parent interface of vlan is not
up and running. Previously it updated collision counter and it was
confusing to interprete it.

PR: kern/163478
Reviewed by: glebius, jhb
Tested by: Joe Holden < lists <> rewt dot org dot uk >


228768 21-Dec-2011 glebius

Provide ABI compatibility shim to enable configuring of addresses
with ifconfig(8) prior to r228571.

Requested by: brooks


228736 20-Dec-2011 glebius

Restore a feature that was present in 5.x and 6.x, and was cleared in
7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP
preemption, while it is running its bulk update.

However, reimplement the feature in more elegant manner, that is
partially inspired by newer OpenBSD:

- Rename term "suppression" to "demotion", to match with OpenBSD.
- Keep a global demotion factor, that can be raised by several
conditions, for now these are:
- interface goes down
- carp(4) has problems with ip_output() or ip6_output()
- pfsync performs bulk update
- Unlike in OpenBSD the demotion factor isn't a counter, but
is actual value added to advskew. The adjustment values for
particular error conditions are also configurable, and their
defaults are maximum advskew value, so a single failure bumps
demotion to maximum. This is for POLA compatibility, and should
satisfy most users.
- Demotion factor is a writable sysctl, so user can do
foot shooting, if he desires to.


228571 16-Dec-2011 glebius

A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.

The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.

ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.

To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]

The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.

Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!

PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]


228532 15-Dec-2011 glebius

Simplify rtrequest(RTM_ADD): ifa can't be NULL after rt_getifa_fib().


228380 09-Dec-2011 brooks

Remove the unused if_free_type() function.

X-MFC after: never


228276 05-Dec-2011 luigi

1. Fix the handling of link reset while in netmap more.
A link reset now is completely transparent for the netmap client:
even if the NIC resets its own ring (e.g. restarting from 0),
the client will not see any change in the current rx/tx positions,
because the driver will keep track of the offset between the two.

2. make the device-specific code more uniform across different drivers
There were some inconsistencies in the implementation of the netmap
support routines, now drivers have been aligned to a common
code structure.

3. import netmap support for ixgbe . This is implemented as a very
small patch for ixgbe.c (233 lines, 11 chunks, mostly comments:
in total the patch has only 54 lines of new code) , as most of
the code is in an external file sys/dev/netmap/ixgbe_netmap.h ,
following some initial comments from Jack Vogel about making
changes less intrusive.
(Note, i have emailed Jack multiple times asking if he had
comments on this structure of the code; i got no reply so
i assume he is fine with it).

Support for other drivers (em, lem, re, igb) will come later.

"ixgbe" is now the reference driver for netmap support. Both the
external file (sys/dev/netmap/ixgbe_netmap.h) and the device-specific
patches (in sys/dev/ixgbe/ixgbe.c) are heavily commented and should
serve as a reference for other device drivers.

Tested on i386 and amd64 with the pkt-gen program in tools/tools/netmap,
the sender does 14.88 Mpps at 1050 Mhz and 14.2 Mpps at 900 MHz
on an i7-860 with 4 cores and 82599 card. Haven't tried yet more
aggressive optimizations such as adding 'prefetch' instructions
in the time-critical parts of the code.


228132 29-Nov-2011 lstewart

Revert r227778 in preparation for committing reworked patches in its place.


228089 28-Nov-2011 jhb

Change the if_vlan driver to use if_transmit for forwarding packets to the
parent interface. This avoids the overhead of queueing a packet to an IFQ
only to immediately dequeue it again.

Suggested by: np
Reviewed by: brooks
MFC after: 1 month


228071 28-Nov-2011 glebius

- Use generic alloc_unr(9) allocator for if_clone, instead
of hand-made.
- When registering new cloner, check whether a cloner with
same name already exist.
- When allocating unit, also check with help of ifunit()
whether such interface already exist or not. [1]

PR: kern/162789 [1]


227832 22-Nov-2011 glebius

Improve logging:
- don't hardcode function name
- use LOG_DEBUG for such a debug message
- print error value


227778 21-Nov-2011 lstewart

- When feed-forward clock support is compiled in, change the BPF header to
contain both a regular timestamp obtained from the system clock and the
current feed-forward ffcounter value. This enables new possibilities including
comparison of timekeeping performance and timestamp correction during post
processing.

- Add the net.bpf.ffclock_tstamp sysctl to provide a choice between timestamping
packets using the feedback or feed-forward system clock.

Committed on behalf of Julien Ridoux and Darryl Veitch from the University of
Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward
Clock Synchronization Algorithms" project.

For more information, see http://www.synclab.org/radclock/

Submitted by: Julien Ridoux (jridoux at unimelb edu au)


227614 17-Nov-2011 luigi

Bring in support for netmap, a framework for very efficient packet
I/O from userspace, capable of line rate at 10G, see

http://info.iet.unipi.it/~luigi/netmap/

At this time I am bringing in only the generic code (sys/dev/netmap/
plus two headers under sys/net/), and some sample applications in
tools/tools/netmap. There is also a manpage in share/man/man4 [1]

In order to make use of the framework you need to build a kernel
with "device netmap", and patch individual drivers with the code
that you can find in

sys/dev/netmap/head.diff

The file will go away as the relevant pieces are committed to
the various device drivers, which should happen in a few days
after talking to the driver maintainers.

Netmap support is available at the moment for Intel 10G and 1G
cards (ixgbe, em/lem/igb), and for the Realtek 1G card ("re").
I have partial patches for "bge" and am starting to work on "cxgbe".
Hopefully changes are trivial enough so interested third parties
can submit their patches. Interested people can contact me
for advice on how to add netmap support to specific devices.

CREDITS:
Netmap has been developed by Luigi Rizzo and other collaborators
at the Universita` di Pisa, and supported by EU project CHANGE
(http://www.change-project.eu/)
The code is distributed under a BSD Copyright.

[1] In my opinion is a bad idea to have all manpage in one directory.
We should place kernel documentation in the same dir that contains
the code, which would make it much simpler to keep doc and code
in sync, reduce the clutter in share/man/ and incidentally is
the policy used for all of userspace code.
Makefiles and doc tools can be trivially adjusted to find the
manpages in the relevant subdirs.


227503 14-Nov-2011 rmh

Remove a few bits of FreeBSD 2.x compatibility code.

Approved by: kib (mentor)


227459 11-Nov-2011 brooks

In r191367 the need for if_free_type() was removed and a new member
if_alloctype was used to store the origional interface type. Take
advantage of this change by removing all existing uses of if_free_type()
in favor of if_free().

MFC after: 1 Month


227309 07-Nov-2011 ed

Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.

The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.


227293 07-Nov-2011 ed

Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.

This means that their use is restricted to a single C file.


227061 03-Nov-2011 mlaier

Fix a use-after-free/redzone issue in the routing code.

Reported by (repeatedly): Mike Tancsa
Prodded by (repeatedly): bz
Forgotten by (repeatedly): mlaier
MFC after: 2 weeks


226830 27-Oct-2011 glebius

Add macro IF_DEQUEUE_ALL(ifq, m), that takes the entire mbuf chain off
the queue. It can be utilized in queue processing to avoid multiple
locking/unlocking.


226710 25-Oct-2011 qingli

The host-id/interface-id can have a specific value and is properly
masked out when adding a prefix route through the "route" command.
However, when deleting the route, simply changing the command keyword
from "add" to "delete" does not work. The failoure is observed in
both IPv4 and IPv6 route insertion. The patch makes the route command
behavior consistent between the "add" and the "delete" operation.

MFC after: 1 week


226610 21-Oct-2011 ed

Add missing #includes.

According to POSIX, these two header files should be able to be included
by themselves, not depending on other headers. The <net/if.h> header
uses struct sockaddr when __BSD_VISIBLE=1, while <netinet/tcp.h> uses
integer datatypes (u_int32_t, u_short, etc).

MFC after: 2 months


226500 18-Oct-2011 ed

Get rid of D_PSEUDO.

It seems the D_PSEUDO flag was meant to allow make_dev() to return NULL.
Nowadays we have a different interface for that; make_dev_p(). There's
no need to keep it there.

While there, remove an unneeded D_NEEDMINOR from the gpio driver.

Discussed with: gonzo@ (gpio)


225837 28-Sep-2011 bz

Pass the fibnum where we need filtering of the message on the
rtsock allowing routing daemons to filter routing updates on an
rtsock per FIB.

Adjust raw_input() and split it into wrapper and a new function
taking an optional callback argument even though we only have one
consumer [1] to keep the hackish flags local to rtsock.c.

PR: kern/134931
Submitted by: multiple (see PR)
Suggested by: rwatson [1]
Reviewed by: rwatson
MFC after: 3 days


225698 20-Sep-2011 kmacy

Make KBI changes required for future MFCing of inpcb rtentry / llentry caching.

Reviewed by: rwatson, bz
Approved by: re (kib)


225617 16-Sep-2011 kmacy

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


225380 04-Sep-2011 thompsa

On the first loop for generating a bridge MAC address use the local
hostid, this gives a good chance of keeping the same address over
reboots. This is intended to help IPV6 and similar which generate
their addresses from the mac.

PR: kern/160300
Submitted by: mdodd
Approved by: re (kib)


225209 27-Aug-2011 bz

When adding IPv6 fwd support to ipfw in r225044 these two files were
not committed. Initialize next_hop6 to align with the IPv4 code.

PR: bin/117214
MFC after: 3 weeks
X-MFC with: r225044
Approved by: re (kib)


225177 25-Aug-2011 attilio

Fix a deficiency in the selinfo interface:
If a selinfo object is recorded (via selrecord()) and then it is
quickly destroyed, with the waiters missing the opportunity to awake,
at the next iteration they will find the selinfo object destroyed,
causing a PF#.

That happens because the selinfo interface has no way to drain the
waiters before to destroy the registered selinfo object. Also this
race is quite rare to get in practice, because it would require a
selrecord(), a poll request by another thread and a quick destruction
of the selrecord()'ed selinfo object.

Fix this by adding the seldrain() routine which should be called
before to destroy the selinfo objects (in order to avoid such case),
and fix the present cases where it might have already been called.
Sometimes, the context is safe enough to prevent this type of race,
like it happens in device drivers which installs selinfo objects on
poll callbacks. There, the destruction of the selinfo object happens
at driver detach time, when all the filedescriptors should be already
closed, thus there cannot be a race.
For this case, mfi(4) device driver can be set as an example, as it
implements a full correct logic for preventing this from happening.

Sponsored by: Sandvine Incorporated
Reported by: rstone
Tested by: pluknet
Reviewed by: jhb, kib
Approved by: re (bz)
MFC after: 3 weeks


225163 25-Aug-2011 qingli

When the RADIX_MPATH kernel option is enabled, the RADIX_MPATH code tries
to find the first route node of an ECMP chain before executing the route
command. If the system has a default route, and the specific route argument
to the command does not exist in the routing table, then the default route
would be reached. The current code does not verify the reached node matches
the given route argument, therefore erroneous removed the entry. This patch
fixes that bug.

Approved by: re
MFC after: 3 days


224703 08-Aug-2011 kevlo

In rtinit1(), before rtrequest1_fib() is called, info.rti_flags is
initialized by flags (function argument) or-ed with ifa->ifa_flags.
If both NIC has a loopback route to itself, so IFA_RTSELF is set on ifa(s).
As IFA_RTSELF is defined by RTF_HOST, rtrequest1_fib() is called with
RTF_HOST flag even if netmask is not NULL. Consequently, netmask is set
to zero in rtrequest1_fib(), and request to add network route is changed
under hands to request to add host route.

Tested by: Andrew Boyer <aboyer at averesystems.com>
Submitted by: Svatopluk Kraus <onwahe at gmail dot com>
Approved by: re (hrs)


224571 01-Aug-2011 pluknet

Add missing MODULE_VERSION() definition to protect against duplicating
module loads.

PR: kern/159345
Reported by: Eugene Grosbein <egrosbein att rdtc ru>
Tested by: Eugene Grosbein <egrosbein att rdtc ru>
Approved by: re (kib)
MFC after: 1 week


224151 17-Jul-2011 bz

Add spares to the network stack for FreeBSD-9:
- TCP keep* timers
- TCP UTO (adjust from what was there already)
- netmap
- route caching
- user cookie (temporary to allow for the real fix)

Slightly re-shuffle struct ifnet moving fields out of the middle
of spares and to better align.

Discussed with: rwatson (slightly earlier version)


224044 14-Jul-2011 mp

Clear the filter memory area before using it. Leaving it uninitialized may
leak previous kernel stack contents through a malicioius BPF filter.

PR: kern/158880
Submitted by: Guy Harris
Obtained from: OpenBSD
MFC after: 1 week


223862 08-Jul-2011 zec

Permit ARP to proceed for IPv4 host routes for which the gateway is the
same as the host address. This already works fine for INET6 and ND6.

While here, remove two function pointers from struct lltable which are
only initialized but never used.

MFC after: 3 days


223846 07-Jul-2011 thompsa

Grab the rlock before checking if our interface is enabled, it could be
possible to hit a dead pointer when changing interfaces.

PR: kern/156978
Submitted by: Andrew Boyer
MFC after: 1 week


223741 03-Jul-2011 bz

Tag mbufs of all incoming frames or packets with the interface's FIB
setting (either default or if supported as set by SIOCSIFFIB, e.g.
from ifconfig).

Submitted by: Alexander V. Chernikov (melifaro ipfw.ru)
Reviewed by: julian
MFC after: 2 weeks


223739 03-Jul-2011 bz

Remove extra white space to comply with style for the rest of the struct.

MFC after: 2 weeks


223735 03-Jul-2011 bz

Add infrastructure to allow all frames/packets received on an interface
to be assigned to a non-default FIB instance.

You may need to recompile world or ports due to the change of struct ifnet.

Submitted by: cjsp
Submitted by: Alexander V. Chernikov (melifaro ipfw.ru)
(original versions)
Reviewed by: julian
Reviewed by: Alexander V. Chernikov (melifaro ipfw.ru)
MFC after: 2 weeks
X-MFC: use spare in struct ifnet


223625 28-Jun-2011 pluknet

Update ifc_len field of struct ifconf passed for the ioctl SIOCGIFCONF32
(i.e. under COMPAT_FREEBSD32) in case ifconf() returned success to match
the native SIOCGIFCONF behavior.

PR: kern/158369
Reported by: Paul Procacci <pprocacci att gmail com>
MFC after: 1 week


223359 21-Jun-2011 bz

Garbage collect never used global, sysctl, externs.

MFC after: 1 week


223334 20-Jun-2011 bz

Leave an extra comment about flowtable and IPv6 support rectifying a
previous comment.

MFC after: 1 week


223223 18-Jun-2011 bz

gre(4) was using a field in the softc to detect possible recursion.
On MP systems this is not a usable solution anymore and could easily
lead to false positives triggering enough logging that even using
the console was no longer usable (multiple parallel ping -f can do).

Switch to the suggested solution of using mbuf tags to carry per
packet state between gre_output() invocations. Contrary to the
proposed solution modelled after gif(4) only allocate one mbuf tag
per packet rather than per packet and per gre_output() pass through.

As the sysctl to control the possible valid (gre in gre) nestings does
no sanity checks, make sure to always allocate space in the mbuf tag
for at least one, and at most 255 possible gre interfaces to detect
loops in addition to the counter.

Submitted by: Cristian KLEIN (cristi net.utcluj.ro) (original version)
PR: kern/114714
Reviewed by: Cristian KLEIN (cristi net.utcluj.ro)
Reviewed bu: Wooseog Choi (ben_choi hotmail.com)
Sponsored by: Sandvine Incorporated
MFC after: 1 week


223078 14-Jun-2011 luigi

Grab one of the ifcap bits for netmap, and enable printing in ifconfig.

Document the fact that we might want an IFCAP_CANTCHANGE mask,
even though the value is not yet used in sys/net/if.c

(asked on -current a week ago, no feedback so i assume no objection).


222834 07-Jun-2011 zec

Set curvnet context in a callout-trigerred code path.

MFC after: 3 days


222651 03-Jun-2011 jhb

Properly return an ENOBUFS error if a write to a tun(4) device fails
due to m_uiotombuf() failing.

While here, trim unneeded error handling related to tuninit() since it
can never fail.

Submitted by: Martin Birgmeier la5lbtyi aon at
Reviewed by: glebius
MFC after: 1 week


222583 01-Jun-2011 rwatson

Add an optional netisr dispatch point at ether_input(), but set the
default dispatch method to NETISR_DISPATCH_DIRECT in order to force
direct dispatch. This adds a fairly negligble overhead without
changing default behavior, but in the future will allow deferred or
hybrid dispatch to other worker threads before link layer processing
has taken place.

For example, this could allow redistribution using RSS hashes
without ethernet header cache line hits, if the NIC was unable to
adequately implement load balancing to too small a number of input
queues -- perhaps due to hard queueset counts of 1, 3, or 8, but in
a modern system with 16-128 threads. This can happen on highly
threaded systems, where you want want an ithread per core,
redistributing work to other queues, but also on virtualised systems
where hardware hashing is (or is not) available, but only a single
queue has been directed to one VCPU on a VM.

Note: this adds a previously non-present assertion about the
equivalence of the ifnet from which the packet is received, and the
ifnet stamped in the mbuf header. I believe this assertion to
generally be true, but we'll find out soon -- if it's not, we might
have to add additional overhead in some cases to add an m_tag with
the originating ifnet pointer stored in it.

Reviewed by: bz
MFC after: 3 weeks
Sponsored by: Juniper Networks, Inc.


222531 31-May-2011 nwhitehorn

On multi-core, multi-threaded PPC systems, it is important that the threads
be brought up in the order they are enumerated in the device tree (in
particular, that thread 0 on each core be brought up first). The SLIST
through which we loop to start the CPUs has all of its entries added with
SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration
and so AP startup would always fail in such situations (causing a machine
check or RTAS failure). Fix this by changing the SLIST into an STAILQ,
and inserting new CPUs at the end.

Reviewed by: jhb


222249 24-May-2011 rwatson

Rework netisr policy mechanism so that per-protocol dispatch policies can
be represented:

- A single policy namespace is defined, consisting of four possible
policies: "default" to use the global default, "deferred" to force
deferred dispatch, "direct" to employ direct dispatch where possible, and
"hybrid" which makes a dynamic decision based on CPU affinity, ordering,
etc. Routines are implemented to convert between strings and an integer
namespace.

- A new global variable, netisr_dispatch_policy, subsumes existing global
variables for direct dispatch, forced direct dispatch, etc, and is used
for explicit policy interpretation and composition. Old variables remain
so that they can be exported by legacy sysctls for use by old netstat(1)
binaries. A new sysctl and tunable, netisr.dispatch.policy, accepts the
above strings for specifying a global policy default.

- The protocol registration structure, netisr_handler, grows an nh_dispatch
field, which accepts a per-policy policy override. The default value is
'0', which corresponds to "default", meaning that protocols will accept
the global default policy unless otherwise specified.

- Policies are now interpreted and composed explicitly at various points in
packet dispatch; protocol policies override global policies.

- Protocols grow the ability to express a non-opinion about affinity even
when implenting m2cpuid by returning NETISR_CPUID_NONE. In that case, the
framework falls back on source ordering, rather than simply using the
current CPU.

These changes are in support of allowing link layer re-dispatch based on
RSS or similar hashes provided by NICs, especially in the case where the
number of hardware receive queues matches hardware core count, rather than
hardware thread count, requiring further software redistributeon. (i.e.,
on RMI XLR).

MFC after: 3 weeks
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.


222247 24-May-2011 zec

Allow for vlan(4) interfaces with MTU of 1500 bytes to be configured
on top of epair(4) virtual interfaces, since there's no physical
hardware associated with epair interfaces which would imply any
constraints on MTU sizes.

MFC after: 3 days


222246 24-May-2011 zec

Let epair(4) virtual interfaces report fake link / media status,
by borrowing the skeleton of if_media manipulation and reporting
code from if_lagg(4). The main motivation behind this change is
to allow for epair(4) interfaces to participate in STP if_bridge(4)
configurations.

Reviewed by: bz
MFC after: 3 days


222143 20-May-2011 qingli

The statically configured (permanent) ARP entries are removed when an
interface is brought down, even though the interface address is still
valid. This patch maintains the permanent ARP entries as long as the
interface address (having the same prefix as that of the ARP entries)
is valid.

Reviewed by: delphij
MFC after: 5 days


221955 15-May-2011 marius

- Add 10baseT as an alias for 10baseT/UTP.
- Add shorthand aliases for common media+option combinations as announced
by miibus(4) so that one can actually supply the media strings found in
the dmesg output to ifconfig(8).

Obtained from: NetBSD (in principle)
MFC after: 2 weeks


221552 06-May-2011 yongari

Fix white space nits and style


221548 06-May-2011 yongari

Do not increment collision counter if transmit have failed.
Transmission error in tun(4) is queueing error(i.e. ENOBUFS) and it
has nothing to do with collision.

Reported by: Zeus V Panchenko (zeus <> ibs dot dn dot ua)


221270 30-Apr-2011 thompsa

LACP frames must not be send VLAN-tagged, check for that before processing.

PR: kern/156743
Submitted by: Dmitrij Tejblum
MFC after: 1 week


221130 27-Apr-2011 bz

Make various (pseudo) interfaces compile without INET in the kernel
adding appropriate #ifdefs. For module builds the framework needs
adjustments for at least carp.

Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 4 days


220317 04-Apr-2011 glebius

When removing ifnets, we should first remove the reference to ifnet
from the interface index, then decrease refcount, not vice versa.

Otherwise there is a race (reproducible) when if_free_internal()
contests on IFNET_WLOCK(), and we got a zero-refed ifnet in the
index for a long time. It may be picked by some other thread,
that runs ifnet_byindex_ref(), who takes the ifnet from index,
and bumps refcount. When reader drops the lock, if_free_internal()
proceeds with free. Then reader tries to free it a second time.


219819 21-Mar-2011 jeff

- Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
and other miscellaneous small features.


219791 20-Mar-2011 dchagin

Remove dead code.

MFC after: 1 Week


219786 19-Mar-2011 dchagin

ouch, newrt is used on the return path, my fault.
Partialy revert the previous change.

MFC after: 1 Week.


219783 19-Mar-2011 dchagin

A bit rearranged rtalloc1_fib() code.
Initialize a variable when it is really needed.
To avoid code duplication move the miss label to line up and jump on it.

MFC after: 1 Week


219776 19-Mar-2011 dchagin

Remove a now unused variable.

MFC after: 1 Week


219275 04-Mar-2011 eri

Fix a panic that can happen when trying to destroy a lagg(4) with scheduler set to none.

Approved by: thompsa(mentor)
MFC after: 1 week


219206 02-Mar-2011 bz

Hide the outer IP addresses of a tunnel interfaces (gif(4), gre(4))
from processes inside jails if the addresses do not belong to the jail.

Originally reported by: Pieter de Boer via remko
PR: kern/151119
Tested by: Piotr KUCHARSKI (nospam 42.pl) [gif]
MFC after: 1 week


218909 21-Feb-2011 brucec

Fix typos - remove duplicate "the".

PR: bin/154928
Submitted by: Eitan Adler <lists at eitanadler.com>
MFC after: 3 days


218757 16-Feb-2011 bz

Mfp4 CH=177274,177280,177284-177285,177297,177324-177325

VNET socket push back:
try to minimize the number of places where we have to switch vnets
and narrow down the time we stay switched. Add assertions to the
socket code to catch possibly unset vnets as seen in r204147.

While this reduces the number of vnet recursion in some places like
NFS, POSIX local sockets and some netgraph, .. recursions are
impossible to fix.

The current expectations are documented at the beginning of
uipc_socket.c along with the other information there.

Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Reviewed by: jhb
Tested by: zec

Tested by: Mikolaj Golub (to.my.trociny gmail.com)
MFC after: 2 weeks


218567 11-Feb-2011 bz

Mfp4 CH=177255:

Resort the CURVNET_SET* macros in the non-VNET_DEBUG case to match
the call order of the VNET_DEBUG case.

Add the VNET_ASSERT() to the non-VNET_DEBUG case as well so that
INVARIANTS will still catch problems.

Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Reviewed by: jhb

MFC after: 2 weeks


218559 11-Feb-2011 bz

Mfp4 CH=177255:

Make VNET_ASSERT() available with either VNET_DEBUG or INVARIANTS.

Change the syntax to match KASSERT() to allow more flexible panic
messages rather than having a printf with hardcoded arguments
before panic.

Adjust the few assertions we have to the new format (and enhance
the output).

Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Reviewed by: jhb

MFC after: 2 weeks


218555 11-Feb-2011 bz

Mfp4 CH=177255:

Use __func__ rather than __FUNCTION__.

MFC after: 2 weeks


218503 10-Feb-2011 mlaier

As info.rti_info[RTAX_DST] can point inside of rtm we must not free the rtm
until rt_dispatch is done with the sockaddr.

Found by: memguard
MFC after: 3 days


217805 24-Jan-2011 jhb

Fix a LOR by dropping the global ifnet locks while allocating a new ifnet
table in if_grow(). The order of the SYSINIT's for ifnet state were swapped
so that the various locks were initialized before being used.

Reviewed by: pluknet, bz
MFC after: 2 weeks


217586 19-Jan-2011 mdf

sysctl(8) should use the CTLTYPE to determine the type of data when
reading. (This was already done for writing to a sysctl). This
requires all SYSCTL setups to specify a type. Most of them are now
checked at compile-time.

Remove SYSCTL_*X* sysctl additions as the print being in hex should be
controlled by the -x flag to sysctl(8).

Succested by: bde


217322 12-Jan-2011 mdf

sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.

Commit the net* piece.


217265 11-Jan-2011 jhb

Remove unneeded includes of <sys/linker_set.h>. Other headers that use
it internally contain nested includes.

Reviewed by: bde


217203 09-Jan-2011 bz

MfP4 CH=185246 [1]:

Add FEATURE() to announce optional VIMAGE.

MFC after: 3 days
[1] for the moment put it in vnet.c.


217076 06-Jan-2011 jhb

- Restore dropping the priority of syncer down to PPAUSE when it is idle.
This was lost when it was converted to using a condition variable instead
of lbolt.
- Drop the priority of flowtable down to PPAUSE when it is idle as well
since it is a similar background task.

MFC after: 2 weeks


217013 05-Jan-2011 marius

Teach ifconfig(8) the handy shared option shortcut aliases the NetBSD
counterpart also takes, i.e. "fdx" for "full-duplex", "flow" for
"flowcontrol", "hdx" for "half-duplex" as well as "loop" and "loopback"
for "hw-loopback".

MFC after: 1 week


217010 05-Jan-2011 marius

Fix whitespace.

MFC after: 1 week


216859 31-Dec-2010 bz

Use NULL rather than 0 to invalidate a pointer.

Rather than duplicating the LLE_FREE_LOCKED() macro code in LLE_FREE(),
call it directly (like we do for the RT_* macros).

Sponsored by: ISPsystem [1]
Reviewed by: julian [1]
MFC After: 1 week

[1] Early 2010.


216856 31-Dec-2010 bz

Print the vnet pointer under DDB when iterating over flowtables of each
virtual network stack instance.

Sponsored by: ISPsystem [1]
Reviewed by: julian [1]
MFC after: 1 week

[1] Early 2010.


216855 31-Dec-2010 bz

Move the increment operation under the lock and split the condition
variable into two so that we can see on which one we are waiting.
This might also more properly propagate the update of the
flowclean_cycles flag and avoid "hangs" people were seeing.

Suggested by: rwatson [1]
Sponsored by: ISPsystem [1]
Reviewed by: julian [1]
Updated by: Mikolaj Golub (to.my.trociny gmail.com)
Tested by: Mikolaj Golub (to.my.trociny gmail.com)
MFC After: 1 week

[1] Early 2010, initial version.


216699 25-Dec-2010 alc

Introduce and use a new VM interface for temporarily pinning pages. This
new interface replaces the combined use of vm_fault_quick() and
pmap_extract_and_hold() throughout the kernel.

In collaboration with: kib@


216268 07-Dec-2010 weongyo

Adds IFF_CANTCONFIG to IFF_CANTCHANGE that it shouldn't happen through
ioctl(2).


216267 07-Dec-2010 weongyo

Introduces IFF_CANTCONFIG interface flag to point that the interface
isn't configurable in a meaningful way. This is for ifconfig(8) or
other tools not to change code whenever IFT_USB-like interfaces are
registered at the interface list.

Reviewed by: brooks
No objections: gavin, jkim


215792 24-Nov-2010 maxim

o Swap descriptions for net.bpf.bufsize and net.bpf.maxbufsize.

PR: misc/152531
MFC after: 1 week


215726 22-Nov-2010 zec

Allow for vlan(4) ifnets to have overlapping unit numbers if they are
created in separated vnets. As a side-effect of having a separated
if_cloner instance for each vnet, all vlan ifnets created in a vnet
will be automatically destroyed when vnet teardown is initiated.

Disallow SIOCSETVLAN and SIOCGETVLAN ioctls on vlan ifnets which are
associated with physical ifnets residing in parent vnets.

This is an interim vlan-specific solution which will be superseded by a
more generic if_cloner V_irtualization change from p4. For nooptions
VIMAGE builds, this should be a no-op change.

Discussed with: bz
MFC after: 3 days


215701 22-Nov-2010 dim

After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.


215641 21-Nov-2010 bz

Add a missing ';' and change the debugging sysctl from xint to int.

Submitted by: Mikolaj Golub (to.my.trociny gmail.com)
MFC after: 3 days


215318 14-Nov-2010 dim

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.


215317 14-Nov-2010 dim

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.


215316 14-Nov-2010 dim

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.


215297 14-Nov-2010 marius

o Flesh out the generic IEEE 802.3 annex 31B full duplex flow control
support in mii(4):
- Merge generic flow control advertisement (which can be enabled by
passing by MIIF_DOPAUSE to mii_attach(9)) and parsing support from
NetBSD into mii_physubr.c and ukphy_subr.c. Unlike as in NetBSD,
IFM_FLOW isn't implemented as a global option via the "don't care
mask" but instead as a media specific option this. This has the
following advantages:
o allows flow control advertisement with autonegotiation to be
turned on and off via ifconfig(8) with the default typically
being off (though MIIF_FORCEPAUSE has been added causing flow
control to be always advertised, allowing to easily MFC this
changes for drivers that previously used home-grown support for
flow control that behaved that way without breaking POLA)
o allows to deal with PHY drivers where flow control advertisement
with manual selection doesn't work or at least isn't implemented,
like it's the case with brgphy(4), e1000phy(4) and ip1000phy(4),
by setting MIIF_NOMANPAUSE
o the available combinations of media options are readily available
from the `ifconfig -m` output
- Add IFM_FLOW to IFM_SHARED_OPTION_DESCRIPTIONS and IFM_ETH_RXPAUSE
and IFM_ETH_TXPAUSE to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so
these are understood by ifconfig(8).
o Make the master/slave support in mii(4) actually usable:
- Change IFM_ETH_MASTER from being implemented as a global option via
the "don't care mask" to a media specific one as it actually is only
applicable to IFM_1000_T to date.
- Let mii_phy_setmedia() set GTCR_MAN_MS in IFM_1000_T slave mode to
actually configure manually selected slave mode (like we also do in
the PHY specific implementations).
- Add IFM_ETH_MASTER to IFM_SUBTYPE_ETHERNET_OPTION_DESCRIPTIONS so it
is understood by ifconfig(8).
o Switch bge(4), bce(4), msk(4), nfe(4) and stge(4) along with brgphy(4),
e1000phy(4) and ip1000phy(4) to use the generic flow control support
instead of home-grown solutions via IFM_FLAGs. This includes changing
these PHY drivers and smcphy(4) to no longer unconditionally advertise
support for flow control but only if the selected media has IFM_FLOW
set (or MIIF_FORCEPAUSE is set) and implemented for these media variants,
i.e. typically only for copper.
o Switch brgphy(4), ciphy(4), e1000phy(4) and ip1000phy(4) to report and
set IFM_1000_T master mode via IFM_ETH_MASTER instead of via IFF_LINK0
and some IFM_FLAGn.
o Switch brgphy(4) to add at least the the supported copper media based on
the contents of the BMSR via mii_phy_add_media() instead of hardcoding
them. The latter approach seems to have developed historically, besides
causing unnecessary code duplication it was also undesirable because
brgphy_mii_phy_auto() already based the capability advertisement on the
contents of the BMSR though.
o Let brgphy(4) set IFM_1000_T master mode on all supported PHY and not
just BCM5701. Apparently this was a misinterpretation of a workaround
in the Linux tg3 driver; BCM5701 seem to require RGPHY_1000CTL_MSE and
BRGPHY_1000CTL_MSC to be set when configuring autonegotiation but
this doesn't mean we can't set these as well on other PHYs for manual
media selection.
o Let ukphy_status() report IFM_1000_T master mode via IFM_ETH_MASTER so
IFM_1000_T master mode support now is generally available with all PHY
drivers.
o Don't let e1000phy(4) set master/slave bits for IFM_1000_SX as it's
not applicable there.

Reviewed by: yongari (plus additional testing)
Obtained from: NetBSD (partially), OpenBSD (partially)
MFC after: 2 weeks


215238 13-Nov-2010 kib

Use 'z' modifier for size_t printing.


215212 12-Nov-2010 dim

Similar to r212647, remove the workaround in sys/net/vnet.h for an ld
bug (incorrect placement of __start_SECNAME in some cases) that was
fixed in r210245.

There is already an UPDATING entry about needing a recent ld.

MFC after: 1 month


215207 12-Nov-2010 gnn

Add a queue to hold packets while we await an ARP reply.

When a fast machine first brings up some non TCP networking program
it is quite possible that we will drop packets due to the fact that
only one packet can be held per ARP entry. This leads to packets
being missed when a program starts or restarts if the ARP data is
not currently in the ARP cache.

This code adds a new sysctl, net.link.ether.inet.maxhold, which defines
a system wide maximum number of packets to be held in each ARP entry.
Up to maxhold packets are queued until an ARP reply is received or
the ARP times out. The default setting is the old value of 1
which has been part of the BSD networking code since time
immemorial.

Expose the time we hold an incomplete ARP entry by adding
the sysctl net.link.ether.inet.wait, which defaults to 20
seconds, the value used when the new ARP code was added..

Reviewed by: bz, rpaulo
MFC after: 3 weeks


215138 11-Nov-2010 dim

Use the same treatment as in linker_set.h for the __start and __stop
symbols of the set_vnet and set_pcpu sections, so those symbols will
always be emitted in kernel modules, if they use vnet.h or pcpu.h.

Also, for pcpu.h, make the __(start|stop)_set_pcpu declarations, and
associated macros invisible to userland, to prevent it picking up these
symbols.

Reviewed by: kib


214517 29-Oct-2010 rpaulo

Sync DLTs with the latest pcap version.


214333 25-Oct-2010 bz

Factor out DDB commands from r204145, r204279 into if_debug.c for further
enhancements (1). Switch to a standard 2-clause BSD license for this (2).

Unfortunately we have to un-static the ifindex_table for this but do not
publicly export it.

Suggested by: rwatson (1) a while back.
Approved by: thompsa (2) for the change from r204279.
MFC after: 6 days


214136 21-Oct-2010 pluknet

Reshuffle SIOCGIFCONF32 handler from r155224.

- move all the chunks into one file, which allows to hide SIOCGIFCONF32
global definition as well.
- replace __amd64__ with proper COMPAT_FREEBSD32 around.
- handle 32bit capacity before going into the handler itself instead of
doing internal 32bit specific changes within it (e.g. as it's done for
SIOCGDEFIFACE32_IN6).
- use explicitely sized types for ABI compat.

Approved by: kib (mentor)
MFC after: 2 weeks


213930 16-Oct-2010 bz

Close a race acquiring the IF_ADDR_LOCK() for each entry while iterating
over all interfaces to make sure the address will neither change nor be
freed while we are working on it.

PR: kern/146250
Submitted by: Mikolaj Golub (to.my.trociny gmail.com)
MFC after: 1 week


213929 16-Oct-2010 bz

lltable_drain() has never been used so far, thus #if 0 it for now.
While touching it add the missing locking to the now disabled code
for the time when we'll resurrect it.

MFC after: 3 days


213328 01-Oct-2010 bz

Only hide the ifa and not the tp under #ifdef INET as the tp is needed
for locking evenwhen there is no INET.

MFC after: 3 days


213028 22-Sep-2010 jhb

- Expand scope of tun/tap softc locks to cover more softc fields and
driver-maintained ifnet fields (such as if_drv_flags).
- Use soft locks as the mutex that protects each interface's knote list
rather than using the global knote list lock. Also, use the softc
for kn_hook instead of the cdev.
- Use mtx_sleep() instead of tsleep() when blocking in the read routines.
This fixes a lost wakeup race.
- Remove D_NEEDGIANT now that the cdevsw routines use the softc lock
where locking is needed.
- Lock IFQ when calculating the result for FIONREAD in tap(4). tun(4)
already did this.
- Remove remaining spl calls.

Submitted by: Marcin Cieslak saper of saper|info (3)
MFC after: 2 weeks


212757 16-Sep-2010 jkim

Fix a typo in a comment.

Submitted by: afiveg


212425 10-Sep-2010 mdf

Replace sbuf_overflowed() with sbuf_error(), which returns any error
code associated with overflow or with the drain function. While this
function is not expected to be used often, it produces more information
in the form of an errno that sbuf_overflowed() did.


212152 02-Sep-2010 bz

MFp4 CH=183259:

No reason to use if_free_type() as we don't change our type.
Just if_free() is fine.

MFC after: 3 days


212100 01-Sep-2010 emaste

Add a sysctl knob to accept input packets on any link in a failover lagg.


211904 27-Aug-2010 bz

MFp4 CH=182972:

Add explicit linkstate UP/DOWN for the epair. This is needed by carp(4)
and other things to work.

MFC after: 5 days


211616 22-Aug-2010 rpaulo

Add an extra comment to the SDT probes definition. This allows us to get
use '-' in probe names, matching the probe names in Solaris.[1]

Add userland SDT probes definitions to sys/sdt.h.

Sponsored by: The FreeBSD Foundation
Discussed with: rwaston [1]


211283 13-Aug-2010 zec

When moving an ethernet ifnet from one vnet to another, destroy the
associated ng_ether netgraph node in the current vnet, and create a
new one in the target vnet.

Reviewed by: julian
MFC after: 3 days


211193 11-Aug-2010 will

Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with
the appropriate ifdefs.

Reviewed by: bz
Approved by: ken (mentor)


211157 11-Aug-2010 will

Allow carp(4) to be loaded as a kernel module. Follow precedent set by
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.

Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default. Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.

This commit requires IP6PROTOSPACER, introduced in r211115.

Reviewed by: bz, simon
Approved by: ken (mentor)
MFC after: 2 weeks


210937 06-Aug-2010 jhb

Adjust the interface type in the link layer socket address for vlan(4)
interfaces to be a vlan (IFT_L2VLAN) rather than an Ethernet interface
(IFT_ETHER). The code already fixed if_type in the ifnet causing some
places to report the interface as a vlan (e.g. arp -a output) and other
places to report the interface as Ethernet (getifaddrs(3)). Now they
should all report IFT_L2VLAN.

Reviewed by: brooks
MFC after: 1 month


210805 03-Aug-2010 kib

Properly set ifi_datalen for compat32 struct if_data32.

PR: kern/149240
Submitted by: Stef Walter <stef memberwebs com>
MFC after: 1 weeks


210533 27-Jul-2010 glebius

Don't check malloc(M_WAITOK) result.


210532 27-Jul-2010 bz

Return NULL rather than 0 for a pointer.

MFC after: 3 days


210529 27-Jul-2010 glebius

When installing a new ARP entry via 'arp -S', lla_lookup() will
either find an existing entry, or allocate a new one. In the latter
case an entry would have flags, that were supplied as argument to
lla_lookup(). In case of an existing entry, flags aren't modified.

This lead to losing LLE_PUB and/or LLE_PROXY flags.

We should apply these flags either in lla_rt_output() or in the
in.c:in_lltable_lookup(). It seems to me that lla_rt_output() is
a more correct choice.

PR: kern/148784, kern/146539
Silence from: qingli, 5 days


210383 22-Jul-2010 jkim

Fix an obvious typo from r1.1. We were acquiring an exclusive writer lock
regardless of the given flags.

MFC after: 3 days


210122 15-Jul-2010 luigi

whitespace cleanup


210121 15-Jul-2010 luigi

small portability fix to build on linux/windows


209216 15-Jun-2010 jkim

Implement flexible BPF timestamping framework.

- Allow setting format, resolution and accuracy of BPF time stamps per
listener. Previously, we were only able to use microtime(9). Now we can
set various resolutions and accuracies with ioctl(2) BIOCSTSTAMP command.
Similarly, we can get the current resolution and accuracy with BIOCGTSTAMP
command. Document all supported options in bpf(4) and their uses.

- Introduce new time stamp 'struct bpf_ts' and header 'struct bpf_xhdr'.
The new time stamp has both 64-bit second and fractional parts. bpf_xhdr
has this time stamp instead of 'struct timeval' for bh_tstamp. The new
structures let us use bh_tstamp of same size on both 32-bit and 64-bit
platforms without adding additional shims for 32-bit binaries. On 64-bit
platforms, size of BPF header does not change compared to bpf_hdr as its
members are already all 64-bit long. On 32-bit platforms, the size may
increase by 8 bytes. For backward compatibility, struct bpf_hdr with
struct timeval is still the default header unless new time stamp format is
explicitly requested. However, the behaviour may change in the future and
all relevant code is wrapped around "#ifdef BURN_BRIDGES" for now.

- Add experimental support for tagging mbufs with time stamps from a lower
layer, e.g., device driver. Currently, mbuf_tags(9) is used to tag mbufs.
The time stamps must be uptime in 'struct bintime' format as binuptime(9)
and getbinuptime(9) do.

Reviewed by: net@


209059 11-Jun-2010 jhb

Update several places that iterate over CPUs to use CPU_FOREACH().


208743 02-Jun-2010 zec

Provide a macro for registering a virtualized sysctl handler for
VNET opaque data.

MFC after: 30 days


208553 25-May-2010 qingli

This patch fixes the problem where proxy ARP entries cannot be added
over the if_ng interface.

MFC after: 3 days


208212 17-May-2010 jhb

Ignore failures from removing multicast addresses from the parent (trunk)
interface when tearing down a vlan interface. If a trunk interface is
detached, all of its multicast addresses are removed before the ifnet
departure eventhandlers are invoked. This means that all of the multicast
addresses are removed before the vlan interfaces are removed which causes
the if_delmulti() calls in the vlan teardown to fail.

In the VLAN_ARRAY case, this left vlan interfaces referencing a no longer
valid parent interface. In the !VLAN_ARRAY case, the eventhandler gets
stuck in an infinite loop retrying vlan_unconfig_locked() forever. In
general the callers of vlan_unconfig_locked() do not expect nor handle
failure, so I believe it is safer to ignore the errors and tear down as
much of the vlan state as possible.

Silence from: net@
MFC after: 4 days


208171 16-May-2010 kmacy

allocate ipv6 flows from the ipv6 flow zone

reported by: rrs@

MFC after: 3 days


208100 14-May-2010 bz

Fix an issue with the dynamic pcpu/vnet data allocators.

We cannot expect that modspace is the last entry in the linker
set and thus that modspace + possible extra space up to PAGE_SIZE
would be contiguous. For the moment do not support more than
*_MODMIN space and ignore the extra space (*).

(*) We know how to get it back but it'll need testing.

Discussed with: jeff, rwatson (briefly)
Reviewed by: jeff
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 4 days


207953 12-May-2010 kmacy

workaround bug with ipv6 where a flow can have a null rtentry


207708 06-May-2010 alc

Remove page queues locking from all sf_buf_mext()-like functions. The page
lock now suffices.

Fix a couple nearby style violations.


207617 04-May-2010 alc

Add page locking to the vm_page_cow* functions.

Push down the acquisition and release of the page queues lock into
vm_page_wire().

Reviewed by: kib


207554 03-May-2010 sobomax

Add new tunable 'net.link.ifqmaxlen' to set default send interface
queue length. The default value for this parameter is 50, which is
quite low for many of today's uses and the only way to modify this
parameter right now is to edit if_var.h file. Also add read-only
sysctl with the same name, so that it's possible to retrieve the
current value.

MFC after: 1 month


207548 03-May-2010 alc

This is the first step in transitioning responsibility for synchronizing
access to the page's wire_count from the page queues lock to the page lock.

Submitted by: kmacy


207410 30-Apr-2010 kmacy

On Alan's advice, rather than do a wholesale conversion on a single
architecture from page queue lock to a hashed array of page locks
(based on a patch by Jeff Roberson), I've implemented page lock
support in the MI code and have only moved vm_page's hold_count
out from under page queue mutex to page lock. This changes
pmap_extract_and_hold on all pmaps.

Supported by: Bitgravity Inc.

Discussed with: alc, jeffr, and kib


207369 29-Apr-2010 bz

MFP4: @176978-176982, 176984, 176990-176994, 177441

"Whitspace" churn after the VIMAGE/VNET whirls.

Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.

Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.

This also removes some header file pollution for putatively
static global variables.

Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.

Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 6 days


207303 28-Apr-2010 kmacy

need to initialize the lock before it is used

MFC after: 3 days


207278 27-Apr-2010 bz

MFP4: @177254

Add missing CURVNET_RESTORE() calls for multiple code paths, to stop
leaking the currently cached vnet into callers and to the process.

Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 4 days


207195 25-Apr-2010 kib

Provide compat32 shims for bpf(4), except zero-copy facilities.

bd_compat32 field of struct bpf_d is kept unconditionally to not
impose the requirement of including "opt_compat.h" on all numerous
users of bpfdesc.h.

Submitted by: jhb (version for 6.x)
Reviewed and tested by: emaste
MFC after: 2 weeks


207194 25-Apr-2010 kib

Provide 32bit compat shims for sysctl net.route NET_RT_IFLIST.
This allows getifaddrs(3) to work for compat32 binaries.

Submitted by: jhb (6.x version)
Reviewed by: emaste
Tested by: emaste and <pluknet gmail com>
MFC after: 2 weeks


206639 14-Apr-2010 julian

Move two copies of the same definition to a common include file.

MFC after: 3 weeks


206637 14-Apr-2010 delphij

When an underlying ioctl(2) handler returns an error, our ioctl(2)
interface considers that it hits a fatal error, and will not copyout
the request structure back for _IOW and _IOWR ioctls, keeping them
untouched.

The previous implementation of the SIOCGIFDESCR ioctl intends to
feed the buffer length back to userland. However, if we return
an error, the feedback would be defeated and ifconfig(8) would
trap into an infinite loop.

This commit changes SIOCGIFDESCR to set buffer field to NULL to
indicate the previous ENAMETOOLONG case.

Reported by: bschmidt
MFC after: 2 weeks


206488 11-Apr-2010 bz

Take a reference to make sure that the interface cannot go away during
if_clone_destroy() in case parallel threads try to.

PR: kern/116837
Submitted by: Mikolaj Golub (to.my.trociny gmail.com)
MFC after: 10 days


206486 11-Apr-2010 bz

Check that the interface is on the list of cloned interfaces before trying
to remove it to avoid panics in case of two threads trying to remove it in
parallel.

PR: kern/116837
Submitted by: Takahiro Kurosawa (takahiro.kurosawa gmail.com) (orig version)
MFC after: 10 days


206481 11-Apr-2010 bz

Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.

In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.

In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.

In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().

In if_llatbl.c when freeing entire tables make sure that in case we cancel
a pending callout to remove the reference as well.

Reviewed by: qingli (earlier version)
MFC after: 10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
Christian Kratzer (ck cksoft.de),
Evgenii Davidov (dado korolev-net.ru)
PR: kern/144564
Configurations still affected: with options FLOWTABLE


206470 11-Apr-2010 bz

In if_detach_internal() we cannot hold the af_data lock over the
dom_ifdetach() calls as they might sleep for callout_drain().
Do as we do in if_attachdomain1() [r121470] and handle
if_afdata_initialized earlier and call dom_ifdetach() unlocked.

Discussed with: rwatson
MFC after: 10 days


206469 11-Apr-2010 bz

In if_detach_internal() only try to do the detach run if if_attachdomain1()
has actually succeeded to initialize and attach. There is a theoretical
possibility to drop out early in if_attachdomain1() leaving the array
uninitialized if we cannot get the lock.

Discussed with: rwatson
MFC after: 10 days


205858 29-Mar-2010 jkim

Check the pointer to JIT binary filter before its de-allocation.

Submitted by: Alexander Sack (asack at niksun dot com)
MFC after: 3 days


205515 23-Mar-2010 rpaulo

Add MCS to the list of media types.

Sponsored by: iXsystems, inc.


205488 22-Mar-2010 kmacy

- boot-time size the ipv4 flowtable and the maximum number of flows
- increase flow cleaning frequency and decrease flow caching time
when near the flow limit
- stop allocating new flows when within 3% of maxflows don't start
allocating again until below 12.5%

MFC after: 7 days


205411 21-Mar-2010 emaste

Avoid holding the VLAN_LOCK() over the parent interface SIOCGIFMEDIA
ioctl call, as it may sleep.

Reviewed by: rwatson


205345 19-Mar-2010 bz

Split eventhandler_register() into an internal part and a wrapper function
that provides the allocated and setup eventhandler entry.

Add a new wrapper for VIMAGE that allocates extra space to hold the
callback function and argument in addition to an extra wrapper function.
While the wrapper function goes as normal callback function the
argument points to the extra space allocated holding the original func
and arg that the wrapper function can then call.

Provide an iterator function for the virtual network stack (vnet) that
will call the callback function for each network stack.

Provide a new set of macros for VNET that in the non-VIMAGE case will
just call eventhandler_register() while in the VIMAGE case it will use
vimage_eventhandler_register() passing in the extra iterator function
but will only register once rather than per-vnet.
We need a special macro in case we are interested in the tag returned
as we must check for curvnet and can neither simply assign the
return value, nor not change it in the non-vnet0 case without that.

Sponsored by: ISPsystem
Discussed with: jhb
Reviewed by: zec (earlier version), jhb
MFC after: 1 month


205276 18-Mar-2010 bz

Add ddb support to the "new" link layer code ("new-arp"):
- show all lltables [1] (optional flag to also show the llentries as well)
- show lltable <struct lltable *>
- show llentry <struct llentry *>

MFC after: 6 days


205222 16-Mar-2010 qingli

Verify interface up status using its link state only
if the interface has such capability. The interface
capability flag indicates whether such capability
exists. This approach is much more backward compatible.
Physical device driver changes will be part of another
commit.

Also updated the ifconfig utility to show the LINKSTATE
capability if present.

Reviewed by: rwatson, imp, juli
MFC after: 3 days


205197 15-Mar-2010 mlaier

Fix a small bug in drbr_dequeue_cond spotted while preparing MFC of r203834.

MFC after: 3 days


205097 12-Mar-2010 kmacy

flowtable_get_hashkey is only used by a DDB function - move under #ifdef DDB

pointed out by jkim@


205095 12-Mar-2010 jkim

Fix a style(9) nit.


205093 12-Mar-2010 kmacy

re-update copyright to 2010
pointed out by danfe@


205092 12-Mar-2010 jkim

Tidy up callout for select(2) and read timeout.

- Add a missing callout_drain(9) before the descriptor deallocation.[1]
- Prefer callout_init_mtx(9) over callout_init(9) and let the callout
subsystem handle the mutex for callout function.

PR: kern/144453
Submitted by: Alexander Sack (asack at niksun dot com)[1]
MFC after: 1 week


205077 12-Mar-2010 qingli

The flow-table module retrieves the destination and source
address as well as the transport protocol port information
from the outbound packets. The routing code is generic and
compares every byte in the given sockaddr object. Therefore
the temporary sockaddr objects must be cleared due to padding
bytes. In addition, the port information must be stripped
or the route search will either fail or return the incorrect
route entry.

Unit testing is done using OpenVPN over the if_tun interface.

MFC after: 7 days


205069 12-Mar-2010 kmacy

fix stats reporting sysctl


205066 12-Mar-2010 kmacy

- restructure flowtable to support ipv6
- add a name argument to flowtable_alloc for printing with ddb commands
- extend ddb commands to print destination address or 4-tuples
- don't parse ports in ulp header if FL_HASH_ALL is not passed
- add kern_flowtable_insert to enable more generic use of flowtable
(e.g. system calls for adding entries)
- don't hash loopback addresses
- cleanup whitespace
- keep statistics per-cpu for per-cpu flowtables to avoid cache line contention
- add sysctls to accumulate stats and report aggregate

MFC after: 7 days


205024 11-Mar-2010 qingli

The if_tap interface is of IFT_ETHERNET type, but it
does not set or update the if_link_state variable.
As such RT_LINK_IS_UP() fails for the if_tap interface.

Also, the RT_LINK_IS_UP() needs to bypass all loopback
interfaces because loopback interfaces are considered
up logically as long as the system is running.

This patch fixes the above issues by setting and updating
the if_link_state variable when the tap interface is
opened or closed respectively. Similary approach is
already done in the if_tun device.

MFC after: 3 days


204902 09-Mar-2010 qingli

One of the advantages of enabling ECMP (a.k.a RADIX_MPATH) is to
allow for connection load balancing across interfaces. Currently
the address alias handling method is colliding with the ECMP code.
For example, when two interfaces are configured on the same prefix,
only one prefix route is installed. So connection load balancing
among the available interfaces is not possible.

The other advantage of ECMP is for failover. The issue with the
current code, is that the interface link-state is not reflected
in the route entry. For example, if there are two interfaces on
the same prefix, the cable on one interface is unplugged, new and
existing connections should switch over to the other interface.
This is not done today and packets go into a black hole.

Also, there is a small bug in the kernel where deleting ECMP routes
in the userland will always return an error even though the command
is successfully executed.

MFC after: 5 days


204901 09-Mar-2010 delphij

Remove the check for IFF_DRV_OACTIVE right before adding a port into lagg
interface. The check itself seems to be coming from OpenBSD but does not
seem to be useful for our code.

Discussed with: thomasa
MFC after: 1 month


204837 07-Mar-2010 bz

Not only flush the ipfw tables when unloading ipfw or tearing
down a virtual netowrk stack, but also free the Radix Node Head.

Sponsored by: ISPsystem
Reviewed by: julian
MFC after: 5 days


204808 06-Mar-2010 bz

Introduce a function rn_detachhead() that will free the
radix table root nodes. This is only needed (and available)
in the virtualization case to free the resources when tearing
down a virtual network stack.

Sponsored by: ISPsystem
Reviewed by: julian, zec
MFC after: 5 days


204805 06-Mar-2010 bz

Rework reference counting in case we queue into the netisr,
or overflow the netisr queue and fall back to the interface
queue so that we can garuantee that the ifnet pointer stays
valid. Formerly we ended up with reference counts <= 0 in
case the netisr had returned ENOBUFS. The idea is to track
any packet in the netisr queue and only change the refount
on edge operations for the fallback interface queue. This
also avoids problems in case the if_snd.ifq_len lies to us.

Also rework refount assertions to make sure they trigger if
we go below 1. Formerly a negative refence count did not
trigger the assert as the refcount variable is u_int.

Sponsored by: ISPsystem
MFC after: 5 days


204591 02-Mar-2010 luigi

Bring in the most recent version of ipfw and dummynet, developed
and tested over the past two months in the ipfw3-head branch. This
also happens to be the same code available in the Linux and Windows
ports of ipfw and dummynet.

The major enhancement is a completely restructured version of
dummynet, with support for different packet scheduling algorithms
(loadable at runtime), faster queue/pipe lookup, and a much cleaner
internal architecture and kernel/userland ABI which simplifies
future extensions.

In addition to the existing schedulers (FIFO and WF2Q+), we include
a Deficit Round Robin (DRR or RR for brevity) scheduler, and a new,
very fast version of WF2Q+ called QFQ.

Some test code is also present (in sys/netinet/ipfw/test) that
lets you build and test schedulers in userland.

Also, we have added a compatibility layer that understands requests
from the RELENG_7 and RELENG_8 versions of the /sbin/ipfw binaries,
and replies correctly (at least, it does its best; sometimes you
just cannot tell who sent the request and how to answer).
The compatibility layer should make it possible to MFC this code in a
relatively short time.

Some minor glitches (e.g. handling of ipfw set enable/disable,
and a workaround for a bug in RELENG_7's /sbin/ipfw) will be
fixed with separate commits.

CREDITS:
This work has been partly supported by the ONELAB2 project, and
mostly developed by Riccardo Panicucci and myself.
The code for the qfq scheduler is mostly from Fabio Checconi,
and Marta Carbone and Francesco Magno have helped with testing,
debugging and some bug fixes.


204582 02-Mar-2010 luigi

remove unnecessary casts leftover from a bogus fix to a previous bug


204552 02-Mar-2010 alfred

Merge projects/enhanced_coredumps (r204346) into HEAD:

Enhanced process coredump routines.

This brings in the following features:
1) Limit number of cores per process via the %I coredump formatter.
Example:
if corefilename is set to %N.%I.core AND num_cores = 3, then
if a process "rpd" cores, then the corefile will be named
"rpd.0.core", however if it cores again, then the kernel will
generate "rpd.1.core" until we hit the limit of "num_cores".

this is useful to get several corefiles, but also prevent filling
the machine with corefiles.

2) Encode machine hostname in core dump name via %H.

3) Compress coredumps, useful for embedded platforms with limited space.
A sysctl kern.compress_user_cores is made available if turned on.

To enable compressed coredumps, the following config options need to be set:
options COMPRESS_USER_CORES
device zlib # brings in the zlib requirements.
device gzio # brings in the kernel vnode gzip output module.

4) Eventhandlers are fired to indicate coredumps in progress.

5) The imgact sv_coredump routine has grown a flag to pass in more
state, currently this is used only for passing a flag down to compress
the coredump or not.

Note that the gzio facility can be used for generic output of gzip'd
streams via vnodes.

Obtained from: Juniper Networks
Reviewed by: kan


204522 01-Mar-2010 joel

The NetBSD Foundation has granted permission to remove clause 3 and 4 from
their software.

Obtained from: NetBSD


204498 01-Mar-2010 rwatson

Whitespace tweak.

MFC after: 3 days


204497 01-Mar-2010 rwatson

Changes to support crashdump analysis of netisr:

- Rename the netisr protocol registration array, 'np' to 'netisr_proto',
in order to reduce the chances of symbol name collisions. It remains
statically defined, but it will be looked up by netstat(1).

- Move certain internal structure definitions from netisr.c to
netisr_internal.h so that netstat(1) can find them. They remain
private, and should not be used for any other purpose (for example,
they should not be used by kernel modules, which must instead use the
public interfaces in netisr.h).

- Store a kernel-compiled version of NETISR_MAXPROT in the global variable
netisr_maxprot, and export via a sysctl, so that it is available for use
by netstat(1). This is especially important for crashdump
interpretation, where the size of the workstream structure is determined
by the maximum number of protocols compiled into the kernel.

MFC after: 1 week
Sponsored by: Juniper Networks


204464 28-Feb-2010 kib

In both if_tun and if_tap:

Do not do additional dev_ref() on the newly created interface in the
if_clone create method [1]. This reference is not needed and never
removed, causing struct cdevpriv leakage. Remove the setting of
SI_CHEAPCLONE flag as well, since it is unused.

For dev_clone handlers, create cdevs with the call make_dev_credf(MAKEDEV_REF)
instead of calling make_dev() and then dev_ref(), to avoid a race.

Call drain_dev_clone_events() at the module unload time after dev_clone
handler is deinstalled.

Submitted by: Mikolaj Golub <to.my.trociny gmail com> [1]
MFC after: 1 week


204303 25-Feb-2010 rwatson

Fix edge cases in several KASSERTs: use <= rather than < when testing that
counters have not gone about MAXCPU or NETISR_MAXPROT. These problems
caused panics on UP kernels with INVARIANTS when using sysctl -a, but
would also have caused problems for 32-core boxes or if the netisr
protocol vector was fully populated.

Reported by: nwhitehorn, Neel Natu <neelnatu@gmail.com>
MFC after: 4 days


204279 24-Feb-2010 bz

Use the DB_SHOW_ALL_COMMAND() macro to register the formerly 'show ifnets'
in the db_show_all_table as 'show all ifnets' and with that follow the
convention for showing complete lists.

Submitted by: thompsa
MFC after: 3 days


204208 22-Feb-2010 rwatson

Fix constant assignment for netisr protocol information sysctl.

MFC after: 1 week
Spotted by: bz


204199 22-Feb-2010 rwatson

Export netisr configuration and statistics to userspace via sysctl(9).

MFC after: 1 week
Sponsored by: Juniper Networks


204173 21-Feb-2010 rwatson

ifconfig(8) expects interface fooX to be supported by the module if_foo,
and will try to load it if it's not present. To better meet these
expectations, change the module name for the loopback interface from
'loop' to 'if_lo'. The loopback interface is always compiled into the
base kernel, so there are no resulting changes in kld files, etc.

Discussed with: brooks (ages ago)
MFC after: 1 week


204156 21-Feb-2010 yongari

Add __FBSDID.

Reviewed by: sam


204149 20-Feb-2010 yongari

Add TSO support on VLANs. Intentionally separated IFCAP_VLAN_HWTSO
from IFCAP_VLAN_HWTAGGING. I think some hardwares may be able to
TSO over VLAN without VLAN hardware tagging.
Driver changes and userland support will follow.

Reviewed by: thompsa


204145 20-Feb-2010 bz

Start to implement ifnet DDB support:
- 'show ifnets' prints a list of ifnet *s per virtual network stack,
- 'show ifnet <struct ifnet *>' prints fields matching the given ifp.

We do not yet print the complete set of fields and might want to
factor this out to an extra if_debug.c file in case this grows
a lot[1]. We may also want to grow 'show ifnet <if_xname>' support[1].

Sponsored by: ISPsystem
Suggested by: rwatson [1]
Reviewed by: rwatson
MFC after: 5 days


204142 20-Feb-2010 bz

Enhance a panic string to contain more useful debugging information.

Sponsored by: ISPsystem
Reviewed by: rwatson
MFC after: 5 days


204105 20-Feb-2010 jkim

Return partially filled buffer for non-blocking read(2)
in non-immediate mode.

PR: kern/143855


203913 15-Feb-2010 pjd

Mark various sysctls also as tunables.

Reviewed by: rwatson
MFC after: 1 week


203834 13-Feb-2010 mlaier

Fix drbr and altq interaction:
- introduce drbr_needs_enqueue that returns whether the interface/br needs
an enqueue operation: returns true if altq is enabled or there are
already packets in the ring (as we need to maintain packet order)
- update all drbr consumers
- fix drbr_flush
- avoid using the driver queue (IFQ_DRV_*) in the altq case as the
multiqueue consumer does not provide enough protection, serialize altq
interaction with the main queue lock
- make drbr_dequeue_cond work with altq

Discussed with: kmacy, yongari, jfv
MFC after: 4 weeks


203729 09-Feb-2010 bz

Add DDB support for printing vnet_sysinit and vnet_sysuninit
ordered call lists. Try to lookup function/symbol names and print
those in addition to the pointers, along with the constants for
subsystem and order.
This is useful for debugging vnet teardown ordering issues.

Make it possible to call the actual printing frunction from normal
code at runtime, ie. from vnet_sysuninit(), if DDB support is there.

Sponsored by: ISPsystem
MFC After: 8 days


203727 09-Feb-2010 bz

Add an SDT provider for "vnet"s along with probes for vnet_alloc
and vnet_destroy.
Use the line number rather than NULL as dummy argument.

Note: the fbt provider does not reliably provide :return probes
(depending on optimization levels used at compile time) making
it unusable for scripts to generate complete call-traces with
well defined boundaries over allocations or destructions of
virtual network stacks.

Sponsored by: ISPsystem
MFC After: 8 days


203548 06-Feb-2010 eri

Propagate the vlan eventis to the underlying interfaces/members so they can do initialization of hw related features.

PR: kern/141646
Reviewed by: thompsa
Approved by: thompsa(co-mentor)
MFC after: 2 weeks


203483 04-Feb-2010 zec

Instead of spamming the console on each curvnet recursion event, print
out each such call graph only once, along with a stack backtrace. This
should make kernels built with VNET_DEBUG reasonably usable again in
busy / production environments.

Introduce a new DDB command "show vnetrcrs" which dumps the whole log
of distinctive curvnet recursion events. This might be useful when
recursion reports get burried / lost too deep in the message buffer.
In the later case stack backtraces are not available.

Reviewed by: bz
MFC after: 3 days


203272 31-Jan-2010 hrs

- Check if_type of "addm <interface>" before setting the
interface's MTU to the if_bridge(4) interface. This fixes a
bug that MTU value of "addm <interface>" is used even when it
is invalid for the if_bridge(4) member:

# ifconfig bridge0 create
# ifconfig bridge0
bridge0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
...
# ifconfig bridge0 addm lo0
ifconfig: BRDGADD lo0: Invalid argument
# ifconfig bridge0
bridge0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 16384
...

- Do not ignore MTU value of an interface even when if_type == IFT_GIF.
This fixes MTU mismatch when an if_bridge(4) interface has a
gif(4) interface and no other interface as the member, and it
is directly used for L2 communication with EtherIP tunneling
enabled.

- Implement SIOCSIFMTU ioctl. Changing the MTU is allowed only
when all members have the same MTU value.


203052 27-Jan-2010 delphij

Revised revision 199201 (add interface description capability as inspired
by OpenBSD), based on comments from many, including rwatson, jhb, brooks
and others.

Sponsored by: iXsystems, Inc.
MFC after: 1 month


202935 24-Jan-2010 syrinx

While flushing the multicast filter of an interface, do not zero the relevant
ifmultiaddr structures' reference to the parent interface, unless the parent
interface is really detaching. While here, program only link layer multicast
filters to a wlan's hardware parent interface.

PR: kern/142391, kern/142392
Reviewed by: sam, rpaolo, bms
MFC after: 1 week


202611 19-Jan-2010 thompsa

Do not hold the lock over if_setlladdr() as it calls into the interface driver
init routine.


202588 18-Jan-2010 thompsa

Declare a new EVENTHANDLER called iflladdr_event which signals that the L2
address on an interface has changed. This lets stacked interfaces such as
vlan(4) detect that their lower interface has changed and adjust things in
order to keep working. Previously this situation broke at least vlan(4) and
lagg(4) configurations.

The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the
risk of a loop.

PR: kern/142927
Submitted by: Nikolay Denev


201995 10-Jan-2010 bz

Correct a typo.

MFC after: 5 days


201803 08-Jan-2010 trasz

Stop GCC from complaining about lagg_port_checkstacking() being unused.


201758 07-Jan-2010 mbr

Remove extraneous semicolons, no functional changes.

Submitted by: Marc Balmer <marc@msys.ch>
MFC after: 1 week


201734 07-Jan-2010 luigi

put ip_var before ip_fw_private.h as this will be needed in
the near future


201527 04-Jan-2010 luigi

Various cleanup done in ipfw3-head branch including:
- use a uniform mtag format for all packets that exit and re-enter
the firewall in the middle of a rulechain. On reentry, all tags
containing reinject info are renamed to MTAG_IPFW_RULE so the
processing is simpler.

- make ipfw and dummynet use ip_len and ip_off in network format
everywhere. Conversion is done only once instead of tracking
the format in every place.

- use a macro FREE_PKT to dispose of mbufs. This eases portability.

On passing i also removed a few typos, staticise or localise variables,
remove useless declarations and other minor things.

Overall the code shrinks a bit and is hopefully more readable.

I have tested functionality for all but ng_ipfw and if_bridge/if_ethersubr.
For ng_ipfw i am actually waiting for feedback from glebius@ because
we might have some small changes to make.
For if_bridge and if_ethersubr feedback would be welcome
(there are still some redundant parts in these two modules that
I would like to remove, but first i need to check functionality).


201351 31-Dec-2009 jhb

Use stricter checking to match possible vlan clones by not allowing extra
garbage characters around or within the tag.

Reviewed by: brooks
MFC after: 3 days


201350 31-Dec-2009 brooks

The devices that supported EVFILT_NETDEV kqueue filters were removed in
r195175. Remove all definitions, documentation, and usage.

fifo_misc.c:
Remove all kqueue tests as fifo_io.c performs all those that
would have remained.

Reviewed by: rwatson
MFC after: 3 weeks
X-MFC note: don't change vlan_link_state() function signature


201319 31-Dec-2009 qingli

Remove a deleted comment line that was brought back by
my previous commit.

MFC after: 5 days


201282 30-Dec-2009 qingli

The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.

MFC after: 5 days


201196 29-Dec-2009 jhb

Change vlan interfaces to cope more usefully with the parent interface being
renamed. Previously the vlan interfaces would lose their configuration as if
the parent interface had been physically removed. Now vlan interfaces ignore
rename events.
- Add a new ifnet flag (IFF_RENAMING) that is set while an ifnet is being
renamed. This flag can be checked in ifnet departure/arrival event
handlers to treat rename events differently.
- Change the ifnet departure event handler in the if_vlan(4) driver to
ignore departure events due to a trunk interface being renamed.

Reviewed by: brooks, rwatson
MFC after: 1 week


201122 28-Dec-2009 luigi

bring in several cleanups tested in ipfw3-head branch, namely:

r201011
- move most of ng_ipfw.h into ip_fw_private.h, as this code is
ipfw-specific. This removes a dependency on ng_ipfw.h from some files.

- move many equivalent definitions of direction (IN, OUT) for
reinjected packets into ip_fw_private.h

- document the structure of the packet tags used for dummynet
and netgraph;

r201049
- merge some common code to attach/detach hooks into
a single function.

r201055
- remove some duplicated code in ip_fw_pfil. The input
and output processing uses almost exactly the same code so
there is no need to use two separate hooks.
ip_fw_pfil.o goes from 2096 to 1382 bytes of .text

r201057 (see the svn log for full details)
- macros to make the conversion of ip_len and ip_off
between host and network format more explicit

r201113 (the remaining parts)
- readability fixes -- put braces around some large for() blocks,
localize variables so the compiler does not think they are uninitialized,
do not insist on precise allocation size if we have more than we need.

r201119
- when doing a lookup, keys must be in big endian format because
this is what the radix code expects (this fixes a bug in the
recently-introduced 'lookup' option)

No ABI changes in this commit.

MFC after: 1 week


200899 23-Dec-2009 rwatson

When warning about possible netisr configuration problems during boot,
report using "netisr_init" rather than "netisr2", which was the development
name for the project.

MFC after: 3 days


200898 23-Dec-2009 rwatson

Refine netisr.c comments a bit.


200855 22-Dec-2009 luigi

merge code from ipfw3-head to reduce contention on the ipfw lock
and remove all O(N) sequences from kernel critical sections in ipfw.

In detail:

1. introduce a IPFW_UH_LOCK to arbitrate requests from
the upper half of the kernel. Some things, such as 'ipfw show',
can be done holding this lock in read mode, whereas insert and
delete require IPFW_UH_WLOCK.

2. introduce a mapping structure to keep rules together. This replaces
the 'next' chain currently used in ipfw rules. At the moment
the map is a simple array (sorted by rule number and then rule_id),
so we can find a rule quickly instead of having to scan the list.
This reduces many expensive lookups from O(N) to O(log N).

3. when an expensive operation (such as insert or delete) is done
by userland, we grab IPFW_UH_WLOCK, create a new copy of the map
without blocking the bottom half of the kernel, then acquire
IPFW_WLOCK and quickly update pointers to the map and related info.
After dropping IPFW_LOCK we can then continue the cleanup protected
by IPFW_UH_LOCK. So userland still costs O(N) but the kernel side
is only blocked for O(1).

4. do not pass pointers to rules through dummynet, netgraph, divert etc,
but rather pass a <slot, chain_id, rulenum, rule_id> tuple.
We validate the slot index (in the array of #2) with chain_id,
and if successful do a O(1) dereference; otherwise, we can find
the rule in O(log N) through <rulenum, rule_id>

All the above does not change the userland/kernel ABI, though there
are some disgusting casts between pointers and uint32_t

Operation costs now are as follows:

Function Old Now Planned
-------------------------------------------------------------------
+ skipto X, non cached O(N) O(log N)
+ skipto X, cached O(1) O(1)
XXX dynamic rule lookup O(1) O(log N) O(1)
+ skipto tablearg O(N) O(1)
+ reinject, non cached O(N) O(log N)
+ reinject, cached O(1) O(1)
+ kernel blocked during setsockopt() O(N) O(1)
-------------------------------------------------------------------

The only (very small) regression is on dynamic rule lookup and this will
be fixed in a day or two, without changing the userland/kernel ABI

Supported by: Valeria Paoli
MFC after: 1 month


200805 21-Dec-2009 jhb

Remove commented out prototype for ifinit(). This prototype has been
commented out since 1.1 and has not been present in <sys/systm.h> since at
least 1.1 of that file. It is also not needed in FreeBSD due to SYSINIT().


200580 15-Dec-2009 luigi

Start splitting ip_fw2.c and ip_fw.h into smaller components.
At this time we pull out from ip_fw2.c the logging functions, and
support for dynamic rules, and move kernel-only stuff into
netinet/ipfw/ip_fw_private.h

No ABI change involved in this commit, unless I made some mistake.
ip_fw.h has changed, though not in the userland-visible part.

Files touched by this commit:

conf/files
now references the two new source files

netinet/ip_fw.h
remove kernel-only definitions gone into netinet/ipfw/ip_fw_private.h.

netinet/ipfw/ip_fw_private.h
new file with kernel-specific ipfw definitions

netinet/ipfw/ip_fw_log.c
ipfw_log and related functions

netinet/ipfw/ip_fw_dynamic.c
code related to dynamic rules

netinet/ipfw/ip_fw2.c
removed the pieces that goes in the new files

netinet/ipfw/ip_fw_nat.c
minor rearrangement to remove LOOKUP_NAT from the
main headers. This require a new function pointer.

A bunch of other kernel files that included netinet/ip_fw.h now
require netinet/ipfw/ip_fw_private.h as well.
Not 100% sure i caught all of them.

MFC after: 1 month


200537 14-Dec-2009 luigi

Move the scan for max_keylen into route.c::route_init(),
and make max_keylen an argument for rn_init().
This removes an unnecessary dependency on domain.h from radix.c

MFC after: 7 days


200473 13-Dec-2009 bz

Throughout the network stack we have a few places of
if (jailed(cred))
left. If you are running with a vnet (virtual network stack) those will
return true and defer you to classic IP-jails handling and thus things
will be "denied" or returned with an error.

Work around this problem by introducing another "jailed()" function,
jailed_without_vnet(), that also takes vnets into account, and permits
the calls, should the jail from the given cred have its own virtual
network stack.

We cannot change the classic jailed() call to do that, as it is used
outside the network stack as well.

Discussed with: julian, zec, jamie, rwatson (back in Sept)
MFC after: 5 days


200439 12-Dec-2009 luigi

Make the code buildable in userland so it is easier to test it:
this requires a small reordering of headers and a few #defines to
map functions not available in userland.

Remove a useless #ifndef block at the beginning of the file.

Introduce (temporarily) rn_init2(), see the comment in the code
for the proper long term change.

No ABI or functional change.

MFC after: 7 days


200354 10-Dec-2009 luigi

No functional changes (who dares to touch this code!) but:

- cast the result of LEN() to int as this is the main usage.
- use LEN() in one place where it was forgotten.
- Document the use of a static variable in rw mode.

More small changes to follow.

MFC after: 7 days


199975 30-Nov-2009 jhb

Remove if_timer/if_watchdog now that they are no longer used. The space
used by if_timer is reserved for expanding if_index to an int in the
future.

Reviewed by: rwatson, brooks


199615 20-Nov-2009 jkim

General style cleanup, no functional change.


199603 20-Nov-2009 jkim

- Allocate scratch memory on stack instead of pre-allocating it with
the filter as we do from bpf_filter()[1].
- Revert experimental use of contigmalloc(9)/contigfree(9). It has no
performance benefit over malloc(9)/free(9)[2].

Requested by: rwatson[1]
Pointed out by: rwatson, jhb, alc[2]


199498 18-Nov-2009 jkim

- Change internal function bpf_jit_compile() to return allocated size of
the generated binary and remove page size limitation for userland.
- Use contigmalloc(9)/contigfree(9) instead of malloc(9)/free(9) to make
sure the generated binary aligns properly and make it physically contiguous.


199492 18-Nov-2009 jkim

- Make BPF JIT compiler working again in userland. We are limiting size of
generated native binary to page size for now.
- Update copyright date and fix some style nits.


199365 17-Nov-2009 tuexen

Fix a LOR showing up with sctp_bsd_addr(): Do not hold a rt lock
when calling rt_newaddrmsg().

Reviewed by: qingli
Approved by: rrs (mentor)
MFC after: 1 month


199231 12-Nov-2009 delphij

Revert revision 199201 for now as it has introduced a kernel vulnerability
and requires more polishing.


199201 11-Nov-2009 delphij

Add interface description capability as inspired by OpenBSD.

MFC after: 3 months


198988 06-Nov-2009 jhb

Take a step towards removing if_watchdog/if_timer. Don't explicitly set
if_watchdog/if_timer to NULL/0 when initializing an ifnet. if_alloc()
sets those members to NULL/0 already.


198417 23-Oct-2009 rwatson

Remove unneeded blank line from bpf_drvinit().

MFC after: 3 days


198357 22-Oct-2009 brueffer

Check pointer for NULL before dereferencing it, not after.

PR: 138390
Submitted by: Patroklos Argyroudis <argp@census-labs.com>
MFC after: 1 week


198353 22-Oct-2009 qingli

Verify "smp_started" is true before calling
sched_bind() and sched_unbind().

Reviewed by: kmacy
MFC after: 3 days


198306 20-Oct-2009 qingli

The flow-table function flowtable_route_flush() may be called
during system initialization time. Since the flow-table is
designed to maintain per CPU flow cache, the existing code
did not check whether "smp_started" is true before calling
sched_bind() and sched_unbind(), which triggers a page fault.

Reviewed by: jeff
MFC after: immediately


198233 19-Oct-2009 rwatson

Clean up comments, white space, and style in pfil.c (especially new VNET
bits).

MFC after: 3 days (not VNET bits)


198219 18-Oct-2009 rwatson

Remove unused pfil_flags field in packet_filter_hook.

MFC after: 3 days


198218 18-Oct-2009 rwatson

Sort function prototypes in pfil.h, clean up white space, and better
align fields for printing.

MFC after: 3 days


198198 18-Oct-2009 rwatson

Line-wrap pfil.c so that it prints more nicely.

MFC after: 3 days


198075 14-Oct-2009 bz

Unbreak the VIMAGE build with IPSEC, broken with r197952 by
virtualizing the pfil hooks.
For consistency add the V_ to virtualize the pfil hooks in here as well.

MFC after: 55 days
X-MFC after: julian MFCed r197952.


197952 11-Oct-2009 julian

Virtualize the pfil hooks so that different jails may chose different
packet filters. ALso allows ipfw to be enabled on on ejail and disabled
on another. In 8.0 it's a global setting.

Sitting aroung in tree waiting to commit for: 2 months
MFC after: 2 months


197727 03-Oct-2009 bz

Put #ifdef INET around parts of the FLOWTABLE code, to unbreak
nooptions INET kernel builds.

MFC after: 3 days
X-MFC: with r197687


197687 01-Oct-2009 qingli

The flow-table associates TCP/UDP flows and IP destinations with
specific routes. When the routing table changes, for example,
when a new route with a more specific prefix is inserted into the
routing table, the flow-table is not updated to reflect that change.
As such existing connections cannot take advantage of the new path.
In some cases the path is broken. This patch will update the affected
flow-table entries when a more specific route is added. The route
entry is properly marked when a route is deleted from the table.
In this case, when the flow-table performs a search, the stale
entry is updated automatically. Therefore this patch is not
necessary for route deletion.

Submitted by: simon, phk
Reviewed by: bz, kmacy
MFC after: 3 days


197364 20-Sep-2009 qingli

A wrong variable is used when setting up the interface
address route, which broke source address selection in
some code paths.

Submitted by: noted by bz
Reviewed by: hrs
MFC after: immediately


197306 18-Sep-2009 zec

Style fix - break too long a line in two.

Spotted by: bz
MFC after: 3 days


197286 17-Sep-2009 zec

V_irtualize the lltables list, making ARP and ND reasonably
usable again with options VIMAGE kernels.

Submitted by: bz (the original version, probably identical to this one)
Reviewed by: many @ DevSummit Cambridge
MFC after: 3 days


197227 15-Sep-2009 qingli

Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by: bz
MFC after: immediately


197134 12-Sep-2009 rwatson

Use C99 initialization for struct filterops.

Obtained from: Mac OS X
Sponsored by: Apple Inc.
MFC after: 3 weeks


197010 09-Sep-2009 emaste

Compare pointer with NULL, not 0.


196995 08-Sep-2009 np

Add arp_update_event. This replaces route_arp_update_event, which
has not worked since the arp-v2 rewrite.

The event handler will be called with the llentry write-locked and
can examine la_flags to determine whether the entry is being added
or removed.

Reviewed by: gnn, kmacy
Approved by: gnn (mentor)
MFC after: 1 month


196871 05-Sep-2009 qingli

The addresses that are assigned to the loopback interface
should be part of the kernel routing table.

Reviewed by: bz
MFC after: immediately


196864 05-Sep-2009 qingli

This patch fixes the following issues:
- Interface link-local address is not reachable within the
node that owns the interface, this is due to the mismatch
in address scope as the result of the installed interface
address loopback route. Therefore for each interface
address loopback route, the rt_gateway field (of AF_LINK
type) will be used to track which interface a given
address belongs to. This will aid the address source to
use the proper interface for address scope/zone validation.
- The loopback address is not reachable. The root cause is
the same as the above.
- Empty nd6 entries are created for the IPv6 loopback addresses
only for validation reason. Doing so will eliminate as much
of the special case (loopback addresses) handling code
as possible, however, these empty nd6 entries should not
be returned to the userland applications such as the
"ndp" command.
Since both of the above issues contain common files, these
files are committed together.

Reviewed by: bz
MFC after: immediately


196797 03-Sep-2009 gnn

Add ARP statistics to the kernel and netstat.

New counters now exist for:
requests sent
replies sent
requests received
replies received
packets received
total packets dropped due to no ARP entry
entrys timed out
Duplicate IPs seen

The new statistics are seen in the netstat command
when it is given the -s command line switch.

MFC after: 2 weeks
In collaboration with: bz


196678 31-Aug-2009 qingli

As part of r196609, a call to "rtalloc" did not take the fib into
account. So call the appropriate "rtalloc_ign_fib()" instead of
calling "rtalloc_ign()".

Reviewed by:i pointed out by bz
MFC after: immediately


196633 28-Aug-2009 zec

Introduce a separate sx lock for protecting lists of vnet sysinit
and sysuninit handlers.

Previously, sx_vnet, which is a lock designated for protecting
the vnet list, was (ab)used for protecting vnet sysinit / sysuninit
handler lists as well. Holding exclusively the sx_vnet lock while
invoking sysinit and / or sysuninit handlers turned out to be
problematic, since some of the handlers may attempt to wake up
another thread and wait for it to walk over the vnet list, hence
acquire a shared lock on sx_vnet, which in turn leads to a deadlock.
Protecting vnet sysinit / sysuninit lists with a separate lock
mitigates this issue, which was first observed with
flowtable_flush() / flowtable_cleaner() in sys/net/flowtable.c.

Reviewed by: rwatson, jhb
MFC after: 3 days


196609 28-Aug-2009 qingli

In ip_output(), the flow-table module must not try to cache L2/L3
information for interface of IFF_POINTOPOINT or IFF_LOOPBACK type.
Since the L2 information (rt_lle) is invalid for these interface
types, accidental caching attempt will trigger panic when the invalid
rt_lle reference is accessed.

When installing a new route, or when updating an existing route, the
user supplied gateway address may be an interface address (this is
particularly true for point-to-point interface related modules such
as ppp, if_tun, if_gif). Currently the routing command handler always
set the RTF_GATEWAY flag if the gateway address is given as part of the
command paramters. Therefore the gateway address must be verified against
interface addresses or else the route would be treated as an indirect
route, thus making that route unusable.

Reviewed by: kmacy, julia, rwatson
Verified by: marcus
MFC after: 3 days


196559 26-Aug-2009 rwatson

Add IFNET_HOLD reserved pointer value for the ifindex ifnet array,
which allows an index to be reserved for an ifnet without making
the ifnet available for management operations. Use this in if_alloc()
while the ifnet lock is released between initial index allocation and
completion of ifnet initialization.

Add ifindex_free() to centralize the implementation of releasing an
ifindex value. Use in if_free() and if_vmove(), as well as when
releasing a held index in if_alloc().

Reviewed by: bz
MFC after: 3 days


196553 25-Aug-2009 rwatson

Break out allocation of new ifindex values from if_alloc() and if_vmove(),
and centralize in a single function ifindex_alloc(). Assert the
IFNET_WLOCK, and add missing IFNET_WLOCK in if_alloc(). This does not
close all known races in this code.

Reviewed by: bz
MFC after: 3 days


196535 25-Aug-2009 rwatson

Use locks specific to the lltable code, rather than borrow the ifnet
list/index locks, to protect link layer address tables. This avoids
lock order issues during interface teardown, but maintains the bug that
sysctl copy routines may be called while a non-sleepable lock is held.

Reviewed by: bz, kmacy
MFC after: 3 days


196519 24-Aug-2009 jfv

When bridging LRO is causing a problem, the believe
that it would work as long as all interfaces have TSO
seems to be false, until the matter gets sorted out
just disable LRO completely.


196510 24-Aug-2009 rwatson

Make if_grow static -- it's not used outside of if.c, and with the
internals destined to change, it's better if it remains that way.

MFC after: 3 days


196504 24-Aug-2009 zec

When moving ifnets from one vnet to another, and the ifnet
has ifaddresses of AF_LINK type which thus have an embedded
if_index "backpointer", we must update that if_index backpointer
to reflect the new if_index that our ifnet just got assigned.

This change affects only options VIMAGE builds.

Submitted by: bz
Reviewed by: bz
Approved by: re (rwatson), julian (mentor)


196482 23-Aug-2009 rwatson

Rather than using IFNET_RLOCK() when iterating over (and modifying) the
ifnet list during if_ef load, directly acquire the ifnet_sxlock
exclusively. That way when if_alloc() recurses the lock, it's a write
recursion rather than a read->write recursion.

This code structure is arguably a bug, so add a comment indicating that
this is the case. Post-8.0, we should fix this, but this commit
resolves panic-on-load for if_ef.

Discussed with: bz, julian
Reported by: phk
MFC after: 3 days


196481 23-Aug-2009 rwatson

Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by: bz, julian
MFC after: 3 days


196419 21-Aug-2009 julian

Don't allow access to the internals until it has all been set up.
Specifically, not until the per-vnet parts have been set up.

Submitted by: kmacy@
Reviewed by: julian@, zec@
Approved by: re(rwatson)
MFC after: immediately


196388 19-Aug-2009 kmacy

This change fixes a comment and addresses a complaint by kib@ by
moving a frequently executed flowtable syslog statement from being
conditional on bootverbose to conditional on a per-vnet flowtable
sysctl.

Approved by: re@


196368 18-Aug-2009 kmacy

- change the interface to flowtable_lookup so that we don't rely on
the mbuf for obtaining the fib index
- check that a cached flow corresponds to the same fib index as the
packet for which we are doing the lookup
- at interface detach time flush any flows referencing stale rtentrys
associated with the interface that is going away (fixes reported
panics)
- reduce the time between cleans in case the cleaner is running at
the time the eventhandler is called and the wakeup is missed less
time will elapse before the eventhandler returns
- separate per-vnet initialization from global initialization
(pointed out by jeli@)

Reviewed by: sam@
Approved by: re@


196342 17-Aug-2009 kmacy

fix netboot issue by disabling flowtable lookups until initialization has been run

Reviewed by: rwatson@
Approved by: re@


196263 15-Aug-2009 rwatson

Remove unused if_rawoutput() macro; it has been unused since at least
FreeBSD 2.

Approved by: re (kib)


196230 14-Aug-2009 zec

Appease VNET_DEBUG - in if_vmove we temporarily switch i.e.
recurse from one vnet to another which is OK, so no need
to flood the console with warnings here.

Approved by: re (rwatson), julian (mentor)


196228 14-Aug-2009 zec

Make VNET_DEBUG a standalone compile-time option, i.e. decouple it from
INVARIANTS.

Reviewed by: bz
Approved by: re (rwatson), julian (mentor)


196176 13-Aug-2009 bz

Make it possible to change the vnet sysctl variables on jails
with their own virtual network stack. Jails only inheriting a
network stack cannot change anything that cannot be changed from
within a prison.

Reviewed by: rwatson, zec
Approved by: re (kib)


196174 13-Aug-2009 bz

Put multiple instructions into a block when iterating; unbreaks
NET_RT_DUMP, which otherwise only returned information of AF_MAX.
This was broken in r193232 (save your time - my bug, my fix).

PR: kern/137700
Reported by: Larry Baird (lab gta.com)
Tested by: Larry Baird (lab gta.com)
Reviewed by: zec, lstewart, qing
Approved by: re (kib)


196150 12-Aug-2009 jkim

Always embed pointer to BPF JIT function in BPF descriptor
to avoid inconsistency when opt_bpf.h is not included.

Reviewed by: rwatson
Approved by: re (rwatson)


196129 12-Aug-2009 bz

Update DDB show vnet command to print all used and available information.

Reviewed by: rwatson, zec
Approved by: re


196118 12-Aug-2009 bz

Put minimum alignment on the dpcpu and vnet section so that ld
when adding the __start_ symbol knows the expected section alignment
and can place the __start_ symbol correctly.

These sections will not support symbols with super-cache line alignment
requirements.

For full details, see posting to freebsd-current, 2009-08-10,
Message-ID: <20090810133111.C93661@maildrop.int.zabbadoz.net>.

Debugging and testing patches by:
Kamigishi Rei (spambox haruhiism.net),
np, lstewart, jhb, kib, rwatson
Tested by: Kamigishi Rei, lstewart
Reviewed by: kib
Approved by: re


196039 02-Aug-2009 rwatson

Many network stack subsystems use a single global data structure to hold
all pertinent statatistics for the subsystem. These structures are
sometimes "borrowed" by kernel modules that require a place to store
statistics for similar events.

Add KPI accessor functions for statistics structures referenced by kernel
modules so that they no longer encode certain specifics of how the data
structures are named and stored. This change is intended to make it
easier to move to per-CPU network stats following 8.0-RELEASE.

The following modules are affected by this change:

if_bridge
if_cxgb
if_gif
ip_mroute
ipdivert
pf

In practice, most of these statistics consumers should, in fact, maintain
their own statistics data structures rather than borrowing structures
from the base network stack. However, that change is too agressive for
this point in the release cycle.

Reviewed by: bz
Approved by: re (kib)


196026 01-Aug-2009 rwatson

The colour was red as shall be the letters of this warning to people upon
boot if the experimental VIMAGE feature was compiled into the kernel.

Submitted by: bz
Reviewed by: zec
Approved by: re (vimage blanket)


196025 01-Aug-2009 rwatson

Minor style tweaks.

Approved by: re (vimage blanket)


196024 01-Aug-2009 rwatson

Make the vnet alloc/destroy paths a bit easier to followg by merging
vnet_data_init/vnet_data_destroy into vnet_alloc/vnet_destroy.

Reviewed by: bz, zec
Approved by: re (vimage blanket)


196020 01-Aug-2009 rwatson

Remove vnet_foreach() utility function, which previously allowed
vnet.c to iterate virtual network stacks without being aware of
the implementation details previously hidden in kern_vimage.c.
Now they are in the same file, so remove this added complexity.

Reviewed by: bz
Approved by: re (vimage blanket)


196019 01-Aug-2009 rwatson

Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)


195972 30-Jul-2009 rwatson

Reorder and recomment vnet.c and vnet.h on the basis that they are no longer
solely about the virtual network stack memory allocator.

Approved by: re (vimage blanket)


195927 28-Jul-2009 rwatson

Revise header comments for vnet.h as we now implement VNET_SYSINIT, not
just VNET_DEFINE in vnet.h.

Approved by: re (vimage blanket)


195921 28-Jul-2009 qingli

The new flow table caches both the routing table entry as well as the
L2 information. For an indirect route the cached L2 entry contains the
MAC address of the gateway. Typically the default route is used to
transmit multicast packets when explicit multicast routes are not
available. The ether_output() function bypasses L2 resolution function
if it verifies the L2 cache is valid, because the cached L2 address
(a unicast MAC address) is copied into the packets as the destination
MAC address. This validation, however, does not apply to broadcast and
multicast packets because the destination MAC address is mapped
according to a standard method instead.

Submitted by: Xin Li
Reviewed by: bz
Approved by: re


195914 27-Jul-2009 qingli

This patch does the following:

- Allow loopback route to be installed for address assigned to
interface of IFF_POINTOPOINT type.
- Install loopback route for an IPv4 interface addreess when the
"useloopback" sysctl variable is enabled. Similarly, install
loopback route for an IPv6 interface address when the sysctl variable
"nd6_useloopback" is enabled. Deleting loopback routes for interface
addresses is unconditional in case these sysctl variables were
disabled after an interface address has been assigned.

Reviewed by: bz
Approved by: re


195892 26-Jul-2009 bz

Update epair(4) to the new netisr implementation and polish
things a bit:
- use dpcpu data to track the ifps with packets queued up,
- per-cpu locking and driver flags
- along with .nh_drainedcpu and NETISR_POLICY_CPU.
- Put the mbufs in flight reference count, preventing interfaces
from going away, under INVARIANTS as this is a general problem
of the stack and should be solved in if.c/netisr but still good
to verify the internal queuing logic.
- Permit changing the MTU to virtually everythinkg like we do for loopback.

Hook epair(4) up to the build.

Approved by: re (kib)


195891 26-Jul-2009 bz

Make the in-kernel logic for the SIOCSIFVNET, SIOCSIFRVNET ioctls
(ifconfig ifN (-)vnet <jname|jid>) work correctly.

Move vi_if_move to if.c and split it up into two functions(*),
one for each ioctl.

In the reclaim case, correctly set the vnet before calling if_vmove.

Instead of silently allowing a move of an interface from the current
vnet to the current vnet, return an error. (*)

There is some duplicate interface name checking before actually moving
the interface between network stacks without locking and thus race
prone. Ideally if_vmove will correctly and automagically handle these
in the future.

Suggested by: rwatson (*)
Approved by: re (kib)


195837 23-Jul-2009 rwatson

Introduce and use a sysinit-based initialization scheme for virtual
network stacks, VNET_SYSINIT:

- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will
occur each time a network stack is instantiated and destroyed. In the
!VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT.
For the VIMAGE case, we instead use SYSINIT's to track their order and
properties on registration, using them for each vnet when created/
destroyed, or immediately on module load for already-started vnets.
- Remove vnet_modinfo mechanism that existed to serve this purpose
previously, as well as its dependency scheme: we now just use the
SYSINIT ordering scheme.
- Implement VNET_DOMAIN_SET() to allow protocol domains to declare that
they want init functions to be called for each virtual network stack
rather than just once at boot, compiling down to DOMAIN_SET() in the
non-VIMAGE case.
- Walk all virtualized kernel subsystems and make use of these instead
of modinfo or DOMAIN_SET() for init/uninit events. In some cases,
convert modular components from using modevent to using sysinit (where
appropriate). In some cases, do minor rejuggling of SYSINIT ordering
to make room for or better manage events.

Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup)
Discussed with: jhb, bz, julian, zec
Reviewed by: bz
Approved by: re (VIMAGE blanket)


195814 21-Jul-2009 bz

sysctl_msec_to_ticks is used with both virtualized and
non-vrtiualized sysctls so we cannot used one common function.

Add a macro to convert the arg1 in the virtualized case to
vnet.h to not expose the maths to all over the code.

Add a wrapper for the single virtualized call, properly handling
arg1 and call the default implementation from there.

Convert the two over places to use the new macro.

Reviewed by: rwatson
Approved by: re (kib)


195782 20-Jul-2009 rwatson

Garbage collect vnet module registrations that have neither constructors
nor destructors, as there's no actual work to do.

In most cases, the constructors weren't needed because of the existing
protocol initialization functions run by net_init_domain() as part of
VNET_MOD_NET, or they were eliminated when support for static
initialization of virtualized globals was added.

Garbage collect dependency references to modules without constructors or
destructors, notably VNET_MOD_INET and VNET_MOD_INET6.

Reviewed by: bz
Approved by: re (vimage blanket)


195778 20-Jul-2009 rwatson

Add macros VNET_SETNAME and VNET_SYMPREFIX, and expose to userspace if
_WANT_VNET is defined. This way we don't need separate definitions in
libkvm.

Reviewed by: bz
Approved by: re (vimage blanket)


195769 19-Jul-2009 rwatson

Normalize field naming for struct vnet, fix two debugging printfs that
print them.

Reviewed by: bz
Approved by: re (kensmith, kib)


195760 19-Jul-2009 rwatson

Reimplement and/or implement vnet list locking by replacing a mostly
unused custom mutex/condvar-based sleep locks with two locks: an
rwlock (for non-sleeping use) and sxlock (for sleeping use). Either
acquired for read is sufficient to stabilize the vnet list, but both
must be acquired for write to modify the list.

Replace previous no-op read locking macros, used in various places
in the stack, with actual locking to prevent race conditions. Callers
must declare when they may perform unbounded sleeps or not when
selecting how to lock.

Refactor vnet sysinits so that the vnet list and locks are initialized
before kernel modules are linked, as the kernel linker will use them
for modules loaded by the boot loader.

Update various consumers of these KPIs based on whether they may sleep
or not.

Reviewed by: bz
Approved by: re (kib)


195741 17-Jul-2009 jamie

Remove the interim vimage containers, struct vimage and struct procg,
and the ioctl-based interface that supported them.

Approved by: re (kib), bz (mentor)


195727 16-Jul-2009 rwatson

Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used. Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with: bz, julian
Reviewed by: bz
Approved by: re (kensmith, kib)


195705 15-Jul-2009 rwatson

Add missing license line for vnet.h, correct white space nit.

Approved by: re (kensmith) (implicit)


195699 14-Jul-2009 rwatson

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


195624 11-Jul-2009 kmacy

Re-factoring for adding weighted routes introduced a
fairly irritating bug where the system will panic
when RADIX_MPATH is enabled. This change fixes this.

Approved by: re@


195618 11-Jul-2009 rpaulo

Implementation of the upcoming Wireless Mesh standard, 802.11s, on the
net80211 wireless stack. This work is based on the March 2009 D3.0 draft
standard. This standard is expected to become final next year.
This includes two main net80211 modules, ieee80211_mesh.c
which deals with peer link management, link metric calculation,
routing table control and mesh configuration and ieee80211_hwmp.c
which deals with the actually routing process on the mesh network.
HWMP is the mandatory routing protocol on by the mesh standard, but
others, such as RA-OLSR, can be implemented.

Authentication and encryption are not implemented.

There are several scripts under tools/tools/net80211/scripts that can be
used to test different mesh network topologies and they also teach you
how to setup a mesh vap (for the impatient: ifconfig wlan0 create
wlandev ... wlanmode mesh).

A new build option is available: IEEE80211_SUPPORT_MESH and it's enabled
by default on GENERIC kernels for i386, amd64, sparc64 and pc98.

Drivers that support mesh networks right now are: ath, ral and mwl.

More information at: http://wiki.freebsd.org/WifiMesh

Please note that this work is experimental. Also, please note that
bridging a mesh vap with another network interface is not yet supported.

Many thanks to the FreeBSD Foundation for sponsoring this project and to
Sam Leffler for his support.
Also, I would like to thank Gateworks Corporation for sending me a
Cambria board which was used during the development of this project.

Reviewed by: sam
Approved by: re (kensmith)
Obtained from: projects/mesh11s


195182 30-Jun-2009 bz

In case we cannot queue a packet reaching the queue limit, retain the
semantics netisr_queue() always had and free the mbuf along with
returning the error.

Reviewed by: rwatson
Approved by: re (kensmith)


195175 29-Jun-2009 brooks

Remove support for the /dev/net/* per-interface devices. They serve
little purpose and are unused in the base system.

The IOCTL functionality is entirely duplicated and routing sockets
provide a richer interface than the kqueue functionality.

Further, it is not practical for these devices to be made sensible in
the face of VIMAGE.

Bump __FreeBSD_version on the off chance that there is any code out
there that actually uses this stuff.

Reviewed by: rwatson
Discussed with: bz, zec
Approved by: re@ (kensmith)


195097 27-Jun-2009 rwatson

Remove unnecessary include of kdb.h that snuck in during ifaddr refcount
work.

Reported by: pluknet <pluknet at gmail.com>
Approved by: re (kib)


195078 26-Jun-2009 rwatson

In light of DPCPU use by netisr, revise various for loops from using
MAXCPU to mp_maxid, and handling and reporting of requests to use more
threads than we have CPUs to run them on.

Reviewed by: bz
Approved by: re (kib)
MFC after: 6 weeks


195070 26-Jun-2009 rwatson

Use if_addr_rlock/if_addr_runlock for if_spp when iterating if_addrhead,
as it is loadable as a module.

Approved by: re (kib)
MFC after: 6 weeks


195022 26-Jun-2009 rwatson

Update if_stf and if_tun to use if_addr_rlock()/if_addr_runlock() rather
than IF_ADDR_LOCK()/IF_ADDR_UNLOCK() when iterating ifp->if_addrhead.

MFC after: 6 weeks


195020 26-Jun-2009 rwatson

Define four wrapper functions for interface address locking,
if_addr_rlock() and if_addr_runlock() for regular address lists, and
if_maddr_rlock() and if_maddr_runlock() for multicast address lists.

We will use these in various kernel modules to avoid encoding specific
type and locking strategy information into modules that currently use
IF_ADDR_LOCK() and IF_ADDR_UNLOCK() directly.

MFC after: 6 weeks


195019 26-Jun-2009 rwatson

Convert netisr to use dynamic per-CPU storage (DPCPU) instead of sizing
arrays to [MAXCPU], offering moderate memory savings. In some places,
this requires using CPU_ABSENT() to handle less common platforms with
sparse CPU IDs. In several places, assert that the selected CPUID for
work placement or statistics is not CPU_ABSENT() to be on the safe side.

Discussed with: bz, jeff


194990 25-Jun-2009 kib

Change the type of uio_resid member of struct uio from int to ssize_t.
Note that this does not actually enable full-range i/o requests for
64 architectures, and is done now to update KBI only.

Tested by: pho
Reviewed by: jhb, bde (as part of the review of the bigger patch)


194951 25-Jun-2009 rwatson

Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the
in_ifaddrhead and INADDR_HASH address lists.

Previously, these lists were used unsynchronized as they were effectively
never changed in steady state, but we've seen increasing reports of
writer-writer races on very busy VPN servers as core count has gone up
(and similar configurations where address lists change frequently and
concurrently).

For the time being, use rwlocks rather than rmlocks in order to take
advantage of their better lock debugging support. As a result, we don't
enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion
is complete and a performance analysis has been done. This means that one
class of reader-writer races still exists.

MFC after: 6 weeks
Reviewed by: bz


194927 24-Jun-2009 bz

Merge from p4: CH154790,154793,154874

Import if_epair(4), a virtual cross-over Ethernet-like interface pair.

Note these files are 1:1 from p4 and not yet connected to the build
not knowing about the new netisr interface.

Sponsored by: The FreeBSD Foundation


194918 24-Jun-2009 np

Add 10Gbase-T to known ethernet media types.

Approved by: gnn (mentor)
MFC after: 1 week.


194917 24-Jun-2009 np

About to add 10Gbase-T to known media types, this is just a whitespace
cleanup before that commit. No functional impact.

Approved by: gnn (mentor)


194821 24-Jun-2009 rwatson

In if_setlladdr(), use IF_ADDR_LOCK() and ifaddr references to improve
the safety of link layer address manipulation.

MFC after: 6 weeks


194819 24-Jun-2009 rwatson

Break at_ifawithnet() into two variants:

- at_ifawithnet(), which acquires an locks it needs and returns an
at_ifaddr reference.
- at_ifawithnet_locked(), which relies on the caller locking
at_ifaddr_list, and returns a pointer rather than a reference.

Update various consumers to prefer one or the other, including ether
and fddi output, to properly release at_ifaddr references.

Rework at_control() to manage locking and references in a manner
identical to in_control().

MFC after: 6 weeks


194813 24-Jun-2009 rwatson

Lock if_addrhead when iterating, and where necessary acquire and release
ifadr references in if_sppp.

MFC after: 6 weeks


194812 24-Jun-2009 rwatson

Make stf_getsrcifa6() return a reference to an in6_ifaddr rather than
a pointer, and dispose of the references when no longer needed.

MFC after: 6 weeks


194760 23-Jun-2009 rwatson

Modify most routines returning 'struct ifaddr *' to return references
rather than pointers, requiring callers to properly dispose of those
references. The following routines now return references:

ifaddr_byindex
ifa_ifwithaddr
ifa_ifwithbroadaddr
ifa_ifwithdstaddr
ifa_ifwithnet
ifaof_ifpforaddr
ifa_ifwithroute
ifa_ifwithroute_fib
rt_getifa
rt_getifa_fib
IFP_TO_IA
ip_rtaddr
in6_ifawithifp
in6ifa_ifpforlinklocal
in6ifa_ifpwithaddr
in6_ifadd
carp_iamatch6
ip6_getdstifaddr

Remove unused macro which didn't have required referencing:

IFP_TO_IA6

This closes many small races in which changes to interface
or address lists while an ifaddr was in use could lead to use of freed
memory (etc). In a few cases, add missing if_addr_list locking
required to safely acquire references.

Because of a lack of deep copying support, we accept a race in which
an in6_ifaddr pointed to by mbuf tags and extracted with
ip6_getdstifaddr() doesn't hold a reference while in transmit. Once
we have mbuf tag deep copy support, this can be fixed.

Reviewed by: bz
Obtained from: Apple, Inc. (portions)
MFC after: 6 weeks (portions)


194739 23-Jun-2009 bz

After cleaning up rt_tables from vnet.h and cleaning up opt_route.h
a lot of files no longer need route.h either. Garbage collect them.
While here remove now unneeded vnet.h #includes as well.


194700 23-Jun-2009 bz

Remove duplicate #include <net/route.h> from the middle of the file.


194660 22-Jun-2009 zec

V_irtualize flowtable state.

This change should make options VIMAGE kernel builds usable again,
to some extent at least.

Note that the size of struct vnet_inet has changed, though in
accordance with one-bump-per-day policy we didn't update the
__FreeBSD_version number, given that it has already been touched
by r194640 a few hours ago.
Reviewed by: bz
Approved by: julian (mentor)


194641 22-Jun-2009 bz

Updates after r194640:
- shrink size guards for vnet_net.
vnet_rtable does not need size guards as it is self-contained.
- remove a bunch of defines from vnet.h no longer valid.


194640 22-Jun-2009 bz

Move virtualization of routing related variables into their own
Vimage module, which had been there already but now is stateful.

All variables are now file local; so this further limits the global
spreading of routing related things throughout the kernel.

Add a missing function local variable in case of MPATHing.

Reviewed by: zec


194629 22-Jun-2009 bz

Collect all VIMAGE_GLOBALS variables in one place.

No longer export rt_tables as all lookups go through
rt_tables_get_rnh().

We cannot make rt_tables (and rtstat, rttrash[1]) static as
netstat -r (-rs[1]) would stop working on a stripped
VIMAGE_GLOBALS kernel.

Reviewed by: zec
Presumably broken by: phk 13.5y ago in r12820 [1]


194622 22-Jun-2009 rwatson

Add a new function, ifa_ifwithaddr_check(), which rather than returning
a pointer to an ifaddr matching the passed socket address, returns a
boolean indicating whether one was present. In the (near) future,
ifa_ifwithaddr() will return a referenced ifaddr rather than a raw
ifaddr pointer, and the new wrapper will allow callers that care only
about the boolean condition to avoid having to free that reference.

MFC after: 3 weeks


194620 22-Jun-2009 bz

After the update to fxp(4) in r194573 we should no longer need
this DELAY(100) hack introduced in r56938.

Thanks to: yongari
MFC after: 6 weeks
X-MFC note: not before the fxp(4) changes


194602 21-Jun-2009 rwatson

Clean up common ifaddr management:

- Unify reference count and lock initialization in a single function,
ifa_init().
- Move tear-down from a macro (IFAFREE) to a function ifa_free().
- Move reference count bump from a macro (IFAREF) to a function ifa_ref().
- Instead of using a u_int protected by a mutex to refcount(9) for
reference count management.

The ifa_mtx is now used for exactly one ioctl, and possibly should be
removed.

MFC after: 3 weeks


194581 21-Jun-2009 rdivacky

Switch cmd argument to u_long. This matches what if_ethersubr.c does and
allows the code to compile cleanly on amd64 with clang.

Reviewed by: rwatson
Approved by: ed (mentor)


194577 21-Jun-2009 rdivacky

In non-debugging mode make this define (void)0 instead of nothing. This
helps to catch bugs like the below with clang.

if (cond); <--- note the trailing ;
something();

Approved by: ed (mentor)
Discussed on: current@


194518 19-Jun-2009 kmacy

add helper function for flushing software queues


194512 19-Jun-2009 csjp

Implement the -z (zero counters) option for the various bpf counters.
Add necessary changes to the kernel for this (basically introduce a
bpf_zero_counters() function). As well, update the man page.

MFC after: 1 month
Discussed with: rwatson


194368 17-Jun-2009 bz

Add explicit includes for jail.h to the files that need them and
remove the "hidden" one from vimage.h.


194357 17-Jun-2009 bz

Add the explicit include of vimage.h to another five .c files still
missing it.

Remove the "hidden" kernel only include of vimage.h from ip_var.h added
with the very first Vimage commit r181803 to avoid further kernel poisoning.


194259 15-Jun-2009 sam

r193336 moved ifq_detach to if_free which broke if_alloc followed
by if_free (w/o doing if_attach); move ifq_attach to if_alloc and
rename ifq_attach/detach to ifq_init/ifq_delete to better identify
their purpose

Reviewed by: jhb, kmacy


194252 15-Jun-2009 jamie

Get vnets from creds instead of threads where they're available, and from
passed threads instead of curthread.

Reviewed by: zec, julian
Approved by: bz (mentor)


194251 15-Jun-2009 jamie

Manage vnets via the jail system. If a jail is given the boolean
parameter "vnet" when it is created, a new vnet instance will be created
along with the jail. Networks interfaces can be moved between prisons
with an ioctl similar to the one that moves them between vimages.
For now vnets will co-exist under both jails and vimages, but soon
struct vimage will be going away.

Reviewed by: zec, julian
Approved by: bz (mentor)


194201 14-Jun-2009 bz

Add an optional callback function that will be invoked when a per-CPU
queue was drained. It will never fire for a directly dispatched packet.

You will most likely never want to use this for any ordinary netisr usage
and you will never blame netisr in case you try to use it and it does
not work as expected.

Reviewed by: rwatson


194077 12-Jun-2009 bz

Garbage collect an extern for a non-existent variable.
While here let the comment end in a '.' and mark the #endif of _KERNEL.

Reviewed by: rwatson (as part of a larger patch)


194076 12-Jun-2009 bz

Move the kernel option FLOWTABLE chacking from the header file to the
actual implementation.
Remove the accessor functions for the compiled out case, just returning
"unavail" values. Remove the kernel conditional from the header file as
it is no longer needed, only leaving the externs.
Hide the improperly virtualized SYSCTL/TUNABLE for the flowtable size
under the kernel option as well.

Reviewed by: rwatson


194062 12-Jun-2009 vanhu

Added support for NAT-Traversal (RFC 3948) in IPsec stack.

Thanks to (no special order) Emmanuel Dreyfus (manu@netbsd.org), Larry
Baird (lab@gta.com), gnn, bz, and other FreeBSD devs, Julien Vanherzeele
(julien.vanherzeele@netasq.com, for years of bug reporting), the PFSense
team, and all people who used / tried the NAT-T patch for years and
reported bugs, patches, etc...

X-MFC: never

Reviewed by: bz
Approved by: gnn(mentor)
Obtained from: NETASQ


193983 11-Jun-2009 bz

carp(4) allows people to share a set of IP addresses and can only
use IPv4/v6 for inter-node communication (according to my reading).

Properly wrap the carp callouts in INET || INET6 and refelect this
in sys/conf/files as well. While in theory this should be ok,
it might be a bit optimistic to think that carp could build with
inet6 only[1].

Discussed with: mlaier [1]


193951 10-Jun-2009 kib

Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use
vnode interlock to protect the knote fields [1]. The locking assumes
that shared vnode lock is held, thus we get exclusive access to knote
either by exclusive vnode lock protection, or by shared vnode lock +
vnode interlock.

Do not use kl_locked() method to assert either lock ownership or the
fact that curthread does not own the lock. For shared locks, ownership
is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared
lock not owned by curthread, causing false positives in kqueue subsystem
assertions about knlist lock.

Remove kl_locked method from knlist lock vector, and add two separate
assertion methods kl_assert_locked and kl_assert_unlocked, that are
supposed to use proper asserts. Change knlist_init accordingly.

Add convenience function knlist_init_mtx to reduce number of arguments
for typical knlist initialization.

Submitted by: jhb [1]
Noted by: jhb [2]
Reviewed by: jhb
Tested by: rnoland


193926 10-Jun-2009 bz

SCTP needs either IPv4 or IPv6 as lower layer[1].
So properly hide the already #ifdef SCTP code with
#if defined(INET) || defined(INET6) as well to get us
closer to a non-INET/INET6 kernel.

Discussed with: tuexen [1]


193913 10-Jun-2009 bz

ip_gif_ttl/GIF_TTL are only used by the inet part in in_gif.c,
so put the initialization under #ifdef INET.


193891 10-Jun-2009 bz

The llentry *lle is only used in cases of INET or INET6.
Put the variable declaration under proper #ifdefs.

In case variables are only needed for one of the two AFs
more them into proper scope.


193863 09-Jun-2009 kmacy

revert to opt-in flowtable


193859 09-Jun-2009 oleg

Close long existed race with net.inet.ip.fw.one_pass = 0:
If packet leaves ipfw to other kernel subsystem (dummynet, netgraph, etc)
it carries pointer to matching ipfw rule. If this packet then reinjected back
to ipfw, ruleset processing starts from that rule. If rule was deleted
meanwhile, due to existed race condition panic was possible (as well as
other odd effects like parsing rules in 'reap list').

P.S. this commit changes ABI so userland ipfw related binaries should be
recompiled.

MFC after: 1 month
Tested by: Mikolaj Golub


193856 09-Jun-2009 kmacy

make flowtable opt-out


193854 09-Jun-2009 kmacy

move jenkins hash to its own header in libkern


193848 09-Jun-2009 kmacy

- add drbr routines for accessing #qentries and conditionally dequeueing
- track bytes enqueued in buf_ring


193820 09-Jun-2009 bz

Remove one INET dependency by calling the general
AF agnostic version for doing the routing lookup.

Reviewed by: kmacy


193815 09-Jun-2009 hrs

Style fix.

Submitted by: bz


193796 09-Jun-2009 hrs

- Fix sanity check of GIFSOPTS ioctl.
- Rename option mask s/GIF_FULLOPTS/GIF_OPTMASK/

Spotted by: Eygene Ryabinkin, delphij


193748 08-Jun-2009 bz

Remove two unneeded, hidden includes.


193744 08-Jun-2009 bz

After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.


193731 08-Jun-2009 zec

Introduce an infrastructure for dismantling vnet instances.

Vnet modules and protocol domains may now register destructor
functions to clean up and release per-module state. The destructor
mechanisms can be triggered by invoking "vimage -d", or a future
equivalent command which will be provided via the new jail framework.

While this patch introduces numerous placeholder destructor functions,
many of those are currently incomplete, thus leaking memory or (even
worse) failing to stop all running timers. Many of such issues are
already known and will be incrementaly fixed over the next weeks in
smaller incremental commits.

Apart from introducing new fields in structs ifnet, domain, protosw
and vnet_net, which requires the kernel and modules to be rebuilt, this
change should have no impact on nooptions VIMAGE builds, since vnet
destructors can only be called in VIMAGE kernels. Moreover,
destructor functions should be in general compiled in only in
options VIMAGE builds, except for kernel modules which can be safely
kldunloaded at run time.

Bump __FreeBSD_version to 800097.
Reviewed by: bz, julian
Approved by: rwatson, kib (re), julian (mentor)


193664 07-Jun-2009 hrs

Fix and add a workaround on an issue of EtherIP packet with reversed
version field sent via gif(4)+if_bridge(4). The EtherIP
implementation found on FreeBSD 6.1, 6.2, 6.3, 7.0, 7.1, and 7.2 had
an interoperability issue because it sent the incorrect EtherIP
packets and discarded the correct ones.

This change introduces the following two flags to gif(4):

accept_rev_ethip_ver: accepts both correct EtherIP packets and ones
with reversed version field, if enabled. If disabled, the gif
accepts the correct packets only. This flag is enabled by
default.

send_rev_ethip_ver: sends EtherIP packets with reversed version field
intentionally, if enabled. If disabled, the gif sends the correct
packets only. This flag is disabled by default.

These flags are stored in struct gif_softc and can be set by
ifconfig(8) on per-interface basis.

Note that this is an incompatible change of EtherIP with the older
FreeBSD releases. If you need to interoperate older FreeBSD boxes and
new versions after this commit, setting "send_rev_ethip_ver" is
needed.

Reviewed by: thompsa and rwatson
Spotted by: Shunsuke SHINOMIYA
PR: kern/125003
MFC after: 2 weeks


193511 05-Jun-2009 rwatson

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


193502 05-Jun-2009 luigi

More cleanup in preparation of ipfw relocation (no actual code change):

+ move ipfw and dummynet hooks declarations to raw_ip.c (definitions
in ip_var.h) same as for most other global variables.
This removes some dependencies from ip_input.c;

+ remove the IPFW_LOADED macro, just test ip_fw_chk_ptr directly;

+ remove the DUMMYNET_LOADED macro, just test ip_dn_io_ptr directly;

+ move ip_dn_ruledel_ptr to ip_fw2.c which is the only file using it;

To be merged together with rev 193497

MFC after: 5 days


193336 02-Jun-2009 sam

move ifq_detach from if_detach to if_free; this permits callers to
reference if_snd in the period between detach+free which helps simplify
detach code

Reviewed by: jhb, rwatson


193243 01-Jun-2009 rwatson

Revert a recent netisr2 change: when billing packets to the current
CPU, don't lock the workstream, as its mutexes may not have been
initialized if there are fewer workstreams than CPUs.

Run into by: hps, ps


193232 01-Jun-2009 bz

Convert the two dimensional array to be malloced and introduce
an accessor function to get the correct rnh pointer back.

Update netstat to get the correct pointer using kvm_read()
as well.

This not only fixes the ABI problem depending on the kernel
option but also permits the tunable to overwrite the kernel
option at boot time up to MAXFIBS, enlarging the number of
FIBs without having to recompile. So people could just use
GENERIC now.

Reviewed by: julian, rwatson, zec
X-MFC: not possible


193230 01-Jun-2009 rwatson

Garbage collect NETISR_POLL and NETISR_POLLMORE, which are no longer
required for options DEVICE_POLLING.

De-fragment the NETISR_ constant space and lower NETISR_MAXPROT from
32 to 16 -- when sizing queue arrays using this compile-time constant,
significant amounts of memory are saved.

Warn on the console when tunable values for netisr are automatically
adjusted during boot due to exceeding limits, invalid values, or as a
result of DEVICE_POLLING.


193219 01-Jun-2009 rwatson

Reimplement the netisr framework in order to support parallel netisr
threads:

- Support up to one netisr thread per CPU, each processings its own
workstream, or set of per-protocol queues. Threads may be bound
to specific CPUs, or allowed to migrate, based on a global policy.

In the future it would be desirable to support topology-centric
policies, such as "one netisr per package".

- Allow each protocol to advertise an ordering policy, which can
currently be one of:

NETISR_POLICY_SOURCE: packets must maintain ordering with respect to
an implicit or explicit source (such as an interface or socket).

NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work,
as well as allowing protocols to provide a flow generation function
for mbufs without flow identifers (m2flow). Falls back on
NETISR_POLICY_SOURCE if now flow ID is available.

NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for
each packet handled by netisr (m2cpuid).

- Provide utility functions for querying the number of workstreams
being used, as well as a mapping function from workstream to CPU ID,
which protocols may use in work placement decisions.

- Add explicit interfaces to get and set per-protocol queue limits, and
get and clear drop counters, which query data or apply changes across
all workstreams.

- Add a more extensible netisr registration interface, in which
protocols declare 'struct netisr_handler' structures for each
registered NETISR_ type. These include name, handler function,
optional mbuf to flow ID function, optional mbuf to CPU ID function,
queue limit, and ordering policy. Padding is present to allow these
to be expanded in the future. If no queue limit is declared, then
a default is used.

- Queue limits are now per-workstream, and raised from the previous
IFQ_MAXLEN default of 50 to 256.

- All protocols are updated to use the new registration interface, and
with the exception of netnatm, default queue limits. Most protocols
register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use
NETISR_POLICY_FLOW, and will therefore take advantage of driver-
generated flow IDs if present.

- Formalize a non-packet based interface between interface polling and
the netisr, rather than having polling pretend to be two protocols.
Provide two explicit hooks in the netisr worker for start and end
events for runs: netisr_poll() and netisr_pollmore(), as well as a
function, netisr_sched_poll(), to allow the polling code to schedule
netisr execution. DEVICE_POLLING still embeds single-netisr
assumptions in its implementation, so for now if it is compiled into
the kernel, a single and un-bound netisr thread is enforced
regardless of tunable configuration.

In the default configuration, the new netisr implementation maintains
the same basic assumptions as the previous implementation: a single,
un-bound worker thread processes all deferred work, and direct dispatch
is enabled by default wherever possible.

Performance measurement shows a marginal performance improvement over
the old implementation due to the use of batched dequeue.

An rmlock is used to synchronize use and registration/unregistration
using the framework; currently, synchronized use is disabled
(replicating current netisr policy) due to a measurable 3%-6% hit in
ping-pong micro-benchmarking. It will be enabled once further rmlock
optimization has taken place. However, in practice, netisrs are
rarely registered or unregistered at runtime.

A new man page for netisr will follow, but since one doesn't currently
exist, it hasn't been updated.

This change is not appropriate for MFC, although the polling shutdown
handler should be merged to 7-STABLE.

Bump __FreeBSD_version.

Reviewed by: bz


193166 31-May-2009 zec

Introduce an interm userland-kernel API for creating vnets and
assigning ifnets from one vnet to another. Deletion of vnets is not
yet supported.

The interface is implemented as an ioctl extension so that no syscalls
had to be introduced. This should be acceptable given that the new
interface will be used for a short / interim period only, until the
new jail management framwork gains the capability of managing vnets.
This method for managing vimages / vnets has been in use for the past
7 years without any observable issues.

The userland tool to be used in conjunction with the interim API can be
found in p4: //depot/projects/vimage-commit2/src/usr.sbin/vimage/... and
will most probably never get commited to svn.

While here, bump copyright notices in kern_vimage.c and vimage.h to
cover work done in year 2009.

Approved by: julian (mentor)
Discussed with: bz, rwatson


193096 30-May-2009 attilio

When user_frac in the polling subsystem is low it is going to busy the
CPU for too long period than necessary. Additively, interfaces are kept
polled (in the tick) even if no more packets are available.
In order to avoid such situations a new generic mechanism can be
implemented in proactive way, keeping track of the time spent on any
packet and fragmenting the time for any tick, stopping the processing
as soon as possible.

In order to implement such mechanism, the polling handler needs to
change, returning the number of packets processed.
While the intended logic is not part of this patch, the polling KPI is
broken by this commit, adding an int return value and the new flag
IFCAP_POLLING_NOCOUNT (which will signal that the return value is
meaningless for the installed handler and checking should be skipped).

Bump __FreeBSD_version in order to signal such situation.

Reviewed by: emaste
Sponsored by: Sandvine Incorporated


193030 29-May-2009 rwatson

Make the rmlock(9) interface a bit more like the rwlock(9) interface:

- Add rm_init_flags() and accept extended options only for that variation.
- Add a flags space specifically for rm_init_flags(), rather than borrowing
the lock_init() flag space.
- Define flag RM_RECURSE to use instead of LO_RECURSABLE.
- Define flag RM_NOWITNESS to allow an rmlock to be exempt from WITNESS
checking; this wasn't possible previously as rm_init() always passed
LO_WITNESS when initializing an rmlock's struct lock.
- Add RM_SYSINIT_FLAGS().
- Rename embedded mutex in rmlocks to make it more obvious what it is.
- Update consumers.
- Update man page.


192895 27-May-2009 jamie

Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails. Child jails may be restricted more than their parents,
but never less. Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system. Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings. The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by: bz (mentor)


192763 25-May-2009 sam

rev bpf attach/detach event api to include the dlt


192669 23-May-2009 zec

V_irtualize the if_clone framework, thus allowing for clonable ifnets
to optionally have overlapping unit numbers if attached in different
vnets.

At this stage if_loop is the only clonable ifnet class that has been
extended to allow for such overlapping allocation of unit numbers, i.e.
in each vnet it is possible to have a lo0 interface. Other clonable ifnet
classes remain to operate with traditional semantics, i.e. each instance
of a clonable ifnet will be assigned a globally unique unit number,
regardless in which vnet such an ifnet becomes instantiated.

While here, garbage collect unused _lo_list field in struct vnet_net,
as well as improve indentation for #defines in sys/net/vnet.h.

The layout of struct vnet_net has changed, therefore bump
__FreeBSD_version.

This change has no functional impact on nooptions VIMAGE kernel builds.

Reviewed by: bz, brooks
Approved by: julian (mentor)


192608 22-May-2009 zec

Set ifp->if_afdata_initialized to 0 while holding IF_AFDATA_LOCK on ifp,
not after the lock has been released.

Reviewed by: bz
Discussed with: rwatson


192605 22-May-2009 zec

Introduce the if_vmove() function, which will be used in the future
for reassigning ifnets from one vnet to another.

if_vmove() works by calling a restricted subset of actions normally
executed by if_detach() on an ifnet in the current vnet, and then
switches to the target vnet and executes an appropriate subset of
if_attach() actions there.

if_attach() and if_detach() have become wrapper functions around
if_attach_internal() and if_detach_internal(), where the later
variants have an additional argument, a flag indicating whether a
full attach or detach sequence is to be executed, or only a
restricted subset suitable for moving an ifnet from one vnet to
another. Hence, if_vmove() will not call if_detach() and if_attach()
directly, but will call the if_detach_internal() and
if_attach_internal() variants instead, with the vmove flag set.

While here, staticize ifnet_setbyindex() since it is not referenced
from outside of sys/net/if.c.

Also rename ifccnt field in struct vimage to ifcnt, and do some minor
whitespace garbage collection where appropriate.

This change should have no functional impact on nooptions VIMAGE kernel
builds.

Reviewed by: bz, rwatson, brooks?
Approved by: julian (mentor)


192476 20-May-2009 qingli

When an interface address is removed and the last prefix
route is also being deleted, the link-layer address table
(arp or nd6) will flush those L2 llinfo entries that match
the removed prefix.

Reviewed by: kmacy


192313 18-May-2009 sam

add bpf_track eventhandler for monitoring bpf taps attached/detached

Reviewed by: csjp


192302 18-May-2009 rwatson

Garbage collect unused NETISR_{ATM,NETGRAPH,PPP} netisr constants.


192049 13-May-2009 rwatson

Garbage collect now-unused NETISR_FORCEQUEUE, which overrode the global
direct dispatch policy for specific protocols (NETISR_USB). We leave
the additional 'flags' argument to netisr_register() for the time being,
even though it is no longer required.


192048 13-May-2009 rwatson

Remove now-unused NETISR_USB.


191816 05-May-2009 zec

Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one. The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE(). Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by: julian (mentor)


191738 02-May-2009 zec

Make indentation more uniform accross vnet container structs.

This is a purely cosmetic / NOP change.

Reviewed by: bz
Approved by: julian (mentor)
Verified by: svn diff -x -w producing no output


191734 02-May-2009 zec

Unbreak options VIMAGE + nooptions INVARIANTS kernel builds.

Submitted by: julian
Approved by: julian (mentor)


191729 01-May-2009 thompsa

Reorder the bridge add and delete routines to avoid calling ifpromisc() with
the bridge lock held.


191692 30-Apr-2009 thompsa

Use the flowid if its available for selecting the tx port.


191688 30-Apr-2009 zec

Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance. Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:

1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables. As an example, V_ifnet becomes:

options VIMAGE: ((struct vnet_net *) vnet_net)->_ifnet
default build: vnet_net_0._ifnet
options VIMAGE_GLOBALS: ifnet

2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:

INIT_VNET_NET(ifp->if_vnet); becomes

struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];

3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals. If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.

4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet. options VIMAGE builds
will fill in those fields as required.

5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.

6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod. SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.

Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.

Reviewed by: bz, rwatson
Approved by: julian (mentor)


191608 27-Apr-2009 kmacy

replace IFQ_ENQUEUE + if_start with if_transmit


191607 27-Apr-2009 kmacy

replace IFQ_HANDOFF with if_transmit


191606 27-Apr-2009 kmacy

remove gratuitous memory barrier, a remnant of unified L2 / L3


191605 27-Apr-2009 kmacy

remove call to IFQ_HANDOFF is it called by if_transmit in the default case
and doing so allows the ifnet driver to define its own queueing mechanism


191603 27-Apr-2009 sam

use if_transmit intead of direct frobbing of the if_snd q; this is no
longer allowed

Identified by: rwatson
Reviewed by: kmacy


191548 26-Apr-2009 zec

In preparation for turning on options VIMAGE in next commits,
rearrange / replace / adjust several INIT_VNET_* initializer
macros, all of which currently resolve to whitespace.

Reviewed by: bz (an older version of the patch)
Approved by: julian (mentor)


191424 23-Apr-2009 rwatson

As with ifnet_byindex_ref(), don't return IFF_DYING interfaces from
ifunit_ref(). ifunit() continues to return them.

MFC after: 3 weeks


191423 23-Apr-2009 rwatson

Add ifunit_ref(), a version of ifunit(), that returns not just an
interface pointer, but also a reference to it.

Modify ifioctl() to use ifunit_ref(), holding the reference until
all ioctls, etc, have completed.

This closes a class of reader-writer races in which interfaces
could be removed during long-running ioctls, leading to crashes.
Many other consumers of ifunit() should now use ifunit_ref() to
avoid similar races.

MFC after: 3 weeks


191418 23-Apr-2009 rwatson

During if_detach(), invoke if_dead() to set the ifnet's function
pointers to "dead" implementations that no-op rather than invoking
the device driver. This would generally be unexpected and
possibly quite badly handled by most device drivers after
if_detach() has completed.

Reviewed by: bms
MFC after: 3 weeks


191417 23-Apr-2009 rwatson

Move portions of data structure initialization from if_attach() to
if_alloc(), and portions of data structure destruction from if_detach()
to if_free(). These changes leave more of the struct ifnet in a
safe-to-access condition between alloc and attach, and between detach
and free, and focus on attach/detach as stack usage events rather than
data structure initialization.

Affected fields include the linkstate task queue, if_afdata lock,
address lists, kqueue state, and MAC labels. ifq_attach() ifq_detach()
are not moved as ifq_attach() may use a queue length set by the device
driver between if_alloc() and if_attach().

MFC after: 3 weeks


191416 23-Apr-2009 rwatson

Add a new interface flag, IFF_DYING, which is set when a device driver
calls if_free(), and remains set if the refcount is elevated. IF_DYING
skips the bit in the if_flags bitmask previously used by IFF_NEEDSGIANT,
so that an MFC can be done without changing which bit is used, as
IFF_NEEDSGIANT is still present in 7.x.

ifnet_byindex_ref() checks for IFF_DYING and returns NULL if it is set,
preventing new references from by acquired by index, preventing
monitoring sysctls from seeing it. Other lookup mechanisms currently
do not check IFF_DYING, but may need to in the future.

MFC after: 3 weeks


191367 21-Apr-2009 rwatson

Start to address a number of races relating to use of ifnet pointers
after the corresponding interface has been destroyed:

(1) Add an ifnet refcount, ifp->if_refcount. Initialize it to 1 in
if_alloc(), and modify if_free_type() to decrement and check the
refcount.

(2) Add new if_ref() and if_rele() interfaces to allow kernel code
walking global interface lists to release IFNET_[RW]LOCK() yet
keep the ifnet stable. Currently, if_rele() is a no-op wrapper
around if_free(), but this may change in the future.

(3) Add new ifnet field, if_alloctype, which caches the type passed
to if_alloc(), but unlike if_type, won't be changed by drivers.
This allows asynchronous free's of the interface after the
driver has released it to still use the right type. Use that
instead of the type passed to if_free_type(), but assert that
they are the same (might have to rethink this if that doesn't
work out).

(4) Add a new ifnet_byindex_ref(), which looks up an interface by
index and returns a reference rather than a pointer to it.

(5) Fix if_alloc() to fully initialize the if_addr_mtx before hooking
up the ifnet to global lists.

(6) Modify sysctls in if_mib.c to use ifnet_byindex_ref() and release
the ifnet when done.

When this change is MFC'd, it will need to replace if_ispare fields
rather than adding new fields in order to avoid breaking the binary
interface. Once this change is MFC'd, if_free_type() should be
removed, as its 'type' argument is now optional.

This refcount is not appropriate for counting mbuf pkthdr references,
and also not for counting entry into the device driver via ifnet
function pointers. An rmlock may be appropriate for the latter.
Rather, this is about ensuring data structure stability when reaching
an ifnet via global ifnet lists and tables followed by copy in or out
of userspace.

MFC after: 3 weeks
Reported by: mdtancsa
Reviewed by: brooks


191365 21-Apr-2009 rwatson

Acquire the interface address list lock over some iterations over
if_addrhead. This closes some reader-writer races associated with
the address list.

MFC after: 2 weeks


191343 20-Apr-2009 rwatson

Acquire interfce address list lock while walking the interface address
list during tun device initialization.

MFC after: 2 weeks


191342 20-Apr-2009 rwatson

Acquire address list lock before walking an interface's address list to
identify possible jail addresses on it for IPv4 and IPv6.

MFC after: 2 weeks


191339 20-Apr-2009 rwatson

Prefer ifa_link (structure field) to ifa_list (macro alias for it).

MFC after: 2 weeks


191335 20-Apr-2009 rwatson

Prefer if_addrhead (FreeBSD) to if_addrlist (BSD compat) naming for the
interface address list in if_stf.c.

Acquire interface address list locks around address list access.

MFC after: 2 months


191324 20-Apr-2009 kmacy

simplify code by removing bit_fns and replacing with the use of a temporary mask


191258 19-Apr-2009 kmacy

update TODO list


191257 19-Apr-2009 kmacy

- put larger flowtable members at the end
- fix bug where tail pointer of the free list would not get advanced
- clear entry's next pointer when it is added to the freelist to avoid freeing
an entry that it still points to


191255 19-Apr-2009 kmacy

- Import infrastructure for caching flows as a means of accelerating L3 and L2 lookups
as well as providing stateful load balancing when used with RADIX_MPATH.
- Currently compiled in to i386 and amd64 but disabled by default, it can be enabled at
runtime with 'sysctl net.inet.flowtable.enable=1'.

- Embedded users can remove it entirely from the kernel by adding 'nooption FLOWTABLE' to
their kernel config files.

- A minimal hookup will be added to ip_output in a subsequent commit. I would like to see
more review before bringing in changes that require more churn.

Supported by: Bitgravity Inc.


191253 18-Apr-2009 rwatson

Remove IFF_NEEDSGIANT interface flag: we no longer provide ifnet-layer
infrastructure to support non-MPSAFE network device drivers.


191221 17-Apr-2009 kmacy

clarify state of llentry that is passed back


191217 17-Apr-2009 jhb

The vlan code has not required the miibus code since 6.0 when
if_link_state_change() was added and the vlan link-state hook was moved
out of miibus and into net/if.c.

MFC after: 1 month


191161 16-Apr-2009 kmacy

export if_qflush for use by driver if_qflush routines
only set ifp->if_{transmit, qflush} if not already set
KASSERT that neither or both are set


191159 16-Apr-2009 kmacy

add comment to llentry_update

Requested by: sam


191154 16-Apr-2009 kmacy

add utility routine for updating an struct llentry *


191148 16-Apr-2009 kmacy

Change if_output to take a struct route as its fourth argument in order
to allow passing a cached struct llentry * down to L2

Reviewed by: rwatson


191124 15-Apr-2009 kmacy

revert RTM_VERSION change - it doesn't do what I thought it does and changing breaks
ifconfig needlessly


191117 15-Apr-2009 kmacy

add an llentry to struct route{_in6} to allow it to be passed around with
the rtentry


191112 15-Apr-2009 zec

In the !VIMAGE_GLOBALS case, make sure not to call vnet_net_iattach()
both via the vnet_mod_register() framework and then directly, but only
once.

Reviewed by: bz
Approved by: julian (mentor)


191080 14-Apr-2009 kmacy

Extend route command:
- add show as alias for get
- add weights to allow mpath to do more than equal cost
- add sticky / nostick to disable / re-enable per-connection load balancing

This adds a field to rt_metrics_lite so network bits of world will need to be re-built.

Reviewed by: jeli & qingli


191037 14-Apr-2009 kmacy

call default if_qflush on ifq if default method isn't used by the driver


191033 14-Apr-2009 kmacy

Adapt buf_ring abstraction interface to allow consumers to interoperate with ALTQ


190951 11-Apr-2009 rwatson

Update stats in struct ipstat using four new macros, IPSTAT_ADD(),
IPSTAT_INC(), IPSTAT_SUB(), and IPSTAT_DEC(), rather than directly
manipulating the fields across the kernel. This will make it easier
to change the implementation of these statistics, such as using
per-CPU versions of the data structures.

MFC after: 3 days


190909 11-Apr-2009 zec

Introduce vnet module registration / initialization framework with
dependency tracking and ordering enforcement.

With this change, per-vnet initialization functions introduced with
r190787 are no longer directly called from traditional initialization
functions (which cc in most cases inlined to pre-r190787 code), but are
instead registered via the vnet framework first, and are invoked only
after all prerequisite modules have been initialized. In the long run,
this framework should allow us to both initialize and dismantle
multiple vnet instances in a correct order.

The problem this change aims to solve is how to replay the
initialization sequence of various network stack components, which
have been traditionally triggered via different mechanisms (SYSINIT,
protosw). Note that this initialization sequence was and still can be
subtly different depending on whether certain pieces of code have been
statically compiled into the kernel, loaded as modules by boot
loader, or kldloaded at run time.

The approach is simple - we record the initialization sequence
established by the traditional mechanisms whenever vnet_mod_register()
is called for a particular vnet module. The vnet_mod_register_multi()
variant allows a single initializer function to be registered multiple
times but with different arguments - currently this is only used in
kern/uipc_domain.c by net_add_domain() with different struct domain *
as arguments, which allows for protosw-registered initialization
routines to be invoked in a correct order by the new vnet
initialization framework.

For the purpose of identifying vnet modules, each vnet module has to
have a unique ID, which is statically assigned in sys/vimage.h.
Dynamic assignment of vnet module IDs is not supported yet.

A vnet module may specify a single prerequisite module at registration
time by filling in the vmi_dependson field of its vnet_modinfo struct
with the ID of the module it depends on. Unless specified otherwise,
all vnet modules depend on VNET_MOD_NET (container for ifnet list head,
rt_tables etc.), which thus has to and will always be initialized
first. The framework will panic if it detects any unresolved
dependencies before completing system initialization. Detection of
unresolved dependencies for vnet modules registered after boot
(kldloaded modules) is not provided.

Note that the fact that each module can specify only a single
prerequisite may become problematic in the long run. In particular,
INET6 depends on INET being already instantiated, due to TCP / UDP
structures residing in INET container. IPSEC also depends on INET,
which will in turn additionally complicate making INET6-only kernel
configs a reality.

The entire registration framework can be compiled out by turning on the
VIMAGE_GLOBALS kernel config option.

Reviewed by: bz
Approved by: julian (mentor)


190903 10-Apr-2009 mlaier

Follow up for r190895 It's not only the "all" group that is affected, but
all groups on the given interface.

PR: kern/130977, kern/131310
MFC after: 3 days (%vnet)


190895 10-Apr-2009 mlaier

Remove interfaces from IFG_ALL on detach. This cures a couple of pf panics
when using the "self" keyword in tables or as ()-style host address and
fixes "ifconfig -g all" output.

PR: kern/130977, kern/131310
Submitted by: Mikolaj Golub
MFC after: 3 days


190818 07-Apr-2009 ed

Add parentheses to under-parenthesized macro.

Submitted by: Christoph Mallon <christoph.mallon@gmx.de>


190787 06-Apr-2009 zec

First pass at separating per-vnet initializer functions
from existing functions for initializing global state.

At this stage, the new per-vnet initializer functions are
directly called from the existing global initialization code,
which should in most cases result in compiler inlining those
new functions, hence yielding a near-zero functional change.

Modify the existing initializer functions which are invoked via
protosw, like ip_init() et. al., to allow them to be invoked
multiple times, i.e. per each vnet. Global state, if any,
is initialized only if such functions are called within the
context of vnet0, which will be determined via the
IS_DEFAULT_VNET(curvnet) check (currently always true).

While here, V_irtualize a few remaining global UMA zones
used by net/netinet/netipsec networking code. While it is
not yet clear to me or anybody else whether this is the right
thing to do, at this stage this makes the code more readable,
and makes it easier to track uncollected UMA-zone-backed
objects on vnet removal. In the long run, it's quite possible
that some form of shared use of UMA zone pools among multiple
vnets should be considered.

Bump __FreeBSD_version due to changes in layout of structs
vnet_ipfw, vnet_inet and vnet_net.

Approved by: julian (mentor)


190751 05-Apr-2009 ed

Remove if_ppp(4) and if_sl(4).

Not only did these two drivers depend on IFF_NEEDSGIANT, they were
broken 7 months ago during the MPSAFE TTY import. if_ppp(4) has been
replaced by ppp(8). There is no replacement for if_sl(4).

If we see regressions in for example the ports tree, we should just use
__FreeBSD_version 800045 to check whether if_ppp(4) and if_sl(4) are
present. Version 800045 is used to denote the import of MPSAFE TTY.

Discussed with: rwatson, but also rwatson's IFF_NEEDSGIANT emails on the
lists.


190639 02-Apr-2009 rpaulo

Sync DLTs with latest libpcap version.


190508 28-Mar-2009 sam

enable setting the mac address of 802.11 devices


190151 20-Mar-2009 jamie

Call the interface's if_ioctl from ifioctl(), if the protocol didn't
handle the ioctl. There are other paths that already call it, but this
allows for a non-interface socket (like AF_LOCAL which ifconfig now
uses) to use a broader class of interface ioctls.

Approved by: bz (mentor), rwatson


189907 17-Mar-2009 scf

Remove the splimp()/splx() calls around the setting of the MTU. They are
no-op's that I inadvertently added. Even if locking is needed in general
for the ioctl's, setting a single long will not need it due to the operation
being atomic.

Reported by: rwatson


189873 16-Mar-2009 rwatson

Define and use two macros for loopback checksum offload:

LO_CSUM_FEATURES - a bitmask of supported transmit offload features, which
will be stored in if_hwassist if IFCAP_TXCSUM is enabled, and be cleared
from mbuf packet header csum flags on transmit. (1)

LO_CSUM_SET - a bitmask of supported receive offload features, which will
be set on the mbuf packet header csum flags on transmit if IFCAP_RXCSUM
is enabled.

While here, fix SCTP offload for loopback: offer generation on the
transmit side, don't just skip validation on the receive side.

Obtained from: DragonflyBSD (1)
MFC after: 1 week


189871 16-Mar-2009 rwatson

if_hwassist should be initialized with CSUM, rather than IFCAP, flags.

Submitted by: yongari
MFC after: 1 week


189866 16-Mar-2009 scf

Add the SIOCSIFMTU ioctl handling directly to tap(4) permitting it to
have its MTU set higher than 1500 (ETHERMTU). Its new limit is now
65535 as enforced by ifhwioctl() in if.c

This allows a tap(4) device to be added to a bridge, which requires all
interface members to have the same MTU, with an interface configured for
jumbo frames. QEMU may now connect to a network via tap(4) without
requiring the real interface to have its MTU set to 1500 or lower.

Reviewed by: rpaulo, bms
MFC after: 1 week


189863 15-Mar-2009 rwatson

Teach the loopback interface about checksum generation and validation
avoidance:

- Enable setting the RXCSUM and TXCSUM flags for loopback interfaces;
set both by default.
- When RXCSUM is set, flag packets sent over the loopback interface as
having checked and valid IP, UDP, TCP checksums so that higher
protocol layers won't check them.
- Always clear CSUM_{IP,UDP_TCP} checksum required flags on transmit,
as they will have gotten there as a result of TXCSUM being set.

This is done only for packets explicitly sent over the loopback, not
simulated loopback via if_simloop() due to !SIMPLEX interfaces, etc.

Note that enabling TXCSUM but not RXCSUM will lead to unhappiness, as
checksums won't be generated but will be validated.

Kris reports that this leads to significant performance improvements
in loopback benchmarking with TCP and UDP for throughput:

RXCSUM RXCSUM+TXCSUM
TCP 15% 37%
UDP 10% 74%

Update man page.

Reviewed by: sam
Tested by: kris
MFC after: 1 week


189851 15-Mar-2009 rwatson

Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced
in FreeBSD 5.x to allow network device drivers to run with Giant
despite the network stack being Giant-free. This significantly
simplifies calls into ioctl() on network interfaces, especially
in the multicast code, as well as eliminates deferred invocation
of interface if_start routines.

Disable the build on device drivers still depending on
IFF_NEEDSGIANT as they no longer compile. They will be removed
in a few weeks if they haven't been made MPSAFE in that time.
Disabled drivers:

if_ar
if_axe
if_aue
if_cdce
if_cue
if_kue
if_ray
if_rue
if_rum
if_sr
if_udav
if_ural
if_zyd

Drivers that were already disabled because of tty changes:

if_ppp
if_sl

Discussed on: arch@


189800 14-Mar-2009 sam

remove stray ;


189620 10-Mar-2009 csjp

Disable zerocopy by default for now. It's causing some problems in pcap
consumers which fork after the shared pages have been setup. pflogd(8)
is an example. The problem is understood and there is a fix coming in
shortly.

Folks who want to continue using it can do so by setting

net.bpf.zerocopy_enable

to 1.

Discussed with: rwatson


189501 07-Mar-2009 rwatson

When resetting a BPF descriptor, properly check that zero-copy buffers
are not currently owned by userspace before clearing or rotating them.

Otherwise we may not play by the rules of the shared memory protocol,
potentially corrupting packet data or causing userspace applications
that are playing by the rules to spin due to being notified that a
buffer is complete but the shared memory header not reflecting that.

This behavior was seen with pflogd by a number of reporters; note that
this fix is not sufficient to get pflogd properly working with
zero-copy BPF, due to pflogd opening the BPF device before forking,
leading to the shared memory buffer not being propery inherited in the
privilege-separated child. We're still deciding how to fix that
problem.

This change exposes buffer-model specific strategy information in
reset_d(), which will be fixed at a later date once we've decided how
best to improve the BPF buffer abstraction.

Reviewed by: csjp
Reported by: keramida


189494 07-Mar-2009 marius

On architectures with strict alignment requirements compensate
the misalignment of the IP header that prepending the EtherIP
header might have caused.

PR: 131921
MFC after: 1 week


189490 07-Mar-2009 csjp

Mark the bpf stats sysctl as being mpsafe. We do not require
Giant here.


189489 07-Mar-2009 rwatson

Clarify some comments, fix some types, and rename ZBUF_FLAG_IMMUTABLE to
ZBUF_FLAG_ASSIGNED to make it clear why the buffer can't be written to:
it is assigned to userspace.


189344 04-Mar-2009 bms

Reserve a netisr slot for the IGMPv3 output queue.


189286 02-Mar-2009 csjp

Switch the default buffer mode in bpf(4) to zero-copy buffers.

Discussed with: rwatson


189230 01-Mar-2009 rwatson

Do a bit of struct ifnet cleanup in preparation for 8.0: group function
pointers together, move padding to the bottom of the structure, and add
two new integer spares due to attrition over time. Remove unused spare
"flags" field, we can use one of the spare ints if we need it later.

This change requires a rebuild of device driver modules that depend on
the layout of ifnet for binary compatibility reasons.

Discussed with: kmacy


189225 01-Mar-2009 bz

Add size-guards evaluated at compile-time to the main struct vnet_*
which are not in a module of their own like gif.

Single kernel compiles and universe will fail if the size of the struct
changes. Th expected values are given in sys/vimage.h.
See the comments where how to handle this.

Requested by: peter


189106 27-Feb-2009 bz

For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.


188675 16-Feb-2009 luigi

we need if_var.h not if.h


188674 16-Feb-2009 luigi

remove unnecessary forward declaration


188668 16-Feb-2009 rwatson

IFF_NEEDSGIANT will no longer be supported, so remove compatibility code
from if_sppp framework for interfaces requiring Giant.


188626 15-Feb-2009 luigi

remove unnecessary #include from vnet.h and vinet.h

Approved by: Marko Zec


188594 13-Feb-2009 thompsa

bridge_delete_member is called via the event handler from if_detach
after the LLADDR is reclaimed which causes a null pointer deref with
inherit_mac enabled. Record the ifnet pointer of the interface and then compare
that to find when to re-assign the bridge address.

Submitted by: sam


188575 13-Feb-2009 maxim

o In case of the error do not forget to deallocate a cloned device unit.

PR: kern/131642
Submitted by: Dmitrij Tejblum
MFC after: 1 week


188546 13-Feb-2009 rwatson

Remove unused ifaddr local variable in ioctl routine.

MFC after: 3 days


188149 05-Feb-2009 jamie

Call prison_if from rtm_get_jailed, instead of splitting it out into
prison_check_ip4 and prison_check_ip6. As prison_if includes a jailed()
check, remove that check before calling rtm_get_jailed.

Approved by: bz (mentor)


188144 05-Feb-2009 jamie

Standardize the various prison_foo_ip[46] functions and prison_if to
return zero on success and an error code otherwise. The possible errors
are EADDRNOTAVAIL if an address being checked for doesn't match the
prison, and EAFNOSUPPORT if the prison doesn't have any addresses in
that address family. For most callers of these functions, use the
returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or
EINVAL.

Always include a jailed() check in these functions, where a non-jailed
cred always returns success (and makes no changes). Remove the explicit
jailed() checks that preceded many of the function calls.

Approved by: bz (mentor)


188066 03-Feb-2009 rrs

Adds support for SCTP checksum offload. This means
we, like TCP and UDP, move the checksum calculation
into the IP routines when there is no hardware support
we call into the normal SCTP checksum routine.

The next round of SCTP updates will use
this functionality. Of course the IGB driver needs
a few updates to support the new intel controller set
that actually does SCTP csum offload too.

Reviewed by: gnn, rwatson, kmacy


187946 31-Jan-2009 bz

Like with r185713 make sure to not leak a lock as rtalloc1(9) returns
a locked route. Thus we have to use RTFREE_LOCKED(9) to get it unlocked
and rtfree(9)d rather than just rtfree(9)d.

Since the PR was filed, new places with the same problem were added
with new code. Also check that the rt is valid before freeing it
either way there.

PR: kern/129793
Submitted by: Dheeraj Reddy <dheeraj@ece.gatech.edu>
MFC after: 2 weeks
Committed from: Bugathon #6


187684 25-Jan-2009 bz

For consistency with prison_{local,remote,check}_ipN rename
prison_getipN to prison_get_ipN.

Submitted by: jamie (as part of a larger patch)
MFC after: 1 week


187648 23-Jan-2009 jhb

Only start the if_slowtimo timer (which drives the if_watchdog methods of
network interfaces) if we have at least one interface with an if_watchdog
routine.

MFC after: 2 weeks


187328 16-Jan-2009 qingli

The RTF_LLINFO was revived unconditionally, but within the kernel the
check on the sysctl argument value being RTF_LLINFO is conditioned on
the COMPAT_ROUTE_FLAGS kernel option. This mismatch caused the L2
table retrieval failure, and the arp/ndp -an command displays empty L2
tables.

Reviewed by: pjd


187094 12-Jan-2009 qingli

Revive the RTF_LLINFO flag in route.h. The kernel code is guarded
by the new kernel option COMPAT_ROUTE_FLAGS for binary backward
compatibility. The RTF_LLDATA flag maps to the same value as RTF_LLINFO.
RTF_LLDATA is used by the arp and ndp utilities. The RTF_LLDATA flag is
always returned to the userland regardless whether the COMPAT_ROUTE_FLAGS
is defined.


187039 10-Jan-2009 rwatson

Do invoke mac_ifnet_check_transmit() and mac_ifnet_create_mbuf()
in the loopback and synthetic loopback code so that packets are
access control checked and relabeled. Previously, the MAC
Framework enforced that packets sent over the loopback weren't
relabeled, but this will allow policies to make explicit choices
about how and whether to relabel packets on the loopback. Also,
for SIMPLEX devices, this produces more consistent behavior for
looped back packets to the local MAC address by labeling those
packets as coming from the interface.

Discussed with: csjp
Obtained from: TrustedBSD Project


186986 10-Jan-2009 bz

Rather than using the cred from curthread, take it from the thread
referenced in the sysctl req argument.

Reviewed by: rwatson
MFC after: 2 weeks


186980 09-Jan-2009 bz

Restrict arp, ndp and theoretically the FIB listing (if not
read with libkvm) to the addresses of a prison, when inside a
jail. [1]
As the patch from the PR was pre-'new-arp', add checks to the
llt_dump handlers as well.

While touching RTM_GET in route_output(), consistently use
curthread credentials rather than the creds from the socket
there. [2]

PR: kern/68189
Submitted by: Mark Delany <sxcg2-fuwxj@qmda.emu.st> [1]
Discussed with: rwatson [2]
Reviewed by: rwatson
MFC after: 4 weeks


186956 09-Jan-2009 bz

Take the cred from curthread rather than curproc as curproc would need
locking but the credential from curthread (usually) never changes.

Discussed with: jhb
MFC after: 2 weeks


186705 02-Jan-2009 qingli

The log message should terminate with a newline instead
of a tab character.


186500 26-Dec-2008 qingli

This checkin addresses a couple of issues:
1. The "route" command allows route insertion through the interface-direct
option "-iface". During if_attach(), an sockaddr_dl{} entry is created
for the interface and is part of the interface address list. This
sockaddr_dl{} entry describes the interface in detail. The "route"
command selects this entry as the "gateway" object when the "-iface"
option is present. The "arp" and "ndp" commands also interact with the
kernel through the routing socket when adding and removing static L2
entries. The static L2 information is also provided through the
"gateway" object with an AF_LINK family type, similar to what is
provided by the "route" command. In order to differentiate between
these two types of operations, a RTF_LLDATA flag is introduced. This
flag is set by the "arp" and "ndp" commands when issuing the add and
delete commands. This flag is also set in each L2 entry returned by the
kernel. The "arp" and "ndp" command follows a convention where a RTM_GET
is issued first followed by a RTM_ADD/DELETE. This RTM_GET request fills
in the fields for a "rtm" object, which is reinjected into the kernel by
a subsequent RTM_ADD/DELETE command. The entry returend from RTM_GET
is a prefix route, so the RTF_LLDATA flag must be specified when issuing
the RTM_ADD/DELETE messages.

2. Enforce the convention that NET_RT_FLAGS with a 0 w_arg is the
specification for retrieving L2 information. Also optimized the
code logic.

Reviewed by: julian


186497 25-Dec-2008 qingli

The "tun?" dev need not be opened at all. One is allowed to perform
the following operations, e.g.:
1) ifconfig tun0 create
2) ifconfig tun0 10.1.1.1 10.1.1.2
3) route add -net 192.103.54.0/24 -iface tun0
4) ifconfig tun0 destroy
If cv wait on the TUN_CLOSED flag, then the last operation (4) will
block forever.

Revert the previous changes and fix the mtx_unlock() leak.


186483 25-Dec-2008 kmacy

- Close a race during which the open flag could be cleared but the tun_softc would still be referenced
by adding a separate TUN_CLOSED flag that is set after tunclose is done referencing it.

- drop the tun_mtx after the flag check to avoid holding it across if_detach which can recurse in to
if_tun.c


186391 22-Dec-2008 qingli

Provide a condition variable to delay the cloned interface
destroy operation until the referenced clone device has
been closed by the process properly. The behavior is now
consistently with the previous release.

Reviewed by: Kip Macy


186275 18-Dec-2008 kmacy

if_rtdel is always called with the RADIX_NODE_HEAD lock held


186266 18-Dec-2008 kmacy

add ifnet_byindex_locked to allow for use of IFNET_RLOCK


186260 17-Dec-2008 gnn

Add TWINAX (Twin Axial Copper for 10G networking) media types.

Add code to the Chelsio driver so that it can recognize different
module types which may be plugged into it, including SR, LR lasers
and TWINAX copper cables.

Obtained from: Chelsio Inc.
MFC after: 1 week


186255 17-Dec-2008 thompsa

- Protect against sc->sc_primary being null
- Initialise speed where its used


186254 17-Dec-2008 thompsa

Update the interface baudrate taking into account the max speed for the
different aggregation protocols.


186217 17-Dec-2008 qingli

Remove the rt argument from nd6_storelladdr() because
rt is no longer accessed.


186213 17-Dec-2008 kmacy

Keep stats in drbr_enqueue

Discussed with: ps


186209 17-Dec-2008 kmacy

avoid trying to acquire a shared lock while holding an exclusive lock
by making the ifnet lock acquisition exclusive


186207 17-Dec-2008 kmacy

merge in 2 buf_ring helper routines for enqueueing and freeing buf_rings


186199 17-Dec-2008 kmacy

convert ifnet and afdata locks from mutexes to rwlocks


186195 16-Dec-2008 thompsa

Also propagate the if_hwassist value to the parent so that cksum offload works.

Submitted by: Tom Hicks (thicks_averesys.com)


186187 16-Dec-2008 rwatson

A few locking fixes and cleanups to pfil hook registration,
unregistration, and execution:

- Add some brackets for clarity and trim a bit of vertical whitespace.
- Remove comments that may not contribute to clarity, such as "Lock"
before acquiring a lock and "Get memory" before allocating memory.
- During hook registration, don't drop pfil_list_lock between checking
for a duplicate and registering the hook, as this leaves a race
condition by failing to enforce the "no duplicate hooks" invariant.
- Don't lock the hook during registration, since it's not yet in use.
- Document assumption that hooks will be quiesced before being
unregistered.
- Don't write-lock hooks during removal because they are assumed
quiesced.
- Rename "done" label to "locked_error" to be clear that it's an error
path on the way out of hook execution.

MFC after: pretty soon


186176 16-Dec-2008 kmacy

remove assertion checks for now - ipfw uses its own lock for protecting its radix tree instance


186167 16-Dec-2008 kmacy

style and spelling fix


186166 16-Dec-2008 kmacy

assert that the radix node head is locked when manipulating the tree


186149 16-Dec-2008 kmacy

add macro for destroying an llentry's rwlock


186121 15-Dec-2008 kmacy

Add arpv2 management code


186119 15-Dec-2008 qingli

This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion


186061 13-Dec-2008 thompsa

Dont leak the rnh lock on error.


186048 13-Dec-2008 bz

Second round of putting global variables, which were virtualized
but formerly missed under VIMAGE_GLOBAL.

Put the extern declarations of the virtualized globals
under VIMAGE_GLOBAL as the globals themsevles are already.
This will help by the time when we are going to remove the globals
entirely.

Sponsored by: The FreeBSD Foundation


186036 13-Dec-2008 rwatson

Line wrap very long line in struct packet_filter_hook definition.

MFC after: pretty soon


185963 12-Dec-2008 csjp

Consider processes attaching/detaching from tun(4) devices as being link
state changes. This change modifies tunopen and tunclose to call the
if_link_state_change() function. Among other things, this will result in
devd(8) receiving events from devctl(4) for linkup/link down. This allows
us to do several useful things, including initializing tunnel parameters
and adding routes.

Discussed on: freebsd-net@
MFC after: 2 weeks


185937 11-Dec-2008 bz

Put a global variables, which were virtualized but formerly
missed under VIMAGE_GLOBAL.

Start putting the extern declarations of the virtualized globals
under VIMAGE_GLOBAL as the globals themsevles are already.
This will help by the time when we are going to remove the globals
entirely.

While there garbage collect a few dead externs from ip6_var.h.

Sponsored by: The FreeBSD Foundation


185931 11-Dec-2008 bz

Whitespace changes only - tabs must have been converted to spaces
somehow, when moving the code from p4 to svn.

Sponsored by: The FreeBSD Foundation


185895 10-Dec-2008 zec

Conditionally compile out V_ globals while instantiating the appropriate
container structures, depending on VIMAGE_GLOBALS compile time option.

Make VIMAGE_GLOBALS a new compile-time option, which by default will not
be defined, resulting in instatiations of global variables selected for
V_irtualization (enclosed in #ifdef VIMAGE_GLOBALS blocks) to be
effectively compiled out. Instantiate new global container structures
to hold V_irtualized variables: vnet_net_0, vnet_inet_0, vnet_inet6_0,
vnet_ipsec_0, vnet_netgraph_0, and vnet_gif_0.

Update the VSYM() macro so that depending on VIMAGE_GLOBALS the V_
macros resolve either to the original globals, or to fields inside
container structures, i.e. effectively

#ifdef VIMAGE_GLOBALS
#define V_rt_tables rt_tables
#else
#define V_rt_tables vnet_net_0._rt_tables
#endif

Update SYSCTL_V_*() macros to operate either on globals or on fields
inside container structs.

Extend the internal kldsym() lookups with the ability to resolve
selected fields inside the virtualization container structs. This
applies only to the fields which are explicitly registered for kldsym()
visibility via VNET_MOD_DECLARE() and vnet_mod_register(), currently
this is done only in sys/net/if.c.

Fix a few broken instances of MODULE_GLOBAL() macro use in SCTP code,
and modify the MODULE_GLOBAL() macro to resolve to V_ macros, which in
turn result in proper code being generated depending on VIMAGE_GLOBALS.

De-virtualize local static variables in sys/contrib/pf/net/pf_subr.c
which were prematurely V_irtualized by automated V_ prepending scripts
during earlier merging steps. PF virtualization will be done
separately, most probably after next PF import.

Convert a few variable initializations at instantiation to
initialization in init functions, most notably in ipfw. Also convert
TUNABLE_INT() initializers for V_ variables to TUNABLE_FETCH_INT() in
initializer functions.

Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


185849 10-Dec-2008 kmacy

fix a reported panic when adding a route and one hit here when deleting a route

- pass RTF_RNH_LOCKED to rtalloc1_fib in 2 cases where the lock is held
- make sure the rnh lock is held across rt_setgate and rt_getifa_fib


185810 09-Dec-2008 bz

It does not make much sense to include net/route.h twice.
Remove one #include.


185808 09-Dec-2008 bz

Add rwlock.h (and lock.h for that) to keep no-INET kernels compiling
after RADIX_NODE_HEAD_{,UN}LOCK() were added. Must have been "learned"
by pollution before (most likely: route.h -> radix.h -> rwlock.h)


185807 09-Dec-2008 bz

Fix a bug introduced in r185747: rather than dereferencing an uninitialized
*rt to something undefined, use the fibnum that came in as function argument.

Found with: Coverity Prevent(tm)
CID: 4168


185774 08-Dec-2008 kmacy

- avoid recursively locking the radix node head lock
- assert that it is held if RTF_RNH_LOCKED is not passed


185751 08-Dec-2008 imp

Add missing include to sys/lock.h before sys/rwlock.h


185747 07-Dec-2008 kmacy

- convert radix node head lock from mutex to rwlock
- make radix node head lock not recursive
- fix LOR in rtexpunge
- fix LOR in rtredirect

Reviewed by: sam


185571 02-Dec-2008 bz

Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation


185435 29-Nov-2008 bz

MFp4:
Bring in updated jail support from bz_jail branch.

This enhances the current jail implementation to permit multiple
addresses per jail. In addtion to IPv4, IPv6 is supported as well.
Due to updated checks it is even possible to have jails without
an IP address at all, which basically gives one a chroot with
restricted process view, no networking,..

SCTP support was updated and supports IPv6 in jails as well.

Cpuset support permits jails to be bound to specific processor
sets after creation.

Jails can have an unrestricted (no duplicate protection, etc.) name
in addition to the hostname. The jail name cannot be changed from
within a jail and is considered to be used for management purposes
or as audit-token in the future.

DDB 'show jails' command was added to aid debugging.

Proper compat support permits 32bit jail binaries to be used on 64bit
systems to manage jails. Also backward compatibility was preserved where
possible: for jail v1 syscalls, as well as with user space management
utilities.

Both jail as well as prison version were updated for the new features.
A gap was intentionally left as the intermediate versions had been
used by various patches floating around the last years.

Bump __FreeBSD_version for the afore mentioned and in kernel changes.

Special thanks to:
- Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches
and Olivier Houchard (cognet) for initial single-IPv6 patches.
- Jeff Roberson (jeff) and Randall Stewart (rrs) for their
help, ideas and review on cpuset and SCTP support.
- Robert Watson (rwatson) for lots and lots of help, discussions,
suggestions and review of most of the patch at various stages.
- John Baldwin (jhb) for his help.
- Simon L. Nielsen (simon) as early adopter testing changes
on cluster machines as well as all the testers and people
who provided feedback the last months on freebsd-jail and
other channels.
- My employer, CK Software GmbH, for the support so I could work on this.

Reviewed by: (see above)
MFC after: 3 months (this is just so that I get the mail)
X-MFC Before: 7.2-RELEASE if possible


185419 28-Nov-2008 zec

Unhide declarations of network stack virtualization structs from
underneath #ifdef VIMAGE blocks.

This change introduces some churn in #include ordering and nesting
throughout the network stack and drivers but is not expected to cause
any additional issues.

In the next step this will allow us to instantiate the virtualization
container structures and switch from using global variables to their
"containerized" counterparts.

Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


185348 26-Nov-2008 zec

Merge more of currently non-functional (i.e. resolving to
whitespace) macros from p4/vimage branch.

Do a better job at enclosing all instantiations of globals
scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks.

De-virtualize and mark as const saorder_state_alive and
saorder_state_any arrays from ipsec code, given that they are never
updated at runtime, so virtualizing them would be pointless.

Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


185253 24-Nov-2008 sam

use consistent style


185164 22-Nov-2008 kmacy

convert calls to IFQ_HANDOFF to if_transmit


185162 22-Nov-2008 kmacy

- bump __FreeBSD version to reflect added buf_ring, memory barriers,
and ifnet functions

- add memory barriers to <machine/atomic.h>
- update drivers to only conditionally define their own

- add lockless producer / consumer ring buffer
- remove ring buffer implementation from cxgb and update its callers

- add if_transmit(struct ifnet *ifp, struct mbuf *m) to ifnet to
allow drivers to efficiently manage multiple hardware queues
(i.e. not serialize all packets through one ifq)
- expose if_qflush to allow drivers to flush any driver managed queues

This work was supported by Bitgravity Inc. and Chelsio Inc.


185088 19-Nov-2008 zec

Change the initialization methodology for global variables scheduled
for virtualization.

Instead of initializing the affected global variables at instatiation,
assign initial values to them in initializer functions. As a rule,
initialization at instatiation for such variables should never be
introduced again from now on. Furthermore, enclose all instantiations
of such global variables in #ifdef VIMAGE_GLOBALS blocks.

Essentialy, this change should have zero functional impact. In the next
phase of merging network stack virtualization infrastructure from
p4/vimage branch, the new initialization methology will allow us to
switch between using global variables and their counterparts residing in
virtualization containers with minimum code churn, and in the long run
allow us to intialize multiple instances of such container structures.

Discussed at: devsummit Strassburg
Reviewed by: bz, julian
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


184837 11-Nov-2008 kmacy

- Use RTFREE_LOCKED macro
- Don't clone route on lookup (was causing arpresolve to fail)
- u_int_32 -> uint32_t

Reviewed by: qingli
MFC after: 3 days


184726 06-Nov-2008 bz

Include if_arp.h for IFP2AC so that the netgraph parts in if.c
are happy even if compiled without INET or INET6.

MFC after: 2 months


184711 06-Nov-2008 bz

Check for INET not AF_INET in #ifdef. Makes it compile without INET.

MFC after: 2 months


184710 06-Nov-2008 bz

Hide an unused variable in case we compile without INET.
Include ethernet.h and if_arp.h directly so that the constants are
always defined.
Makes token compile without INET.

MFC after: 2 months


184709 06-Nov-2008 bz

Hide an unused variable in case we compile without INET.
Include ethernet.h directly so that the constants are always defined.
Makes fddi compile without INET.

MFC after: 2 months


184682 05-Nov-2008 bz

Make compile without INET.

The change is modelled after the way it was done for (without) INET6.

MFC after: 2 months


184681 05-Nov-2008 bz

Hide the IPv4 init function if the kernel is compiled without INET.
It is not used in that case and would not compile.


184680 05-Nov-2008 bz

Make compile without INET.

MFC after: 2 months


184679 05-Nov-2008 bz

Make tun(4) compile without INET.

MFC after: 2 months


184678 05-Nov-2008 bz

Do only define the variable if either INET or INET6 is defined.

To prevent it from compiling without INET and INET6 we should put
an explicit #error in there like we have in other files,
but not rely on an unused variable.

MFC after: 2 months


184214 23-Oct-2008 des

Fix a number of style issues in the MALLOC / FREE commit. I've tried to
be careful not to fix anything that was already broken; the NFSv4 code is
particularly bad in this respect.


184205 23-Oct-2008 des

Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months


183550 02-Oct-2008 zec

Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation


183498 30-Sep-2008 glebius

Do not mangle if_oerrors of the underlying interface. This counter
belongs solely to the driver.
We don't lose any statistics with this change, because in a error
case the drop counter on the interface output queue is always incremented.

Reviewed by: thompsa


183397 27-Sep-2008 ed

Replace all calls to minor() with dev2unit().

After I removed all the unit2minor()/minor2unit() calls from the kernel
yesterday, I realised calling minor() everywhere is quite confusing.
Character devices now only have the ability to store a unit number, not
a minor number. Remove the confusion by using dev2unit() everywhere.

This commit could also be considered as a bug fix. A lot of drivers call
minor(), while they should actually be calling dev2unit(). In -CURRENT
this isn't a problem, but it turns out we never had any problem reports
related to that issue in the past. I suspect not many people connect
more than 256 pieces of the same hardware.

Reviewed by: kib


183381 26-Sep-2008 ed

Remove unit2minor() use from kernel code.

When I changed kern_conf.c three months ago I made device unit numbers
equal to (unneeded) device minor numbers. We used to require
bitshifting, because there were eight bits in the middle that were
reserved for a device major number. Not very long after I turned
dev2unit(), minor(), unit2minor() and minor2unit() into macro's.
The unit2minor() and minor2unit() macro's were no-ops.

We'd better not remove these four macro's from the kernel, because there
is a lot of (external) code that may still depend on them. For now it's
harmless to remove all invocations of unit2minor() and minor2unit().

Reviewed by: kib


183351 25-Sep-2008 dwmalone

Some people's 6to4 routers seem to have been blowing up because of
the unlocked route caching in if_stf. Add a mutex that protects
access to cached route. This seemed to fix problems for Pekka Savola.

Nick Sayer had similar problems, and in his case completly disabling
the route cache seemed to help. Add a sysctl net.link.stf.route_cache
that can be used to turn off route caching in if_stf.

PR: 122283
MFC after: 2 weeks
Tested by: Pekka Savola, Nick Sayer.


183210 20-Sep-2008 thompsa

Fix clone destruction, can't use the simple api because that does not remove
the ifnet from cloner's list. Expose if_clone_destroyif api to do this.

Submitted by: sam


183200 20-Sep-2008 zec

Move #defines for MRT-related constants from net/route.c to
net/route.h, because the vnet code will need those constants as
well.

Reviewed by: bz
Approved by: julian (mentor)
MFC after: never


183160 18-Sep-2008 thompsa

Move the protocol and port count checks to outside the loop, these conditions
can not change while we have the lock so no point retesting.


183135 18-Sep-2008 thompsa

Make sure there is at least one port to avoid divide by zero when choosing the
tx port.

PR: kern/122794
MFC after: 3 days


183034 15-Sep-2008 julian

Hey, committed the same typo twice! must be a record


183032 15-Sep-2008 julian

rewrite rt_check. Ztake into account that whiel teh rtentry is unlocked,
someone else might change it, so after we re-acquire the lock on it,
we need to check it is still valid. People have been panicing in this
function due to soem edge cases which I have hopefully removed.

Reviewed by: keramida @
Obtained from: 1 week


183017 14-Sep-2008 julian

come on Julian, make up if you're committing one change or the other.
fix braino


183013 14-Sep-2008 julian

Revert a part of the MRT commit that proved un-needed.
rt_check() in its original form proved to be sufficient and
rt_check_fib() can go away (as can its evil twin in_rt_check()).

I believe this does NOT address the crashes people have been seeing
in rt_check.

MFC after: 1 week


182904 10-Sep-2008 bms

Add a missing break statement; IFDATA_LINKSPECIFIC would fall through
to IFDATA_DRIVERNAME otherwise.

Reviewed by: brooks
MFC after: 1 week


182880 08-Sep-2008 emax

Add new TAPGIFNAME tap(4) character device ioctl. This is a
convenient shortcut to obtain network interface name using
file descriptor for character device.

Obtained from: NetBSD
MFC after: 1 week


182862 08-Sep-2008 thompsa

Put the bridge mac inheritance behind a sysctl with the default off as this
still needs all the edge cases fixed.

Submitted by: Eygene Ryabinkin


182801 05-Sep-2008 julian

Be consistent about whether these multi-lined macros are separated by
a blank line. Some were, some weren't. Decide in favour of the line
as it matches what an inline would do, and it's easier to read.


182615 01-Sep-2008 brooks

Wrap a line that became too long with the addition of V_.

(This file contains many more unwrapped or badly wrapped lines.)


182462 29-Aug-2008 jkim

Make bpf_maxinsns visible from ng_bpf.c.
Pass me the pointyhat, please.


182456 29-Aug-2008 jkim

Fix the last missing parentheses for a return statement in bpf_filter.c.


182455 29-Aug-2008 jkim

More convergence towards style(9).


182454 29-Aug-2008 jkim

- Directly match code wherever possible instead of using macros.
- Macrofy bitmap table lookup. Constify the table while I am here.
- Add missing continue statements in the for loop.

Functionally it should be the last remaining fix from:

PR: kern/89752
MFC after: 1 month


182425 29-Aug-2008 jkim

Simplify jump instruction range checks.

MFC after: 1 month


182413 28-Aug-2008 jfv

Fix to bug kern/126850. Only dispatch event hander if the
interface had a parent (was attached).

Reviewed by: EvilSam
MFC after: 1 week


182412 28-Aug-2008 jkim

Check invalid BPF codes from bpf_validate(9).

Note that it is not critical because bpf_filter(9) returns zero
when it encounters invalid code at run time.

MFC after: 1 month


182380 28-Aug-2008 jkim

Validate scratch memory addresses for BPF_STX and BPF_LDX|BPF_MEM.
A badly written filter was able to reference invalid addresses,
even cause kernel crash.

MFC after: 3 days


182376 28-Aug-2008 jkim

Initialize scratch memory for JIT-compiled filter when it is allocated.
Previously it may have contained unnecessary (even sensitive) data from
the previous allocation.
As a (good) side effect, scratch memory may be used to store the previous
filter state(s) safely because it is allocated and freed with filter itself.
However, use it carefully because bpf_filter(9) does not have this behavior.

MFC after: 3 days


182285 27-Aug-2008 emaste

Move CTASSERT of ether header sizes out of the header file and into
if_ethersubr.c. CTASSERT is implemented using a dummy typedef, which if
used in a header file may conflict with another CTASSERT in a source file
using that header.

I'll make a note of this in CTASSERT's man page.

Approved by: imp


182220 26-Aug-2008 jkim

Move empty filter handling to MI source.

MFC after: 3 days


182197 26-Aug-2008 jkim

Revert the previous commit to fix buildworld for now.

We have constified 'struct bpf_insn *' for bpf_filter(9) and bpf_validate(9)
since r1.19 but they conflict with pcap.h from libpcap.


182184 26-Aug-2008 jkim

Make sys/net/bpf_filter.c build cleanly on user land.


182173 25-Aug-2008 jkim

Fix a typo in copyrights.


182172 25-Aug-2008 jkim

Embed scratch memory in the filter structure.

MFC after: 3 days


182121 24-Aug-2008 imp

MFp4:

Remove all the OtherBSD ifdefs. They are very out of date at this
point. OtherBSD doesn't use this file verbatim, and they don't have
FreeBSD ifdefs in their code.

Reviewed by: bms@, joerg@


182106 24-Aug-2008 bz

Make the checks for ptp interfaces in ifa_ifwithdstaddr() and
ifa_ifwithnet() look more similar by comparing the pointer to NULL
in both cases.

MFC after: 3 months


181900 20-Aug-2008 thompsa

ifnet_setbyindex() is only used locally, go back to being static.


181892 20-Aug-2008 kmacy

Fix build


181887 20-Aug-2008 julian

A bunch of formatting fixes brough to light by, or created by the Vimage commit
a few days ago.


181846 18-Aug-2008 jkim

- Make these files compilable on user land.
- Update copyrights and fix style(9).


181803 17-Aug-2008 bz

Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch


181795 17-Aug-2008 thompsa

LRO combined packets can actually be bridged as long as all the interfaces also
support TSO, this can always be disabled manually if undesirable.

Pointed out by: gallatin


181690 13-Aug-2008 ed

Change bpf(4) to use the cdevpriv API.

Right now the bpf(4) driver uses the cloning API to generate /dev/bpf%u.
When an application such as tcpdump needs a BPF, it opens /dev/bpf0,
/dev/bpf1, etc. until it opens the first available device node. We used
this approach, because our devfs implementation didn't allow
per-descriptor data.

Now that we can, make it use devfs_get_cdevpriv() to obtain the private
data. To remain compatible with the existing implementation, add a
symlink from /dev/bpf0 to /dev/bpf. I've already changed libpcap to
compile with HAVE_CLONING_BPF, which makes it use /dev/bpf. There may be
other applications in the base system (dhclient) that use the loop to
obtain a valid bpf.

Discussed on: src-committers
Approved by: csjp


181627 12-Aug-2008 vanhu

Increase statistic counters for enc0 interface when enabled
and processing IPSec traffic.

Approved by: gnn (mentor)
MFC after: 1 week


181138 01-Aug-2008 antoine

Make "1000baseT" the description and "1000baseTX" the alias for
IFM_1000_T instead of the reverse. It is possible FreeBSD doesn't
even support 1000baseTX.
This changes ifconfig(8) output.

Requested by: gavin@ and bms@
See also: http://docs.freebsd.org/cgi/mid.cgi?20050307191901.H32508


181137 01-Aug-2008 antoine

Remove trailing ';' in BPFD_LOCK_ASSERT macro.

MFC after: 1 month
X-MFC-to: stable/7, stable/6 has it right


181135 01-Aug-2008 csjp

Annotate why we do not call BPF_CHECK_DIRECTION() in this tapping routine.
There is no way for the caller to tell us which direction this packet is
going. With the bpf_mtap{2} routines, we can check the interface pointer.

MFC after: 2 weeks


181118 01-Aug-2008 rwatson

Remove further trailing white space.


181016 30-Jul-2008 jhb

Trim some noise from some #ifdef's. This had leaked into the compat32
support for bpf(4) due to hacks in the Y! tree for a truss32 binary
(since superseded by native support for 32-bit binaries in truss itself).

MFC after: 1 week


180840 27-Jul-2008 julian

Add the ability to add new addresses for interfacesto just one FIB
(Other more specific related options will follow)
This allows one to set multiple p2p links to the same place
and select which to use by having each in different FIBS.


180817 26-Jul-2008 trhodes

Fill in BPF sysctl descriptions.

Reviewed by: csjp


180639 20-Jul-2008 julian

Add support for actually sending WCCP return packets via GRE.
This MAY be combined by a clever person with the 'key' code recently
added, however a cursary glance suggest that it would be safer to just keep
the patches as it is unlikely that the two modes would be used together
and the separate patch has been extensively tested.

Obtained from: here and there
MFC after: 1 week


180515 14-Jul-2008 jkim

Allow injecting big packets via bpf(4) up to min(MTU, 16K-byte).

MFC after: 1 week


180511 14-Jul-2008 jfv

Add event notification at attach/detach so the NIC
is able to detect it and do hardware filtering.


180391 09-Jul-2008 rwatson

Rather than checking for a NULL so_pcb in raw_attach(), assert that
it's non-NULL, as all callers can and should already do the required
checking. Update comments a bit more to talk about rawcb allocation
for consumers.

Reviewed by: bz
MFC after: 3 weeks


180390 09-Jul-2008 rwatson

Add sysctl subtree net.raw for generic raw socket infrastructure;
expose default send and receive socket buffer sizes using sysctls
so that they can be administered centrally.

Reviewed by: bz
MFC after: 3 weeks


180385 09-Jul-2008 rwatson

Remove unused support for local and foreign addresses in generic raw
socket support. These utility routines are used only for routing and
pfkey sockets, neither of which have a notion of address, so were
required to mock up fake socket addresses to avoid connection
requirements for applications that did not specify their own fake
addresses (most of them).

Quite a bit of the removed code is #ifdef notdef, since raw sockets
don't support bind() or connect() in practice. Removing this
simplifies the raw socket implementation, and removes two (commented
out) uses of dtom(9).

Fake addresses passed to sendto(2) by applications are ignored for
compatibility reasons, but this is now done in a more consistent way
(and with a comment). Possibly, EINVAL could be returned here in
the future if it is determined that no applications depend on the
semantic inconsistency of specifying a destination address for a
protocol without address support, but this will require some amount
of careful surveying.

NB: This does not affect netinet, netinet6, or other wire protocol
raw sockets, which provide their own independent infrastructure with
control block address support specific to the protocol.

MFC after: 3 weeks
Reviewed by: bz


180337 07-Jul-2008 dwmalone

Add a new ioctl for changing the read filter (BIOCSETFNR). This is
just like BIOCSETF but it doesn't drop all the packets buffered on
the discriptor and reset the statistics.

Also, when setting the write filter, don't drop packets waiting to
be read or reset the statistics.

PR: 118486
Submitted by: Matthew Luckie <mluckie@cs.waikato.ac.nz>
MFC after: 1 month


180310 05-Jul-2008 csjp

Make sure we are clearing the ZBUF_FLAG_IMMUTABLE any time a free buffer
is reclaimed by the kernel. This fixes a bug resulted in the kernel
over writing packet data while user-space was still processing it when
zerocopy is enabled. (Or a panic if invariants was enabled).

Discussed with: rwatson


180307 05-Jul-2008 rwatson

Clarify comments and prototypes in raw_cb.h:

- the protosw entries are used directly
- the usrreq functions are library routines, generally wrapped by
consumers rather than being used directly
- the usrreq structure entries are likewise typically wrapped

Remove the rather incorrect #if 0'd pr_input_t prototype for raw_input.

MFC after: 3 days


180305 05-Jul-2008 rwatson

Improve approximation of style(9) in raw socket code.


180249 04-Jul-2008 thompsa

port % count will never be greater than LAGG_MAX_PORTS so nuke the test.


180239 04-Jul-2008 rwatson

Remove NETISR_MPSAFE, which allows specific netisr handlers to be directly
dispatched without Giant, and add NETISR_FORCEQUEUE, which allows specific
netisr handlers to always be dispatched via a queue (deferred). Mark the
usb and if_ppp netisr handlers as NETISR_FORCEQUEUE, and explicitly
acquire Giant in those handlers.

Previously, any netisr handler not marked NETISR_MPSAFE would necessarily
run deferred and with Giant acquired. This change removes Giant
scaffolding from the netisr infrastructure, but NETISR_FORCEQUEUE allows
non-MPSAFE handlers to continue to force deferred dispatch so as to avoid
lock order reversals between their acqusition of Giant and any calling
context.

It is likely we will be able to remove NETISR_FORCEQUEUE once
IFF_NEEDSGIANT is removed, as non-MPSAFE usb and if_ppp drivers will no
longer be supported.

Reviewed by: bz
MFC after: 1 month
X-MFC note: We can't remove NETISR_MPSAFE from stable/7 for KPI reasons,
but the rest can go back.


180220 03-Jul-2008 thompsa

Be smarter about disabling interface capabilities. TOE/TSO/TXCSUM will only be
disabled if one (or more) of the member interfaces does not support it. Always
turn off LRO since we can not bridge a combined frame.

Tested by: Stefan Lambrev


180140 01-Jul-2008 philip

Set bridge MAC addresses to the MAC address of their first interface unless
locally configured. This is more in line with the behaviour of other popular
bridging implementations and makes bridges more predictable after reboots for
example.

Reviewed by: thompsa
MFC after: 1 week


180094 29-Jun-2008 ed

Remove the unused softc from the lo(4) driver.

Now that the pseudo-interface cloner has an internal list of instances,
there is no need to create a softc. The softc only contains a pointer to
the ifp, which means there is no valid reason to keep it. While there,
remove the corresponding malloc-pool.

Approved by: philip (mentor)


180042 26-Jun-2008 rwatson

Introduce locking around use of ifindex_table, whose use was previously
unsynchronized. While races were extremely rare, we've now had a
couple of reports of panics in environments involving large numbers of
IPSEC tunnels being added very quickly on an active system.

- Add accessor functions ifnet_byindex(), ifaddr_byindex(),
ifdev_byindex() to replace existing accessor macros. These functions
now acquire the ifnet lock before derefencing the table.
- Add IFNET_WLOCK_ASSERT().
- Add static accessor functions ifnet_setbyindex(), ifdev_setbyindex(),
which set values in the table either asserting of acquiring the ifnet
lock.
- Use accessor functions throughout if.c to modify and read
ifindex_table.
- Rework ifnet attach/detach to lock around ifindex_table modification.

Note that these changes simply close races around use of ifindex_table,
and make no attempt to solve the probem of disappearing ifnets. Further
refinement of this work, including with respect to ifindex_table
resizing, is still required.

In a future change, the ifnet lock should be converted from a mutex to an
rwlock in order to reduce contention.

Reviewed and tested by: brooks


180041 26-Jun-2008 julian

change a variable name ot stop it from colliding with other names in
some situations. (i.e. in vimage)

MFC after: 1 week


179894 20-Jun-2008 thompsa

Add support for the optional key in the GRE header.

PR: kern/114714
Submitted by: Cristian KLEIN


179735 11-Jun-2008 jfv

Duh, wrong directory, needed to be in netinet


179734 11-Jun-2008 jfv

Add generic TCP LRO code, moved from the ixgbe driver into net


179726 11-Jun-2008 ed

Don't enforce unique device minor number policy anymore.

Except for the case where we use the cloner library (clone_create() and
friends), there is no reason to enforce a unique device minor number
policy. There are various drivers in the source tree that allocate unr
pools and such to provide minor numbers, without using them themselves.

Because we still need to support unique device minor numbers for the
cloner library, introduce a new flag called D_NEEDMINOR. All cdevsw's
that are used in combination with the cloner library should be marked
with this flag to make the cloning work.

This means drivers can now freely use si_drv0 to store their own flags
and state, making it effectively the same as si_drv1 and si_drv2. We
still keep the minor() and dev2unit() routines around to make drivers
happy.

The NTFS code also used the minor number in its hash table. We should
not do this anymore. If the si_drv0 field would be changed, it would no
longer end up in the same list.

Approved by: philip (mentor)


179426 30-May-2008 qingli

When RADIX_MPATH is enabled, the route selection is not rotating
through the multipath entries. The hash value was a signed integer
and was always giving a -1 value.

PR: 123991
Submitted by: Barrett Lyon


179066 17-May-2008 brooks

The if_check() function performed three actions:
- verified that the ifp->if_snd.ifq_mtx was initalized for
all attached interfaces. This was pointless because it was
initalized for all interfaces in if_attach() so I've removed it.
- Checked that ifp->if_snd.ifq_maxlen is initalized and set it to
ifqmaxlen if unset. This makes more sense in if_attach() so
I moved it there.
- The first call of if_slowtimo(). Delete if_check() and call
if_slowtimo() directly from the SYSINIT().


179036 16-May-2008 scf

Spelling and capitalization fixes.

MFC after: 3 days


178920 10-May-2008 antoine

Add missing braces in #if 0ed code.

Approved by: rwatson (mentor)
MFC after: 1 month


178898 10-May-2008 julian

move a #define from a place it shouldn't have been to a place it should
have been. Basically my testign didn't ocver one case that this broke.
thanks tinderbox!


178897 10-May-2008 julian

undef MAXFIBS before redefining it


178888 09-May-2008 julian

Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:

-----

One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.

Constraints:
------------

I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.

One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".

One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.

This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.

To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.

The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.

The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.

In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.

One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).

You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.

This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.

Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.

Packets fall into one of a number of classes.

1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..

setfib -3 ping target.example.com # will use fib 3 for ping.

It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.

2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)

3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).

4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.

5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.

6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.

Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)

In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.

In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.

Early testing experience:
-------------------------

Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.

For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.

Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.

ipfw has grown 2 new keywords:

setfib N ip from anay to any
count ip from any to any fib N

In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.

SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.

Where to next:
--------------------

After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.

Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.

My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.

When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.

Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.

This work was sponsored by Ironport Systems/Cisco

Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco


178883 09-May-2008 rwatson

Trim trailing whitespace at ends of lines.


178882 09-May-2008 jhb

Set D_TRACKCLOSE to avoid a race in devfs that could lead to orphaned bpf
devices never getting fully closed.

MFC after: 3 days


178674 29-Apr-2008 julian

Add an option (compiled out by default)
to profile outoing packets for a number of mbuf chain
related parameters
e.g. number of mbufs, wasted space.
probably will do with further work later.

Reviewed by: various


178639 28-Apr-2008 jkim

Check packet directions more properly instead of just checking received
interface is null.

PR: kern/123138
Submitted by: Dmitry (hanabana at mail dot ru)
MFC after: 1 week


178454 24-Apr-2008 qingli

In function rtalloc_mpath(), do not try to release the lock if the ro_rt
pointer is NULL.

Reported by: (pluknet at gmail dot com)


178333 20-Apr-2008 antoine

Move "1000baseT" from IFM_SUBTYPE_ETHERNET_DESCRIPTIONS to
IFM_SUBTYPE_ETHERNET_ALIASES: there is already "1000baseTX" in
IFM_SUBTYPE_ETHERNET_DESCRIPTIONS. This doesn't change ifconfig
behaviour.

PR: 45793 (maybe)
Approved by: rwatson (mentor)
MFC after: 1 month


178323 19-Apr-2008 brooks

Delay the global registration of the struct ifnet in if_alloc() until after
we're certain the allocation will entierly succeed. This fixes a leak in a
fairly unlikely case.

Reported by: vijay singh <vijjus at rocketmail dot com>
MFC after: 1 week


178223 15-Apr-2008 jkim

Revert the previous commit and use M_PROMISC flag instead.
It is safer because it will never be used for outgoing packets.


178221 15-Apr-2008 emax

Fix possible buffer overrun on 64-bit arch when generating MAC
address for tap interface.

Reported by: Marc Lorner < marc dot loerner at hob dot de >
Reviewed by: bms
MFC after: 3 days


178208 15-Apr-2008 jkim

Remove M_SKIP_FIREWALL abuse and add more appropriate check.

Pointyhat to: jkim
Reported by: Eugene Grosbein (eugen at kuzbass dot ru)
MFC after: 3 days


178187 13-Apr-2008 qingli

Make this file compile on IPv6 kernels.


178183 13-Apr-2008 phk

Make this compile also on non-IPv6 kernels.


178176 13-Apr-2008 bz

Fix the build in case RADIX_MPATH is not defined.


178168 13-Apr-2008 qingli

These files handle the radix tree for the ECMP routes.
The original code from KAME did not take care of address
aliases or multiple ip addresses that have the same
prefix.

Reviewed by: rwatson, gnn, sam, kmacy, julian


178167 13-Apr-2008 qingli

This patch provides the back end support for equal-cost multi-path
(ECMP) for both IPv4 and IPv6. Previously, multipath route insertion
is disallowed. For example,

route add -net 192.103.54.0/24 10.9.44.1
route add -net 192.103.54.0/24 10.9.44.2

The second route insertion will trigger an error message of
"add net 192.103.54.0/24: gateway 10.2.5.2: route already in table"

Multiple default routes can also be inserted. Here is the netstat
output:

default 10.2.5.1 UGS 0 3074 bge0 =>
default 10.2.5.2 UGS 0 0 bge0

When multipath routes exist, the "route delete" command requires
a specific gateway to be specified or else an error message would
be displayed. For example,

route delete default

would fail and trigger the following error message:

"route: writing to routing socket: No such process"
"delete net default: not in table"

On the other hand,

route delete default 10.2.5.2

would be successful: "delete net default: gateway 10.2.5.2"

One does not have to specify a gateway if there is only a single
route for a particular destination.

I need to perform more testings on address aliases and multiple
interfaces that have the same IP prefixes. This patch as it
stands today is not yet ready for prime time. Therefore, the ECMP
code fragments are fully guarded by the RADIX_MPATH macro.
Include the "options RADIX_MPATH" in the kernel configuration
to enable this feature.

Reviewed by: robert, sam, gnn, julian, kmacy


177966 07-Apr-2008 rwatson

Maintain and observe a ZBUF_FLAG_IMMUTABLE flag on zero-copy BPF
buffer kernel descriptors, which is used to allow the buffer
currently in the BPF "store" position to be assigned to userspace
when it fills, even if userspace hasn't acknowledged the buffer
in the "hold" position yet. To implement this, notify the buffer
model when a buffer becomes full, and check that the store buffer
is writable, not just for it being full, before trying to append
new packet data. Shared memory buffers will be assigned to
userspace at most once per fill, be it in the store or in the
hold position.

This removes the restriction that at most one shared memory can
by owned by userspace, reducing the chances that userspace will
need to call select() after acknowledging one buffer in order to
wait for the next buffer when under high load. This more fully
realizes the goal of zero system calls in order to process a
high-speed packet stream from BPF.

Update bpf.4 to reflect that both buffers may be owned by userspace
at once; caution against assuming this.


177965 07-Apr-2008 rwatson

Coerce if_loop.c in the general direction of style(9):

- Use ANSI function declarations
- Remove use of 'register' keyword
- Prefer style(9) return parens, white space

MFC after: 1 month


177669 27-Mar-2008 iedowse

Add IFF_NEEDSGIANT to IFF_CANTCHANGE, to prevent user-level code
from clearing the IFF_NEEDSGIANT flag on Giant-locked interfaces.
In particular, wpa_supplicant was doing this on USB interfaces,
causing panics when Giant-locked code was then called without Giant.

Submitted by: Alexey Popov
Reviewed by: rwatson
MFC after: 3 days


177647 26-Mar-2008 rwatson

Add a comment explaining that we initialize the 'a' buffer for
zero-copy to the store buffer position on the BPF descriptor,
and the 'b' buffer as the free buffer in order to fill them in
the order documented in bpf(4).

MFC after: 4 months
Suggested by: csjp


177617 25-Mar-2008 sam

expose if_purgemaddrs, it will be used by the vap code unless someone
redesigns the mcast support code in the next few weeks

MFC after: 3 weeks


177616 25-Mar-2008 sam

IFM_IEEE80211_IBSSMASTER hasn't been used in many years; replace it
with IFM_IEEE80211_WDS which will be used by the forthcoming vap code

MFC after: 3 weeks


177599 25-Mar-2008 ru

Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT.
Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true
since the advent of MBUMA.

Reviewed by: arch

There are ongoing disputes as to whether we want to switch to directly using
UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.


177596 25-Mar-2008 rwatson

Check for a NULL free buffer pointer in BPF before invoking
bpf_canfreebuf() in order to avoid potentially calling a non-inlinable
but trivial function in zero-copy buffer mode for every packet
received when we couldn't free the buffer anyway.

MFC after: 4 months


177585 24-Mar-2008 jkim

Fix build with option BPF_JITTER.


177584 24-Mar-2008 jkim

Remove redundant inclusions of net/bpfdesc.h.


177548 24-Mar-2008 csjp

Introduce support for zero-copy BPF buffering, which reduces the
overhead of packet capture by allowing a user process to directly "loan"
buffer memory to the kernel rather than using read(2) to explicitly copy
data from kernel address space.

The user process will issue new BPF ioctls to set the shared memory
buffer mode and provide pointers to buffers and their size. The kernel
then wires and maps the pages into kernel address space using sf_buf(9),
which on supporting architectures will use the direct map region. The
current "buffered" access mode remains the default, and support for
zero-copy buffers must, for the time being, be explicitly enabled using
a sysctl for the kernel to accept requests to use it.

The kernel and user process synchronize use of the buffers with atomic
operations, avoiding the need for system calls under load; the user
process may use select()/poll()/kqueue() to manage blocking while
waiting for network data if the user process is able to consume data
faster than the kernel generates it. Patchs to libpcap are available
to allow libpcap applications to transparently take advantage of this
support. Detailed information on the new API may be found in bpf(4),
including specific atomic operations and memory barriers required to
synchronize buffer use safely.

These changes modify the base BPF implementation to (roughly) abstrac
the current buffer model, allowing the new shared memory model to be
added, and add new monitoring statistics for netstat to print. The
implementation, with the exception of some monitoring hanges that break
the netstat monitoring ABI for BPF, will be MFC'd.

Zerocopy bpf buffers are still considered experimental are disabled
by default. To experiment with this new facility, adjust the
net.bpf.zerocopy_enable sysctl variable to 1.

Changes to libpcap will be made available as a patch for the time being,
and further refinements to the implementation are expected.

Sponsored by: Seccuris Inc.
In collaboration with: rwatson
Tested by: pwood, gallatin
MFC after: 4 months [1]

[1] Certain portions will probably not be MFCed, specifically things
that can break the monitoring ABI.


177436 20-Mar-2008 kmacy

back out last change as Sam believes that it breaks multicast - need to revisit after following up with pyun


177433 20-Mar-2008 kmacy

Don't re-initialize the interface if it is already running.

This one line change makes the following code found in many ethernet device drivers
(at least em, igb, ixgbe, and cxgb) gratuitous

case SIOCSIFADDR:
if (ifa->ifa_addr->sa_family == AF_INET) {
/*
* XXX
* Since resetting hardware takes a very long time
* and results in link renegotiation we only
* initialize the hardware only when it is absolutely
* required.
*/
ifp->if_flags |= IFF_UP;
if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
EM_CORE_LOCK(adapter);
em_init_locked(adapter);
EM_CORE_UNLOCK(adapter);
}
arp_ifinit(ifp, ifa);
} else
error = ether_ioctl(ifp, command, data);
break;


177416 19-Mar-2008 julian

Replace really convoluted code that simplifies to "a ^= 0x01;"


177289 17-Mar-2008 thompsa

Remove extra semicolons.

Pointed out by: antoine


177274 16-Mar-2008 thompsa

Switch the LACP state machine over to its own mutex to protect the internals,
this means that it no longer grabs the lagg rwlock. Use two port table arrays
which list the active ports for Tx and switch between them with an atomic op.
Now the lagg rwlock is only exclusively locked for management (ioctls) and
queuing of lacp control frames isnt needed.


177253 16-Mar-2008 rwatson

In keeping with style(9)'s recommendations on macros, use a ';'
after each SYSINIT() macro invocation. This makes a number of
lightweight C parsers much happier with the FreeBSD kernel
source, including cflow's prcc and lxr.

MFC after: 1 month
Discussed with: imp, rink


177003 09-Mar-2008 rwatson

Improve convergence of bpf_filter.c toward style(9).

MFC after: 3 weeks
Submitted by: csjp


176906 07-Mar-2008 rwatson

Move IFF_NEEDSGIANT warning from if_ethersubr.c to if.c so it is displayed
for all network interfaces, not just ethernet-like ones.

Upgrade it to a louder WARNING and be explicit that the flag is obsolete.
Support for IFF_NEEDSGIANT will be removed in a few months (see arch@ for
details) and will not appear in 8.0.

Upgrade if_watchdog to a WARNING.


176879 06-Mar-2008 thompsa

Improve EtherIP interaction with the bridge
- Set M_BCAST|M_MCAST for incoming frames
- Send the frame to a local interface if the bridge returns the mbuf

Submitted by: Eugene Grosbein
Tested by: Boris Kochergin


176244 13-Feb-2008 jhb

Use RTFREE_LOCKED() instead of rtfree() when releasing a reference on the
'rt' route in rtredirect() as 'rt' is always locked.

MFC after: 1 week
PR: kern/117913
Submitted by: Stefan Lambrev stefan.lambrev of moneybookers.com


175903 02-Feb-2008 rwatson

Add comment that bpfread() has multi-threading issues.

Fix minor white space nit.


175432 18-Jan-2008 thompsa

Remove a chunk of duplicated code, test the destination address against the
bridge the same way we check member interfaces.


175419 18-Jan-2008 thompsa

IEEE 802.1D-2004 states, frames containing any of the group MAC Addresses
specified in Table 7-10 in their destination address field shall not be relayed
by the Bridge. Add a check in bridge_forward() to adhere to this.

PR: kern/119744


175396 17-Jan-2008 thompsa

Sync from OpenBSD r1.118, nuke clause 3 & 4.


175018 31-Dec-2007 rwatson

Update netisr comment for the SMPng world order: netisr is no longer
implemented using the ISR facility, and cannot be triggered by calling
splnet()/splx().

MFC after: 3 weeks


175005 31-Dec-2007 thompsa

Pass any unmatched slowprotocols frames up the stack instead of dropping them,
there are more subtypes than just LACP.


174934 27-Dec-2007 mux

Add a workaround for a deadlock between the rt_setgate() and rt_check()
functions. It is easily triggered by running routed, and, I expect, by
running any other daemon that uses routing sockets.

Reviewed by: net@
MFC after: 1 week


174913 26-Dec-2007 thompsa

Fix a panic where if the mbuf was consumed by the filter for requeueing
(dummynet), ipsec_filter() would return the empty error code and the ipsec code
would continue to forward/deference the null mbuf.

Found by: m0n0wall
Reviewed by: bz
MFC after: 3 days


174895 25-Dec-2007 rwatson

Use __FBSDID() in the kernel BPF implementation.

MFC after: 3 days


174876 23-Dec-2007 rwatson

Remove trailing whitespace from lines in BPF.

MFC after: 3 days


174749 18-Dec-2007 thompsa

Simplify the error handling and use the dereferenced sc->sc_ifp pointer.


174746 18-Dec-2007 thompsa

When the bridge has an address and a packet comes in for it then drop it if the
link has been marked discarding by Spanning Tree. This would cause the bridge
to see duplicate packets to itself even if STP has correctly calculated the
topology and blocked redundant links.

Reported by: trasz
Tested by: trasz
MFC after: 3 days


174742 18-Dec-2007 thompsa

- Use the macro to check the port status has it will also test if its
administratively down (!IFF_UP)
- Use the same parameters to lagg_link_active() to get the backup port as in
the output path, this didnt actually matter in practice as sc_primary is
always the first on the port list.

MFC after: 3 days


174721 17-Dec-2007 thompsa

Add myself to the copyright.


174703 17-Dec-2007 kmacy

widen the routing event interface (arp update, redirect, and eventually pmtu change)
into separate functions

revert previous commit's changes to arpresolve and add a new interface
arpresolve2 which does arp resolution without an mbuf


174628 15-Dec-2007 kmacy

fix bonehead cut and paste error in last commit


174625 15-Dec-2007 kmacy

Create separate capability flags for TCP over IPv4 and TCP over IPv6


174624 15-Dec-2007 kmacy

add interface capability for TOE


174559 12-Dec-2007 kmacy

add interface for allowing consumers to register for ARP updates,
redirects, and path MTU changes

Reviewed by: silby


174505 10-Dec-2007 sam

Wake On Lan (WOL) infrastructure

Submitted by: Stefan Sperling <stsp@stsp.name>
Reviewed by: brooks


174493 09-Dec-2007 thompsa

Fix spelling.

Obtained from: OpenBSD


174388 07-Dec-2007 kmacy

Add padding for anticipated functionality
- vimage
- TOE
- multiq
- host rtentry caching

Rename spare used by 80211 to if_llsoftc

Reviewed by: rwatson, gnn
MFC after: 1 day


174374 06-Dec-2007 julian

No need to assert that a == b when we just set a = b.


174278 05-Dec-2007 thompsa

Support monitor mode where the frame is discarded after bpf and stats processing.


174054 28-Nov-2007 bz

Add sysctls to if_enc(4) to control whether the firewalls or
bpf will see inner and outer headers or just inner or outer
headers for incoming and outgoing IPsec packets.

This is useful in bpf to not have over long lines for debugging
or selcting packets based on the inner headers.
It also properly defines the behavior of what the firewalls see.

Last but not least it gives you if_enc(4) for IPv6 as well.

[ As some auxiliary state was not available in the later
input path we save it in the tdbi. That way tcpdump can give a
consistent view of either of (authentic,confidential) for both
before and after states. ]

Discussed with: thompsa (2007-04-25, basic idea of unifying paths)
Reviewed by: thompsa, gnn


173904 25-Nov-2007 mlaier

pfil(9) locking take 3: Switch to rmlock(9)
This has the benefit that rmlocks have proper support for reader recursion
(in contrast to rwlock(9) which could potential lead to writer stravation).
It also means a significant performance gain, eventhough only visible in
microbenchmarks at the moment.

Discussed on: -arch, -net


173895 25-Nov-2007 thompsa

Have the lagg interface generate link up/down events, the interface is marked
as up if at least one of its ports also has a link up. This fixes using
carp+lagg together and any other system that relies on linkstate events.

PR: kern/113956
MFC after: 3 days


173804 21-Nov-2007 thompsa

Use the safer callout_init_rw() to allow the softclock to grab the
rwlock for us.


173399 06-Nov-2007 oleg

1) dummynet_io() declaration has changed.
2) Alter packet flow inside dummynet: allow certain packets to bypass
dummynet scheduler. Benefits are:

- lower latency: if packet flow does not exceed pipe bandwidth, packets
will not be (up to tick) delayed (due to dummynet's scheduler granularity).
- lower overhead: if packet avoids dummynet scheduler it shouldn't reenter ip
stack later. Such packets can be fastforwarded.
- recursion (which can lead to kernel stack exhaution) eliminated. This fix
long existed panic, which can be triggered this way:
kldload dummynet
sysctl net.inet.ip.fw.one_pass=0
ipfw pipe 1 config bw 0
for i in `jot 30`; do ipfw add 1 pipe 1 icmp from any to any; done
ping -c 1 localhost

3) Three new sysctl nodes are added:
net.inet.ip.dummynet.io_pkt - packets passed to dummynet
net.inet.ip.dummynet.io_pkt_fast - packets avoided dummynet scheduler
net.inet.ip.dummynet.io_pkt_drop - packets dropped by dummynet

P.S. Above comments are true only for layer 3 packets. Layer 2 packet flow
is not changed yet.

MFC after: 3 month


173320 04-Nov-2007 thompsa

Add an option to limit the number of source MACs that can be behind a bridge
interface. Once the limit is reached packets with unknown source addresses are
dropped until an existing host cache entry expires or is removed. Useful to
use with the STICKY cache option.

Sponsored by: miniSuperHappyDevHouse NZ


173076 27-Oct-2007 yar

Add a comment explaining why disc(4) bears the IFF_LOOPBACK flag.
It should be the final follow-up to an old yet unfinished discussion
on whether IFF_LOOPBACK is necessary for disc(4) and why.


173074 27-Oct-2007 yar

if_loop doesn't need to keep the list of lo(4) interfaces. Today
a private softc list is needed neither for tracking clones in general
nor for destroying all clones before the module unload -- if_clone
takes care of all that. (Note that some other interface drivers do
need a softc list to be able to scan it for their private purposes.)


172930 24-Oct-2007 rwatson

Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

mac_<object>_<method/action>
mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


172885 22-Oct-2007 jhb

Close a race when trying to lookup a gateway route in rt_check().
Specifically, if two threads were doing concurrent lookups and the existing
gateway was marked down, the the first thread would drop a reference on the
gateway route and then unlock the "root" route while it tried to allocate
a new route. The second thread could then also drop a reference on the
same gateway route resulting in a reference underflow. Fix this by
clearing the gateway route pointer after dropping the reference count but
before dropping the lock. Secondly, in this same case, the second thread
would overwrite the gateway route pointer w/o free'ing a reference to the
route installed by the first thread. In practice this would probably just
fix a lost reference that would result in a route never being freed.

This fixes panics observed in rt_check() and rtexpunge().

MFC after: 1 week
PR: kern/112490
Insight from: mehuljv at yahoo.com
Reviewed by: ru (found the "not-setting it to NULL" part)
Tested by: several


172851 21-Oct-2007 mlaier

Additions from libpcap 0.9.8 unbreak the build.

Pointy hat to: mlaier
X-MFC after: RELENG_7 buildworld


172825 20-Oct-2007 thompsa

Use ETHER_BPF_MTAP so that the vlan tags are visible to bpf(4) when stacked
under a vlan.

MFC after: 3 days


172824 20-Oct-2007 thompsa

Use ETHER_BPF_MTAP so that the vlan tags are visible to bpf(4) when bridging a
vlan trunk.

Discussed with: csjp
MFC after: 3 days


172777 18-Oct-2007 thompsa

Use a uint16_t type for the vlan tag rather an int.


172770 18-Oct-2007 thompsa

The bridging output function puts the mbuf directly on the interfaces send
queue so the output network card must support the same tagging mechanism as
how the frame was input (prepended Ethernet header tag or stripped HW mflag).

Now the vlan Ethernet header is _always_ stripped in ether_input and the mbuf
flagged, only only network cards with VLAN_HWTAGGING enabled would properly
re-tag any outgoing vlan frames.

If the outgoing interface does not support hardware tagging then readd the vlan
header to the front of the frame. Move the common vlan encapsulation in to
ether_vlanencap().

Reported by: Erik Osterholm, Jon Otterholm
MFC after: 1 week


172582 12-Oct-2007 csjp

Make sure that we refresh the PID on read(2) and write(2) operations.
This fixes the process portion of the bpf(4) stats if the peer forks
into the background after it's opened the descriptor. This bug
results in the following behavior for netstat -B:

# netstat -B
Pid Netif Flags Recv Drop Match Sblen Hblen Command
netstat: kern.proc.pid failed: No such process
78023 em0 p--s-- 2237404 43119 2237404 13986 0 ??????

MFC after: 1 week


172554 12-Oct-2007 thompsa

Fix two panics in lagg.

1. The locking was changed to shared but roundrobin mode still updated a
pointer in the softc with the next tx interface to use. This will panic
under high load. Change this to an atomically incremented sequence number in
order to choose the tx port in round robin.

2. IFQ_HANDOFF will free the mbuf if the queue is full, this will then be freed
again by lagg_start() and panic. Reorganised the error handling and freeing
to fix this.

MFC after: 3 days


172307 23-Sep-2007 csjp

Certain consumers of rtalloc like gif(4) and if_stf(4) lookup the
route and once they are done with it, call rtfree(). rtfree() should
only be used when we are certain we hold the last reference to the
route. This bug results in console messages like the following:

rtfree: 0xc40f7000 has 1 refs

This patch switches the rtfree() to use RTFREE_LOCKED() instead,
which should handle the reference counting on the route better.

Approved by: re@ (gnn)
Reviewed by: bms
Reported by: many via net@ and current@
Tested by: many


172223 18-Sep-2007 sam

remove IFM_IEEE80211_HT40PLUS and IFM_IEEE80211_HT40MINUS; they
never got used so nuke 'em before we branch

Approved by: re (blanket wireless)


172201 16-Sep-2007 thompsa

Allow additional packet filtering on the physical interface for locally
destined packets, disabled by default.

PR: kern/116051
Submitted by: Eygene Ryabinkin
Approved by: re (bmah)
MFC after: 2 weeks


172170 14-Sep-2007 julian

Remove DIAG code that discards oversized packets.
There has been general consensus that this was a bad idea/

Approved by: re (bmah)


172154 13-Sep-2007 dwmalone

Make the type of the memory used by the BPF filter unsigned, so it
matches the BPF registers (which are the only thing that is assigned
to/from BPF memory). This is a pedantic change that shouldn't change
any behaviour.

PR: 115931
Submitted by: Matthew Luckie <mjl@luckie.org.nz>
Approved by: re (bmah)
MFC after: 3 weeks


172108 10-Sep-2007 thompsa

Check for multicast destination on bpf injected packets and update the M_*CAST
flags, the absense of these flags causes problems in other areas such as
bridging which expect them to be correct.

At the moment only Ethernet DLTs are checked.

Reviewed by: bms, csjp, sam
Approved by: re (bmah)


172092 08-Sep-2007 cognet

Do not set the RTF_GATEWAY flag if RTF_LLINFO is set, it doesn't make much
sense in that context, and leads to unusable routes.
This should unbreak bootpd.

Discussed with: glebius
Submitted by: bms
Approved by: re (bmah)


172020 30-Aug-2007 thompsa

Show the ACTIVE flag in ifconfig for the single interface that is actaully
active in failover mode rather than all interfaces with a link. This makes it
clear if the master interface is in use or one of the backup links.

Found by: Writing the Handbook section
Approved by: re (kensmith)


171886 18-Aug-2007 thompsa

If the STP state machine is stopped then clear the bridge-id and root-id.

Approved by: re (kensmith)


171744 06-Aug-2007 rwatson

Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which
previously conditionally acquired Giant based on debug.mpsafenet. As that
has now been removed, they are no longer required. Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.

While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option. Clean up some related gotos for
consistency.

Reviewed by: bz, csjp
Tested by: kris
Approved by: re (kensmith)


171724 04-Aug-2007 thompsa

- Ensure the path cost does not exceed 65535 in legacy STP mode.
- If the path cost is calculated when the link is down, set a pending flag so
it is calculated again when it comes back up.
- To not use 00:00:00:00:00:00 as the bridge id, all interfaces are scanned and
the lowest number wins. All zeros is too low.

Approved by: re (rwatson)


171678 01-Aug-2007 thompsa

Add a bridge interface flag called PRIVATE where any private port can not
communicate with another private port.

All unicast/broadcast/multicast layer2 traffic is blocked so it works much the
same way as using firewall rules but scales better and is generally easier as
firewall packages usually do not allow ARP blocking.

An example usage would be having a number of customers on separate vlans
bridged with a server network. All the vlans are marked private, they can all
communicate with the server network unhindered, but can not exchange any
traffic whatsoever with each other.

Approved by: re (rwatson)


171661 30-Jul-2007 thompsa

- Propagate the largest set of interface capabilities supported by all lagg
ports to the lagg interface.
- Use the MTU from the first interface as the lagg MTU, all extra interfaces
must be the same.

This fixes using a lagg interface for a vlan or enabling jumbo frames, etc.

Approved by: re (kensmith)
MFC After: 3 days


171637 28-Jul-2007 rwatson

Replace references to NET_CALLOUT_MPSAFE with CALLOUT_MPSAFE, and remove
definition of NET_CALLOUT_MPSAFE, which is no longer required now that
debug.mpsafenet has been removed.

The once over: bz
Approved by: re (kensmith)


171613 27-Jul-2007 rwatson

First in a series of changes to remove the now-unused Giant compatibility
framework for non-MPSAFE network protocols:

- Remove debug_mpsafenet variable, sysctl, and tunable.
- Remove NET_NEEDS_GIANT() and associate SYSINITSs used by it to force
debug.mpsafenet=0 if non-MPSAFE protocols are compiled into the kernel.
- Remove logic to automatically flag interrupt handlers as non-MPSAFE if
debug.mpsafenet is set for an INTR_TYPE_NET handler.
- Remove logic to automatically flag netisr handlers as non-MPSAFE if
debug.mpsafenet is set.
- Remove references in a few subsystems, including NFS and Cronyx drivers,
which keyed off debug_mpsafenet to determine various aspects of their own
locking behavior.
- Convert NET_LOCK_GIANT(), NET_UNLOCK_GIANT(), and NET_ASSERT_GIANT into
no-op's, as their entire behavior was determined by the value in
debug_mpsafenet.
- Alias NET_CALLOUT_MPSAFE to CALLOUT_MPSAFE.

Many remaining references to NET_.*_GIANT() and NET_CALLOUT_MPSAFE are still
present in subsystems, and will be removed in followup commits.

Reviewed by: bz, jhb
Approved by: re (kensmith)


171603 26-Jul-2007 thompsa

Avoid holding the softc lock when using copyout().

Reported by: dfr
Approved by: re (rwatson)


171247 05-Jul-2007 thompsa

Allow the LACP state to be queried from userland which at the moment is the
actor and partner peer info. Print out the active aggregator and per port data
in verbose mode from ifconfig.

Approved by: re (mux)


171173 03-Jul-2007 mlaier

Link pf 4.1 to the build:
- move ftp-proxy from libexec to usr.sbin
- add tftp-proxy
- new altq mtag link

Approved by: re (kensmith)


171167 03-Jul-2007 gnn

Commit the change from FAST_IPSEC to IPSEC. The FAST_IPSEC
option is now deprecated, as well as the KAME IPsec code.
What was FAST_IPSEC is now IPSEC.

Approved by: re
Sponsored by: Secure Computing


171157 02-Jul-2007 rwatson

Continue pre-7.0 privilege cleanup: update suser(9) comments to be priv(9)
comments.

Approved by: re (bmah)


171056 26-Jun-2007 rwatson

Sync comments to code: we now use priv_check() rather than suser() to
determine privilege.

Approved by: re (bmah)


170995 22-Jun-2007 thompsa

Check the correct port to see if synced is true.

PR: misc/113958
Submitted by: Aaron Needles
Approved by: re (bmah)
MFC after: 1 week


170896 17-Jun-2007 csjp

Silence some gcc 4 warnings. It is expected that the bpf_movein() routine
will intialize the the header length and re-initialize the mbuf pointer
to reference the mbuf that is allocated after moving user supplied packet
data in.


170749 15-Jun-2007 csjp

- Conditionally pickup Giant around the network interface
ioctl routines if we are running with !mpsafenet
- Change un-conditional Giant acquisition around ifpromisc
to occur only if we are running with !mpsafenet

With these locking bits in place, we can now remove the Giant
requirement from BPF, so drop the D_NEEDGIANT device flag.
This change removes Giant acquisitions around BPF device
handlers (read, write, ioctl etc).

MFC after: 1 month
Discussed with: rwatson


170681 13-Jun-2007 thompsa

Add the vlan tag to the bridge route table. This allows a vlan trunk to be
bridged, previously legitimate traffic was not passed as the bridge could not
tell that it was on a different Ethernet segment.

All non-tagged traffic is treated as vlan1 as per IEEE 802.1Q-2003


170664 13-Jun-2007 rwatson

Remove IPX over IP tunneling support, which allows IPX routing over IP
tunnels, and was not MPSAFE. The code can be easily restored in the
event that someone with an IPX over IP tunnel configuration can work
with me to test patches.

This removes one of five remaining consumers of NET_NEEDS_GIANT.

Approved by: re (kensmith)


170632 12-Jun-2007 gallatin

Use if_capenable to allow LRO enabled drivers to bypass
the MTU check in ether_input().


170599 12-Jun-2007 thompsa

non-functional cleanup
- remove dead code
- use consistent variable names
- gc unused defines
- whitespace cleanup


170576 11-Jun-2007 andre

Add IFCAP_LRO flag for drivers to announce their TCP Large Receive Offload
capabilities.


170567 11-Jun-2007 gallatin

Move the oversize ethernet frame size check into DIAGNOSTIC,
as was proposed when it was originally added. This allows
LRO to work on non-DIAGNOSTIC kernels without consuming
any mbuf flags.

Discussed with: sam


170565 11-Jun-2007 gallatin

Back out the previous commit which added an M_LRO mbuf flag
to defeat the mtu check in ether_input. Mbuf flags are too scarce.
Discussed with: sam


170560 11-Jun-2007 gallatin

Allow drivers, such as cxgb and mxge, which support LRO to bypass
the MTU check in ether_input() on LRO merged frames.

Discussed with: kmacy


170557 11-Jun-2007 phk

Add missing \n to printf


170530 11-Jun-2007 sam

Update 802.11 wireless support:
o major overhaul of the way channels are handled: channels are now
fully enumerated and uniquely identify the operating characteristics;
these changes are visible to user applications which require changes
o make scanning support independent of the state machine to enable
background scanning and roaming
o move scanning support into loadable modules based on the operating
mode to enable different policies and reduce the memory footprint
on systems w/ constrained resources
o add background scanning in station mode (no support for adhoc/ibss
mode yet)
o significantly speedup sta mode scanning with a variety of techniques
o add roaming support when background scanning is supported; for now
we use a simple algorithm to trigger a roam: we threshold the rssi
and tx rate, if either drops too low we try to roam to a new ap
o add tx fragmentation support
o add first cut at 802.11n support: this code works with forthcoming
drivers but is incomplete; it's included now to establish a baseline
for other drivers to be developed and for user applications
o adjust max_linkhdr et. al. to reflect 802.11 requirements; this eliminates
prepending mbufs for traffic generated locally
o add support for Atheros protocol extensions; mainly the fast frames
encapsulation (note this can be used with any card that can tx+rx
large frames correctly)
o add sta support for ap's that beacon both WPA1+2 support
o change all data types from bsd-style to posix-style
o propagate noise floor data from drivers to net80211 and on to user apps
o correct various issues in the sta mode state machine related to handling
authentication and association failures
o enable the addition of sta mode power save support for drivers that need
net80211 support (not in this commit)
o remove old WI compatibility ioctls (wicontrol is officially dead)
o change the data structures returned for get sta info and get scan
results so future additions will not break user apps
o fixed tx rate is now maintained internally as an ieee rate and not an
index into the rate set; this needs to be extended to deal with
multi-mode operation
o add extended channel specifications to radiotap to enable 11n sniffing

Drivers:
o ath: add support for bg scanning, tx fragmentation, fast frames,
dynamic turbo (lightly tested), 11n (sniffing only and needs
new hal)
o awi: compile tested only
o ndis: lightly tested
o ipw: lightly tested
o iwi: add support for bg scanning (well tested but may have some
rough edges)
o ral, ural, rum: add suppoort for bg scanning, calibrate rssi data
o wi: lightly tested

This work is based on contributions by Atheros, kmacy, sephe, thompsa,
mlaier, kevlo, and others. Much of the scanning work was supported by
Atheros. The 11n work was supported by Marvell.


170490 10-Jun-2007 mjacob

Cast the ioctl define to the type of the variable being switched on.


170432 08-Jun-2007 gallatin

Correct the definition of PFIL_HOOKED() so that it compares
the value of ph_nhooks to zero, not the address. This removes
extranious calls to pfil_run_hooks (and an rw lock) from the
network stack's critical path when no pfil hooks are active.

Reviewed by: csjp
Sponsored by: Myricom Inc.


170326 05-Jun-2007 simokawa

Remove GIANT_REQUIRED for upcoming changes in FireWire stack.


170311 05-Jun-2007 davidch

- Added a new Ethernet media type (2500BaseSX) to support BCM5708 controllers
which support a 2.5Gbps mode over fiber using next page extensions during
autonegotiation. Typically only found in blade systems which also include
a Broadcom 2.5Gbps capable switch.

MFC after: 2 weeks


170139 30-May-2007 thompsa

Remove a KASSERT intended to help the developer, the condition is no longer
valid since the span code was added.

PR: kern/113170
MFC after: 1 week


170097 29-May-2007 yar

Sync ether_ioctl() with ioctl(2) and ifnet.if_ioctl
as to the type of the command argument: int -> u_long.
These types have different widths in the 64-bit world.

Add a note to UPDATING because the change breaks KBI
on 64-bit platforms.

Discussed on: -net, -current
Reviewed by: bms, ru


169872 22-May-2007 glebius

Some minor cleanups:
- In rt_check() remove the senderr() macro and the "bad" label. They
used to simplify code, but now aren't.
- Remove extra RT_LOCK_ASSERT() in rt_setgate(). The RT_REMREF macro
does this.
- In rtfree() convert panics to KASSERTs.
- Strict the routing API: rtfree() should be called only in a case
when we are completely sure we've got the last reference on the
rtentry. In all other cases RTFREE_LOCKED() macro should be used.
If the reference isn't the last one spit out a warning printf.
Correct the only(?) case for this in rt_check().
- Fix typos in comments.


169783 20-May-2007 thompsa

- packets on the input interface were counted twice
- Use IFQ_HANDOFF instead of rolling our own


169741 19-May-2007 thompsa

Compare the partner system priority when choosing the aggregator.


169739 19-May-2007 thompsa

Implement the Marker Protocol. A marker frame is placed on the interface queue
of each port and any further packets are blocked, when the all the marker frames
have been returned to us from the remote network device then we can be sure
that all interface queues are empty.

This is needed when a port is added or removed from the aggregation since it
will affect the hash based distribution, if the queues are not empty then a
packet from an existing connection may be placed on a different interface and
arrive out of order. This was previously achieved by suppressing transmission for
1 second, now that there is an active feedback this timeout as been increased
to 3 seconds and used as a fallback.


169735 19-May-2007 rwatson

Check return value of m_pullup() in firewire_input().

CID: 2105
Found with: Coverity Prevent(tm)


169698 19-May-2007 thompsa

Fix a mbuf leak where sc_start fails or the protocol is none.


169688 18-May-2007 thompsa

Fix locking assert where we should hold the reader lock.


169619 16-May-2007 brooks

Update the comments on if_alloc(), if_free(), if_free_type(), and
if_attach.

Remove a comment about pre-3.0 network drivers from if_attach().

Be a bit more consistant about whitespace near comments.


169614 16-May-2007 brooks

The struct if_data members ifi_recvquota and ifi_xmitquota have been
unused for ages. Rename them to ifi_spare_char1 and ifi_spare_char2
respectively to indicate this face.


169583 15-May-2007 thompsa

Fix unused variable error with !INET6

Reported by: Artem Naluzhny, Frank Terhaar-Yonkers


169570 15-May-2007 thompsa

Feed ipv6 flowlabel to hash calculation.

Obtained from: NetBSD


169569 15-May-2007 thompsa

Change from a mutex to a read/write lock. This allows the tx port to be
selected simultaneously by multiple senders and transmit/receive is not
serialised between aggregated interfaces.


169529 13-May-2007 rwatson

Add prototypes for ether_aton_r() and ether_ntoa_r() missed in previous
commit.


169425 09-May-2007 gnn

Integrate the Camellia Block Cipher. For more information see RFC 4132
and its bibliography.

Submitted by: Tomoyuki Okazaki <okazaki at kick dot gr dot jp>
MFC after: 1 month


169340 07-May-2007 thompsa

- Correctly check if lp_ioctl is null
- Remove lagg_ether_purgemulti as its no longer needed
- Mark the interface as up if any ports are active rather than just the primary


169330 07-May-2007 thompsa

The purgemulti call is not needed since all the ports have already been detached.


169329 07-May-2007 thompsa

Call if_setlladdr() on the aggregation port from a taskqueue so the softc lock
is not held. The short delay between aggregating the port and setting the MAC
address is fine.


169328 07-May-2007 thompsa

Avoid touching various unsafe parts if the interface is disappearing.


169327 07-May-2007 thompsa

Change from using if_delmulti() to if_delmulti_ifma() as it simplifies the code
and is safe to use if the ifp has disappeared.

Suggested by: bms


169228 03-May-2007 thompsa

Fix flag descriptions.


169227 03-May-2007 thompsa

- Add a disabled state for ports that can not be aggregated
- Refine check for lacp links, set to disabled if not suitable


169207 02-May-2007 yar

Fix a couple of typos in a comment.


169204 02-May-2007 thompsa

Set the master flag on the right variable.


169203 02-May-2007 thompsa

Test for IFM_FDX rather than IFM_HDX as the half-duplex bit may not be set even
if the link is not full-duplex.


168793 17-Apr-2007 thompsa

Rename the trunk(4) driver to lagg(4) as it is too similar to vlan trunking.

The name trunk is misused as the networking term trunk means carrying multiple
VLANs over a single connection. The IEEE standard for link aggregation (802.3
section 3) does not talk about 'trunk' at all while it is used throughout IEEE
802.1Q in describing vlans.

The lagg(4) driver provides link aggregation, failover and fault tolerance.

Discussed on: current@


168639 12-Apr-2007 thompsa

Fix a case where the multicast addresses were not removed from some ports. The
first port to be removed from the trunk would free the multicast list so
subsequent removed ports didnt have their multicast addresses removed.


168573 10-Apr-2007 thompsa

Fix an uninitialized variable warning.


168561 10-Apr-2007 thompsa

Add the trunk(4) driver for providing link aggregation, failover and fault
tolerance. This driver allows aggregation of multiple network interfaces as
one virtual interface using a number of different protocols/algorithms.

failover - Sends traffic through the secondary port if the master becomes
inactive.
fec - Supports Cisco Fast EtherChannel.
lacp - Supports the IEEE 802.3ad Link Aggregation Control Protocol
(LACP) and the Marker Protocol.
loadbalance - Static loadbalancing using an outgoing hash.
roundrobin - Distributes outgoing traffic using a round-robin scheduler
through all active ports.

This code was obtained from OpenBSD and this also includes 802.3ad LACP support
from agr(4) in NetBSD.


167949 27-Mar-2007 glebius

Fix regression in rev. 1.140.

Reported by: Yuriy Tsibizov <Yuriy.Tsibizov gfk.ru>, bsam


167943 27-Mar-2007 bms

Fix a case where hardware removal of an interface caused an attempt to
announce an ll_ifma which has gone away. Add a KASSERT to catch regressions.

Bug found by: Tom Uffner


167897 26-Mar-2007 yar

Fix some statements in disc(4) and about it:
- ifnet is no more embedded in softc;
- the interface name is `disc', not `ds'.


167896 26-Mar-2007 yar

Give a hint that softc can contain many things besides ifp.


167894 26-Mar-2007 yar

We no longer embed ifnet in softc, and the pointer to ifnet
doesn't need to be first in softc now. (It was the whole
ifnet structure itself that needed to be first in the good
old days.) Fix the respective comment accordingly.

Add xrefs to ifnet(9) in some other comments while I'm here.

Pointed out by: thompsa


167892 26-Mar-2007 yar

Introduce a new toy interface, edsc(4). It's a discard interface
imitating an Ethernet device, so vlan(4) and if_bridge(4) can be
attached to it for testing and benchmarking purposes. Its source
can be an introduction to the anatomy of a network interface driver
due to its simplicity as well as to a bunch of comments in it.

(The rest of needed changes were in my previous commit, which got
interrupted in the middle. Alas, CVS commits are not atomic.)


167816 22-Mar-2007 bms

Fix a typo, and update a comment.

Submitted by: yar


167797 22-Mar-2007 glebius

When working on an RTM_CHANGE do the route editing in the following
sequence. First, if rt_ifa is going to be changed, then call
ifa_rtrequest(RTM_DELETE). Second, if gateway is going to be changed,
then call rt_setgate(). Third, change rt_ifa.

With this change we are able to change a link level route to a
gateway one, that wasn't possible before:

# ifconfig em0 192.168.22.1/24
# arp -s 192.168.22.99 00:11:22:33:44:55
# route change 192.168.22.99 192.168.22.199
# ping 192.168.22.99
db>

Reported by: avatar


167740 20-Mar-2007 bms

Make the m_pullup() diagnostic message compile-time conditional on DIAGNOSTIC.

Requested by: glebius


167732 20-Mar-2007 bms

Fix tinderbox; ng_ether needs to see if_findmulti().


167729 20-Mar-2007 bms

Implement reference counting for ifmultiaddr, in_multi, and in6_multi
structures. Detect when ifnet instances are detached from the network
stack and perform appropriate cleanup to prevent memory leaks.

This has been implemented in such a way as to be backwards ABI compatible.
Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti()
is unable to detect interface removal by design, as it performs searches
on structures which are removed with the interface.

With this architectural change, the panics FreeBSD users have experienced
with carp and pfsync should be resolved.

Obtained from: p4 branch bms_netdev
Reviewed by: andre
Sponsored by: Garance A Drosehn
Idea from: NetBSD
MFC after: 1 month


167725 19-Mar-2007 thompsa

etherbroadcastaddr is now unused.


167722 19-Mar-2007 thompsa

M_BCAST & M_MCAST are now set by ether_input before passing to the bridge.


167716 19-Mar-2007 bms

Clean up the ether_input() path by using the M_PROMISC flag.

Main points of this change:
* Drop frames immediately if the interface is not marked IFF_UP.
* Always trim off the frame checksum if present.
* Always use M_VLANTAG in preference to passing 802.1Q frames
to consumers.
* Use __func__ consistently for KASSERT().
* Use the M_PROMISC flag to detect situations where ether_input()
may reenter itself on the same call graph with the same mbuf which
was promiscuously received on behalf of subsystems such as
netgraph, carp, and vlan.
* 802.1P frames (that is, VLAN frames with an ID of 0) will now be
passed to layer 3 input paths.
* Deal with the special case for CARP in a sane way.

This is a significant rewrite of code on the critical path. Please report
any issues to me if they arise. Frames will now only pass through dummynet
if M_PROMISC is cleared, to avoid problems with re-entry.

The handling of CARP needs to be revisited architecturally. The M_PROMISC
flag may potentially be demoted to a link-layer flag only as it is in
NetBSD, where the idea originated.

Discussed on: net
Idea from: NetBSD
Reviewed by: yar
MFC after: 1 month


167713 19-Mar-2007 bms

Add a sysctl net.link.tap.up_on_open which defaults to zero; when it
is non-zero, tap(4) instances will be marked IFF_UP on attach.

PR: 110383
Requested by: Frank Behrens
MFC after: 2 weeks


167711 19-Mar-2007 yar

Now <net/if_arp.h> is unused here.


167708 19-Mar-2007 yar

Fix a nameless constant: 6 -> ETHER_ADDR_LEN

Tested with: md5(1)


167704 19-Mar-2007 yar

Now that this driver uses ether_ioctl(), it no longer needs
the INET related include files.


167683 18-Mar-2007 rik

Give a chance for packet to appear with a correct input interfaces
in case of multiple interfaces with the same MAC in the same bridge.
This commit do not solve the entire problem. Only case where packet
arrived from such interface.

PR: kern/109815
MFC after: 7 days
Submitted by: Eygene Ryabinkin and rik@
Discussed with: bms@, thompsa@, yar@


167601 15-Mar-2007 yar

Remove a spurious blank line at the start of vlan_growhash().
Add a diagnostic message to the function about resizing vlan
hash table.


167575 14-Mar-2007 thompsa

Properly move the setting of bstp_linkstate_p to the bridgestp module.


167559 14-Mar-2007 yar

Let vlan_ioctl() pass some work on to ether_ioctl()
and so reduce code duplication a bit.


167484 12-Mar-2007 yar

Emit load and unload messages under bootverbose.
This can help to spot bugs (which it did for me,)
and let people know which mode the vlan module is
actually using if they suspect it isn't picking its
options from the main kernel config file.


167483 12-Mar-2007 yar

Fix some minor issues in the internal vlan lists:

- ifv_list member of struct ifvlan is unneeded in array mode,
it's used only in hash mode to resolve hash collisions.

- We don't need the list of trunks at all. (The initial reason for
having it was to be able to destroy all trunks in the MOD_UNLOAD
handler, but a trunk is not to be destroyed forcibly -- it will
go away when all vlan interfaces on it have been deleted.
Note that if_clone_detach() called first of all under MOD_UNLOAD
will delete all vlan interfaces and thus make all trunks go away
quietly.)

- It's enough to use a single [S]LIST_FIRST() in a typical list
destruction loop.


167379 09-Mar-2007 thompsa

Change the passing of callbacks to a struct in case this needs to be extended in the future.


167290 07-Mar-2007 bms

Add Ethertype for 802.3ad LACP.


167126 28-Feb-2007 bms

Prepare for 802.1p:
Add macro EVL_APPLY_VLID() which may be used to apply an 802.1q VLAN ID
to the M_VLANTAG field in an mbuf packet header non-destructively.
This will be used by net80211 to begin with.

Add macro EVL_APPLY_PRI() which may be used to apply an 802.1p priority
class to the M_VLANTAG field in an mbuf packet header non-destructively.

Add other macros for manipulating tags and the CFI bit.

Submitted by: Boris Kovalenko (EVL_CFIOFTAG(), EVL_MAKETAG())


167035 26-Feb-2007 jkim

Add three new ioctl(2) commands for bpf(4).

- BIOCGDIRECTION and BIOCSDIRECTION get or set the setting determining
whether incoming, outgoing, or all packets on the interface should be
returned by BPF. Set to BPF_D_IN to see only incoming packets on the
interface. Set to BPF_D_INOUT to see packets originating locally and
remotely on the interface. Set to BPF_D_OUT to see only outgoing
packets on the interface. This setting is initialized to BPF_D_INOUT
by default. BIOCGSEESENT and BIOCSSEESENT are obsoleted by these but
kept for backward compatibility.

- BIOCFEEDBACK sets packet feedback mode. This allows injected packets
to be fed back as input to the interface when output via the interface is
successful. When BPF_D_INOUT direction is set, injected outgoing packet
is not returned by BPF to avoid duplication. This flag is initialized to
zero by default.

Note that libpcap has been modified to support BPF_D_OUT direction for
pcap_setdirection(3) and PCAP_D_OUT direction is functional now.

Reviewed by: rwatson


166916 23-Feb-2007 thompsa

Move the lock init until after if_alloc in case the allocation fails and we
free the softc and return.

MFC after: 3 days


166888 22-Feb-2007 csjp

Use ETHER_BPF_MTAP() instead of BPF_MTAP() here. It's possible
incoming packets have had their 802.1Q tags processed by the
hardware, resulting in them being stripped from the packets, and
placed on the mbuf. This fixes the processing of 802.1Q tags when
hardware offload of 802.1Q tags is enabled.


166879 22-Feb-2007 bms

Fix a bug in if_findmulti(), whereby it would not find (and thus delete)
a link-layer multicast group membership.
Such memberships are needed in order to support protocols such as
IS-IS without putting the interface into PROMISC or ALLMULTI modes.

sa_equal() is not OK for comparing sockaddr_dl as it has deeper structure
than a simple byte array, so add sa_dl_equal() and use that instead.

Reviewed by: rwatson
Verified with: /usr/sbin/mtest
Bug found by: Jouke Witteveen
MFC after: 2 weeks


166847 20-Feb-2007 rwatson

Replace a suser() check with an explicit check for PRIV_NET_SETIFMTU.


166577 09-Feb-2007 cognet

Use __NO_STRICT_ALIGNMENT, instead of special casing ia64 and sparc64.
This fixes panics I got on arm, with struct ip aligned on 4 bytes.

MFC After: 1 week


166514 05-Feb-2007 bms

Fix devfs cloning for non-superusers when net.link.tap.user_open is non-zero.
Note: 'ifconfig tapX create' still requires PRIV_NET_IFCREATE privilege.

Reviewed by: rwatson


166512 05-Feb-2007 bms

Clean up after tun(4) properly; remove routes whose ifp is set to
that of the tun instance even for the !AF_INET case, and properly
remove configured addresses by calling if_purgeaddrs().

Maintain the TUN_DSTADDR behaviour for compatibility with the OS/390
emulator.

MFC after: 3 weeks
PR: 100080
Reviewed by: bz


166497 04-Feb-2007 bms

Implement ifnet cloning for tun(4)/tap(4).
Make devfs cloning a sysctl/tunable which defaults to on.

If devfs cloning is enabled, only the super-user may create
tun(4)/tap(4)/vmnet(4) instances. Devfs cloning is still enabled by
default; it may be disabled from the loader or via sysctl with
"net.link.tap.devfs_cloning" and "net.link.tun.devfs_cloning".

Disabling its use affects potentially all tun(4)/tap(4) consumers
including OpenSSH, OpenVPN and VMware.

PR: 105228 (potentially also 90413, 105570)
Submitted by: Landon Fuller
Tested by: Andrej Tobola
Approved by: core (rwatson)
MFC after: 4 weeks


166443 03-Feb-2007 bms

Drop unicast Ethernet frames not destined for the configured address
of a tap(4) instance, if IFF_PROMISC is not set.

In tap(4), we should emulate the effect IFF_PROMISC would have on
hardware, otherwise we risk introducing layer 2 loops if tap(4) is
used with bridges. This means not even bpf(4) gets to see them.

This patch has been tested in a variety of situations. Multicast and
broadcast frames are correctly allowed through. I have observed this
behaviour causing problems with multiple QEMU instances hosted on
the same FreeBSD machine.

The checks in in ether_demux() [if_ethersubr.c, rev 1.222, line 638]
are insufficient to prevent this bug from occurring, as ifp->if_vlantrunk
will always be NULL for the non-vlan case.

MFC after: 3 weeks
PR: 86429
Submitted by: Pieter de Boer (with changes)


166438 02-Feb-2007 bms

Use int instead of u_int for the 'extra' argument to the
clone_create() KPI.
This fixes a signedness bug in unit number comparisons.

Submitted by: imp, Landon Fuller
PR: kern/105228
MFC after: 2 weeks


166329 29-Jan-2007 rwatson

Update comment for struct bpf_d: we now store buffered packets for BPF
in malloc'd storage, not in mbuf clusters.


166311 28-Jan-2007 rwatson

Remove slightly dubious comment; add descriptive strings for several
sysctls.

MFC after: 3 days


166282 27-Jan-2007 rwatson

Remove BSD < 199103 compatibility entries in the bpf_d structure: they are
not used in any of our code. Also remove explicit padding variable that
kept the bpf_d structure the same size before and after the change in
select implementation, since binary compatibility is not required for this
data structure on 7-CURRENT.


166281 27-Jan-2007 rwatson

Remove now unused bpf_compat.h. This compatibility file emulates malloc(9)
using the mbuf allocator.


166083 18-Jan-2007 thompsa

Set topology change propagation on all ports _except_ the caller.


165724 01-Jan-2007 csjp

style(9) nit. Prefer struct[space]name[space]{ to make grep searches more
in line with that we find in the rest of the tree.


165662 30-Dec-2006 yar

- Don't defer the removal of an 802.1q header for no real reason.
- Micro-optimize the addition of an 802.1q header to match the removal code.
- Consistently check for interfaces being up and running.
- Consistently use NULL instead of 0 with pointers.


165632 29-Dec-2006 jhb

Various bpf(4) related fixes to catch places up to the new bpf(4)
semantics.
- Stop testing bpf pointers for NULL. In some cases use
bpf_peers_present() and then call the function directly inside the
conditional block instead of the macro.
- For places where the entire conditional block is the macro, remove the
test and make the macro unconditional.
- Use BPF_MTAP() in if_pfsync on FreeBSD instead of an expanded version of
the old semantics.

Reviewed by: csjp (older version)


165569 27-Dec-2006 sam

First cut at half/quarter-rate 11a channel support (e.g. for use
in the Public Safety Band):
o add channel flags to identify half/quarter-rate operation
o add rate sets (need to check spec on 4Mb/s in 1/4 rate)
o add if_media definitions for new rates
o split net80211 channel setup out into ieee80211_chan_init
o fixup ieee80211_mhz2ieee and ieee80211_ieee2mhz to understand half/quarter
rate channels: note we temporarily use a nonstandard/hack numbering that
avoids overlap with 2.4G channels because we don't (yet) have enough
state to identify and/or map overlapping channel sets
o fixup ieee80211_ifmedia_init so it can be called post attach and will
recalculate the channel list and associated state; this enables changing
channel-related state like the regulatory domain after attach (will be
needed for 802.11d support too)
o add ieee80211_get_suprates to return a reference to the supported rate
set for a given channel
o add 3, 4.5, and 27 MB/s tx rates to rate <-> media conversion routines
o const-poison channel arg to ieee80211_chan2mode


165522 24-Dec-2006 yar

Note that rev. 1.221 introduced a local workaround for a general problem.
Add a pointer to the relevant PR for future reference. The whole comment
will be OK to remove as soon as the general solution is applied.

PR: kern/105943


165118 12-Dec-2006 bz

MFp4: 92972, 98913 + one more change

In ip6_sprintf no longer use and return one of eight static buffers
for printing/logging ipv6 addresses.
The caller now has to hand in a sufficiently large buffer as first
argument.


165105 11-Dec-2006 thompsa

These days P2P means peer-2-peer (also well known from serveral filesharing
protocols) while PointToPoint has been PtP links. Change the variables
accordingly while the code is still fresh and undocumented.

Requested by: bz


165008 08-Dec-2006 luigi

Fix an oscure bug triggered by a recent change in kern_socket.c.
The symptoms were that outgoing DHCP requests for diskless kernels
had the IP header corrupt. After long investigations, the source of
the problem was found in ether_output() - for SIMPLEX interfaces
and broadcast traffic, a copy of the packet is passed back to the kernel
through if_simloop(). However if_simloop() modifies the mbuf, while
the copy obtained through m_copym() is a readonly one.

The bug has been there forever, but it has been triggered only recently
by a change in sosend_dgram() which passed down mbufs with sufficient
space to prepend the header.

This fix is trivial - use m_dup() instead of m_copy() to create
the copy. As an alternative, we could try and modify if_simloop()
to play safely with readonly mbufs, but i don't think it is worthwhile
because 1) this is a relatively infrequent code path so we do not need
to worry too much about performance, and 2) the cost of doing an
extra m_pullup in if_simloop() is probably the same as doing the
copy of the cluster, anyways.

MFC after: 1 week


164921 05-Dec-2006 ume

Use callout mechanism instead of timeout()/untimeout().

MFC after: 1 week


164880 04-Dec-2006 syrinx

Add two new flags to if_bridge(4) indicating whether the edge flag
of the bridge port and path cost have been administratively set or
calculated automatically by RSTP.

Make sure to transition from non-edge to edge when the port goes down
and the edge flag was manually set before.
This is needed to comply with the condition
((!portEnabled && AdminEdge) || ....)
in the Bridge Detection State Machine (IEE802.1D-2004, p. 171).

Reviewed by: thompsa
Approved by: bz (mentor)


164861 03-Dec-2006 syrinx

Fix SIOCGDRVSPEC/BRDGGIFSSTP ioctl: make it copyin() the user
provided buffer length before trying to use it.

Reviewed by: thompsa
Approved by: bz (mentor)
MFC after: 3 days


164812 01-Dec-2006 rwatson

Remove obfuscating OpenBSD/NetBSD/BSDI/FreeBSD 2.x/FreeBSD 5.x ifdefs
from around printfs and address list iteration.


164807 01-Dec-2006 imp

fix typo in last commit


164806 01-Dec-2006 imp

Use FreeBDS standard __packed as opposed to the gcc centric
__attribute__(__packed__).


164804 01-Dec-2006 imp

Move the __packed declarations. This makes sizeof(struct llc) 8 again
on the arm. Add an assert to ensure that the size is 8 to prefent others
from falling into this trap (we should have more of these).

Why the construct:

struct foo {
union bar {
struct {
...
} __packed fred;
...
} __packed wilma;
} __packed;

has a different packing than:

struct foo {
union bar {
struct {
...
} fred __packed;
...
} wilma __packed;
} __packed;

is beyond my ability to ferret out of the gcc documentation. Most
likely some subtle binding issue (eg before it says the struct itself
is packed, while after it means that the whole struct is packed into
the thing it is in). Pointers to relevant documentation would be
appreciated.


164785 01-Dec-2006 imp

Use CTASSERT to make sure:
sizeof ether_header is 2 * ETHER_ADDR_LEN + 2 (14) bytes long
sizeof ether_addr is ETHER_ADDR_LEN bytes long

On arm, this shows that struct ether_addr needs to be __packed.

The first condition muts be true for the bridging code to not dump core.
The second one appears to be implicitly relied upon by wi (but many
of the rids it sends down likely need __packed too to be safe) and
maybe others. It appears to not hurt anything.


164772 30-Nov-2006 glebius

The recent issues with em(4) interface has shown that the old 4.4BSD
if_watchdog/if_timer interface doesn't fit modern SMP network
stack design.

Device drivers that need watchdog to monitor their hardware should
implement it theirselves.

Eventually the if_watchdog/if_timer API will be removed. For now,
warn that driver uses it.

Reviewed by: scottl


164716 28-Nov-2006 rwatson

Change net.isr.direct from defaulting to 0 to 1 in 7-CURRENT. This
enables direct dispatch of the network stack from the device driver
ithread, enabling input path parallelism by default when multiple
interfaces are present.

The strategy for network stack parallelism is something being actively
discussed, and this is just one of several possible (and perfectly
reasonable) strategies, but has the distinct advantage of reducing the
number of context switches and preemptions significantly, resulting in
higher efficiency in many cases. In some caes, this may reduce
network stack parallelism due to work not being deferred from the
ithread to the netisr. Therefore, the strategy may change in the
future, but this offers a reasonable first pass and enabling
parallelism while maintaining strong ordering.

Hopefully this will trigger lots of nice new bugs.

This change is not intended for MFC.


164653 27-Nov-2006 thompsa

Sync with the OpenBSD port of RSTP
- use flags rather than sperate ioctls for edge, p2p
- implement p2p and autop2p flags
- define large pathcost constant as ULL
- show bridgeid and rootid in ifconfig

Obtained from: Reyk Floeter <reyk@openbsd.org>


164638 26-Nov-2006 thompsa

Initialize the port info, this shouldnt have been removed in r1.28


164632 26-Nov-2006 thompsa

Remove redundant setting of port state.


164626 26-Nov-2006 thompsa

use two stage creation of stp ports, this means that the stp variables can be
set before the port is marked STP and they will no longer be overwrittten


164549 23-Nov-2006 bde

Initialize a local variable in 2 places just before it is used, not always
at the start of rtalloc1(). This backs out part of revs 1.83 and 1.85.

Profiling on an i386 showed that that for sending tiny packets using
bge, -current takes 7 bzero()s where RELENG_4 takes only 1, and that
bzero()ing is now the dominant overhead (10-12%, up from 1%, but
profiling overestimated this a bit). This commit backs out 2 of the
6 extra bzero()s (1 in each of 2 calls per packet to rtalloc1()). They
were the largest ones by byte count (48 bytes each) but perhaps not
by time (small misaligned ones might take longer).


164414 19-Nov-2006 thompsa

Do not call bstp_stop() internally as it clears the running flag which causes
the timer to never be restarted.

Reported by: bz


164398 18-Nov-2006 csjp

Fix typo in comment

Pointed out by: ru


164396 18-Nov-2006 csjp

Currently, drivers that support hardware offload of VLAN tag
processing are forced to toggle this functionality when the card
is put in and out of promiscuous mode. The main reason for this
is because the hardware strips the VLAN tag, making it impossible
for the tag information to show up in network diagnostic tools like
tcpdump(1).

This change introduces ether_vlan_mtap(), which is called if the
mbuf has M_VLANTAG set. VLAN information is extracted from the
mbuf and inserted into a stack allocated ether vlan header which
is then inserted through the bpf machinery via bpf_mtap2(). The
original mbuf's data pointer and lengths are temporarily adjusted
to eliminate the original Ethernet header for the duration of the
tap operation. This should have no long term effects on the mbuf.

Also, define a new macro, ETHER_BPF_MTAP which should be used
by drivers which support hardware offload of VLAN tag processing.

The fixes for the relevant drivers will follow shortly.

Discussed with: rwatson, andre, jhb (and others)
Much feedback from: sam, ru
MFC after: 1 month [1]

[1] The version that is eventually MFCed will be somewhat
different then this, as there has been significant work
done to the VLAN code in HEAD.


164381 18-Nov-2006 sam

mark struct ether_header packed so gcc honors alignment
constratins on arm; this fixes bridging when packets are
rx'd so ip headers are 32-bit aligned

Reviewed by: imp (and discussed elsewhere)
MFC after: 2 weeks


164180 11-Nov-2006 ume

Teach an IPv6 to ppp(4).

Obtained from: NetBSD
MFC after: 1 week


164141 09-Nov-2006 thompsa

MFp4
- Each stp port is added sequentially so it was possible for our bridgeid to
change every time because the new port has a lower MAC address. Instead
just find the lowest MAC address from all Ethernet adapters in the machine
as the value only needs to be unique, this stops a lot of churn on the
protocol.
- Update the states after enabling or disabling a port.
- Keep tabs if we have been stopped or started by our parent bridge.
- The callout only needs to be drained before destroying the mutex, move it to
bstp_detach.


164112 09-Nov-2006 thompsa

Add a new address cache type called sticky. On an interface marked sticky any
address learned by the bridge is made permanent, the address will not age out
and most importantly will not migrate to another interface.

This can be used to stop mac address poisoning or clients roaming in much the
same way as static entries without the hassle of preloading the table.


164033 06-Nov-2006 rwatson

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


164002 05-Nov-2006 csjp

Fix possible leak when bridge is in monitor mode. Use m_freem() which will
free the entire chain, instead of using m_free() which will free just the
mbuf that was passed.

Discussed with: thompsa
MFC after: 3 days


163986 04-Nov-2006 csjp

Currently, we initialize "error" to zero when it's declared, then
we never initialize it to anything else. However, in the case that
m_uiotombuf fails, we return error (effectively reporting success).

This appears to be a relic of an older revision of this file, where
"error" used to be doing something useful. (See revision 1.1, where
error is used in a loop with uiomove() instead of using m_uiotomubf).

So instead on unconditionally reporting success in the case there is
a failure in m_uiotombuf, explicitly return ENOBUFS. While we are
here, garbage collect the error variable since it's no longer required.

MFC after: 2 weeks


163984 04-Nov-2006 thompsa

When the packet is for the bridge then note which interface to send the reply
to, previously it was always broadcast to all interfaces (a bug). This is
useful when the bridge is the default gateway and vlans are used to isolate
each client, the reply is now kept private to the vlan which the client
resides.

Reported by: Jon Otterholm
Tested by: Jon Otterholm
MFC after: 3 days


163953 03-Nov-2006 rrs

Ok, here it is, we finally add SCTP to current. Note that this
work is not just mine, but it is also the works of Peter Lei
and Michael Tuexen. They both are my two key other developers
working on the project.. and they need ata-boy's too:
****
peterlei@cisco.com
tuexen@fh-muenster.de
****
I did do a make sysent which updated the
syscall's and sysproto.. I hope that is correct... without
it you don't build since we have new syscalls for SCTP :-0

So go out and look at the NOTES, add
option SCTP (make sure inet and inet6 are present too)
and play with SCTP.

I will see about comitting some test tools I have after I
figure out where I should place them. I also have a
lib (libsctp.a) that adds some of the missing socketapi
functions that I need to put into lib's.. I will talk
to George about this :-)

There may still be some 64 bit issues in here, none of
us have a 64 bit processor to test with yet.. Michael
may have a MAC but thats another beast too..

If you have a mac and want to use SCTP contact Michael
he maintains a web site with a loadable module with
this code :-)

Reviewed by: gnn
Approved by: gnn


163926 03-Nov-2006 thompsa

Defer sending the bpdu from bstp_update_info as all code paths will test this
flag anyway.


163915 02-Nov-2006 andre

Rename m_getm() to m_getm2() and rewrite it to allocate up to page sized
mbuf clusters. Add a flags parameter to accept M_PKTHDR and M_EOR mbuf
chain flags. Provide compatibility macro for m_getm() calling m_getm2()
with M_PKTHDR set.

Rewrite m_uiotombuf() to use m_getm2() for mbuf allocation and do the
uiomove() in a tight loop over the mbuf chain. Add a flags parameter to
accept mbuf flags to be passed to m_getm2(). Adjust all callers for the
extra parameter.

Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 month


163904 02-Nov-2006 thompsa

Do not test all the conditions if the port is already forwarding. Also print a
debug message if the port is agreed as it is an important condition of the
protocol.


163903 02-Nov-2006 thompsa

Fix a resource leak when the mbuf pointer changes.

CID: 1564, 1565
Found by: Coverity Prevent (tm)


163895 02-Nov-2006 thompsa

If the port is agreed or edge then allow it go go straight to forwarding rather
than waiting another tick (1s) for the states to be checked again.


163863 01-Nov-2006 thompsa

Bring in support for the Rapid Spanning Tree Protocol (802.1w).

RSTP provides faster spanning tree convergence, the protocol will exchange
information with neighboring switches to quickly transition to forwarding
without creating loops. The code will default to RSTP mode but will downgrade
any port connected to a legacy STP network so is fully backward compatible.

Reviewed by: syrinx
Tested by: syrinx


163606 22-Oct-2006 rwatson

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


163561 21-Oct-2006 glebius

Fix error in rev. 1.68. The intention was to break out the switch(){},
but actually exited from the for(){} loop. This fixes the PPPIOCSCOMPRESS
ioctl.

PR: kern/101333
Submitted by: Igor Popov <igorpopov newmail.ru>


163232 11-Oct-2006 glebius

- Update the baudrate every time the parent changes its link state.
- Rearrange the curly braces so that this piece of code is more
readable.


163142 09-Oct-2006 thompsa

Use LIST_FOREACH_SAFE instead of a hand rolled version.


162906 01-Oct-2006 thompsa

Remove licence clauses 3 & 4

OKed by: Jason L. Wright


162711 27-Sep-2006 ru

Fix our ioctl(2) implementation when the argument is "int". New
ioctls passing integer arguments should use the _IOWINT() macro.
This fixes a lot of ioctl's not working on sparc64, most notable
being keyboard/syscons ioctls.

Full ABI compatibility is provided, with the bonus of fixing the
handling of old ioctls on sparc64.

Reviewed by: bde (with contributions)
Tested by: emax, marius
MFC after: 1 week


162561 22-Sep-2006 thompsa

Revert r1.80 as the ethernet header was inadvertently stripped from ARP
packets. Reimplement this correctly and use a sysctl that defaults to off so
the user doesnt get any suprises if ipfw blocks the ARP packet.

MFC after: 3 days


162539 22-Sep-2006 suz

fixed a bug that local IPv6 traffic (to an address configured on an
interface other than lo0) does not show up properly on any bpf.

Reported by: mlaier
Reviewed by: gnn, csjp
MFC after: 1 week


162375 17-Sep-2006 andre

Move ethernet VLAN tags from mtags to its own mbuf packet header field
m_pkthdr.ether_vlan. The presence of the M_VLANTAG flag on the mbuf
signifies the presence and validity of its content.

Drivers that support hardware VLAN tag stripping fill in the received
VLAN tag (containing both vlan and priority information) into the
ether_vtag mbuf packet header field:

m->m_pkthdr.ether_vtag = vlan_id; /* ntohs()? */
m->m_flags |= M_VLANTAG;

to mark the packet m with the specified VLAN tag.

On output the driver should check the mbuf for the M_VLANTAG flag to
see if a VLAN tag is present and valid:

if (m->m_flags & M_VLANTAG) {
... = m->m_pkthdr.ether_vtag; /* htons()? */
... pass tag to hardware ...
}

VLAN tags are stored in host byte order. Byte swapping may be necessary.

(Note: This driver conversion was mechanic and did not add or remove any
byte swapping in the drivers.)

Remove zone_mtag_vlan UMA zone and MTAG_VLAN definition. No more tag
memory allocation have to be done.

Reviewed by: thompsa, yar
Sponsored by: TCP/IP Optimization Fundraise 2005


162368 17-Sep-2006 thompsa

Rearrange things so that ARP packets can be filtered or rate limited with IPFW.

Requested by: Jon Otterholm
Tested by: Jon Otterholm


162084 06-Sep-2006 andre

First step of TSO (TCP segmentation offload) support in our network stack.

o add IFCAP_TSO[46] for drivers to announce this capability for IPv4 and IPv6
o add CSUM_TSO flag to mbuf pkthdr csum_flags field
o add tso_segsz field to mbuf pkthdr
o enhance ip_output() packet length check to allow for large TSO packets
o extend tcp_maxmtu[46]() with a flag pointer to pass interface capabilities
o adjust all callers of tcp_maxmtu[46]() accordingly

Discussed on: -current, -net
Sponsored by: TCP/IP Optimization Fundraise 2005


162070 06-Sep-2006 andre

Improve description of if_capabilities, if_capenable and ifi_hwassist.

Sponsored by: TCP/IP Optimization Fundraise 2005


162068 06-Sep-2006 andre

Fix the socket option IP_ONESBCAST by giving it its own case in ip_output()
and skip over the normal IP processing.

Add a supporting function ifa_ifwithbroadaddr() to verify and validate the
supplied subnet broadcast address.

PR: kern/99558
Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru>
Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 days


162010 04-Sep-2006 sam

more juniper dlt's

MFC after: 1 month


161626 25-Aug-2006 thompsa

Move the bridge hook after the loopback check so that IFF_SIMPLEX is honoured
on member interfaces. This makes us the same as OpenBSD/NetBSD.

MFC after: 3 days


161625 25-Aug-2006 thompsa

The bridge cant hear its own transmissions so set IFF_SIMPLEX.

PR: kern/102361
Tested by: Radim Kolar <hsn@netmag.cz>
MFC after: 3 days


161603 25-Aug-2006 thompsa

Fix spelling.


161556 23-Aug-2006 cperciva

Correct buffer overflow in the handling of LCP options in ppp(4)

Security: FreeBSD-SA-06:18.ppp


161407 17-Aug-2006 thompsa

Remove unneeded asserts from bridge_ioctl_* since these are just
extensions of bridge_ioctl() which has the correct locking.


161403 17-Aug-2006 thompsa

Remove two lock asserts that are unneeded due to subsequent unlocks.


161401 17-Aug-2006 thompsa

Call bridge_span before dropping the lock.

MFC after: 5 days


161329 15-Aug-2006 yar

This XXX remark was rendered false by rev. 103, which made the
VLAN_ARRAY case subject to rw locking, too.


161326 15-Aug-2006 yar

Make it a tad easier to base other encapsulation schemes on this driver
by restoring the ifv_proto field in the vlan softc and putting it to use
this time. It's a good companion for ifv_encaplen, which has already been
used throughout this driver.


161321 15-Aug-2006 yar

Set IFF_DRV_RUNNING on vlan(4) once in vlan_config(),
not at many places after each call to vlan_config().
This is consistent with IFF_DRV_RUNNING being unset
in vlan_unconfig().


161255 12-Aug-2006 thompsa

Add the module version to fix the loading with if_bridge.

Reported by: keramida
Tested by: keramida


161210 11-Aug-2006 yar

Optionally pad outgoing frames to the minimum of 60 bytes (excl. FCS)
before tagging them. This can help to work around brain-damage in some
switches that fail to pad a frame after untagging it if its length drops
below the minimum. This option is blessed by IEEE Std 802.1Q (2003 Ed.),
paragraph C.4.4.3.b. It's controlled by sysctl net.link.vlan.soft_pad.

Idea by: az
MFC after: 1 week


161124 09-Aug-2006 rwatson

Since bpf_allocbufs() uses malloc() with M_WAITOK, don't check return
values for NULL or return an error state. Assert that all three bpf
buffer pointers are NULL before starting.

MFC after: 1 week


161103 08-Aug-2006 rwatson

Add kqueue support to if_tun. Loosely based on if_tap changes.

Two almost identical patches based on the if_tap work were submitted
via GNATS; I started out with the patch in 100796 from David Gilbert,
but could have easily started with the patch from Vilmos Nebehaj which
I found only later.

MFC after: 1 week
PR: 93976, 100796


160981 04-Aug-2006 brooks

With exception of the if_name() macro, all definitions in net_osdep.h
were unused or already in if_var.h so add if_name() to if_var.h and
remove net_osdep.h along with all references to it.

Longer term we may want to kill off if_name() entierly since all modern
BSDs have if_xname variables rendering it unnecessicary.


160951 03-Aug-2006 yar

Should vlan_input() ever be called with ifp pointing to a non-Ethernet
interface, do not just assign -1 to tag because it breaks the logic of
the code to follow. The better way is to handle this case as an unsupported
protocol and return unless INVARIANTS is in effect and we can panic.
Panic is good there because the scenario can happen only because of a
coding error elsewhere.

We also should show the interface name in the panic message for easier
debugging of the problem, should it ever emerge.

Submitted by: qingli (initially)


160950 03-Aug-2006 yar

Back out rev. 1.107 because it introduced as many problems
as it tried to solve:

- it smuggled hidden 802.1q details into otherwise protocol-neutral code;
- it put an important code consistency check under DEBUG, which was never
defined by anyone but a developer hacking this file for the moment;
- lastly, the former bcopy() call had been correct as long as the "dead"
code was there.

(A new version of the fix for tag of -1 to come in the next commit.)

Agreed by: qingli


160902 02-Aug-2006 thompsa

- Use the new bridgestp callback to once again flush our bridge routes when an
interface is disabled.
- Log port changes to syslog, defaulting to off


160901 02-Aug-2006 thompsa

Tell bridgestp that we are about to free the memory so it can cleanup.


160900 02-Aug-2006 thompsa

Fix style in the last commit, the variable declaration goes at the top of the
function.


160899 02-Aug-2006 thompsa

Add a callback so we can notify the parent bridge that a port state change has
occured, we need to do this from a taskqueue to avoid a LOR with the if_bridge
mutex.


160897 02-Aug-2006 thompsa

Be sure to disable the port when removing it from STP.


160884 01-Aug-2006 qingli

In vlan_input(), if the network interface does not perform h/w based
vlan tag processing, the code will use bcopy() to remove the vlan
tag field but the code copies 2 bytes too many, which essentially
overwrites the protocol type field.

Also, a tag value of -1 is generated for unrecognized interface type,
which would cause an invalid memory access in the vlans[] array.

In addition, removed a line of dead code and its associated comments.

Reviewed by: sam


160867 31-Jul-2006 thompsa

Add some statistics that are needed to support RFC4188 as part of the SoC2006
work on a bridge monitoring module for BSNMP.

Submitted by: shteryana (SoC 2006)


160769 27-Jul-2006 thompsa

Remove the dependency of bridgestp.h on if_bridgevar.h by moving a couple of
private structures to if_bridge.c.


160735 27-Jul-2006 avatar

Fixing compilation bustage: net/if_bridgevar.h depends on net/bridgestp.h.


160730 26-Jul-2006 thompsa

bridgestp is now a seperate module.


160726 26-Jul-2006 thompsa

Remove stp variables that are already initialised in bstp_attach().


160703 26-Jul-2006 thompsa

/tmp/cvsuusTrc


160702 26-Jul-2006 thompsa

Remove variables that are overridden by ether_ifattach(). This clears up any
confusion especially as *if_output was pointed to a different function.


160690 26-Jul-2006 sam

add support for 802.11 packet injection via bpf

Together with: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Reviewed by: arch@
MFC after: 1 month


160620 24-Jul-2006 dwmalone

Rather than calling mircotime() in catchpacket(), make catchpacket()
take a timeval indicating when the packet was captured. Move
microtime() to the calling functions and grab the timestamp as soon
as we know that we're going to call catchpacket at least once.

This means that we call microtime() once per matched packet, as
opposed to once per matched packet per bpf listener. It also means
that we return the same timestamp to all bpf listeners, rather than
slightly different ones.

It would be more accurate to call microtime() even earlier for all
packets, as you have to grab (1+#listener) locks before you can
determine if the packet will be logged. You could always grab a
timestamp before the locks, but microtime() can be costly, so this
didn't seem like a good idea.

(I guess most ethernet interfaces will have a bpf listener these
days because of dhclient. That means that we could be doing two bpf
locks on most packets going through the interface.)

PR: 71711


160549 21-Jul-2006 rwatson

Change semantics of socket close and detach. Add a new protocol switch
function, pru_close, to notify protocols that the file descriptor or
other consumer of a socket is closing the socket. pru_abort is now a
notification of close also, and no longer detaches. pru_detach is no
longer used to notify of close, and will be called during socket
tear-down by sofree() when all references to a socket evaporate after
an earlier call to abort or close the socket. This means detach is now
an unconditional teardown of a socket, whereas previously sockets could
persist after detach of the protocol retained a reference.

This faciliates sharing mutexes between layers of the network stack as
the mutex is required during the checking and removal of references at
the head of sofree(). With this change, pru_detach can now assume that
the mutex will no longer be required by the socket layer after
completion, whereas before this was not necessarily true.

Reviewed by: gnn


160377 15-Jul-2006 brooks

Use TAILQ_FOREACH instead of poking around in the guts of the list
macros.


160376 15-Jul-2006 brooks

Drop a pointless cast of ifp->if_softc to (struct tap_softc *).


160233 10-Jul-2006 thompsa

Catch up with the revised network interface cloning which takes an optional
opaque parameter that can specify configuration parameters.


160195 09-Jul-2006 sam

Revise network interface cloning to take an optional opaque
parameter that can specify configuration parameters:
o rev cloner api's to add optional parameter block
o add SIOCCREATE2 that accepts parameter data
o rev vlan support to use new api (maintain old code)

Reviewed by: arch@


160124 06-Jul-2006 oleg

Adjust rt_(set|get)metrics() to do kernel <-> userland timebase conversion.
We need it since kernel timebase has changed (time_second -> time_uptime).

Approved by: glebius (mentor)


160099 04-Jul-2006 thompsa

Fix a braino in the last revision, enc_clone_destroy needs return void instead
of int. The clone system will ensure that our first interface is not destroyed
so we dont need the extra checking anyway.

Tested by: Scott Ullrich


160087 03-Jul-2006 csjp

Adjust descriptor locking to tell the kqueue subsystem that our descriptor is
already locked. The reason to do this is to avoid two lock+unlock operations
in a row. We need the lock here to serialize access to bd_pid for stats
collection purposes.

Drop the locks all together on detach, as they will be picked up by
knlist_remove.

This should fix a failed locking assertion when kqueue is being used with bpf
descriptors.

Discussed with: jmg


160038 29-Jun-2006 yar

There is a consensus that ifaddr.ifa_addr should never be NULL,
except in places dealing with ifaddr creation or destruction; and
in such special places incomplete ifaddrs should never be linked
to system-wide data structures. Therefore we can eliminate all the
superfluous checks for "ifa->ifa_addr != NULL" and get ready
to the system crashing honestly instead of masking possible bugs.

Suggested by: glebius, jhb, ru


160035 29-Jun-2006 yar

Use TAILQ_FOREACH in the __FreeBSD__ case, too.
Funnily enough, rev. 1.15 changed the __Net and __Open cases only.


160034 29-Jun-2006 yar

Use TAILQ_FOREACH.


160033 29-Jun-2006 yar

Use the nifty TAILQ_FOREACH.


160020 29-Jun-2006 yar

Detach the interface first, do vlan_unconfig() then.
Previously, another thread could get a pointer to the
interface by scanning the system-wide list and sleep
on the global vlan mutex held by vlan_unconfig().
The interface was gone by the time the other thread
woke up.

In order to be able to call vlan_unconfig() on a detached
interface, remove the purely cosmetic bzero'ing of IF_LLADDR
from the function because a detached interface has no addresses.

Noticed by: a stress-testing script by maxim
Reviewed by: glebius


160019 29-Jun-2006 yar

Remove a few unused things.
Fix some style and consistency points.


160018 29-Jun-2006 yar

Reduce unneeded code duplication.


160011 28-Jun-2006 thompsa

A small race existed where the lock was dropped between when encif was
tested and then set. [1]

Reorganise things to eliminate this, we now ensure that enc0 can not be
destroyed which as the benefit of no longer needing to lock in
ipsec_filter and ipsec_bpf. The cloner will create one interface during the
init so we can guarantee that encif will be valid before any SPD entries are
added to ipsec.

Spotted by: glebius [1]


159969 27-Jun-2006 thompsa

Simplify ipsec_bpf by using bpf_mtap2().


159965 26-Jun-2006 thompsa

Add a pseudo interface for packet filtering IPSec connections before or after
encryption. There are two functions, a bpf tap which has a basic header with
the SPI number which our current tcpdump knows how to display, and handoff to
pfil(9) for packet filtering.

Obtained from: OpenBSD
Based on: kern/94829
No objections: arch, net
MFC after: 1 month


159838 21-Jun-2006 yar

Fix the VLAN_ARRAY case, mostly regarding improper use of atomic(9)
in place of conventional rw locking. Alas, atomic(9) can't buy us
lockless operation so easily.


159823 21-Jun-2006 yar

Track interface department events and detach vlans from
departing trunk so that we don't get into trouble later
by dereferencing a stale pointer to dead trunk's things.

Prodded by: oleg
Sponsored by: RiNet (Cronyx Plus LLC)
MFC after: 1 week


159822 21-Jun-2006 glebius

- First initialize ifnet, and then insert it into global
list.
- First remove from global list, then start destroying.

PR: kern/97679
Submitted by: Alex Lyashkov <shadow itt.net.ru>
Reviewed by: rwatson, brooks


159807 20-Jun-2006 thompsa

Allow gif interfaces to be added as span ports, the user may want to send a
copy of all packets to the other side of the world.


159781 19-Jun-2006 mlaier

Import interface groups from OpenBSD. This allows to group interfaces in
order to - for example - apply firewall rules to a whole group of
interfaces. This is required for importing pf from OpenBSD 3.9

Obtained from: OpenBSD (with changes)
Discussed on: -net (back in April)


159759 19-Jun-2006 thompsa

Fix spelling mistake in comment.


159641 15-Jun-2006 csjp

Since we are doing some bpf(4) clean up, change a couple of function prototypes
to be consistent. Also, ANSI'fy function definitions. There is no functional
change here.


159595 14-Jun-2006 csjp

If bpf(4) has not been compiled into the kernel, initialize the bpf interface
pointer to a zeroed, statically allocated bpf_if structure. This way the
LIST_EMPTY() macro will always return true. This allows us to remove the
additional unconditional memory reference for each packet in the fast path.

Discussed with: sam


159555 12-Jun-2006 thompsa

Use bit operations to get a locally administered address rather than using a
hardcoded OUI code.


159528 11-Jun-2006 fjoe

Fix KASSERT conditions in if_deregister_com_alloc().


159446 08-Jun-2006 thompsa

Allow bridge and carp to play nicely together by returning the packet if its
destined for a carp interface.

Obtained from: OpenBSD
MFC after: 2 weeks


159305 05-Jun-2006 qingli

Assuming the interface has an address of x.x.x.195, a mask of
255.255.255.0, and a default route with gateway x.x.x.1. Now if
the address mask is changed to something more specific, e.g.,
255.255.255.128, then after the mask change the default gateway
is no longer reachable.

Since the default route is still present in the routing table,
when the output code tries to resolve the address of the default
gateway in function rt_check(), again, the default route will be
returned by rtalloc1(). Because the lock is currently held on the
rtentry structure, one more attempt to hold the lock will trigger
a crash due to "lock recursed on non-recursive mutex ..."

This is a general problem. The fix checks for the above condition
so that an existing route entry is not mistaken for a new cloned
route. Approriately, an ENETUNREACH error is returned back to the
caller

Approved by: andre


159193 03-Jun-2006 csjp

Back out previous two commits, this caused some problems in the namespace
resulting in some build failures. Instead, to fix the problem of bpf not
being present, check the pointer before dereferencing it.

This is a temporary bandaid until we can decide on how we want to handle
the bpf code not being present. This will be fixed shortly.


159192 03-Jun-2006 csjp

Temporarily include files so that our macro checks do something useful.


159186 03-Jun-2006 csjp

Make sure we don't try to dereference the the if_bpf pointer when bpf has
not been compiled into the the kernel.

Submitted by: benno


159183 02-Jun-2006 sam

add missed calls to bpf_peers_present


159180 02-Jun-2006 csjp

Fix the following bpf(4) race condition which can result in a panic:

(1) bpf peer attaches to interface netif0
(2) Packet is received by netif0
(3) ifp->if_bpf pointer is checked and handed off to bpf
(4) bpf peer detaches from netif0 resulting in ifp->if_bpf being
initialized to NULL.
(5) ifp->if_bpf is dereferenced by bpf machinery
(6) Kaboom

This race condition likely explains the various different kernel panics
reported around sending SIGINT to tcpdump or dhclient processes. But really
this race can result in kernel panics anywhere you have frequent bpf attach
and detach operations with high packet per second load.

Summary of changes:

- Remove the bpf interface's "driverp" member
- When we attach bpf interfaces, we now set the ifp->if_bpf member to the
bpf interface structure. Once this is done, ifp->if_bpf should never be
NULL. [1]
- Introduce bpf_peers_present function, an inline operation which will do
a lockless read bpf peer list associated with the interface. It should
be noted that the bpf code will pickup the bpf_interface lock before adding
or removing bpf peers. This should serialize the access to the bpf descriptor
list, removing the race.
- Expose the bpf_if structure in bpf.h so that the bpf_peers_present function
can use it. This also removes the struct bpf_if; hack that was there.
- Adjust all consumers of the raw if_bpf structure to use bpf_peers_present

Now what happens is:

(1) Packet is received by netif0
(2) Check to see if bpf descriptor list is empty
(3) Pickup the bpf interface lock
(4) Hand packet off to process

From the attach/detach side:

(1) Pickup the bpf interface lock
(2) Add/remove from bpf descriptor list

Now that we are storing the bpf interface structure with the ifnet, there is
is no need to walk the bpf interface list to locate the correct bpf interface.
We now simply look up the interface, and initialize the pointer. This has a
nice side effect of changing a bpf interface attach operation from O(N) (where
N is the number of bpf interfaces), to O(1).

[1] From now on, we can no longer check ifp->if_bpf to tell us whether or
not we have any bpf peers that might be interested in receiving packets.

In collaboration with: sam@
MFC after: 1 month


159174 02-Jun-2006 glebius

Fix gif_output() so that GIF_UNLOCK() is performed only in case
we have locked the softc.

PR: kern/98298
Submitted by: Eugene Grosbein


159164 02-Jun-2006 rwatson

raw_disconnect() now disconnects but does not detach the raw pcb. As a
result, raw_uabort() now needs to call raw_detach() directly. As
raw_uabort() is never called, and raw_disconnect() is probably not ever
actually called in practice, this is likely not a functional change, but
improves congruence between protocols, and avoids a NULL raw cb pointer
after disconnect, which could result in a panic.

MFC after: 1 month


159162 02-Jun-2006 glebius

- Add definition for IFM_10G_CX4.
- Put IFM_10G_CX4 and IFM_10G_SR into IFMEDIA_BAUDRATE array.

Requested by: Jack Vogel <jfvogel gmail.com>


159126 01-Jun-2006 thompsa

Announce all interfaces to devd on attach/detach. This adds a new devctl
notification so all interfaces including pseudo are reported. When netif
creates the clones at startup devctl_disable has not been turned off yet so the
interfaces will not be initialised twice, enforce this by adding an explicit
order between rc.d/netif and rc.d/devd.

This change allows actions to taken in userland when an interface is cloned
and the pseudo interface will be automatically configured if a ifconfig_<int>=""
line exists in rc.conf.

Reviewed by: brooks
No objections on: net


159079 30-May-2006 marius

Revert the (int *) -> (intptr_t *) conversion done as part of rev. 1.59
for IOCTLs where casting data to intptr_t * isn't the right thing to do
as _IO() isn't used for them but _IOR(..., int)/_IOW(..., int) are (i.e.
for all IOCTLs except VMIO_SIOCSIFFLAGS), fixing tap(4) on big-endian
LP64 machines.

PR: sparc64/98084
OK'ed by: emax
MFC after: 1 week


159078 30-May-2006 ru

Fix -Wundef warnings.


159018 28-May-2006 dwmalone

Avoid unwanted sign extension of indexed byte load in bpf code.

PR: 89748
Submitted by: Guy Harris <guy@alum.mit.edu>
Obtained from: NetBSD via OpenBSD
MFC after: 2 weeks


158697 17-May-2006 emax

Do not call knlist_destroy() in tapclose(). Instead call it when device is
actually destroyed. Also move call to knlist_init() into tapcreate(). This
should fix panic described in kern/95357.

PR: kern/95357
No response from: freebsd-current@
MFC after: 3 days


158667 16-May-2006 thompsa

Fix style(9) nits, whitespace and parentheses.


158661 16-May-2006 qingli

The current routing code allows insertion of indirect routes that have
gateways which are unreachable except through the default router. For
example, assuming there is a default route configured, and inserting
a route

"route add 64.102.54.0/24 60.80.1.1"

is currently allowed even when 60.80.1.1 is only reachable through
the default route. However, an error is thrown when this route is
utilized, say,

"ping 64.102.54.1" will return an error

This type of route insertion should be disallowed becasue:

1) Let's say that somehow our code allowed this packet to flow to
the default router, and the default router knows the next hop is
60.80.1.1, then the question is why bother inserting this route in
the 1st place, just simply use the default route.

2) Since we're not talking about source routing here, the default
router could very well choose a different path than using 60.80.1.1
for the next hop, again it defeats the purpose of adding this route.

Reviewed by: ru, gnn, bz
Approved by: andre


158592 15-May-2006 dhartmei

Recalculate IP checksum after running pfil hooks.

Reviewed by: thompsa
Tested by: Adam McDougall <mcdouga9@egr.msu.edu>


158500 12-May-2006 mlaier

Remove ip6fw. Since ipfw has full functional IPv6 support now and - in
contrast to ip6fw - is properly lockes, it is time to retire ip6fw.


158471 12-May-2006 jhb

Remove various bits of conditional Alpha code and fixup a few comments.


158416 11-May-2006 hsu

Correct test for fragmented packet.


158345 07-May-2006 csjp

Pickup locks for the BPF interface structure. It's quite possible that
bpf(4) descriptors can be added and removed on this interface while we
are processing stats.

MFC after: 2 weeks


158294 04-May-2006 bz

In rtrequest and rtinit check for sa_len != 0 for the given
destination. These checks are needed so we do not install
a route looking like this:
(0) 192.0.2.200 UH tun0 =>

When removing this route the kernel will start to walk
the address space which looks like a hang on 64bit platforms
because it'll take ages while on 32bit you should see a panic
when kernel debugging options are turned on.

The problem is in rtrequest1:
if (netmask) {
rt_maskedcopy(dst, ndst, netmask);
} else
bcopy(dst, ndst, dst->sa_len);

In both cases the len might be 0 if the application forgot to
set it. If so ndst will be all-zero leading to above
mentioned strange routes.

This is an application error but we must not fail/hang/panic
because of this.

Looks ok: gnn
No objections: net@ (silence)
MFC after: 8 weeks


158140 29-Apr-2006 thompsa

Add support for fragmenting ipv4 packets.

The packet filter may reassemble the ip fragments and return a packet that is
larger than the MTU of the sending interface. There is no check for DF or icmp
replies as we can only get a large packet to fragment by reassembling a
previous fragment, and this only happens after a call to pfil(9).

Obtained from: OpenBSD (mostly)
Glanced at by: mlaier
MFC after: 1 month


157681 12-Apr-2006 rwatson

Use ANSI C function protypes and declarations for if_arcsubr.

MFC after: 1 month


157604 09-Apr-2006 rwatson

Correct an assertion in raw_uattach(): this is a library call that other
protocols invoke after allocating a PCB, so so_pcb should be non-NULL.
It is only used by the two IPSEC implementations, so I didn't hit it in
my testing.

Reported by: pjd
MFC after: 3 months


157506 04-Apr-2006 andre

Undo damage from wrong MFC to HEAD.

Pointed out by: jkim, remko


157503 04-Apr-2006 andre

MFC rev. 1.32: Add link status descriptions and related structures for userland
applications.

Approved by: re


157372 01-Apr-2006 rwatson

In raw and raw-derived socket types, maintain and enforce invariant that
the so_pcb pointer on the socket is always non-NULL. This eliminates
countless unnecessary error checks, replacing them with assertions.

MFC after: 3 months


157370 01-Apr-2006 rwatson

Chance protocol switch method pru_detach() so that it returns void
rather than an error. Detaches do not "fail", they other occur or
the protocol flags SS_PROTOREF to take ownership of the socket.

soclose() no longer looks at so_pcb to see if it's NULL, relying
entirely on the protocol to decide whether it's time to free the
socket or not using SS_PROTOREF. so_pcb is now entirely owned and
managed by the protocol code. Likewise, no longer test so_pcb in
other socket functions, such as soreceive(), which have no business
digging into protocol internals.

Protocol detach routines no longer try to free the socket on detach,
this is performed in the socket code if the protocol permits it.

In rts_detach(), no longer test for rp != NULL in detach, and
likewise in other protocols that don't permit a NULL so_pcb, reduce
the incidence of testing for it during detach.

netinet and netinet6 are not fully updated to this change, which
will be in an upcoming commit. In their current state they may leak
memory or panic.

MFC after: 3 months


157366 01-Apr-2006 rwatson

Change protocol switch pru_abort() API so that it returns void rather
than an int, as an error here is not meaningful. Modify soabort() to
unconditionally free the socket on the return of pru_abort(), and
modify most protocols to no longer conditionally free the socket,
since the caller will do this.

This commit likely leaves parts of netinet and netinet6 in a situation
where they may panic or leak memory, as they have not are not fully
updated by this commit. This will be corrected shortly in followup
commits to these components.

MFC after: 3 months


157288 30-Mar-2006 rwatson

Add IFF_NEEDSGIANT to kernel PPP support. I have no idea why this wasn't
here, but it should have been.

MFC after: 3 days


157155 26-Mar-2006 thompsa

Assert that the mbuf is not shared to ensure problems like the last commit are
not reintroduced.


157057 23-Mar-2006 rik

m_dup () packet not m_copypacket () since we will modify it. For more
details see PR kern/94448.

PR: kern/94448

Original patch: Eygene A. Ryabinkin <rea-fbsd at rea dot mbslab dot kiae dot ru>Final patch: thompsa@
Tested by: thompsa@, Eygene A. Ryabinkin

MFC after: 7 days


156948 21-Mar-2006 glebius

No direct call to carp_ifdetach() anymore. It is called by
event handler.

PR: kern/82908
Submitted by: Dan Lukes <dan obluda.cz>


156783 16-Mar-2006 emax

Add kqueue(2) support on if_tap(4) interfaces. While I'm here, replace
K&R style function declarations with ANSI style. Also fix endian bugs
accessing ioctl arguments that are passed by value.

PR: kern/93897
Submitted by: Vilmos Nebehaj < vili at huwico dot hu >
MFC after: 1 week


156751 15-Mar-2006 andre

Add link status descriptions and related structures for userland
applications.

Open[BGP|OSPF]D make use of this to determine the link status of
interfaces to make the right routing descisions.

Obtained from: OpenBSD
MFC after: 3 days


156750 15-Mar-2006 andre

- Fill in the correct rtm_index for RTM_ADD and RTM_CHANGE messages.

- Allow RTM_CHANGE to change a number of route flags as specified by
RTF_FMASK.

- The unused rtm_use field in struct rt_msghdr is redesignated as
rtm_fmask field to communicate route flag changes in RTM_CHANGE
messages from userland. The use count of a route was moved to
rtm_rmx a long time ago. For source code compatibility reasons
a define of rtm_use to rtm_fmask is provided.

These changes faciliate running of multiple cooperating routing
daemons at the same time without causing undesired interference.
Open[BGP|OSPF]D make use of these features to have IGP routes
override EGP ones.

Obtained from: OpenBSD (claudio@)
MFC after: 3 days


156495 09-Mar-2006 ru

Don't acquire a lock before calling vlan_unconfig().
This fixes a panic when doing "ifconfig ... -vlandev".

OK'ed by: glebius


156328 06-Mar-2006 thompsa

If we miss the LINK_UP event from the network interface then the bridge port
will remain in the disabled state until another link event happens in the
future (if at all). Add a timer to periodically check the interface state and
recover.

Reported by: Nik Lam <freebsdnik j2d.lam.net.au>
MFC after: 3 days


156246 03-Mar-2006 csjp

Unbreak byte counters when network interfaces are in monitor mode by
re-organizing the monitor return logic. We perform interface monitoring
checks after we have determined if the CRC is still on the packet, if
it is, m_adj() is called which will adjust the packet length. This
ensures that we are not including CRC lengths in the byte counters for
each packet.

Discussed with: andre, glebius


156238 03-Mar-2006 thompsa

Since we are using random ethernet addresses for the bridge, it is possible
that we might have address collisions, so make sure that this hardware address
isn't already in use on another bridge.

Submitted by: csjp
MFC after: 1 month


156235 03-Mar-2006 csjp

Slightly re-worked bpf(4) code associated with bridging: if we have a
destination interface as a member of our bridge or this is a unicast packet,
push it through the bpf(4) machinery.

For broadcast or multicast packets, don't bother with the bpf(4) because it will
be re-injected into ether_input. We do this before we pass the packets through
the pfil(9) framework, as it is possible that pfil(9) will drop the packet or
possibly modify it, making it very difficult to debug firewall issues on the
bridge.

Further, implemented IFF_MONITOR for bridge interfaces. This does much the same
thing that it does for regular network interfaces: it pushes the packet to any
bpf(4) peers and then returns. This bypasses all of the bridge machinery,
saving mutex acquisitions, list traversals, and other operations performed by
the bridging code.

This change to the bridging code is useful in situations where individuals use a
bridge to multiplex RX/TX signals from two interfaces, as is required by some
network taps for de-multiplexing links and transmitting the RX/TX signals
out through two separate interfaces. This behaviour is quite common for network
taps monitoring links, especially for certain manufacturers.

Reviewed by: thompsa
MFC after: 1 month
Sponsored by: Seccuris Labs


156096 28-Feb-2006 thompsa

Fix up the Bridge Identifier field in the BPDU packet.

- use the cu_bridge_id rather than the cu_rootid for the bridge address [1]
- the memcmp return value is not signed so the wrong interface may have been
selected
- fix up the calculation of sc_bridge_id

PR: kern/93909 [1]
MFC after: 3 days


156072 27-Feb-2006 wkoszek

This patch fixes a problem, which exists if you have IPSEC in your kernel
and want to have crypto support loaded as KLD. By moving zlib to separate
module and adding MODULE_DEPEND directives, it is possible to use such
configuration without complication. Otherwise, since IPSEC is linked with
zlib (just like crypto.ko) you'll get following error:

interface zlib.1 already present in the KLD 'kernel'!

Approved by: cognet (mentor)


155986 24-Feb-2006 yar

Don't to forget to unlock the rwlock on trunk before destroying it.
This should fix panic on "kldunload if_vlan" while vlanX are still there.

Reviewed by: glebius


155708 15-Feb-2006 glebius

Fix build.


155669 14-Feb-2006 glebius

- Introduce ifmedia_baudrate(), which returns correct baudrate of the
given media status. [1]
- Utilize ifmedia_baudrate() in miibus_statchg() to update ifp->if_baudrate.

Obtained from: NetBSD [1]


155509 10-Feb-2006 emaste

Bump the MODULE_VERSION for HEAD, as the vlan(4) API is different in
RELENG_6, and would require a lower version number.

Requested by: glebius
Approved by: rwatson (mentor)


155502 10-Feb-2006 yar

Avoid frobbing IFF_UP at any cost (which is close to
zero in this case.) A kernel driver has IFF_DRV_RUNNING
at its full disposal while IFF_UP may be toggled only by
humans or their daemonic deputies from the userland.

MFC after: 3 days


155493 09-Feb-2006 emaste

Add a MODULE_VERSION so that other modules (perhaps third-party) can
depend on this one.

Approved by: rwatson (mentor)


155442 07-Feb-2006 qingli

The code in rn_walktree_from() that checks if we backed up too far
did not stop at the right node. Change the backtracking check from
smaller-than to smaller-or-equal to prevent this from happening.
While here fix one additional problem where the insertion of the
default route traversed the entire tree.

PR: kern/38752
Submitted by: qingli (before I became committer)
Reviewed by: andre
MFC after: 3 days


155440 07-Feb-2006 qingli

Remove two unnecessary type casts, of which both had a typo in
it anyways.

Approved by: andre
MFC after: 3 days


155268 03-Feb-2006 oleg

Properly initialize args structure before passing it to ipfw_chk(): having
uninitialized args.inp is unhealthy for uid/gid/jail ipfw rules.

PR: kern/92589
Approved by: glebius (mentor)
MFC after: 1 week


155231 02-Feb-2006 glebius

In vlan_config() first call vlan_inithash(), then lock mutex, because
vlan_inithash() calls malloc(M_WAITOK).


155226 02-Feb-2006 csjp

define lock.h before rwlock.h for DEBUG_LOCKS


155224 02-Feb-2006 ps

Implement SIOCGIFCONF for 32bit binaries.


155221 02-Feb-2006 csjp

Use PFIL_HOOKED macros in if_bridge and pass the right argument to
rw_assert. This un-breaks the build.

Submitted by: Kostik Belousov
Pointy hat to: csjp


155201 02-Feb-2006 csjp

Somewhat re-factor the read/write locking mechanism associated with the packet
filtering mechanisms to use the new rwlock(9) locking API:

- Drop the variables stored in the phil_head structure which were specific to
conditions and the home rolled read/write locking mechanism.
- Drop some includes which were used for condition variables
- Drop the inline functions, and convert them to macros. Also, move these
macros into pfil.h
- Move pfil list locking macros intp phil.h as well
- Rename ph_busy_count to ph_nhooks. This variable will represent the number
of IN/OUT hooks registered with the pfil head structure
- Define PFIL_HOOKED macro which evaluates to true if there are any
hooks to be ran by pfil_run_hooks
- In the IP/IP6 stacks, change the ph_busy_count comparison to use the new
PFIL_HOOKED macro.
- Drop optimization in pfil_run_hooks which checks to see if there are any
hooks to be ran, and returns if not. This check is already performed by the
IP stacks when they call:

if (!PFIL_HOOKED(ph))
goto skip_hooks;

- Drop in assertion which makes sure that the number of hooks never drops
below 0 for good measure. This in theory should never happen, and if it
does than there are problems somewhere
- Drop special logic around PFIL_WAITOK because rw_wlock(9) does not sleep
- Drop variables which support home rolled read/write locking mechanism from
the IPFW firewall chain structure.
- Swap out the read/write firewall chain lock internal to use the rwlock(9)
API instead of our home rolled version
- Convert the inlined functions to macros

Reviewed by: mlaier, andre, glebius
Thanks to: jhb for the new locking API


155143 31-Jan-2006 thompsa

Fix two bugs with the bridge

- code expects memcmp() to return a signed value, our memcmp() returns 0 if
args are equal and > 0 if not.

- It's possible to hijack interface for static entry. If bridge recieves
packet from interface marked as learning it will replace the bridge_rtnode
entry for the source address even if such entry marked as static.

Submitted by: Gleb Kurtsov <k-gleb yandex.ru>
MFC after: 3 days


155114 31-Jan-2006 yar

Set IFF_BROADCAST and IFF_MULTICAST on vlan interfaces from the
beginning and simply refuse to attach to a parent without either
flag.

Our network stack cannot handle well IFF_BROADCAST or IFF_MULTICAST
on an interface changing on the fly. E.g., IP will or won't assign
a broadcast address to an interface and join the all-hosts multicast
group on it depending on its IFF_BROADCAST and IFF_MULTICAST settings.
Should the flags alter later, IP will miss the change and keep using
bogus settings. This can lead to evil things like supplying an
invalid broadcast address or trying to leave a multicast group that
hasn't been joined. So just avoid touching the flags since an
interface was created. This has no practical purpose.

Discussed with: -net, glebius, oleg
MFC after: 1 week


155051 30-Jan-2006 glebius

Merge the //depot/user/yar/vlan branch into CVS. It contains some collective
work by yar, thompsa and myself. The checksum offloading part also involves
work done by Mihail Balikov.

The most important changes:

o Instead of global linked list of all vlan softc use a per-trunk
hash. The size of hash is dynamically adjusted, depending on
number of entries. This changes struct ifnet, replacing counter
of vlans with a pointer to trunk structure. This change is an
improvement for setups with big number of VLANs, several interfaces
and several CPUs. It is a small regression for a setup with a single
VLAN interface.
An alternative to dynamic hash is a per-trunk static array with
4096 entries, which is a compile time option - VLAN_ARRAY. In my
experiments the array is not an improvement, probably because such
a big trunk structure doesn't fit into CPU cache.
o Introduce an UMA zone for VLAN tags. Since drivers depend on it,
the zone is declared in kern_mbuf.c, not in optional vlan(4) driver.
This change is a big improvement for any setup utilizing vlan(4).
o Use rwlock(9) instead of mutex(9) for locking. We are the first
ones to do this! :)
o Some drivers can do hardware VLAN tagging + hardware checksum
offloading. Add an infrastructure for this. Whenever vlan(4) is
attached to a parent or parent configuration is changed, the flags
on vlan(4) interface are updated.

In collaboration with: yar, thompsa
In collaboration with: Mihail Balikov <mihail.balikov interbgc.com>


155037 30-Jan-2006 glebius

Add some initial locking to gif(4). It doesn't covers the whole driver,
however IPv4-in-IPv4 tunnels are now stable on SMP. Details:

- Add per-softc mutex.
- Hold the mutex on output.

The main problem was the rtentry, placed in softc. It could be
freed by ip_output(). Meanwhile, another thread being in
in_gif_output() can read and write this rtentry.

Reported by: many
Tested by: Alexander Shiryaev <aixp mail.ru>


154806 25-Jan-2006 cperciva

Make sure buffers in if_bridge are fully initialized before copying
them to userland.

Security: FreeBSD-SA-06:06.kmem


154708 23-Jan-2006 yar

Be consistent in checking ifa->ifa_addr for NULL.

Found by: Coverity Prevent (tm)
MFC after: 3 days


154625 21-Jan-2006 bz

Fix stack corruptions on amd64.

Vararg functions have a different calling convention than regular
functions on amd64. Casting a varag function to a regular one to
match the function pointer declaration will hide the varargs from
the caller and we will end up with an incorrectly setup stack.

Entirely remove the varargs from these functions and change the
functions to match the declaration of the function pointers.
Remove the now unnecessary casts.

Lots of explanations and help from: peter
Reviewed by: peter
PR: amd64/89261
MFC after: 6 days


154518 18-Jan-2006 andre

Return mbuf pointer or NULL from ip_fastforward() as the mbuf pointer
may have changed by m_pullup() during fastforward processing.

While this is a bug it is actually never triggered in real world
situations and it is not remotely exploitable.

Found by: Coverity Prevent(tm)
Coverity ID: CID780
Sponsored by: TCP/IP Optimization Fundraise 2005


154336 14-Jan-2006 thompsa

Add code that clears certain capabilities from the member interface, these are
restored when its removed from the bridge.

At the moment we only clear IFCAP_TXCSUM. Since a locally generated packet on
the bridge may be sent out any one or more interfaces it cant be assumed that
every card does hardware csums. Most bridges don't generate a lot of traffic
themselves so turning off offloading won't hurt, bridged packets are
unaffected.

Tested by: Bruce Walker (bmw borderware.com)
MFC after: 5 days


154318 13-Jan-2006 rwatson

Check the right ifnet pointer to see if if_alloc() failed or not in
ef_clone(); we were testing the original ifnet, not the one allocated.

When aborting ef_clone() due to if_alloc() failing, free the allocated
efnet structure rather than leaking it.

Noticed by: Coverity Prevent analysis tool
MFC after: 3 days


154317 13-Jan-2006 rwatson

When freeing the chain of if_ef devices on an aborted load, use
SLIST_FOREACH_SAFE() rather than SLIST_FOREACH(), as elements are
freed on each iteration of the loop. This prevents use-after-free.

Noticed by: Coverity Prevent analysis tool
MFC after: 3 days


154209 11-Jan-2006 brooks

Get rid of the bogus IFP2FC() macro and use IFP2FWC(). IFP2FC()
attempted to cast a struct ifnet to a struct fw_com which resulted in
data corruption.

PR: kern/91307
Submitted by: Alex Semenyaka <alex at semenyaka do ru>
MFC After: 6 days


154023 04-Jan-2006 harti

Add a new leaf to the net.link.generic.ifdata.%d sysctl to retrieve
the name and unit number assigned by the driver. This is needed by
SNMP to find interfaces after they have been renamed.

MFC after: 4 weeks


153996 03-Jan-2006 jkim

Correctly check the filter length. I committed the wrong version.
Pointy hat to me.


153995 03-Jan-2006 jkim

- Explicitly validate an empty filter to match bpf_filter() comment[1].
- Do not use BPF JIT compiler for an empty filter.

[1] Pointed out by: darrenr


153979 02-Jan-2006 thompsa

Fix a brain-o in the last commit, the conditional was always false.


153978 02-Jan-2006 thompsa

Reorganise bridge_rtupdate slightly to reduce duplication.


153977 02-Jan-2006 thompsa

Reset the route expiry time on each update rather than always letting them get
GC'd and recreated.


153976 02-Jan-2006 thompsa

It is better to use time_uptime here since it is monotonic.

Pointed out by: glebius


153967 02-Jan-2006 thompsa

Minor whitespace cleanup.


153965 02-Jan-2006 thompsa

Read time_second directly rather than calling getmicrotime().

Obtained from: DragonflyBSD


153831 29-Dec-2005 thompsa

When pfil(9) is enabled the bridge only considers ETHERTYPE_ARP, ETHERTYPE_IP and
ETHERTYPE_IPV6 frames. Change this to be a sysctl knob so that is able to still
bridge non-IP packets if desired.

Also return early if all pfil_* sysctls are turned off, the user obviously does
not want to filter on the bridge.


153723 25-Dec-2005 sam

add a sysctl to turn debug msgs on/off when built with IFMEDIA_DEBUG


153640 22-Dec-2005 oleg

1) remove useless check of loop_copy - corresponding code was removed in
rev. 1.70 five years ago.
2) convert loop_copy to "non-negative" flag

Approved by: glebius (mentor)
MFC after: 2 weeks


153621 21-Dec-2005 thompsa

Add RFC 3378 EtherIP support. This change makes it possible to add gif
interfaces to bridges, which will then send and receive IP protocol 97 packets.
Packets are Ethernet frames with an EtherIP header prepended.

Obtained from: NetBSD
MFC after: 2 weeks


153606 21-Dec-2005 thompsa

As of r1.21 all broadcast packets are reprocessed by ether_input as arriving on
the bridge, this caused these packets to show up twice via bpf. Do not process
them twice with BPF_TAP.

MFC after: 3 days


153512 18-Dec-2005 glebius

- Fix VLAN_INPUT_TAG() macro, so that it doesn't touch mtag in
case if memory allocation failed.
- Remove fourth argument from VLAN_INPUT_TAG(), that was used
incorrectly in almost all drivers. Indicate failure with
mbuf value of NULL.

In collaboration with: yongari, ru, sam


153498 17-Dec-2005 thompsa

Use M_ZERO for the bridge_iflist to ensure there are no unexpected suprises.


153497 17-Dec-2005 thompsa

Minor whitespace cleanup.


153494 17-Dec-2005 thompsa

Change from a callback in if_ethersubr to using EVENTHANDLER in order to detach
span ports when they disappear. The span port does not have a pointer to the
softc so revert r1.31 and bring back the softc linked-list.

MFC after: 2 weeks


153458 15-Dec-2005 thompsa

It is not safe to use m_copypacket() here as the returned mbuf is readonly,
change to m_dup and keep the alignment on the layer3 header.

MFC after: 1 week


153408 14-Dec-2005 thompsa

Add support for creating span ports so that one can snoop bridged traffic
from another interface/machine/network.

Obtained from: OpenBSD
MFC after: 2 weeks


153221 08-Dec-2005 jkim

Do not accept an empty bpf program.


153213 07-Dec-2005 jkim

Add BPF Just-In-Time compiler support for ng_bpf(4).

The sysctl is changed from net.bpf.jitter.enable to net.bpf_jitter.enable
and this controls both bpf(4) and ng_bpf(4) now.


153157 06-Dec-2005 jkim

s/M_WAITOK/M_NOWAIT/ while mutex is held.

Pointed out by: csjp


153151 06-Dec-2005 jkim

Add experimental BPF Just-In-Time compiler for amd64 and i386.

Use the following kernel configuration option to enable:

options BPF_JITTER

If you want to use bpf_filter() instead (e. g., debugging), do:

sysctl net.bpf.jitter.enable=0

to turn it off.

Currently BIOCSETWF and bpf_mtap2() are unsupported, and bpf_mtap() is
partially supported because 1) no need, 2) avoid expensive m_copydata(9).

Obtained from: WinPcap 3.1 (for i386)


153084 04-Dec-2005 ru

Fix -Wundef from compiling the amd64 LINT.


153072 04-Dec-2005 ru

Fix -Wundef.


152932 29-Nov-2005 thompsa

The bridge is capable of sending broadcast packets so enable IFF_BROADCAST

Requested by: des


152882 28-Nov-2005 glebius

Take if_baudrate from the parent. This fixes problem with SNMP
daemons reporting zero speed for vlan(4) interfaces.


152779 24-Nov-2005 ru

Fix the following bugs:

- In ifc_name2unit(), disallow leading zeroes in a unit.

Exploit: ifconfig lo01 create

- In ifc_name2unit(), properly handle overflows. Otherwise,
either of two local panic()'s can occur, either because
no interface with such a name could be found after it was
successfully created, or because the code will bogusly
assume that it's a wildcard (unit < 0 due to overflow).

Exploit: ifconfig lo<overflowed_integer> create

- Previous revision made the following sequence trigger
a KASSERT() failure in queue(3):

Exploit: ifconfig lo0 destroy; ifconfig lo0 destroy

This is because IFC_IFLIST_REMOVE() is always called
before ifc->ifc_destroy() has been run, not accounting
for the fact that the latter can fail and leave the
interface operating (like is the case for "lo0").
So we ended up calling LIST_REMOVE() twice. We cannot
defer IFC_IFLIST_REMOVE() until after a call to
ifc->ifc_destroy() because the ifnet may have been
removed and its memory has been freed, so recover from
this by re-inserting the ifnet in the cloned interfaces
list if ifc->ifc_destroy() indicates a failure.


152583 18-Nov-2005 andre

Purge layer specific mbuf flags on layer crossings to avoid confusing
upper or lower layers.

Sponsored by: TCP/IP Optimization Fundraise 2005


152393 13-Nov-2005 thompsa

Fix a second missed case where the refcount is not decremented.

MFC after: 3 days


152392 13-Nov-2005 thompsa

Fix a mbuf and refcnt leak in the broadcast code.

If the packet is rejected from pfil(9) then continue the loop rather than
returning, this means that we can still try to send it out the remaining
interfaces but more importantly the mbuf is freed and refcount decremented on
exit.


152315 11-Nov-2005 ru

- Store pointer to the link-level address right in "struct ifnet"
rather than in ifindex_table[]; all (except one) accesses are
through ifp anyway. IF_LLADDR() works faster, and all (except
one) ifaddr_byindex() users were converted to use ifp->if_addr.

- Stop storing a (pointer to) Ethernet address in "struct arpcom",
and drop the IFP2ENADDR() macro; all users have been converted
to use IF_LLADDR() instead.


152312 11-Nov-2005 ru

Use the more appropriate ifnet_byindex() instead of ifaddr_byindex().


152308 11-Nov-2005 glebius

Force this interface to be RUNNING.


152296 11-Nov-2005 ru

- Make IFP2ENADDR() a pointer to IF_LLADDR() rather than another
copy of Ethernet address.

- Change iso88025_ifattach() and fddi_ifattach() to accept MAC
address as an argument, similar to ether_ifattach(), to make
this work.


152242 09-Nov-2005 ru

Use sparse initializers for "struct domain" and "struct protosw",
so they are easier to follow for the human being.


152209 08-Nov-2005 thompsa

Move the cloned interface list management in to if_clone. For some drivers the
softc lists and associated mutex are now unused so these have been removed.

Calling if_clone_detach() will now destroy all the cloned interfaces for the
driver and in most cases is all thats needed to unload.

Idea by: brooks
Reviewed by: brooks


152139 06-Nov-2005 glebius

- Do not raise IFF_DRV_OACTIVE flag in vlan_start, because this
can lead to stalled interface
- Explain this fact in a comment.

Reviewed by: rwatson, thompsa, yar


151967 02-Nov-2005 andre

Retire MT_HEADER mbuf type and change its users to use MT_DATA.

Having an additional MT_HEADER mbuf type is superfluous and redundant
as nothing depends on it. It only adds a layer of confusion. The
distinction between header mbuf's and data mbuf's is solely done
through the m->m_flags M_PKTHDR flag.

Non-native code is not changed in this commit. For compatibility
MT_HEADER is mapped to MT_DATA.

Sponsored by: TCP/IP Optimization Fundraise 2005


151594 23-Oct-2005 thompsa

If we have been called from ether_ifdetach() then do not try and clear the
promisc flag from the member interface, this is a no-op anyway since the
interface is disappearing. The driver may have already released
its resources such as miibus and this is likely to panic the kernel.

Submitted and tested by: Wojciech A. Koszek
MFC after: 2 weeks


151569 23-Oct-2005 csjp

Before we export network interface data through the ifmibdata structure,
OR the flags bits with the driver managed status flags. This fixes an
issue where RUNNING flags would not be reported to processes, which
conflicts with the flags information provided by ifconfig(8).


151387 16-Oct-2005 phk

Use new (inline) functions for calls into driver.


151345 14-Oct-2005 thompsa

Make four more functions static that were missed in the last commit.


151313 14-Oct-2005 thompsa

Change most of the bridge and stp funtions to static. This has highlighted
that the following funtions are not used, wrap in '#ifdef noused' for the
moment.

bstp_enable_change_detection
bstp_disable_change_detection
bstp_set_bridge_priority
bstp_set_port_priority
bstp_set_path_cost


151305 14-Oct-2005 thompsa

Further clean up the bridge hooks in if_ethersubr.c and ng_ether.c

- move the function pointer definitions to if_bridgevar.h
- move most of the logic to the new BRIDGE_INPUT and BRIDGE_OUTPUT macros
- remove unneeded functions from if_bridgevar.h and sort a little.


151301 13-Oct-2005 thompsa

From 101 ways to panic your kernel.

Use bridge_ifdetach() to notify the bridge that a member has been detached. The
bridge can then remove it from its interface list and not try to send out via a
dead pointer.


151298 13-Oct-2005 julian

Consolidate two adjacent conditional blocks
I actually believe the code in question should be elsewhere (in the preceding
function).

MFC after: 1 week


151288 13-Oct-2005 ru

Remove a stale comment.


151282 13-Oct-2005 thompsa

Clean up the if_bridge hooks a bit in if_ethersubr.c and ng_ether.c, move
the broadcast/multicast test to bridge_input().

Requested by: glebius


151266 12-Oct-2005 thompsa

Change the reference counting to count the number of cloned interfaces for each
cloner. This ensures that ifc->ifc_units is not prematurely freed in
if_clone_detach() before the clones are destroyed, resulting in memory modified
after free. This could be triggered with if_vlan.

Assert that all cloners have been destroyed when freeing the memory.

Change all simple cloners to destroy their clones with ifc_simple_destroy() on
module unload so the reference count is properly updated. This also cleans up
the interface destroy routines and allows future optimisation.

Discussed with: brooks, pjd, -current
Reviewed by: brooks


151265 12-Oct-2005 imp

Be pedantic here: We're converting from network byte order to host
byte order in these cases. This is a nop in terms of the generated
code, but is logically incorrect.

PR: 73852


151227 11-Oct-2005 thompsa

Do not unconditionally set a spanning tree port to forwarding as the link may be
down when we attach. We wont get updated until a linkstate change happens.

Go via bstp_ifupdstatus() which checks the media status first.


151063 07-Oct-2005 glebius

A deja vu of:

http://lists.freebsd.org/pipermail/cvs-src/2004-October/033496.html

The same problem applies to if_bridge(4), too.

- Copy-and-paste the if_bridge(4) related block from
if_ethersubr.c to ng_ether.c
- Add XXXs, so that copy-and-paste would be noticed by
any future editors of this code.
- Also add XXXs near if_bridge(4) declarations.

Silence from: thompsa


150988 06-Oct-2005 avatar

Fixing a boot time panic(when if_fwip is compiled into kernel) by renaming
module name to something that wouldn't conflict with
sys/dev/firewire/firewire.c.

Submitted by: Cai, Quanqing <caiquanqing at gmail dot com>
PR: kern/82727
MFC after: 3 days


150987 06-Oct-2005 thompsa

Fix KASSERT function name in ether_output, use __func__ while I am here.


150968 05-Oct-2005 glebius

- Don't pollute opt_global.h with DEVICE_POLLING and introduce
opt_device_polling.h
- Include opt_device_polling.h into appropriate files.
- Embrace with HAVE_KERNEL_OPTION_HEADERS the include in the files that
can be compiled as loadable modules.

Reviewed by: bde


150929 04-Oct-2005 csjp

Protect PID initializations for statistics by the bpf descriptor
locks. Also while we are here, protect the bpf descriptor during
knlist_remove{add} operations.

Discussed with: rwatson


150903 04-Oct-2005 rwatson

Rename net.isr.enable to net.isr.dispatch.

No compatibility code is provided, as this will be the production name
as of 6.0.

MFC after: 3 days
Requested by: scottl


150846 03-Oct-2005 yar

Improve handling flags that must be propagated
to the parent interface, such as IFF_PROMISC and
IFF_ALLMULTI. In addition, vlan(4) gains ability
to migrate from one parent to another w/o losing
its own flags.

PR: kern/81978
MFC after: 2 weeks


150845 03-Oct-2005 yar

Clean up consistency checks in if_setflag():
. use KASSERT for all checks so that the source of an error can be detected;
. use __func__ instead of spelling function name each time;
. fix a typo.


150844 03-Oct-2005 yar

Log a message about entering or leaving permanently promiscuous mode,
as it is done for usual promiscuous mode already. This info is important
because promiscuous mode in the hands of a malicious party can jeopardize
the whole network.


150837 02-Oct-2005 thompsa

Do not packet filter in the bridge_start() routine, locally generated packets
are already filtered by the higher layers.

Approved by: mlaier (mentor)
MFC after: 3 days


150789 01-Oct-2005 glebius

Big polling(4) cleanup.

o Axe poll in trap.

o Axe IFF_POLLING flag from if_flags.

o Rework revision 1.21 (Giant removal), in such a way that
poll_mtx is not dropped during call to polling handler.
This fixes problem with idle polling.

o Make registration and deregistration from polling in a
functional way, insted of next tick/interrupt.

o Obsolete kern.polling.enable. Polling is turned on/off
with ifconfig.

Detailed kern_poll.c changes:
- Remove polling handler flags, introduced in 1.21. The are not
needed now.
- Forget and do not check if_flags, if_capenable and if_drv_flags.
- Call all registered polling handlers unconditionally.
- Do not drop poll_mtx, when entering polling handlers.
- In ether_poll() NET_LOCK_GIANT prior to locking poll_mtx.
- In netisr_poll() axe the block, where polling code asks drivers
to unregister.
- In netisr_poll() and ether_poll() do polling always, if any
handlers are present.
- In ether_poll_[de]register() remove a lot of error hiding code. Assert
that arguments are correct, instead.
- In ether_poll_[de]register() use standard return values in case of
error or success.
- Introduce poll_switch() that is a sysctl handler for kern.polling.enable.
poll_switch() goes through interface list and enabled/disables polling.
A message that kern.polling.enable is deprecated is printed.

Detailed driver changes:
- On attach driver announces IFCAP_POLLING in if_capabilities, but
not in if_capenable.
- On detach driver calls ether_poll_deregister() if polling is enabled.
- In polling handler driver obtains its lock and checks IFF_DRV_RUNNING
flag. If there is no, then unlocks and returns.
- In ioctl handler driver checks for IFCAP_POLLING flag requested to
be set or cleared. Driver first calls ether_poll_[de]register(), then
obtains driver lock and [dis/en]ables interrupts.
- In interrupt handler driver checks IFCAP_POLLING flag in if_capenable.
If present, then returns.This is important to protect from spurious
interrupts.

Reviewed by: ru, sam, jhb


150636 27-Sep-2005 mlaier

Remove bridge(4) from the tree. if_bridge(4) is a full functional
replacement and has additional features which make it superior.

Discussed on: -arch
Reviewed by: thompsa
X-MFC-after: never (RELENG_6 as transition period)


150444 22-Sep-2005 thompsa

Fix an alignment panic my preserving the 2byte padding (ETHER_ALIGN) on our
copied mbuf, which keeps the IP header 32-bit aligned. This copied mbuf is
reinjected back into ether_input and off to the IP routines.

Reported and tested by: Peter van Dijk
Approved by: mlaier (mentor)
MFC after: 3 days


150414 21-Sep-2005 glebius

Several fixes to rt_setgate(), that fix problems with route changing:

- Rearrange code so that in a case of failure the affected
route is not changed. Otherwise, a bogus rtentry will be
left and later rt_check() can recurse on its lock. [1]
- Remove comment about protocol cloning.
- Fix two places where rtentry mutex was recursed on, because
accessed via two different pointers, that were actually pointing
to the same rtentry in some cases. [1]
- Return EADDRINUSE instead of bogus EDQUOT, in case when gateway
uses the same route. [2]

Reported & tested by: ps, Andrej Zverev <az inec.ru> [1]
PR: kern/64090 [2]


150351 19-Sep-2005 andre

Use monotonic 'time_uptime' instead of 'time_second' as timebase
for rt->rt_rmx.rmx_expire.


150349 19-Sep-2005 andre

Use monotonic time_uptime instead of 'time_second' as timebase
for timeouts.


150331 19-Sep-2005 glebius

Drop current rtentry lock before calling rt_getifa(). This fixes a LOR
and a possible recursive use of rtentry mutex.

PR: kern/69356
Reviewed by: sam


150296 18-Sep-2005 rwatson

Take a first cut at cleaning up ifnet removal and multicast socket
panics, which occur when stale ifnet pointers are left in struct
moptions hung off of inpcbs:

- Add in_ifdetach(), which matches in6_ifdetach(), and allows the
protocol to perform early tear-down on the interface early in
if_detach().

- Annotate that if_detach() needs careful consideration.

- Remove calls to in_pcbpurgeif0() in the handling of SIOCDIFADDR --
this is not the place to detect interface removal! This also
removes what is basically a nasty (and now unnecessary) hack.

- Invoke in_pcbpurgeif0() from in_ifdetach(), in both raw and UDP
IPv4 sockets.

It is now possible to run the msocket_ifnet_remove regression test
using HEAD without panicking.

MFC after: 3 days


150232 16-Sep-2005 ru

The arguments to printf() were swapped.


150219 16-Sep-2005 yar

Do assorted nitpicking in diagnostics while I'm here:
- Use __func__ consistently instead of copying function name
to message strings. Code tends to migrate around source files.
- DIAGNOSTIC is for information, INVARIANTS is for panics.


150217 16-Sep-2005 yar

It's nice to have relevant comments both in if {} and else {},
not in just one of them.


150216 16-Sep-2005 yar

Test the new M_VLANTAG packet flag before calling
m_tag_locate(). This adds little overhead of a simple
bitwise operation in case hardware VLAN acceleration
is on, yet saves the more expensive function call if
the acceleration is off.

Reviewed by: ru, glebius
X-MFC-after: 6.0


150135 14-Sep-2005 andre

Undo a tad little optimization to bpf_mtap() introduced in rev. 1.95
which broke the correct handling of the BIOCGSEESENT flag in the bpf
listener.

PR: kern/56441
Submitted by: <vys at renet.ru>
MFC after: 3 days


150130 14-Sep-2005 andre

Remove bogous semicolons at the end of the definitions of
'do { ... } while (0)' macros.

PR: kern/83088
Sumbitted by: <antoine.brodin at laposte.net>


150063 12-Sep-2005 rwatson

In netkqfilter(), return EINVAL instead of 1 (EPERM) when a filter type
is requested on a network interface file descriptor that is non-applicable.

MFC after: 3 days


149993 11-Sep-2005 rodrigc

Forward declare z_errmsg with static linkage since it is defined
with static linkage later in the file. Eliminates GCC 4.0 error.


149943 10-Sep-2005 csjp

Protect interface and address lists using the appropriate mutex. These
locks were not aquired because the user buffers were not wired, thus it was
possible that that SYSCTL_OUT could sleep, causing a number of different
problems such as lock ordering issues and dead locks.

-Wire user supplied buffer to ensure SYSCTL_OUT will not sleep.
-Pickup ifnet locks to protect the list.
-Where applicable pickup address locks.
-Pickup radix node head locks.
-Remove splnet stubs
-Remove various comments about locking here, because they are no
longer needed.

It is the hope that these changes will make sysctl_rtsock MP safe.

MFC after: 3 weeks


149848 07-Sep-2005 obrien

Forward declaring static variables as extern is invalid ISO-C. Now that
GCC can properly handle forward static declarations, do this properly.


149829 06-Sep-2005 thompsa

Add support for multicast to the bridge and allow inet6 addresses to be
assigned to the interface.

IPv6 auto-configuration is disabled. An IPv6 link-local address has a
link-local scope within one link, the spec is unclear for the bridge case and
it may cause scope violation.

An address can be assigned in the usual way;
ifconfig bridge0 inet6 xxxx:...

Tested by: bmah
Reviewed by: ume (netinet6)
Approved by: mlaier (mentor)
MFC after: 1 week


149809 05-Sep-2005 csjp

Instead of caching the PID which opened the bpf descriptor, continuously
refresh the PID which has the descriptor open. The PID is refreshed in various
operations like ioctl(2), kevent(2) or poll(2). This produces more accurate
information about current bpf consumers. While we are here remove the bd_pcomm
member of the bpf stats structure because now that we have an accurate PID we
can lookup the via the kern.proc.pid sysctl variable. This is the trick that
NetBSD decided to use to deal with this issue.

Special care needs to be taken when MFC'ing this change, as we have made a
change to the bpf stats structure. What will end up happening is we will leave
the pcomm structure but just mark it as being un-used. This way we keep the ABI
in tact.

MFC after: 1 month
Discussed with: Rui Paulo < rpaulo at NetBSD dot org >


149782 04-Sep-2005 sam

reclaim sbuf and clear lock on error in ifconf

Submitted by: Ted Unangst
Reviewed by: rwatson
MFC after: 3 days


149662 31-Aug-2005 yar

Use VLAN_TAG_VALUE() not only to read a dot1q tag
value from an m_tag, but also to set it. This reduces
complex code duplication and improves its readability.

Alas, we shouldn't rename the macro to VLAN_TAG_LVALUE()
globally because that would cause pain for kernel module
port maintainers and vendors using FreeBSD as their codebase.
Added a clarifying comment instead.

Discussed with: ru, glebius
X-MFC-After: 6.0-RELEASE (MFC is good just to reduce the diff)


149619 30-Aug-2005 glebius

Fix fallout from revision 1.77, mark outgoing packets with M_VLANTAG flag.

PR: kern/80646
Reviewed by: yar
MFC after: 3 days


149522 27-Aug-2005 thompsa

Fix a panic in softclock() if the interface is destroyed with a bpf consumer
attached.

This is caused by bpf_detachd clearing IFF_PROMISC on the interface which does
a SIOCSIFFLAGS ioctl. The problem here is that while the interface has been
stopped, IFF_UP has not been cleared so IFF_UP != IFF_DRV_RUNNING, this causes
the ioctl function to init() the interface which resets the callouts.

The destroy then completes and frees the softc but softclock will panic on a
dead callout pointer.

Ensure ifp->if_flags matches reality by clearing IFF_UP when we destroy.

Silence from: rwatson
Approved by: mlaier (mentor)
MFC after: 3 days


149452 25-Aug-2005 rwatson

De-spl parts of the routing socket code now generally protected
through locking; leave some spl references around code where there
are open questions about global variable references. Also, add
an XXX regarding locking in sysctl.

MFC after: 3 days


149396 23-Aug-2005 thompsa

The mtu check in bridge_enqueue is bogus as the maximum Ethernet frame is
actually 1514, so comparing the mbuf length which includes the Ethernet header
to the interface MTU is wrong.

The check was a little over the top so just remove it.

Approved by: mlaier (mentor)
MFC after: 3 days


149389 23-Aug-2005 mlaier

Don't loop back packets that have been routed by pf. This fixes an endless
loop where the same packet is sent over and over again.

Obtained from: OpenBSD
Reported by: Sergey Lapin
Tested by: Sergey Lapin
MFC after: 7 days


149376 22-Aug-2005 csjp

Introduce two new ioctl(2) commands, BIOCLOCK and BIOCSETWF. These commands
enhance the security of bpf(4) by further relinquishing the privilege of
the bpf(4) consumer (assuming the ioctl commands are being implemented).

Once BIOCLOCK is executed, the device becomes locked which prevents the
execution of ioctl(2) commands which can change the underly parameters of the
bpf(4) device. An example might be the setting of bpf(4) filter programs or
attaching to different network interfaces.

BIOCSETWF can be used to set write filters for outgoing packets. Currently if
a bpf(4) consumer is compromised, the bpf(4) descriptor can essentially be used
as a raw socket, regardless of consumer's UID. Write filters give users the
ability to constrain which packets can be sent through the bpf(4) descriptor.

These features are currently implemented by a couple programs which came from
OpenBSD, such as the new dhclient and pflogd.

-Modify bpf_setf(9) to accept a "cmd" parameter. This will be used to specify
whether a read or write filter is to be set.
-Add a bpf(4) filter program as a parameter to bpf_movein(9) as we will run the
filter program on the mbuf data once we move the packet in from user-space.
-Rather than execute two uiomove operations, (one for the link header and the
other for the packet data), execute one and manually copy the linker header
into the sockaddr structure via bcopy.
-Restructure bpf_setf to compensate for write filters, as well as read.
-Adjust bpf(4) stats structures to include a bd_locked member.

It should be noted that the FreeBSD and OpenBSD implementations differ a bit in
the sense that we unconditionally enforce the lock, where OpenBSD enforces it
only if the calling credential is not root.

Idea from: OpenBSD
Reviewed by: mlaier


149255 18-Aug-2005 csjp

Add missing braces around bpf_filter which were missed when I
merged the bpfstat code.

Pointed out by: iedowse
Pointy hat to: csjp
MFC after: 3 days


149253 18-Aug-2005 thompsa

Mark the callouts as MPSAFE as if_bridge has been giant-free since day 1.

Use the SMP friendly callout_init_mtx() while we are here.

Approved by: mlaier (mentor)
MFC after: 3 days


149243 18-Aug-2005 brooks

When we started calling if_findindex() from if_alloc() with an empty
struct ifnet most of if_findindex() become a complex no-op. Remove it
and replace it with a corrected version of the four line for loop it
devolved to plus some error handling. This should probably be replaced
with subr_unit at some point.

Switch from checking ifaddr_byindex to ifnet_byindex when looking for
empty indexes. Since we're doing this from if_alloc/if_free, we can
only be sure that ifnet_byindex will be correct. This fixes panics when
loading the ef(4) module. The panics were caused by the fact that
if_alloc was called four time before if_attach was called and thus
ifaddr_byindex was not set and the same unit was allocated again. This
in turn caused the first if_attach to fail because the ifp was not the
one in ifnet_byindex(ifp->if_index).

Reported by: "Wojciech A. Koszek" <dunstan at freebsd dot czest dot pl>
PR: kern/84987
MFC After: 1 day


149141 16-Aug-2005 brooks

- Move IF_ADDR_LOCK_DESTROY(ifp) from if_free to if_free_type.
- Add a note that additions should be made to if_free_type and not
if_free to help avoid this in the future.

This apparently fixes a use after free in if_bridge and may fix bugs
in other direct if_free_type consumers.

Reported by: thompsa


149110 15-Aug-2005 brooks

Vlan interfaces change their type after ether_ifattach() so we needs to
use if_free_type(ifp, IFT_ETHER) to delete them and stop leaking struct
arpcoms.

Reported by: thompsa
MFC After: 3 days


149065 15-Aug-2005 thompsa

Ensure that we are holding the lock when initialising the bridge interface. We
could initialise while unlocked if the bridge is not up when setting the inet
address, ether_ioctl() would call bridge_init.

Change it so bridge_init is always called unlocked and then locks before
calling bstp_initialization().

Reported by: Michal Mertl
Approved by: mlaier (mentor)
MFC after: 3 days


149064 15-Aug-2005 thompsa

Ensure that we are holding the lock when initialising the bridge interface. We
could initialise while unlocked if the bridge is not up when setting the inet
address, ether_ioctl() would call bridge_init.

Change it so bridge_init is always called unlocked and then locks before
calling bstp_initialization().

Reported by: Michal Mertl
Approved by: mlaier (mentor)
MFC after: 3 days


148983 12-Aug-2005 glebius

Axe ppp_for_tty(). Use tty->t_lsc pointer to store sc. This
also eliminates recursive use of ppp_softc_list_mtx.

PR: kern/84686
Reviewed by: phk
MFC after: 1 week


148956 11-Aug-2005 glebius

o To prevent a race between RTM_DELETE message and
arptimer() deleting stale entry, we need to lock
rtentry before unlocking radix head.

Reviewed by: sam


148954 11-Aug-2005 glebius

o Make rt_check() function more strict:
- rt0 passed to rt_check() must not be NULL, assert this.
- rt returned by rt_check() must be valid locked rtentry,
if no error occured.
o Modify callers, so that they never pass NULL rt0
to rt_check().

Reviewed by: sam, ume (nd6.c)


148894 09-Aug-2005 rwatson

For each interface flag, indicate whether or not it is owned by the
device driver, owned by the network stack, or initialized by the device
driver before attach and read-only from then on.

Not all device drivers and network stack components currently follow
these rules, especially with respect to IFF_UP, and a few exceptions
with IFF_ALLMULTI.

MFC after: 7 days


148887 09-Aug-2005 rwatson

Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags. Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags. This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.

Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.

Reviewed by: pjd, bz
MFC after: 7 days


148886 09-Aug-2005 rwatson

Rename IFF_RUNNING to IFF_DRV_RUNNING, IFF_OACTIVE to IFF_DRV_OACTIVE,
and move both flags from ifnet.if_flags to ifnet.if_drv_flags, making
and documenting the locking of these flags the responsibility of the
device driver, not the network stack. The flags for these two fields
will be mutually exclusive so that they can be exposed to user space as
though they were stored in the same variable.

Provide #defines to provide the old names #ifndef _KERNEL, so that user
applications (such as ifconfig) can use the old flag names. Using the
old names in a device driver will result in a compile error in order to
help device driver writers adopt the new model.

When exposing the interface flags to user space, via interface ioctls
or routing sockets, or the two fields together. Since the driver flags
cannot currently be set for user space, no new logic is currently
required to handle this case.

Add some assertions that general purpose network stack routines, such
as if_setflags(), are not improperly used on driver-owned flags.

With this change, a large number of very minor network stack races are
closed, subject to correct device driver locking. Most were likely
never triggered.

Driver sweep to follow; many thanks to pjd and bz for the line-by-line
review they gave this patch.

Reviewed by: pjd, bz
MFC after: 7 days


148883 09-Aug-2005 glebius

In preparation for fixing races in ARP (and probably in other
L2/L3 mappings) make rt_check() return a locked rtentry.


148874 08-Aug-2005 thompsa

Use m_copypacket() which is an optimization of the common case
m_copym(m, 0, M_COPYALL, how).

This is required for strict alignment architectures where we align the IP
header in the input path but m_copym() will create an unaligned copy in
bridge_broadcast(). m_copypacket() preserves alignment of the first mbuf.

Noticed by: Petri Simolin
Approved by: mlaier (mentor)
MFC after: 3 days


148868 08-Aug-2005 rwatson

Merge the dev_clone and dev_clone_cred event handlers into a single
event handler, dev_clone, which accepts a credential argument.
Implementors of the event can ignore it if they're not interested,
and most do. This avoids having multiple event handler types and
fall-back/precedence logic in devfs.

This changes the kernel API for /dev cloning, and may affect third
party packages containg cloning kernel modules.

Requested by: phk
MFC after: 3 days


148799 06-Aug-2005 sam

destroy lock _before_ free'ing the structure it resides in


148696 04-Aug-2005 jhb

Initialize the if_addr mutex in if_alloc() rather than waiting until
if_attach(). This allows ethernet drivers to use it in their routines
to program their MAC filters before ether_ifattach() is called (de(4) is
one such driver). Also, the if_addr mutex is destroyed in if_free()
rather than if_detach(), so there was another potential bug in that a
driver that failed during attach and called if_free() without having
called ether_ifattach() would have tried to destroy an uninitialized mutex.

Reported by: Holm Tiffe holm at freibergnet dot de
Discussed with: rwatson


148652 02-Aug-2005 rwatson

Protect link layer network interface multicast address list manipulation
using ifp->if_addr_mtx:

- Initialize if_addr_mtx when ifnet is initialized.

- Destroy if_addr_mtx when ifnet is torn down.

- Rename ifmaof_ifpforaddr() to if_findmulti(); assert if_addr_mtx.
Staticize.

- Extract ifmultiaddr allocation and initialization into if_allocmulti();
accept a 'mflags' argument to indicate whether or not sleeping is
permitted. This centralizes error handling and address duplication.

- Extract ifmultiaddr tear-down and deallocation in if_freemulti().

- Re-structure if_addmulti() to hold if_addr_mtx around manipulation of
the ifnet multicast address list and reference count manipulation.
Make use of non-sleeping allocations. Annotate the fact that we only
generate routing socket events for explicit address addition, not
implicit link layer address addition.

- Re-structure if_delmulti() to hold if_addr_mtx around manipulation of
the ifnet multicast address list and reference count manipulation.
Annotate the lack of a routing socket event for implicit link layer
address removal.

- De-spl all and sundry.

Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after: 1 week


148641 02-Aug-2005 rwatson

When allocating link layer ifnet address list entries in
ifp->if_resolvemulti(), do so with M_NOWAIT rather than M_WAITOK, so
that a mutex can be held over the call. In the FDDI code, add a
missing M_ZERO. Consumers are already aware that if_resolvemulti()
can fail.

MFC after: 1 week


148640 02-Aug-2005 rwatson

Add if_addr_mtx to struct ifnet, a mutex to protect ifnet-related address
lists. Add accessor macros.

This changes the size of struct ifnet, but ideally, all ifnet consumers
are now using if_alloc() to allocate these structures rather than
embedding them into device driver softc's, so this won't modify the
network device driver ABI.

MFC after: 1 week


148613 01-Aug-2005 bz

Add support for IPv6 over GRE [1]. PR kern/80340 includes the
FreeBSD specific ip_newid() changes NetBSD does not have.
Correct handling of non AF_INET packets passed to bpf [2].

PR: kern/80340[1], NetBSD PRs 29150[1], 30844[2]
Obtained from: NetBSD ip_gre.c rev. 1.34,1.35, if_gre.c rev. 1.56
Submitted by: Gert Doering <gert at greenie.muc.de>[2]
MFC after: 4 days


148418 26-Jul-2005 csjp

Rather than hold a mutex over calls to SYSCTL_OUT allocate a
temporary buffer then pass the array to user-space once we have
dropped the lock.

While we are here, drop an assertion which could result in a
kernel panic under certain race conditions.

Pointed out by: rwatson


148385 25-Jul-2005 ume

scope cleanup. with this change
- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.

Submitted by: JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp>
Obtained from: KAME


148372 25-Jul-2005 thompsa

We check that all the member interfaces have the same MTU on attach to the
bridge but the interface can still be changed afterwards.

This falls under the 'dont do that' category but log an warning when INVARIANTS
is defined.

Approved by: mlaier (mentor)
MFC after: 3 days


148366 24-Jul-2005 csjp

Introduce new sysctl variable: net.bpf.stats. This sysctl variable can
be used to pass statistics regarding dropped, matched and received
packet counts from the kernel to user-space. While we are here
introduce a new counter for filtered or matched packets. We currently
keep track of packets received or dropped by the bpf device, but not
how many packets actually matched the bpf filter.

-Introduce net.bpf.stats sysctl OID
-Move sysctl variables after the function prototypes so we can
reference bpf_stats_sysctl(9) without build errors.
-Introduce bpf descriptor counter which is used mainly for sizing
of the xbpf_d array.
-Introduce a xbpf_d structure which will act as an external
representation of the bpf_d structure.
-Add a the following members to the bpfd structure:

bd_fcount - Number of packets which matched bpf filter
bd_pid - PID which opened the bpf device
bd_pcomm - Process name which opened the device.

It should be noted that it's possible that the process which opened
the device could be long gone at the time of stats collection. An
example might be a process that opens the bpf device forks then exits
leaving the child process with the bpf fd.

Reviewed by: mdodd


148265 21-Jul-2005 rwatson

Allocate one of the spare ifnet integer fields to hold if_drv_flags,
which in the future will hold IFF_OACTIVE and IFF_RUNNING, and have
its access synchronized by the device driver rather than the
protocol stack. This will avoid potential races in the management
of flags in if_flags.

Discussed with: various (scottl, jhb, ...)
MFC after: 1 week


148228 21-Jul-2005 phk

Add some KASSERTS to catch null pointers.


148202 20-Jul-2005 thompsa

Clear the PROMISC flag from the vlan interface when we remove a member. We
checked for IFT_L2VLAN in bridge_ioctl_add() but not bridge_delete_member().

Approved by: mlaier (mentor)


148153 19-Jul-2005 rwatson

In multicast routines:

Compare pointers with NULL rather than treating them as booleans.

Compare pointers with NULL rather than 0 to make it more clear
they are pointers.

Assign pointers value of NULL rather than 0 to make it more clear
they are pointers.

MFC after: 3 days


148152 19-Jul-2005 rwatson

Rename equal() macro to sa_equal(), which matches the definitions
of sa_equal() in other files, and makes it more clear what equal()
is comparing.

MFC after: 3 days


148125 18-Jul-2005 rwatson

Lock down netnatm and mark as MPSAFE:

- Introduce a subsystem mutex, natm_mtx, manipulated with accessor macros
NATM_LOCK_INIT(), NATM_LOCK(), NATM_UNLOCK(), NATM_LOCK_ASSERT(). It
protects the consistency of pcb-related data structures. Finer grained
locking is possible, but should be done in the context of specific
measurements (as very little work is done in netnatm -- most is in the
ATM device driver or socket layer, so there's probably not much
contention).

- Remove GIANT_REQUIRED, mark as NETISR_MPSAFE, remove
NET_NEEDS_GIANT("netnatm").

- Conditionally acquire Giant when entering network interfaces for
ifp->if_ioctl() using IFF_LOCKGIANT(ifp)/IFF_UNLOCKGIANT(ifp) in order
to coexist with non-MPSAFE atm ifnet drivers..

- De-spl.

MFC after: 2 weeks
Reviewed by: harti, bms (various versions)


148037 15-Jul-2005 gnn

Fix for PR 82974. We were not checking that the route looked up in
the case of an RTM_CHANGE was specific, i.e. that it matched completely. This
led to a route change of a non-existent route changing the default route
as the radix code would simply back track to that point and hand that
route back to the routing socket code.

PR: 82974
Reviewed by: Tai-hwa Liang <avatar@mmlab.cse.yzu.edu.tw>
Ben Kaduk <minimarmot@gmail.com>
Bjoern A. Zeeb <bzeeb-lists@lists.zabbadoz.net>
Obtained from: OpenBSD with modifications.
MFC after: 2 weeks


148010 14-Jul-2005 mlaier

Move eventhandler for 'ifnet_departure_event' at the end of the progress.
Some of the (IPv6) cleanup functions send packets to inform peers of the
departure. These packets confused users of ifnet_departure_event (pf at the
moment).

PR: kern/80627
Tested by: Divacky Roman
MFC after: 1 week


147986 14-Jul-2005 yar

MFp4:

- Introduce a helper function if_setflag() containing the code common
to ifpromisc() and if_allmulti() instead of duplicating the code poorly,
with different bugs.
- Call ifp->if_ioctl() in a consistent way: always use more compatible C
syntax and check whether ifp->if_ioctl is not NULL prior to the call.

MFC after: 1 month


147976 13-Jul-2005 thompsa

Previously the bridge MTU was set to ETHERMTU and could not be changed. Since
we can only bridge interfaces with the same value it meant that all members had
to be set at ETHERMTU as well.

Allow the first member to be added to define the MTU for the bridge, the check
still applies to all additional members.

Print an informative message if the MTU is incorrect [1]

Requested by: Niki Denev [1]
Approved by: mlaier (mentor)
MFC after: 3 days


147893 11-Jul-2005 sam

additions from libpcap 0.9.1 release

Approved by: re (scottl)


147786 06-Jul-2005 thompsa

- Previously when broadcasting to N number of interfaces we would run pfil
hooks for each outgoing interface but also run pfil hooks _N times_ on the
bridge interface. This is changed so pfil hooks are run once for the bridge
interface (bridge0) and then only on the outgoing interfaces in the broadcast
loop.

- Simplify bridge_enqueue() by moving bridge_pfil() to the callers.

- Check (inet6_pfil_hook.ph_busy_count >= 0), it may be possible to have a
packet filter hooked for only ipv6 but we were only checking if ipv4 hooks
were busy.

- Minor optimisation for null mbuf check after bridge_pfil(), move it into the
if-block as it couldnt possibly be null outside.

Prodded by: mlaier
Approved by: re (scottl), mlaier (mentor)


147785 05-Jul-2005 rwatson

Eliminate MAC entry point mac_create_mbuf_from_mbuf(), which is
redundant with respect to existing mbuf copy label routines. Expose
a new mac_copy_mbuf() routine at the top end of the Framework and
use that; use the existing mpo_copy_mbuf_label() routine on the
bottom end.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA, SPAWAR
Approved by: re (scottl)


147760 03-Jul-2005 thompsa

Check the alignment of the IP header before passing the packet up to the
packet filter. This would cause a panic on architectures that require strict
alignment such as sparc64, ia64 and ppc.

This uses the code block from if_bridge and the newly added macro
IP_HDR_ALIGNED_P().

This /might/ be a temporary messure before all NIC drivers are educated
to align the header themself.

PR: ia64/81284
Obtained from: NetBSD (if_bridge)
Approved by: re (dwhite), mlaier (mentor)


147744 02-Jul-2005 thompsa

Check the alignment of the IP header before passing the packet up to the
packet filter. This would cause a panic on architectures that require strict
alignment such as sparc64 (tier1) and ia64/ppc (tier2).

This adds two new macros that check the alignment, these are compile time
dependent on __NO_STRICT_ALIGNMENT which is set for i386 and amd64 where
alignment isn't need so the cost is avoided.

IP_HDR_ALIGNED_P()
IP6_HDR_ALIGNED_P()

Move bridge_ip_checkbasic()/bridge_ip6_checkbasic() up so that the alignment
is checked for ipfw and dummynet too.

PR: ia64/81284
Obtained from: NetBSD
Approved by: re (dwhite), mlaier (mentor)


147730 01-Jul-2005 ssouhlal

Fix the recent panics/LORs/hangs created by my kqueue commit by:

- Introducing the possibility of using locks different than mutexes
for the knlist locking. In order to do this, we add three arguments to
knlist_init() to specify the functions to use to lock, unlock and
check if the lock is owned. If these arguments are NULL, we assume
mtx_lock, mtx_unlock and mtx_owned, respectively.

- Using the vnode lock for the knlist locking, when doing kqueue operations
on a vnode. This way, we don't have to lock the vnode while holding a
mutex, in filt_vfsread.

Reviewed by: jmg
Approved by: re (scottl), scottl (mentor override)
Pointyhat to: ssouhlal
Will be happy: everyone


147724 01-Jul-2005 glebius

Use m_uiotombuf() instead of own implementation. This is not just
a cosmetic change. m_uiotombuf() produces a packet header mbuf, while
original implementation did not. When kernel is compiled with MAC
support, headerless mbuf will cause panic.

Reported by: Alexander Nikiforenko <asn rambler-co.ru>
Approved by: re (scottl)
MFC After: 2 weeks


147665 29-Jun-2005 thompsa

Sync if_bridge to NetBSD r1.31

Rename conflicting variables when handling SNAP Ethernet frames.

Obtained from: NetBSD
Approved by: mlaier (mentor)
Approved by: re (blanket)


147650 28-Jun-2005 qingli

Require gateways for routes to be of the same address family as the
route itself.

It fixes a bug where an IPv4 route for example has an IPv6 gateway
specified:

route add 10.1.1.1 -inet6 fe80::1%fxp0

Destination Gateway Flags Refs Use Netif Expire
10.1.1.1 fe80::1%fxp0 UGHS 0 0 fxp0

The fix rejects these illegal combinations:

route: writing to routing socket: Invalid argument
add host 10.1.1.1: gateway fe80::1%fxp0: Invalid argument

Reviewed by: KAME jinmei@isl.rdc.toshiba.co.jp
Reviewed by: andre (mentor)
Approved by: re
MFC after: 5


147643 28-Jun-2005 bz

Fix panic after ifnet changes in rev. 1.30. sc->sc_ifp is a
pointer now and needs to be allocated before using.

Reviewed by: gnn
Approved by: re (scottl), rwatson (mentor)


147634 27-Jun-2005 thompsa

Fix a panic when bringing up the bridge interface. We were casting a ifnet
pointer to a softc which is no longer valid since the ifnet struct was split
out from the softc.

Approved by: mlaier (mentor)
Approved by: re (blanket)


147611 26-Jun-2005 dwmalone

Fix some long standing bugs in writing to the BPF device attached to
a DLT_NULL interface. In particular:

1) Consistently use type u_int32_t for the header of a
DLT_NULL device - it continues to represent the address
family as always.
2) In the DLT_NULL case get bpf_movein to store the u_int32_t
in a sockaddr rather than in the mbuf, to be consistent
with all the DLT types.
3) Consequently fix a bug in bpf_movein/bpfwrite which
only permitted packets up to 4 bytes less than the MTU
to be written.
4) Fix all DLT_NULL devices to have the code required to
allow writing to their bpf devices.
5) Move the code to allow writing to if_lo from if_simloop
to looutput, because it only applies to DLT_NULL devices
but was being applied to other devices that use if_simloop
possibly incorrectly.

PR: 82157
Submitted by: Matthew Luckie <mjl@luckie.org.nz>
Approved by: re (scottl)


147470 17-Jun-2005 brooks

Spelling/grammer fixes in comment.

Reported by: Hans Petter Selasky <hselasky at c2i dot net>
Approved by: re (ifnet blanked)


147346 13-Jun-2005 brooks

Initialze ifp->if_softc.

Submitted by: ume


147308 12-Jun-2005 brooks

Return NULL instead of a bogus pointer from if_alloc when if_com_alloc
fails.

Move detaching the ifnet from the ifindex_table into if_free so we can
both keep the sanity checks and actually delete the ifnets. [0]

Reported by: gallatin [0]
Approved by: re (blanket)


147281 10-Jun-2005 thompsa

Catch up with the struct ifnet changes and use if_alloc().

Reviewed by: brooks
Approved by: mlaier (mentor)


147256 10-Jun-2005 brooks

Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
- Struct arpcom is no longer referenced in normal interface code.
Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
To enforce this ac_enaddr has been renamed to _ac_enaddr.
- The second argument to ether_ifattach is now always the mac address
from driver private storage rather than sometimes being ac_enaddr.

Reviewed by: sobomax, sam


147251 10-Jun-2005 mlaier

Add missing {} in last commit.


147205 10-Jun-2005 thompsa

Add dummynet(4) support to if_bridge, this code is largely based on bridge.c.

This is the final piece to match bridge.c in functionality, we can now be a
drop-in replacement.

Approved by: mlaier (mentor)


147165 09-Jun-2005 harti

When returing an RTM_GET message through the routing socket fill
in the rtm_index field whenever we have an interface pointer. This
is consistent with the RTM_GET messages returned by sysctl().


147111 07-Jun-2005 thompsa

Bring in IPFW layer2 filtering from bridge.c, this allows Ethernet filtering
using the layer2, mac and mac-type keywords.

This is one of the last features that bridge.c has over if_bridge and gets us
very close to a full functional replacement.

Approved by: mlaier (mentor)


147065 06-Jun-2005 csjp

Change the maximum bpf program instruction limitation from being hard-
coded at 512 (BPF_MAXINSNS) to being tunable. This is useful for users
who wish to use complex or large bpf programs when filtering traffic.
For now we will default it to BPF_MAXINSNS. I have tested bpf programs
with well over 21,000 instructions without any problems.

Discussed with: phk


147059 06-Jun-2005 brooks

Send link state change notifications to /dev/devctl. This is needed to
start the OpenBSD dhclient when links come up.


147040 06-Jun-2005 thompsa

Change ipv6 packet filtering to match ipv4. It now checks pfil_member and
pfil_bridge to determine which interfaces to filter on.

Approved by: mlaier (mentor)


146990 05-Jun-2005 thompsa

Fix indentation of two comment blocks from the last commit.

Approved by: mlaier (mentor)


146986 05-Jun-2005 thompsa

Add hooks into the networking layer to support if_bridge. This changes struct
ifnet so a buildworld is necessary.

Approved by: mlaier (mentor)
Obtained from: NetBSD


146985 05-Jun-2005 thompsa

Add if_bridge, which provides more advanced Ethernet bridging and 802.1d
spanning tree support.

Based on Jason Wright's bridge driver from OpenBSD, and modified by Jason R.
Thorpe in NetBSD.

Reviewed by: mlaier, bms, green
Silence from: -net
Approved by: mlaier (mentor)
Obtained from: NetBSD


146729 28-May-2005 sam

integrate changes from libpcap-0.9.1-096

Reviewed by: bms


146702 28-May-2005 brooks

Update refrenced URL for SNMP list of ifTypes to refer to iana.org
instead of a dead location on ftp.isi.edu.


146635 26-May-2005 glebius

Plug mbuf leak, that I have introduced in 1.85. Also restore important comment
from if_ethersubr.c:1.178. While here adjust formatting, to make code more
readable.

Reported by: Alexey Kamyshev, rwatson


146620 25-May-2005 peadar

Separate out address-detaching part of if_detach into if_purgeaddrs,
so if_tap doesn't need to rely on locally-rolled code to do same.

The observable symptom of if_tap's bzero'ing the address details
was a crash in "ifconfig tap0" after an if_tap device was closed.

Reported By: Matti Saarinen (mjsaarin at cc dot helsinki dot fi)


146550 23-May-2005 mlaier

Fix semantics of ph_busy_count == -1 to pass instead of block.

PR: kern/81128
Submitted by: Joost Bekkers
MFC-after: 2 weeks


145953 06-May-2005 cperciva

If we are going to
1. Copy a NULL-terminated string into a fixed-length buffer, and
2. copyout that buffer to userland,
we really ought to
0. Zero the entire buffer
first.

Security: FreeBSD-SA-05:08.kmem


145883 04-May-2005 emax

Change m_uiotombuf so it will accept offset at which data should be copied
to the mbuf. Offset cannot exceed MHLEN bytes. This is currently used to
fix Ethernet header alignment problem on alpha and sparc64. Also change all
users of m_uiotombuf to pass proper offset.

Reviewed by: jmg, sam
Tested by: Sten Spans "sten AT blinkenlights DOT nl"
MFC after: 1 week


145852 04-May-2005 csjp

-introduce net.bpf sysctl instead of the less intuitive debug.*

debug.bpf_bufsize is now net.bpf.bufsize
debug.bpf_maxbufsize is now net.bpf.maxbufsize

-move function prototypes for bpf_drvinit and bpf_clone up to the
top of the file with the others
-assert bpfd lock in catchpacket() and bpf_wakeup()

MFC after: 2 weeks


145323 20-Apr-2005 glebius

- Call if_link_state_change() for each vlan, when link changes
on parent.
- Remove route.h include.
- Fix comment about MII.

Sponsored by: Rambler
Reviewed by: yar


145320 20-Apr-2005 glebius

Do not call all link state callbacks directly, but schedule
a taskqueue(9) task. This fixes LORs and adds possibility
to serve such events pseudorecursively, when link state
change of interface causes subsequent change on other
interfaces.

Sponsored by: Rambler
Reviewed by: sam, brooks, mux


145095 15-Apr-2005 cperciva

Zero the ifr.ifr_name buffer in ifconf() in order to avoid
accidental disclosure of kernel memory to userland.

Security: FreeBSD-SA-05:04.ifconf


145002 13-Apr-2005 mdodd

Add #defines for control fields and address bits.


144979 13-Apr-2005 mdodd

Provide a sysctl (net.link.tap.user_open) to allow unpriviliged
acces to tap(4) device nodes based on file system permission.

Duplicate the 'debug.if_tap_debug' sysctl under the
'net.link.tap' hierarchy.


144389 31-Mar-2005 phk

Explicitly hold a reference to the cdev we have just cloned. This
closes the race where the cdev was reclaimed before it ever made it
back to devfs lookup.


144198 27-Mar-2005 green

You must selwakeup{,pri}() when closing a selectable object or the
td->td_sel will get trashed and crash the system. Fix BPF's mistake
in this area.

MFC after: 1 day


144160 26-Mar-2005 sam

rt_newaddrmsg will blow up if given something other than RTM_ADD
or RTM_DELETE; add an assertion, may want to do something more
heavyhanded in the future

Noticed by: Coverity Prevent analysis tool
Reviewed by: mdodd


144114 25-Mar-2005 gallatin

Zero the reserved fields of the header, as per rfc 2734. This change
results in connectivty to MacOSX hosts via fwip.

Thanks to Apple's Arulchandran Paramasivam <arulchandranp@apple.com> for
letting us know what we were doing wrong.

Reviewed by: dfr
MFC After: 7 days


144045 24-Mar-2005 mdodd

- Break after nested switch.
- Default returns an error.


143881 20-Mar-2005 glebius

ifma_protospec is a pointer. Use NULL when assigning or compating it.


143464 12-Mar-2005 glebius

Add a sysctl net.link.log_link_state_change, which allows to
suppress logging of interface link state changes.

Requested by: sam, kan


143196 06-Mar-2005 sobomax

When neither of supported frame type is enabled via kernel options enable
them all, otherwise the driver will be useless and will only confuse user
as manual page says nothing about the need to enable one of those frame
types explicitly in the kernel config.

PR: kern/47152
Submitted by: Andriy Gapon <avg@icyb.net.ua>
MFC after: 3 days


143195 06-Mar-2005 sobomax

Fix ef(4) driver when kernel compiled w/o IPX.

MFC after: 3 days


143064 02-Mar-2005 jmg

fix a bug where bpf would try to wakeup before updating the state.. This
was causing kqueue not to see the correct state and not wake up a process
that is waiting...

Submitted by: nCircle Network Security, Inc.


142906 01-Mar-2005 glebius

Use NET_CALLOUT_MPSAFE macro.


142901 01-Mar-2005 glebius

Revert change to struct ifnet. Use ifnet pointer in softc. Embedding
ifnet into smth will soon be removed.

Requested by: brooks


142793 28-Feb-2005 rwatson

In bpf_setf(), protect against races between multiple user threads
attempting to change the BPF filter on a BPF descriptor at the same
time: retrieve the old filter pointer under the same locked region
as setting the new pointer.

MFC after: 3 days


142787 28-Feb-2005 rwatson

Update a comment describing bpf_iflist to indicate that the BPF interface
structures correspond to specific link layers, so the same network
interface may appear more than once.

MFC after: 3 days


142564 26-Feb-2005 glebius

Remove carp_softc.sc_ifp member in favor of union pointers in struct ifnet.

Obtained from: OpenBSD


142501 25-Feb-2005 brooks

Change the definition of struct if_data's member ifi_epoch from wall
clock time to uptime because wall clock time may go backwards.

This is a change in the API which will impact SNMP agents who are using
ifi_epoch to set RFC2233's ifCounterDiscontinuityTime. None are know to
exist today. This will not impact applications that are using the
<index, epoch> tuple to verify interface uniqueness except that it
eliminates a race which could lead to a false assumption of uniqueness.

Because this is a behavior change, bump __FreeBSD_version.

Discussed with: re (jhb, scottl)
MFC after: 3 days
Pointed out by: pkh (way back at EuroBSDCon)
Pointy hat: brooks


142378 24-Feb-2005 maxim

o Move ifcr_count sanity check up and reject negative values before we
panic at kmem_alloc() via malloc(9).

PR: kern/77748
Submitted by: Wojciech A. Koszek
OK'ed by: brooks
Security: local DoS, a sample code in the PR.
MFC after: 3 days


142374 24-Feb-2005 glebius

Fix long lines in comment introduced in previous commit.


142352 24-Feb-2005 sam

the rt parameter to ifa_rtrequest callbacks should always be non-null;
eliminate grauitous ptr checks that follow ptr deref's

Noticed by: Coverity Prevent analysis tool


142335 23-Feb-2005 sam

eliminate dead code and collapse the remainder

Noticed by: Coverity Prevent analysis tool
Reviewed by: rwatson


142240 22-Feb-2005 glebius

Typo in comment.


142237 22-Feb-2005 rwatson

When prepending an LCC SNAP header to an atalk outgoing ethernet packet,
allocate the additional mbuf (if needed) using a non-sleeping memory
allocation.

MFC after: 7 days


142228 22-Feb-2005 glebius

- In if_link_state_change() extract function body from if-block, to improve
readability.
- Call carp_carpdev_state() from if_link_state_change() if interface has
associated CARP interface.

Sponsored by: Rambler


142215 22-Feb-2005 glebius

Add CARP (Common Address Redundancy Protocol), which allows multiple
hosts to share an IP address, providing high availability and load
balancing.

Original work on CARP done by Michael Shalayeff, with many
additions by Marco Pfatschbacher and Ryan McBride.

FreeBSD port done solely by Max Laier.

Patch by: mlaier
Obtained from: OpenBSD (mickey, mcbride)


142069 18-Feb-2005 ru

Allocate the M_VLANTAG m_pkthdr flag, and use it to indicate that
a packet has VLAN mbuf tag attached. This is faster to check than
m_tag_locate(), and allows us to use the tags in non-vlan(4) VLAN
producers.

The first argument to VLAN_OUTPUT_TAG() is now unused but retained
for backward compatibility.

While here, embellish a fix in rev. 1.174 of if_ethersubr.c -- it
now checks for packets with VLAN (mbuf) tags, and it should now
be possible to bridge(4) on vlan(4)'s whose parent interfaces
support VLAN decapsulation in hardware.

Reviewed by: sam


141909 14-Feb-2005 glebius

Check for non-NULL ac_netgraph field in interface arpcom, instead of
checking global presence of ng_ether(4).

Reviewed by: ru


141891 14-Feb-2005 ru

If no vlan(4) interfaces are configured for the interface, and the
driver did VLAN decapsulation in hardware, we were passing a frame
as if it came for the parent (non-VLAN) interface. Stop this from
happening.

Reminded by: glebius
Security: This could pose a security risk in some setups


141749 12-Feb-2005 delphij

Validate ifc->ifc_len before submitting its incarnation to sbuf_new,
which will finally lead to kernel panic.

Security: This prevents a local (root-launched) DoS
Submitted by: Wojciech A. Koszek [dunstan at freebsd czest pl]
PR: 77421
MFC After: 1 week


141616 10-Feb-2005 phk

Make a bunch of malloc types static.

Found by: src/tools/tools/kernxref


141051 30-Jan-2005 glebius

Log changes of link state.

Reviewed by: rwatson


140775 24-Jan-2005 rwatson

Acquire the raw_cb mutex around LIST_REMOVE() of a raw socket control
block from the global raw socket list.

Submitted by: Roselyn Lee <rosel at verniernetworks dot com>
MFC after: 1 week


140745 24-Jan-2005 yar

Fix spelling in a comment.


140686 23-Jan-2005 yar

Reduce the global name space pollution.
The cloner structure isn't referenced by name outside this file.


140345 16-Jan-2005 glebius

- Reduce number of arguments passed to dummynet_io(), we already have cookie
in struct ip_fw_args itself.
- Remove redundant &= 0xffff from dummynet_io().


140323 15-Jan-2005 glebius

Remove ip_fw.h and ip_dummynet.h from includes.


140224 14-Jan-2005 glebius

o Clean up interface between ip_fw_chk() and its callers:

- ip_fw_chk() returns action as function return value. Field retval is
removed from args structure. Action is not flag any more. It is one
of integer constants.
- Any action-specific cookies are returned either in new "cookie" field
in args structure (dummynet, future netgraph glue), or in mbuf tag
attached to packet (divert, tee, some future action).

o Convert parsing of return value from ip_fw_chk() in ipfw_check_{in,out}()
to a switch structure, so that the functions are more readable, and a future
actions can be added with less modifications.

Approved by: andre
MFC after: 2 months


140057 11-Jan-2005 keramida

Fix a typo in a comment that may be confusing if one doesn't really
check what the code does. Separators are spaces, commas or tabs;
not '*' characters (as one may assume by reading the old comment).


140045 11-Jan-2005 ume

don't see NBPFILTER.


140044 11-Jan-2005 ume

remove HAVE_OLD_BPF part.


140043 11-Jan-2005 ume

we are not OLD_BPF system.


140042 11-Jan-2005 ume

fix typo.


139903 08-Jan-2005 glebius

This change adds reliability for Ethernet trunks built with ng_one2many:

- Introduce another ng_ether(4) callback ng_ether_link_state_p, which
is called from if_link_state_change(), every time link is changed.
- In ng_ether_link_state() send netgraph control message notifying
of link state change to a node connected to "lower" hook.

Reviewed by: sam
MFC after: 2 weeks


139823 07-Jan-2005 imp

/* -> /*- for license, minor formatting changes


139365 28-Dec-2004 rik

Add FR support to sppp (MFCronyx).

Silence on: net@, current@, hackers@.
No objections: joerg

Requested by: by many (mostly Cronyx) users for a long long time.
MFC after: 10 days

PR: kern/21771, kern/66348


139358 27-Dec-2004 pjd

Fix mbuf leak.

Submitted by: Johnny Eriksson <bygg@cafax.se>
MFC after: 5 days


139208 22-Dec-2004 phk

Include fcntl.h
Include selinfo.h (don't rely on vnode.h to do so)
Check O_NONBLOCK instead of IO_NELAY
Don't include vnode.h


139207 22-Dec-2004 phk

Don't include filedesc.h
Include fcntl.h
Include selinfo.h (don't rely on vnode.h to do so)
Check O_NONBLOCK instead of IO_NDELAY
Don't include vnode.h


139206 22-Dec-2004 phk

Include fcntl.h
Check O_NONBLOCK instead of IO_NDELAY
Include uio.h
Don't include vnode.h
Don't include filedesc.h


139200 22-Dec-2004 phk

Check O_NONBLOCK instead of IO_NDELAY.
Don't include <sys/vnode.h>


138950 17-Dec-2004 jmg

don't try to recurse on the bpf lock.. kqueue already locks the bpf lock
now...

Submitted by: Ed Maste of Sandvine Inc.
MFC after: 1 week


138855 14-Dec-2004 rik

Kill double inclusion for <netinet/in.h> and <netinet/in_systm.h>.


138745 12-Dec-2004 rik

Make sppp MPSAFE.
MPSAFE could be turned off by IFF_NEEDSGIANT.

Silence on: net@, current@, hackers@.
No objections: joerg


138542 08-Dec-2004 sam

Cleanup link state change notification:
o add new if_link_state_change routine that deals with link state changes
o change mii to use if_link_state_change


138540 08-Dec-2004 sam

Don't require a device to be marked up when issuing BIOCSETIF.


138239 30-Nov-2004 mlaier

Implement the check I was talking about in the previous message already.
Introduce domain_init_status to keep track of the init status of the domains
list (surprise). 0 = uninitialized, 1 = initialized/unpopulated, 2 =
initialized/done. Higher values can be used to support late addition of
domains which right now "works", but is potential dangerous. I choose to
only give a warning when doing so.

Use domain_init_status with if_attachdomain[1]() to ensure that we have a
complete domains list when we init the if_afdata array. Store the current
value of domain_init_status in if_afdata_initialized. This way we can update
if_afdata after a new protocol has been added (once that is allowed).

Submitted by: se (with changes)
Reviewed by: julian, glebius, se
PR: kern/73321 (partly)


138039 23-Nov-2004 rwatson

Assign if_broadcastaddr to NULL not 0 in if_attach().

Printf() a warning if if_attachdomain() is called more than once on an
interface to generate some noise on mailing lists when this occurs.

Fix up style in if_start(), where spaces crept in instead of tabs at
some point.

MFC after: 1 week
MFC note: Not the printf().


137824 17-Nov-2004 jmg

sync comment on IFF_OACTIVE with reality.. IFF_OACTIVE is set when the
hardware cannot take anymore packets, and so will supress the calling of
the device's if_start method...

Submitted by: bde


137476 09-Nov-2004 mlaier

Remove the #if 0 wrapping around !ALTQ stuff that can't be used due to ABI
stability anyway.


137386 08-Nov-2004 phk

Initialize struct pr_userreqs in new/sparse style and fill in common
default elements in net_init_domain().

This makes it possible to grep these structures and see any bogosities.


137336 07-Nov-2004 cognet

Don't abuse tp->t_sc in sl(4) either.


137335 07-Nov-2004 cognet

Don't abuse tp->t_sc, as it is now used by tty drivers.
This fixes the panic that occurs when using ppp(4)

Reported and tested by: Yann Berthier (yb at sainte-barbe dot org)


137101 31-Oct-2004 glebius

Utilize m_uiotombuf() in device write method, instead of home-grown
implementation. This also gives a performance improvement, because
m_uiotombuf() utilizes clusters.

Approved by: julian (mentor)
MFC after: 1 month


137065 30-Oct-2004 rwatson

Move if_handoff() from an inline in if_var.h to a function to if.c
in orden to harden the ABI for 5.x; this will permit us to modify
the locking in the ifnet packet dispatch without requiring drivers
to be recompiled.

MFC after: 3 days
Discussed at: EuroBSDCon Developer's Summit


137062 30-Oct-2004 rwatson

Add additional "spare" fields to 'struct ifnet' in order to improve
the resistance of the network driver ABI to changes that will be
required as we optimize locking.

MFC after: 3 days
Discussed at: Developer Summit


136950 25-Oct-2004 jmg

use NULL instead of 0 when casting/comparing w/ a pointer...


136704 19-Oct-2004 rwatson

Define IFF_LOCKGIANT() and IFF_UNLOCKGIANT() macros, which conditionally
acquire Giant if the passed interface has IFF_NEEDSGIANT set on it.
Modify calls into (ifp)->if_ioctl() in if.c to use these macros in order
to ensure that Giant is held.

MFC after: 3 days
Bumped into by: jmg


136682 18-Oct-2004 rwatson

Push acquisition of the accept mutex out of sofree() into the caller
(sorele()/sotryfree()):

- This permits the caller to acquire the accept mutex before the socket
mutex, avoiding sofree() having to drop the socket mutex and re-order,
which could lead to races permitting more than one thread to enter
sofree() after a socket is ready to be free'd.

- This also covers clearing of the so_pcb weak socket reference from
the protocol to the socket, preventing races in clearing and
evaluation of the reference such that sofree() might be called more
than once on the same socket.

This appears to close a race I was able to easily trigger by repeatedly
opening and resetting TCP connections to a host, in which the
tcp_close() code called as a result of the RST raced with the close()
of the accepted socket in the user process resulting in simultaneous
attempts to de-allocate the same socket. The new locking increases
the overhead for operations that may potentially free the socket, so we
will want to revise the synchronization strategy here as we normalize
the reference counting model for sockets. The use of the accept mutex
in freeing of sockets that are not listen sockets is primarily
motivated by the potential need to remove the socket from the
incomplete connection queue on its parent (listen) socket, so cleaning
up the reference model here may allow us to substantially weaken the
synchronization requirements.

RELENG_5_3 candidate.

MFC after: 3 days
Reviewed by: dwhite
Discussed with: gnn, dwhite, green
Reported by: Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by: Vlad <marchenko at gmail dot com>


136428 12-Oct-2004 glebius

Fix packet flow when both ng_ether(4) and bridge(4) are in use:

- push all bridge logic from if_ethersubr.c into bridge.c
make bridge_in() return mbuf pointer (or NULL).
- call only bridge_in() from ether_input(), after ng_ether_input()
was optinally called.
- call bridge_in() from ng_ether_rcv_upper().

Long description: http://lists.freebsd.org/mailman/htdig/freebsd-net/2004-May/003881.html
Reported by: Jian-Wei Wang <jwwang at FreeBSD.csie.NCTU.edu.tw>
Tested by: myself, Sergey Lyubka
Reviewed by: sam
Approved by: julian (mentor)
MFC after: 2 months


136393 11-Oct-2004 andre

Correctly unregister a netisr by clearing the ni->ni_queue field to NULL as
well. This field is actually used by various netisr functions to determine
the availablility of the specified netisr. This uncomplete unregister leads
directly to a crash when the KLD unregistering the netisr is unloaded.

Submitted by: Sam <sah@softcardsystems.com>
MFC after: 3 days


136376 11-Oct-2004 rwatson

When harvesting entropy from an ethernet mbuf, do so before freeing the
mbuf.

RELENG_5 candidate.


136371 11-Oct-2004 glebius

Assign pointer NULL, not 0.

Approved by: julian (mentor)


136258 08-Oct-2004 mlaier

Change pfil starvation prevention from fail-open to fail-close.
We return ENOBUF to indicate the problem, which is an errno that should be
handled well everywhere.

Requested & Submitted by: green
Silently okay'ed by: The rest of the firewall gang
MFC after: 3 days


136243 08-Oct-2004 brooks

Since net/net_osdep.c contained only one function that could be
trivially implemented as a macro, do that and remove it. NetBSD did
this quite a while ago.


136185 06-Oct-2004 green

Don't recurse the BPF descriptor lock during the BIOCSDLT operation
(and panic). To try to finish making BPF safe, at the very least,
the BPF descriptor lock really needs to change into a reader/writer
lock that controls access to "settings," and a mutex that controls
access to the selinfo/knote/callout. Also, use of callout_drain()
instead of callout_stop() (which is really a much more widespread
issue).


136155 05-Oct-2004 sam

Add 802.11-specific events that are dispatched through the routing socket.
This really doesn't belong here but is preferred (for the moment) over
adding yet another mechanism for sending msgs from the kernel to user apps.

Reviewed by: imp


136154 05-Oct-2004 sam

add ETHERTYPE_PAE for EAPOL/802.1x


135920 29-Sep-2004 mlaier

Add an additional struct inpcb * argument to pfil(9) in order to enable
passing along socket information. This is required to work around a LOR with
the socket code which results in an easy reproducible hard lockup with
debug.mpsafenet=1. This commit does *not* fix the LOR, but enables us to do
so later. The missing piece is to turn the filter locking into a leaf lock
and will follow in a seperate (later) commit.

This will hopefully be MT5'ed in order to fix the problem for RELENG_5 in
forseeable future.

Suggested by: rwatson
A lot of work by: csjp (he'd be even more helpful w/o mentor-reviews ;)
Reviewed by: rwatson, csjp
Tested by: -pf, -ipfw, LINT, csjp and myself
MFC after: 3 days

LOR IDs: 14 - 17 (not fixed yet)


135588 22-Sep-2004 mlaier

Switch order for mtx_unlock and cv_signal as (condvar(9)) sez:

A thread must hold mp while calling cv_signal(), cv_broadcast(), or
cv_broadcastpri() even though it isn't passed as an argument.

and is right with this claim.

While here remove a "\" from the macro -> __inline conversion.

Found by: csjp
MFC after: 4 days


135577 22-Sep-2004 stefanf

Prefer C99's __func__ over GCC's __FUNCTION__.


135570 22-Sep-2004 green

Call sbuf_finish() before sbuf_data() so as to not panic the system.


135568 22-Sep-2004 brooks

Fix a LOR where ifconf() used copyout while holding a mutex. This LOR
was seen when configuring addresses on interfaces using ifconfig. This
patch has been verified to work with over eight thousand addresses
assigned to an interface.

LOR id: 031


135416 18-Sep-2004 brooks

Log the renaming of an interface. This should make it easier to follow
kernel log files.


135354 17-Sep-2004 rwatson

Destroy global tapmtx when the if_tap module is unloaded.

RELENG_5 candidated.


135256 15-Sep-2004 brooks

Fix a LOR where copyout was called while holding a lock.

Reported by: rwatson


134970 09-Sep-2004 rwatson

Reformulate bpf_dettachd() to acquire the BIF_LOCK() as well as
BPFD_LOCK() when removing a descriptor from an interface descriptor
list. Hold both over the operation, and do a better job at
maintaining the invariant that you can't find partially connected
descriptors on an active interface descriptor list.

This appears to close a race that resulted in the kernel performing
a NULL pointer dereference when BPF sessions are detached during
heavy network activity on SMP systems.

RELENG_5 candidate.


134967 09-Sep-2004 rwatson

Reformulate use of linked lists in 'struct bpf_d' and 'struct bpf_if'
to use queue(3) list macros rather than hand-crafted lists. While
here, move to doubly linked lists to eliminate iterating lists in
order to remove entries. This change simplifies and clarifies the
list logic in the BPF descriptor code as a first step towards revising
the locking strategy.

RELENG_5 candidate.

Reviewed by: fenner


134966 09-Sep-2004 rwatson

Compare/set pointers using NULL not 0.


134933 08-Sep-2004 brooks

Re-add ifi_epoch, to struct if_data, this time replacing ifi_unused
to avoid ABI changes. It is set to the last time the interface
counters were zeroed, currently the time if_attach() was called. It is
intentended to be a valid value for RFC2233's ifCounterDiscontinuityTime
and to make it easier for applications to verify that the interface they
find at a given index is the one that was there last time they looked.

Due to space constraints ifi_epoch is a time_t rather then a struct
timeval. SNMP would prefer higher precision, but this unlikely to be
useful in practice.


134859 06-Sep-2004 jmg

don't call f_detach if the filter has alread removed the knote.. This
happens when a proc exits, but needs to inform the user that this has
happened.. This also means we can remove the check for detached from
proc and sig f_detach functions as this is doing in kqueue now...

MFC after: 5 days


134666 03-Sep-2004 rwatson

Correct a comment typo: s/Note/Not/.

Pointed out by: kensmith


134630 02-Sep-2004 brooks

Back out ifi_epoch. The ABI breakage is too disruptive this close to
5-STABLE. ifi_epoch will shortly be reintroduced with less precistion
using the space currently allocated to ifi_unused.


134614 01-Sep-2004 mlaier

Fix an assertion when if_down()ing a ALTQ managed interface. The lock should
have been in place all the time the mtx_assert in the ALTQ code just
discovered the shortcoming.

PR: i386/71195
Tested by: Bettan (PR originator), myself
MFC after: 5 days


134609 01-Sep-2004 brooks

Use a spare byte in struct if_data to store the structure size without
increasing it. Add code to ifconfig to use this size to find the
sockaddr_dl after the struct if_data in the routing message. This
allows struct if_data to grow (up to 255 bytes) without breaking
ifconfig.

Submitted by: peter


134514 30-Aug-2004 brooks

Add a new variable, ifi_epoch, to struct if_data. It is set to the last
time the interface counters were zeroed, currently the time if_attach()
was called. It is indentended to be a valid value for RFC2233's
ifCounterDiscontinuityTime and to make it easier for applications to
verify that the interface they find at a given index is the one that was
there last time they looked.

An if_epoch "compatability" macro has not been created as ifi_epoch has
never been a member of struct ifnet.

Approved by: andre, bms, wollman


134511 30-Aug-2004 yar

Use an ANSI-style definition for slstart()
in accord with the rest of the file.


134510 30-Aug-2004 yar

Grant the poor old SLIP driver with an if_start handler
so that it becomes happy and no longer panics the system
upon getting the very first packet to transmit.

Reported and tested by: Igor Timkin <ivt@gamma.ru>
Reviewed by: rwatson
MFC after: 5 days


134449 28-Aug-2004 rwatson

Correct typo in printf() warning.

Submitted by: Pawel Worach <pawel.worach at telia.com>


134443 28-Aug-2004 rwatson

Change the default disposition of debug.mpsafenet from 0 to 1, which
will cause the network stack to operate without the Giant lock by
default. This change has the potential to improve performance by
increasing parallelism and decreasing latency in network processing.

Due to the potential exposure of existing or new bugs, the following
compatibility functionality is maintained:

- It is still possible to disable Giant-free operation by setting
debug.mpsafenet to 0 in loader.conf.

- Add "options NET_WITH_GIANT", which will restore the default value of
debug.mpsafenet to 0, and is intended for use on systems compiled with
known unsafe components, or where a more conservative configuration is
desired.

- Add a new declaration, NET_NEEDS_GIANT("componentname"), which permits
kernel components to declare dependence on Giant over the network
stack. If the declaration is made by a preloaded module or a compiled
in component, the disposition of debug.mpsafenet will be set to 0 and
a warning concerning performance degraded operation printed to the
console. If it is declared by a loadable kernel module after boot, a
warning is displayed but the disposition cannot be changed. This is
implemented by defining a new SYSINIT() value, SI_SUB_SETTINGS, which
is intended for the processing of configuration choices after tunables
are read in and the console is available to generate errors, but
before much else gets going.

This compatibility behavior will go away when we've finished the last
of the locking work and are confident that operation is correct.


134399 27-Aug-2004 brooks

When detaching an interface, don't leave an obsolete pointer to the
soon to be deleted struct ifnet around.

PR: kern/52260
MFC After: 3 days


134391 27-Aug-2004 andre

Apply error and success logic consistently to the function netisr_queue() and
its users.

netisr_queue() now returns (0) on success and ERRNO on failure. At the
moment ENXIO (netisr queue not functional) and ENOBUFS (netisr queue full)
are supported.

Previously it would return (1) on success but the return value of IF_HANDOFF()
was interpreted wrongly and (0) was actually returned on success. Due to this
schednetisr() was never called to kick the scheduling of the isr. However this
was masked by other normal packets coming through netisr_dispatch() causing the
dequeueing of waiting packets.

PR: kern/70988
Found by: MOROHOSHI Akihiko <moro@remus.dti.ne.jp>
MFC after: 3 days


134383 27-Aug-2004 andre

Always compile PFIL_HOOKS into the kernel and remove the associated kernel
compile option. All FreeBSD packet filters now use the PFIL_HOOKS API and
thus it becomes a standard part of the network stack.

If no hooks are connected the entire packet filter hooks section and related
activities are jumped over. This removes any performance impact if no hooks
are active.

Both OpenBSD and DragonFlyBSD have integrated PFIL_HOOKS permanently as well.


134246 24-Aug-2004 rwatson

Revert previous revision, 1.7, as removal of GIANT_REQUIRED was made
in the wrong branch (and hence to the wrong function).


134245 24-Aug-2004 rwatson

MT4 if_fwsubr.c:1.6:

date: 2004/08/22 14:48:55; author: rwatson; state: Exp; lines: +0 -2
Don't need to assert Giant in fw_output(), only in the firewire start
routine.

Approved by: re (scottl)


134241 24-Aug-2004 roam

Fix a typo (attacked -> attached).

Approved by: sam


134185 22-Aug-2004 rwatson

Style update: use newer style function prototypes in if_sl.c in
prep for merging locking.


134160 22-Aug-2004 rwatson

Don't need to assert Giant in fw_output(), only in the firewire start
routine.


134138 21-Aug-2004 rwatson

If a tunable for the routing socket netisr queue max is defined, allow it
to override the default value, rather than the default value overriding
the tunable.


134137 21-Aug-2004 rwatson

Allow the size of the routing socket netisr queue to be configured using
the tunable or sysctl 'net.route.netisr_maxqlen'. Default the maximum
depth to 256 rather than IFQ_MAXLEN due to the downsides of dropping
routing messages.

MT5 candidate.

Discussed with: mdodd, mlaier, Vincent Jardin <jardin at 6wind.com>


134122 21-Aug-2004 csjp

When a prison is given the ability to create raw sockets (when the
security.jail.allow_raw_sockets sysctl MIB is set to 1) where privileged
access to jails is given out, it is possible for prison root to manipulate
various network parameters which effect the host environment. This commit
plugs a number of security holes associated with the use of raw sockets
and prisons.

This commit makes the following changes:

- Add a comment to rtioctl warning developers that if they add
any ioctl commands, they should use super-user checks where necessary,
as it is possible for PRISON root to make it this far in execution.
- Add super-user checks for the execution of the SIOCGETVIFCNT
and SIOCGETSGCNT IP multicast ioctl commands.
- Add a super-user check to rip_ctloutput(). If the calling cred
is PRISON root, make sure the socket option name is IP_HDRINCL,
otherwise deny the request.

Although this patch corrects a number of security problems associated
with raw sockets and prisons, the warning in jail(8) should still
apply, and by default we should keep the default value of
security.jail.allow_raw_sockets MIB to 0 (or disabled) until
we are certain that we have tracked down all the problems.

Looking forward, we will probably want to eliminate the
references to curthread.

This may be a MFC candidate for RELENG_5.

Reviewed by: rwatson
Approved by: bmilekic (mentor)


133920 17-Aug-2004 andre

Convert ipfw to use PFIL_HOOKS. This is change is transparent to userland
and preserves the ipfw ABI. The ipfw core packet inspection and filtering
functions have not been changed, only how ipfw is invoked is different.

However there are many changes how ipfw is and its add-on's are handled:

In general ipfw is now called through the PFIL_HOOKS and most associated
magic, that was in ip_input() or ip_output() previously, is now done in
ipfw_check_[in|out]() in the ipfw PFIL handler.

IPDIVERT is entirely handled within the ipfw PFIL handlers. A packet to
be diverted is checked if it is fragmented, if yes, ip_reass() gets in for
reassembly. If not, or all fragments arrived and the packet is complete,
divert_packet is called directly. For 'tee' no reassembly attempt is made
and a copy of the packet is sent to the divert socket unmodified. The
original packet continues its way through ip_input/output().

ipfw 'forward' is done via m_tag's. The ipfw PFIL handlers tag the packet
with the new destination sockaddr_in. A check if the new destination is a
local IP address is made and the m_flags are set appropriately. ip_input()
and ip_output() have some more work to do here. For ip_input() the m_flags
are checked and a packet for us is directly sent to the 'ours' section for
further processing. Destination changes on the input path are only tagged
and the 'srcrt' flag to ip_forward() is set to disable destination checks
and ICMP replies at this stage. The tag is going to be handled on output.
ip_output() again checks for m_flags and the 'ours' tag. If found, the
packet will be dropped back to the IP netisr where it is going to be picked
up by ip_input() again and the directly sent to the 'ours' section. When
only the destination changes, the route's 'dst' is overwritten with the
new destination from the forward m_tag. Then it jumps back at the route
lookup again and skips the firewall check because it has been marked with
M_SKIP_FIREWALL. ipfw 'forward' has to be compiled into the kernel with
'option IPFIREWALL_FORWARD' to enable it.

DUMMYNET is entirely handled within the ipfw PFIL handlers. A packet for
a dummynet pipe or queue is directly sent to dummynet_io(). Dummynet will
then inject it back into ip_input/ip_output() after it has served its time.
Dummynet packets are tagged and will continue from the next rule when they
hit the ipfw PFIL handlers again after re-injection.

BRIDGING and IPFW_ETHER are not changed yet and use ipfw_chk() directly as
they did before. Later this will be changed to dedicated ETHER PFIL_HOOKS.

More detailed changes to the code:

conf/files
Add netinet/ip_fw_pfil.c.

conf/options
Add IPFIREWALL_FORWARD option.

modules/ipfw/Makefile
Add ip_fw_pfil.c.

net/bridge.c
Disable PFIL_HOOKS if ipfw for bridging is active. Bridging ipfw
is still directly invoked to handle layer2 headers and packets would
get a double ipfw when run through PFIL_HOOKS as well.

netinet/ip_divert.c
Removed divert_clone() function. It is no longer used.

netinet/ip_dummynet.[ch]
Neither the route 'ro' nor the destination 'dst' need to be stored
while in dummynet transit. Structure members and associated macros
are removed.

netinet/ip_fastfwd.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code.

netinet/ip_fw.h
Removed 'ro' and 'dst' from struct ip_fw_args.

netinet/ip_fw2.c
(Re)moved some global variables and the module handling.

netinet/ip_fw_pfil.c
New file containing the ipfw PFIL handlers and module initialization.

netinet/ip_input.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code. ip_forward() does not longer require
the 'next_hop' struct sockaddr_in argument. Disable early checks
if 'srcrt' is set.

netinet/ip_output.c
Removed all direct ipfw handling code and replace it with the new
'ipfw forward' handling code.

netinet/ip_var.h
Add ip_reass() as general function. (Used from ipfw PFIL handlers
for IPDIVERT.)

netinet/raw_ip.c
Directly check if ipfw and dummynet control pointers are active.

netinet/tcp_input.c
Rework the 'ipfw forward' to local code to work with the new way of
forward tags.

netinet/tcp_sack.c
Remove include 'opt_ipfw.h' which is not needed here.

sys/mbuf.h
Remove m_claim_next() macro which was exclusively for ipfw 'forward'
and is no longer needed.

Approved by: re (scottl)


133741 15-Aug-2004 jmg

Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers. Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks. Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by: green, rwatson (both earlier versions)


133671 13-Aug-2004 rwatson

Use IFQ_SET_MAXLEN() to set the maximum queue depth of the routing
socket netisr queue.

Pointed out by: winter


133603 12-Aug-2004 tackerman

Added two new media types for 10GBASE-SR and 10GBASE-LR


133513 11-Aug-2004 andre

Convert the routing table to use an UMA zone for rtentries. The zone is
called "rtentry".

This saves a considerable amount of kernel memory. R_Zmalloc previously
used 256 byte blocks (plus kmalloc overhead) whereas UMA only needs 132
bytes.

Idea from: OpenBSD


133460 11-Aug-2004 emax

Set IFF_RUNNING flag on the interface as soon as the control device is opened.


133261 07-Aug-2004 mlaier

Add a "void *if_carp" placeholder to struct ifnet with prospect to bring in
the "Common address redundancy protocol" (CARP) during the 5-STABLE cycle.
Hence doing the ABI break now.

Approved by: re (scottl)


133238 06-Aug-2004 rwatson

As SLIP directly accesses the tty code from its if_start() routine,
mark if_sl as IFF_NEEDSGIANT.


133200 06-Aug-2004 roam

Do not attempt to clean up data that has not been initialized yet.
This fixes two kernel panics on boot when the xl driver fails to
allocate bus/port/memory resources.

Reviewed by: silence on -net


133163 05-Aug-2004 sobomax

Set ip_v field properly.

PR: kern/69957


133148 05-Aug-2004 rwatson

Do a lockless read of the BPF interface structure descriptor list head
before grabbing BPF locks to see if there are any entries in order to
avoid the cost of locking if there aren't any. Avoids a mutex lock/
unlock for each packet received if there are no BPF listeners.


132780 28-Jul-2004 kan

Avoid casts as lvalues.


132778 28-Jul-2004 kan

Initialize ; variable eraly to shut up GCC warning.


132712 27-Jul-2004 rwatson

Add a new network interface flag, IFF_NEEDSGIANT, which will allow
device drivers to declare that the ifp->if_start() method implemented
by the driver requires Giant in order to operate correctly.

Add a 'struct task' to 'struct ifnet' that can be used to execute a
deferred ifp->if_start() in the event that if_start needs to be called
in a Giant-free environment. To do this, introduce if_start(), a
wrapper function for ifp->if_start(). If the interface can run MPSAFE,
it directly dispatches into the interface start routine. If it can't
run MPSAFE, we're running with debug.mpsafenet != 0, and Giant isn't
currently held, the task is queued to execute in a swi holding Giant
via if_start_deferred().

Modify if_handoff() to use if_start() instead of direct dispatch.
Modify 802.11 to use if_start() instead of direct dispatch.

This is intended to provide increased compatibility for non-MPSAFE
network device drivers in the presence of Giant-free operation via
asynchronous dispatch. However, this commit does not mark any network
interfaces as IFF_NEEDSGIANT.


132659 26-Jul-2004 yar

Stop tinkering with the parent's VLAN_MTU capability.
Now it is user-controlled through ifconfig(8).

The former ``automagic'' way of operation created more
trouble than good. First, VLAN_MTU consumers other than
vlan(4) had appeared, e.g., ng_vlan(4). Second, there was
no way to disable VLAN_MTU manually if it were causing
trouble, e.g., data corruption.

Dropping the ``automagic'' should be completely invisible
to the user since
a) all the drivers supporting VLAN_MTU
have it enabled by default, and in the first place
b) there is only one driver that can really toggle VLAN_MTU
in the hardware under its control (it's fxp(4), to which
I added VLAN_MTU controls to illustrate the principle.)


132602 24-Jul-2004 rwatson

Prefer NULL to '0' when checking a pointer value.


132557 22-Jul-2004 brooks

Actually free the unit when destroying the interface.

Reported by: la at delfi.lt
Tested by: la at delfi.lt
PR: 68618


132470 20-Jul-2004 mlaier

When removing the last reference to a cloner, do not try to unlock twice -
esp. not since the backing memory was just freed.

Reviewed by: rwatson


132368 18-Jul-2004 rwatson

Comment clarifying debug_mpsafenet.


132362 18-Jul-2004 rwatson

Gratuitous whitespace change to un-wrap a short line.


132226 15-Jul-2004 phk

Preparation commit for the tty cleanups that will follow in the near
future:

rename ttyopen() -> tty_open() and ttyclose() -> tty_close().

We need the ttyopen() and ttyclose() for the new generic cdevsw
functions for tty devices in order to have consistent naming.


132199 15-Jul-2004 phk

Do a pass over all modules in the kernel and make them return EOPNOTSUPP
for unknown events.

A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".


132152 14-Jul-2004 mlaier

Fix a copy-and-paste-o in IFQ_DRV_PREPEND - all pointyhats to me.
While here also fix a (not less stupid) braino in IFQ_DRV_PURGE.

Reported-by: clement
Tested-by: clement (_PREPEND in sis(4))


132131 14-Jul-2004 rwatson

Convert SLIP to using C99 structure initialization for its struct
linesw.


131856 09-Jul-2004 bms

Use ETHER_IS_MULTICAST() consistently in ether_resolvemulti().

Reviewed by: jmallett


131675 06-Jul-2004 bms

Use M_ZERO instead of bzero().


131674 06-Jul-2004 bms

Be consistent and use bzero() instead of memset().


131673 06-Jul-2004 bms

Use M_ZERO instead of memset() (!).


131672 06-Jul-2004 bms

Use M_ZERO instead of bzero().


131670 06-Jul-2004 bms

Replace a bzero() after malloc() with M_ZERO.


131669 06-Jul-2004 bms

Style.


131630 05-Jul-2004 rwatson

In the BPF and ethernet bridging code, don't allow callouts to execute
without Giant if we're not debug.mpsafenet=1.


131586 04-Jul-2004 bms

Workaround a locking problem in vlan(4). vlan_setmulti() may be called
with sleepable locks held from further up in the network stack, and
attempts to allocate memory to hold multicast group membership information
with M_WAITOK.

This panic was triggered specifically when an exiting routing daemon
process closes its raw sockets after joining multicast groups on them.

While we're here, comment some possible locking badness.

PR: kern/48560


131580 04-Jul-2004 bms

style(9)/whitespace cleanup while I'm in this file.


131571 04-Jul-2004 bms

The net.link.ether.bridge.enable sysctl MIB variable enables bridge
functionality by setting to a non-zero value. This is an integer, but
is treated as a boolean by the code, so clamp it to a boolean value
when set so as to avoid unnecessary bridge reinitialization if it's
changed to another value.

PR: kern/61174
Requested by: Bruce Cran


131477 02-Jul-2004 brooks

Don't announce the ethernet address when it's 00:00:00:00:00:00. It's
not of any interest. This primairly happens when vlan(4) interfaces are
created.


131455 02-Jul-2004 mlaier

Bring in the first chunk of altq driver modifications. This covers the
following drivers: bfe(4), em(4), fxp(4), lnc(4), tun(4), de(4) rl(4),
sis(4) and xl(4)

More patches are pending on: http://peoples.freebsd.org/~mlaier/ Please take
a look and tell me if "your" driver is missing, so I can fix this.

Tested-by: many
No-objection: -current, -net


131241 28-Jun-2004 rik

Do not m_free packet since IF_HANDOFF (called from netisr_queue) will
do it for us, just count it.


131178 27-Jun-2004 pjd

Those are unneeded too.


131177 27-Jun-2004 pjd

Add two missing includes and remove two uneeded.
This is quite serious fix, because even with MAC framework compiled in,
MAC entry points in those two files were simply ignored.


131134 26-Jun-2004 phk

Pick the hotchar out of the tty structure instead of caching private
copies.

No current line disciplines have a dynamically changing hotchar, and
expecting to receive anything sensible during a change in ldisc is
insane so no locking of the hotchar field is necessary.


131130 26-Jun-2004 phk

Fix line discipline switching issues: If opening a new ldisc fails,
we have to revert to TTYDISC which we know will successfully open
rather than try the previous ldisc which might also fail to open.

Do not let ldisc implementations muck about with ->t_line, and remove
code which checks for reopens, it should never happen.

Move ldisc->l_hotchar to tty->t_hotchar and have ldisc implementation
initialize it in their open routines. Reset to zero when we enter
TTYDISC. ("no" should really be -1 since zero could be a valid
hotchar for certain old european mainframe protocols.)


131093 25-Jun-2004 rik

Do not count loobacks as other fuilures.
As a result magic will not be rejected any more in case of loopback.

Discussed with: joerg@


131050 24-Jun-2004 joerg

Add a couple of #ifdef DEBUG printf()s in vlan_input() I found to be
useful when debugging the ether_demux() problem (when bridging over
VLANs).


131049 24-Jun-2004 joerg

When considering an ethernet frame that is not destined for us, do not
only allow this to be further processed when bridging is active on
that interface, but also if the current packet has a VLAN tag and
VLANs are active on our interface. This gives the VLAN layers a
chance to also consider the packet (and perhaps drop it instead of the
main dispatcher).

This fixes a situation where bridging was only active on VLAN
interfaces but ether_demux() called on behalf of the main interface
had already thrown the packet away.

MFC after: 4 weeks


131048 24-Jun-2004 des

Make dependencies on the TCP/IP stack conditional on INET / INET6. This
makes it possible to build a kernel with NIC drivers but no TCP/IP stack.

Sponsored by: Teleplan AS


130933 22-Jun-2004 brooks

Major overhaul of pseudo-interface cloning. Highlights include:

- Split the code out into if_clone.[ch].
- Locked struct if_clone. [1]
- Add a per-cloner match function rather then simply matching names of
the form <name><unit> and <name>.
- Use the match function to allow creation of <interface>.<tag>
vlan interfaces. The old way is preserved unchanged!
- Also the match function to allow creation of stf(4) interfaces named
stf0, stf, or 6to4. This is the only major user visible change in
that "ifconfig stf" creates the interface stf rather then stf0 and
does not print "stf0" to stdout.
- Allow destroy functions to fail so they can refuse to delete
interfaces. Currently, we forbid the deletion of interfaces which
were created in the init function, particularly lo0, pflog0, and
pfsync0. In the case of lo0 this was a panic implementation so it
does not count as a user visiable change. :-)
- Since most interfaces do not need the new functionality, an family of
wrapper functions, ifc_simple_*(), were created to wrap old style
cloner functions.
- The IF_CLONE_INITIALIZER macro is replaced with a new incompatible
IFC_CLONE_INITIALIZER and ifc_simple consumers use IFC_SIMPLE_DECLARE
instead.

Submitted by: Maurycy Pawlowski-Wieronski <maurycy at fouk.org> [1]
Reviewed by: andre, mlaier
Discussed on: net


130799 20-Jun-2004 markm

Give zlib the ability to be a module that can be depended on,
in the MODULE_DEPEND() sense.


130731 19-Jun-2004 bde

Include <sys/_lock.h>'s prerequisite <sys/queue.h> before including the
former, not after.

Don't hide this bug by including <sys/queue.h> in <sys/_lock.h>.


130640 17-Jun-2004 phk

Second half of the dev_t cleanup.

The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()

Various minor adjustments including handling of userland access to kernel
space struct cdev etc.


130585 16-Jun-2004 phk

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


130549 15-Jun-2004 mlaier

Replace IF_HANDOFF with new IFQ_HANDOFF to enqueue with ALTQ once enabled on
the respective drivers.


130514 15-Jun-2004 rwatson

Lock down rawcb_list, a global list of control blocks for raw sockets,
using rawcb_mtx. Hold this mutex while modifying or iterating over
the control list; this means that the mutex is held over calls into
socket delivery code, which no longer causes a lock order reversal as
the routing socket code uses a netisr to avoid recursing socket ->
routing -> socket.

Note: Locking of IPsec consumers of rawcb_list is not included in this
commit.


130512 15-Jun-2004 mlaier

Fix a typeo in IFQ_HANDOFF.


130508 15-Jun-2004 mlaier

Transform tbr_dequeue into a function pointer in order to build drivers with
ALTQ enabled versions of IFQ_* macros by default, as requested by serveral
others. This is a follow-up to the quick fix I committed yesterday which
turned off the ALTQ checks for non-ALTQ kernels.


130456 14-Jun-2004 dfr

Fix big-endian build.


130449 14-Jun-2004 mlaier

Unbreak non-ALTQ kernel linking. I forgot about tbr_dequeue.

In the end drivers should be building with ALTQ checks by default, but for
now build them with the old macros for non-ALTQ kernels.

Note: Check new features w/ LINT *and* w/ LINT minus the new feature.

Found-by: rwatson


130429 13-Jun-2004 dfr

Add MAC framework bits to the output path.


130425 13-Jun-2004 dfr

Remove advertising clause.


130416 13-Jun-2004 mlaier

Link ALTQ to the build and break with ABI for struct ifnet. Please recompile
your (network) modules as well as any userland that might make sense of
sizeof(struct ifnet).
This does not change the queueing yet. These changes will follow in a
seperate commit. Same with the driver changes, which need case by case
evaluation.

__FreeBSD_version bump will follow.

Tested-by: (i386)LINT


130407 13-Jun-2004 dfr

Add a new driver to support IP over firewire. This driver is intended to
conform to the rfc2734 and rfc3146 standard for IP over firewire and
should eventually supercede the fwe driver. Right now the broadcast
channel number is hardwired and we don't support MCAP for multicast
channel allocation - more infrastructure is required in the firewire
code itself to fix these problems.


130387 12-Jun-2004 rwatson

Extend coverage of SOCK_LOCK(so) to include so_count, the socket
reference count:

- Assert SOCK_LOCK(so) macros that directly manipulate so_count:
soref(), sorele().

- Assert SOCK_LOCK(so) in macros/functions that rely on the state of
so_count: sofree(), sotryfree().

- Acquire SOCK_LOCK(so) before calling these functions or macros in
various contexts in the stack, both at the socket and protocol
layers.

- In some cases, perform soisdisconnected() before sotryfree(), as
this could result in frobbing of a non-present socket if
sotryfree() actually frees the socket.

- Note that sofree()/sotryfree() will release the socket lock even if
they don't free the socket.

Submitted by: sam
Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS


130336 11-Jun-2004 rwatson

Constify raw_sendspace and raw_recvspace, as they're not mutable.


130335 11-Jun-2004 rwatson

Switch to conditionally acquiring and dropping Giant around calls into
ifp->if_output() basedd on debug.mpsafenet. That way once bpfwrite()
can be called without Giant, it will acquire Giant (if desired) before
entering the network stack.


130334 11-Jun-2004 rwatson

Un-staticize 'dst' sockaddr in the stack of bpfwrite() to prevent
the need to synchronize access to the structure. I believe this
should fit into the stack under the necessary circumstances, but
if not we can either add synchronization or use a thread-local
malloc for the duration.


130256 09-Jun-2004 rwatson

Introduce a netisr to deliver kernel-generated routing, avoiding
recursive entering of the socket code from the routing code:

- Modify rt_dispatch() to bundle up the sockaddr family, if any,
associated with a pending mbuf to dispatch to routing sockets, in
an m_tag on the mbuf.

- Allocate NETISR_ROUTE for use by routing sockets.

- Introduce rtsintrq, an ifqueue to be used by the netisr, and
introduce rts_input(), a function to unbundle the tagged sockaddr
and inject the mbuf and address into raw_input(), which previously
occurred in rt_dispatch().

- Introduce rts_init() to initialize rtsintrq, its mutex, and
register the netisr. Perform this at the same point in system
initialization as setup of the domains.

This change introduces asynchrony between the generation of a
pending routing socket message and delivery to sockets for use
by userspace. It avoids socket->routing->rtsock->socket use and
helps to avoid lock order reversals between the routing code and
socket code (in particular, raw socket control blocks), as route
locks are held over calls to rt_dispatch().

Reviewed by: "George V.Neville-Neil" <gnn@neville-neil.com>
Conceptual head nod by: sam


130202 07-Jun-2004 phk

Use ldisc_[de]register() instead of frobbing linesw[] directly.


130015 02-Jun-2004 naddy

Add helper functions to calculate the standard ethernet CRC in
little/big endian fashion, so that network drivers can just reference
the standard implementation and don't have to bring their own.

As discussed on arch@.

Obtained from: NetBSD


129880 30-May-2004 phk

add missing #include <sys/module.h>


129876 30-May-2004 phk

Add some missing <sys/module.h> includes which are masked by the
one on death-row in <sys/kernel.h>


129874 30-May-2004 dwmalone

Make the comment for DLT_NULL slightly more accurate.

PR: 62272
Submitted by: Radim Kolar <hsn@netmag.cz>
MFC after: 1 week


129748 26-May-2004 yar

if_printf() won't emit a newline unless told to.


129734 25-May-2004 rik

Keepalive timer should be added if we does not have any sppp consumers before
and should be deleted if we do not have any anymore.


129717 25-May-2004 yar

After all the relevant drivers have been fixed, fix vlan(4) itself
WRT manipulating capabilities of the parent interface:

- use ioctl(SIOCSIFCAP) to toggle VLAN_MTU (the way that was done
before was just wrong);

- use the right order of conditional clauses to set the MTU fudge
(that is logically independent from toggling VLAN_MTU.)


129648 24-May-2004 mux

Remove another redundant if_output initialization.


129637 23-May-2004 yar

Consult parent's if_capenable for active VLAN-related capabilities.
This change is possible since all the relevant drivers have been
fixed to set if_capenable properly. The field if_capabilities tracks
supported capabilities, which may be disabled administratively.

Inheriting checksum offload support from the parent interface isn't
that easy because the checksumming capabilities of the parent may be
toggled on the fly. Disable the code for now.


129539 21-May-2004 ru

Added dependency on the miibus module.


129089 10-May-2004 csjp

Zero the un-used portions of the struct sockaddr data before sending
it back to userspace, so it does not break bind(2) on raw sockets in jails.

Currently some processes, like traceroute(8) construct a routing request
to determine its source address based on the destination. This sockaddr
data is fed directly to bind(2). When bind calls ifa_ifwithaddr(9) to
make sure the address exists on the interface, the comparison will
fail causing bind(2) to return EADDRNOTAVAIL if the data wasnt zero'ed
before initialization.

Approved by: bmilekic (mentor)


128907 04-May-2004 scottl

Add route.h to pick up the rt_ifmsg() declaration.


128880 03-May-2004 maxim

o Fix misindentation in the previous commit.


128871 03-May-2004 andre

Link state change notification of ethernet media to the routing socket.

o Extend the if_data structure with an ifi_link_state field and
provide the corresponding defines for the valid states.

o The mii_linkchg() callback updates the ifi_link_state field
and calls rt_ifmsg() to notify listeners on the routing socket
in addition to the kqueue KNOTE.

o If vlans are configured on a physical interface notify and update
all vlan pseudo devices as well with the vlan_link_state() callback.

No objections by: sam, wpaul, ru, bms
Brucification by: bde


128664 26-Apr-2004 bmilekic

Give jail(8) the feature to allow raw sockets from within a
jail, which is less restrictive but allows for more flexible
jail usage (for those who are willing to make the sacrifice).
The default is off, but allowing raw sockets within jails can
now be accomplished by tuning security.jail.allow_raw_sockets
to 1.

Turning this on will allow you to use things like ping(8)
or traceroute(8) from within a jail.

The patch being committed is not identical to the patch
in the PR. The committed version is more friendly to
APIs which pjd is working on, so it should integrate
into his work quite nicely. This change has also been
presented and addressed on the freebsd-hackers mailing
list.

Submitted by: Christian S.J. Peron <maneo@bsdpro.com>
PR: kern/65800


128636 25-Apr-2004 luigi

This commit does two things:

1. rt_check() cleanup:
rt_check() is only necessary for some address families to gain access
to the corresponding arp entry, so call it only in/near the *resolve()
routines where it is actually used -- at the moment this is
arpresolve(), nd6_storelladdr() (the call is embedded here),
and atmresolve() (the call is just before atmresolve to reduce
the number of changes).
This change will make it a lot easier to decouple the arp table
from the routing table.

There is an extra call to rt_check() in if_iso88025subr.c to
determine the routing info length. I have left it alone for
the time being.

The interface of arpresolve() and nd6_storelladdr() now changes slightly:
+ the 'rtentry' parameter (really a hint from the upper level layer)
is now passed unchanged from *_output(), so it becomes the route
to the final destination and not to the gateway.
+ the routines will return 0 if resolution is possible, non-zero
otherwise.
+ arpresolve() returns EWOULDBLOCK in case the mbuf is being held
waiting for an arp reply -- in this case the error code is masked
in the caller so the upper layer protocol will not see a failure.

2. arpcom untangling
Where possible, use 'struct ifnet' instead of 'struct arpcom' variables,
and use the IFP2AC macro to access arpcom fields.
This mostly affects the netatalk code.

=== Detailed changes: ===
net/if_arcsubr.c
rt_check() cleanup, remove a useless variable

net/if_atmsubr.c
rt_check() cleanup

net/if_ethersubr.c
rt_check() cleanup, arpcom untangling

net/if_fddisubr.c
rt_check() cleanup, arpcom untangling

net/if_iso88025subr.c
rt_check() cleanup

netatalk/aarp.c
arpcom untangling, remove a block of duplicated code

netatalk/at_extern.h
arpcom untangling

netinet/if_ether.c
rt_check() cleanup (change arpresolve)

netinet6/nd6.c
rt_check() cleanup (change nd6_storelladdr)


128626 25-Apr-2004 luigi

fix one typo and remove one wrong line


128622 24-Apr-2004 luigi

Correct and extend the description of the behaviour of rt_check().


128621 24-Apr-2004 luigi

document the locking behaviour of the functions that access
the routing table.


128618 24-Apr-2004 luigi

arpcom untangling:

consistently with the rest of the code, use IFP2AC(ifp) to access
the arpcom structure given the ifp.

In this case also fix a difference in assumptions WRT the rest of
the net/ sources: it is not the 'struct *softc' that starts with a
'struct arpcom', but a 'struct arpcom' that starts with a
'struct ifnet'


128617 24-Apr-2004 luigi

arpcom untangling:
do not use struct arpcom directly, rather use IFP2AC(ifp).


128615 24-Apr-2004 luigi

arpcom untangling:
- use ifp instead if &ac->ac_if in a couple of nd6* calls;
this removes a useless dependency.

- use IFP2AC(ifp) instead of an extra variable to point to the struct arpcom;
this does not remove the nesting dependency between arpcom and ifnet but
makes it more evident.


128583 23-Apr-2004 andre

Add the comment of the previous commit to the source file directly.

Requested by: ru


128580 23-Apr-2004 andre

Call ip_output() with IP_FORWARD flag to prevent it from overwriting the
ip_id again. ip_id is already set to the ip_id of the encapsulated packet.

Make a comment about mbuf allocation failures more realistic.

Reviewed by: sobomax


128525 21-Apr-2004 luigi

Readability fixes:

Clearly comment the assumptions on the structure of keys (addresses)
and masks, and introduce a macro, LEN(p), to extract the size of these
objects instead of using *(u_char *)p which might be confusing.

Comment the confusion in the types used to pass around pointers
to keys and masks, as a reminder to fix that at some point.

Add a few comments on what some functions do.

Comment a probably inefficient (but still correct) section of code
in rn_walktree_from()

The object code generated after this commit is the same as before.

At some point we should also change same variable identifiers such
as "t, tt, ttt" to fancier names such as "root, left, right" (just
in case someone wants to understand the code!), replace misspelling
of NULL as 0, remove 'register' declarations that make little sense
these days.


128524 21-Apr-2004 luigi

Clearly comment the assumptions that allow us to cast a
'struct radix_node *' to a 'struct rtentry *' in this code,
and introduce a macro, RNTORT(), to do this type conversion.


128455 20-Apr-2004 luigi

Fix the initial check for NULL arguments in rtfree (previously
it checked for rt == NULL after dereferencing the pointer).
We never check for those events elsewhere, so probably these checks
might go away here as well.

Slightly simplify (and document) the logic for memory allocation
in rt_setgate().

The rest is mostly style changes -- replace 0 with NULL where appropriate,
remove the macro SA() that was only used once, remove some useless
debugging code in rt_fixchange, explain some odd-looking casts.


128454 20-Apr-2004 luigi

Document an assumption on the structure of 'struct rtentry'


128433 19-Apr-2004 luigi

Add some comments, move a static array of constants in the only place
where it is used, and replace R_Malloc with R_Zalloc in a couple
of places removing the corresponding bzero()'s


128432 19-Apr-2004 luigi

Fix a recently introduced panic in if_detach() by delaying
the invalidation of ifindex_table[] entry. Probably this
code should be moved even further down, but for the time being
let's do it this way.


128420 19-Apr-2004 ru

More style and deobfuscation fixes.

Submitted by: bde


128417 19-Apr-2004 brooks

Use an tempory struct ifnet *ifp instead of sc->sc_if to access the
ifnet in stf_clone_create. Also use if_printf() instead of printf().


128413 19-Apr-2004 rwatson

First pass at softc list locking for if_ppp.c. Many parts of
this patch were submitted by Maurycy Pawlowski-Wieronski. In addition
to Maurycy's change, break out softc tear down from ppp_clone_destroy()
into ppp_destroy() rather than performing a convoluted series of
extraction casts and indirections during tear down at mod unload.

Submitted by: Maurycy Pawlowski-Wieronski <maurycy@fouk.org>


128409 18-Apr-2004 ru

Style and code unobfuscation.


128408 18-Apr-2004 ru

Fixed a bug from rev. 1.42: cast to a correct type.

Submitted by: luigi


128407 18-Apr-2004 mlaier

Make if_(un)route static in if.c as they are called from if_up/if_down only.
This is also cleanup to make locking easier.

Reviewed by: luigi
Approved by: bms(mentor)


128401 18-Apr-2004 luigi

+ move MKGet()/MKFree() into the only file that can use them.

+ remove useless wrappers around bcmp(), bcopy(), bzero().
The code assumes that bcmp() returns 0 if the size is 0, but
this is true for both the libc and the libkern versions.

+ nuke Bcmp, Bzero, Bcopy from radix.h now that nobody uses them anymore.


128400 18-Apr-2004 luigi

+ replace Bcmp/Bzero with 'the real thing' as in the rest of the file.
+ remember to check and fix or explain a strange cast in route_output()


128399 18-Apr-2004 luigi

replace Bcopy with bcopy as in the rest of the file.


128396 18-Apr-2004 luigi

replace Bcmp() with the same bcmp() used in the rest of the file.


128376 18-Apr-2004 luigi

+ rename and document an unused field in struct arpcom (field is still
there so there are no ABI changes);
+ replace 5 redefinitions of the IPF2AC macro with one in if_arp.h

Eventually (but before freezing the ABI) we need to get rid of
struct arpcom (initially with the help of some smart #defines
to avoid having to touch each and every driver, see below).

Apart from the struct ifnet, struct arpcom now only stores a copy
of the MAC address (ac_enaddr, but we already have another copy in
the struct ifnet -- if_addrhead), and a netgraph-specific field
which is _always_ accessed through the ifp, so it might well go
into the struct ifnet too (where, besides, there is already an entry
for AF_NETGRAPH data...)

Too bad ac_enaddr is widely referenced by all drivers. But
this can be fixed as follows:

#define ac_enaddr ac_if.the_original_ac_enaddr_in_struct_ifnet

(note that the right hand side would likely be a pointer rather than
the base address of an array.)


128373 18-Apr-2004 luigi

Minor changes to improve code readability (no actual code changes):
+ replace 0 with NULL where appropriate (not complete)
+ remove register declaration while there
+ add argument names to function prototypes to have a better idea of
what they are used for
+ add 'const' qualifiers in 3 places


128357 17-Apr-2004 luigi

make route_init() static


128356 17-Apr-2004 luigi

misc cleanup in sysctl_ifmalist():
+ remove a partly incorrect comment that i introduced in the last commit;
+ deal with the correct part of the above comment by cleaning up the
updates of 'info' -- rti_addrs needd not to be updated,
rti_info[RTAX_IFP] can be set once outside the loop.
While at it, correct a few misspelling of NULL as 0, but there are
way too many in this file, and i did not want to clutter the
important part of this commit.


128316 16-Apr-2004 luigi

Use if_link instead of the alias if_list, and change a for() into
the TAILQ_FOREACH() form.

Comment the need to store the same info (mac address for ethernet-type
devices) in two different places.

No functional changes. Even the compiler output should be unmodified
by this change.


128315 16-Apr-2004 luigi

Documented the intended usage of if_addrhead and ifaddr_byindex()
This commit only changes comments. Nothing to recompile.


128311 16-Apr-2004 luigi

Consistently use ifaddr_byindex() to access the link-level address
of an interface. No functional change.

On passing, comment a likely bug in net/rtsock.c:sysctl_ifmalist()
which, if confirmed, would deserve to be fixed and MFC'ed


128291 15-Apr-2004 luigi

Document the way if_addrhead and struct ifaddr are used.
Remove a member from 'struct ifaddr' which has been in an
#ifdef notdef block since rev 1.1

No ABI changes -- no need to recompile anything.


128288 15-Apr-2004 rwatson

If IF_HANDOFF() or netisr_queue() fail, they will free the mbuf. When
this happens, set (m) to NULL or we'll try to free it a second time on
return.

Submitted by: Pavel Gulchouck <gul@gul.kiev.ua>


128209 14-Apr-2004 brooks

Staticize <if>_clone_{create,destroy} functions.

Reviewed by: mlaier


128195 13-Apr-2004 fjoe

Add Direct Sequence 354K and 512K (needed for arl(4)).


128185 13-Apr-2004 luigi

route.h: introduce a macro, SA_SIZE(struct sockaddr *) which returns
the space occupied by a struct sockaddr when passed through a
routing socket.
Use it to replace the macro ROUNDUP(int), that does the same but
is redefined by every file which uses it, courtesy of
the School of Cut'n'Paste Programming(TM).

(partial) userland changes to follow.


128168 12-Apr-2004 luigi

remove an almost-duplicate piece of code by setting the loop
limits appropriately.


128167 12-Apr-2004 luigi

in rtinit(), remove one useless variable, and move a few others
within the block where they are used.


128157 12-Apr-2004 ru

Count outgoing link-level broadcast packets in if_omcasts.
I'm not sure this is completely correct but at least this
is consistent with the accounting of incoming broadcasts.

PR: kern/65273
Submitted by: David J Duchscher <daved@tamu.edu>


128124 11-Apr-2004 rwatson

In 4.x, if_ipending is used to track network interrupt state. In 5.x,
it is no longer used, so GC the ifnet.if_ipending field.


128113 11-Apr-2004 ru

Added the new interface capability option for drivers that implement
user-configurable polling(4) support. Make ifconfig(8) aware of it.

Suggested by: luigi


128019 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


127898 05-Apr-2004 ru

Properly detect loops by recording the interface pointer in an mtag.
For now, preserve the gif_called functionality to limit the nesting
level because uncontrolled nesting can easily cause the kernel stack
exhaustion. Rumors are it should be shot to allow people to easily
shoot themselves in the foot, but I have ran out of cartridges. ;)


127836 04-Apr-2004 luigi

whoops, forgot to fix these places where arpresolve() was used

Detected by: tinderbox


127828 04-Apr-2004 luigi

+ arpresolve(): remove an unused argument
+ struct ifnet: remove unused fields, move ipv6-related field close
to each other, add a pointer to l3<->l2 translation tables (arp,nd6,
etc.) for future use.

+ struct route: remove an unused field, move close to each
other some fields that might likely go away in the future


127736 01-Apr-2004 rwatson

For now, restore an splx(s) I removed when introducing slisunitfree().


127692 31-Mar-2004 rwatson

Abstract "is a particular SLIP unit free" check behind slisunitfree(),
and use that instead of manual list searches in a couple of places.


127674 31-Mar-2004 bms

Add more DLT types required by libpcap 0.8.3.
Maintain numeric sort order.


127673 31-Mar-2004 bms

Update system bpf headers for libpcap 0.8.3.
Maintain listing of DLT link types in numeric order.


127591 29-Mar-2004 rwatson

Add per-softc locking to if_tun:

- Add tun_mtx to tun_softc. Annotate what is (and isn't) locked by it.
- Lock down tun_flags, tun_pid.
- In the output path, cache the value of tun_flags so it's consistent
when processing a particular packet rather than re-reading the field.
- In general, use unlocked reads for debugging.
- Annotate a couple of places where additional unlocked reads may be
possible.
- Annotate that tun_pid is used as a bug in tunopen().

if_tun is now largely MPSAFE, although questions remain about some of
the cdevsw fields and how they are synchronized.


127580 29-Mar-2004 rwatson

Lock down if_tun global variables using a new mutex, tunmtx. As with
other pseudo-interfaces, break out tear-down of a softc into a
separate tun_destroy() function, and invoke that from the module
unloader. Hold tunmtx across manipulations of the global softc list.


127541 29-Mar-2004 rwatson

Modify BPF descriptor assertions to assert Giant when a BPF descriptor
lock is asserted and running non-MPSAFE.


127307 22-Mar-2004 rwatson

Lock down global variables in if_gre:

- Add gre_mtx to protect global softc list.
- Hold gre_mtx over various list operations (insert, delete).
- Centralize if_gre interface teardown in gre_destroy(), and call this
from modevent unload and gre_clone_destroy().
- Export gre_mtx to ip_gre.c, which walks the gre list to look up gre
interfaces during encapsulation. Add a wonking comment on how we need
some sort of drain/reference count mechanism to keep gre references
alive while in use and simultaneous destroy.

This commit does not lockdown softc data, which follows in a future
commit.


127305 22-Mar-2004 rwatson

Lock down global variables in if_gif:

- Add gif_mtx, which protects globals.
- Hold gif_mtx around manipulation of gif_softc_list.
- Abstract gif destruction code into gif_destroy(), which tears down
a softc after it's been removed from the global list by either module
unload or clone destroy.
- Lock gif_called, even though we know gif_called is broken with reentrant
network processing.
- Document an event ordering problem in gif_set_tunnel() that will need
to be fixed.

gif_softc fields not locked down in this commit.


127303 22-Mar-2004 rwatson

Move "called", a static function variable used to detect recursive
processing with gif interfaces, to a global variable named "gif_called".
Add an annotation that this approach will not work with a reentrant
network stack, and that we should instead use packet tags to detect
excessive recursive processing.


127290 22-Mar-2004 mdodd

MAC addresses are 8 bits in ARCNET. Adjust bcopy().


127275 21-Mar-2004 mdodd

- Correct variable name.
- Correct unnecessary use of htons().

Reported by: many.


127260 21-Mar-2004 mdodd

Handle AF_ARP.


127170 18-Mar-2004 rwatson

Correct a bug introduced with the recent clone API chang: when the clone
event handler for if_tap fails, make sure to clean up clone state to
prevent a clone memory leak.


127165 18-Mar-2004 rwatson

sAdd a comment indicating why there continues to be a race condition in
the tap driver, even with Giant over the cdev operation vector, due to
a non-atomic test-and-set of the si_drv1 field in the dev_t. This bug
exists with Giant under high memory pressure, as malloc() may sleep
in tapcreate(), but is less likely to occur. The resolution will
probably be to cover si_drv1 using the global tapmtx since no softc is
available, but I need to think about this problem more generally
across a range of drivers using si_drv1 in combination with SI_CHEAPCLONE
to defer expensive allocation to open().

Correct what appears to be a bug in the original if_tap implementation,
in which tapopen() will panic if a tap device instance is opened more
than once due to an incorrect assertion -- only triggered if INVARIANTS
is compiled in (i.e., when built into a kernel). Return EBUSY instead.

Expand mtx_lock() coverage using tp->tap_mtx to include tp->ether_addr.


127099 17-Mar-2004 rwatson

Remove tun_proc; replace with tun_pid. tun_proc pointer may be stale
as the process that opens tun_softc can exit before the file
descriptor is closed.

Taiwan experience provided by: keichii
Crashing breakers provided by: Chia-liang Kao <clkao@clkao.org>


127098 17-Mar-2004 rwatson

Add tap_mtx to tap_softc in order to protect per-softc variables
(tap_pid, tap_flags). if_tap should now be entirely MPSAFE.

Committed from: Bamboo house by ocean in Taiwan
Tropical paradise provided by: Chia-liang Kao <clkao@clkao.org>


127003 15-Mar-2004 rwatson

Lock down global variables in if_tap (primarily, the tap softc list);
add tapmtx, which protects globale variables.

Notes:

- The EBUSY check in MOD_UNLOAD may be subject to a race. Moving the
event handler unregister inside the mutex grab may prevent that race.

- Locking of global variables safely is now possible because tapclones
is only modified when the module is loading or unloading, thanks to
phk's recent chang to clone_setup().

- softc locking to follow.


126966 14-Mar-2004 mdodd

Announce ethernet MAC addresss in ether_ifattach().


126951 14-Mar-2004 mdodd

Handle AF_ARP in *_output()

Obtained from: NetBSD


126939 14-Mar-2004 rwatson

Compare spppq to NULL instead of using spppq as a boolean.


126910 13-Mar-2004 rwatson

Constify interactive_ports, as its value is static, and therefore doesn't
require synchronization.


126908 13-Mar-2004 rwatson

Remove stale (unused) unit variables from if_tun and if_tap softc's.


126907 13-Mar-2004 rwatson

Constify iso88025_broadcastaddr to make it clear no explicit
synchronization is required.


126901 13-Mar-2004 brooks

Don't allow interfaces to be renamed to the empty string.
While I'm here, errors aren't bools.

Pointed out by: hmp


126900 13-Mar-2004 brooks

Remove if_withname. It came in with the KAME import, but never got
used. Should someone need its functionality, it's a really expensive
implementation of:
ifnet_byindex(sdl->sdl_index)

Reviewed by: bde, ume


126845 11-Mar-2004 phk

Add clone_setup() function rather than rely on lazy initialization.

Requested by: rwatson


126796 10-Mar-2004 phk

Fix handling of tap/vmnet flag in relation to cloning and properly enforce
largest supported unit number for this device driver.

Reported by: Kaho Toshikazu <kaho@easy.es.tuat.ac.jp>


126788 09-Mar-2004 rwatson

Const-poison ethernet and FDDI broadcast address constants, as they
are accessed read-only.


126783 09-Mar-2004 rwatson

Introduce stf_mtx to protect global softc list in if_stf. Add
stf_destroy() to handle the common softc destruction path for the
two destruction sources: interface cloning destroy, and module
unload.

NOTE: sc_ro, the cached route for stf conversion, is not synchronized
against concurrent access in this change, that will follow in a future
change.

Reviewed by: pjd


126781 09-Mar-2004 rwatson

Introduce faith_mtx to protect the if_faith global softc list.
Push if_faith softc destruction logic into faith_destroy() so that
it can be called after softc list removal in both the clone destroy
and module unload paths.


126778 09-Mar-2004 rwatson

Introduce lo_mtx to protect the global loopback softc list. I'm not
really sure why we have a softc list for if_loop, given that it
can't be unloaded, but that's an issue to revisit in the future as
corrupting the softc list would still cause panics.

Reviewed by: benno


126777 09-Mar-2004 rwatson

Introduce disc_mtx to protect the global softc list in if_disc.

Since there are two destroy paths for if_disc interfaces --
module unload and cloan interface destroy, create a new utility
function disc_destroy(), which is callded on a softc after it
has been removed from the global softc list; the cloaner and
module unload entry paths will both remove it before calling
disc_destroy().

Reviewed by: pjd


126709 07-Mar-2004 rwatson

Const-poison ip_stf_ttl to make it clear that the variable is not
modified at run-time.


126486 02-Mar-2004 mlaier

Two minor follow-ups on the MT_TAG removal:
ifp is now passed explicitly to ether_demux; no need to look it up again.
Make mtag a global var in ip_input.

Noticed by: rwatson
Approved by: bms(mentor)


126425 01-Mar-2004 rwatson

Rename dup_sockaddr() to sodupsockaddr() for consistency with other
functions in kern_socket.c.

Rename the "canwait" field to "mflags" and pass M_WAITOK and M_NOWAIT
in from the caller context rather than "1" or "0".

Correct mflags pass into mac_init_socket() from previous commit to not
include M_ZERO.

Submitted by: sam


126406 29-Feb-2004 rwatson

Define BPFD_LOCK_ASSERT() to assert the BPF descriptor lock.

Assert the BPF descriptor lock in the MAC calls referencing live
BPF descriptors.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, McAfee Research


126405 29-Feb-2004 rwatson

Grab Giant after MAC processing on outgoing packets being sent via
BPF. Grab the BPF descriptor lock before entering MAC since the MAC
Framework references BPF descriptor fields, including the BPF
descriptor label.

Submitted by: sam


126264 26-Feb-2004 mlaier

Bring eventhandler callbacks for pf.
This enables pf to track dynamic address changes on interfaces (dailup) with
the "on (<ifname>)"-syntax. This also brings hooks in anticipation of
tracking cloned interfaces, which will be in future versions of pf.

Approved by: bms(mentor)


126263 26-Feb-2004 mlaier

Tweak existing header and other build infrastructure to be able to build
pf/pflog/pfsync as modules. Do not list them in NOTES or modules/Makefile
(i.e. do not connect it to any (automatic) builds - yet).

Approved by: bms(mentor)


126239 25-Feb-2004 mlaier

Re-remove MT_TAGs. The problems with dummynet have been fixed now.

Tested by: -current, bms(mentor), me
Approved by: bms(mentor), sam


126188 24-Feb-2004 bde

Don't set d_flags twice. The second setting clobbered D_NOGIANT.


126080 21-Feb-2004 phk

Device megapatch 4/6:

Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.


126077 21-Feb-2004 phk

Device megapatch 2/6:

This commit adds a couple of functions for pseudodrivers to use for
implementing cloning in a manner we will be able to lock down (shortly).

Basically what happens is that pseudo drivers get a way to ask for
"give me the dev_t with this unit number" or alternatively "give
me a dev_t with the lowest guaranteed free unit number" (there is
unfortunately a lot of non-POLA in the exact numeric value of this
number, just live with it for now)

Managing the unit number space this way removes the need to use
rman(9) to do so in the drivers this greatly simplifies the code in
the drivers because even using rman(9) they still needed to manage
their dev_t's anyway.

I have taken the if_tun, if_tap, snp and nmdm drivers through the
mill, partly because they (ab)used makedev(), but mostly because
together they represent three different problems for device-cloning:

if_tun and snp is the plain case: just give me a device.

if_tap has two kinds of devices, with a flag for device type.

nmdm has paired devices (ala pty) can you can clone either of them.


126076 21-Feb-2004 phk

Device megapatch 1/6:

Free approx 86 major numbers with a mostly automatically generated patch.

A number of strategic drivers have been left behind by caution, and a few
because they still (ab)use their major number.


126064 21-Feb-2004 yar

Minor beautifications related to style(9) and code consistency.
No functional changes.


126062 21-Feb-2004 yar

Improve the SIOCSIFCAP handler a bit:
- allow for ifp->if_ioctl being NULL, as the rest of ifioctl() does;
- give the interface driver a chance to report a error to the caller;
- don't forget to update ifp->if_lastchange upon successful modification
of interface operation parameters.


125952 18-Feb-2004 mlaier

Backout MT_TAG removal (i.e. bring back MT_TAGs) for now, as dummynet is
not working properly with the patch in place.

Approved by: bms(mentor)


125879 16-Feb-2004 des

Random style fixes and a comment update. No functional changes.


125853 15-Feb-2004 dwmalone

Return EACCES rather than ENOBUFS if ipfw blocks a packet on the
way out at layer 2.

PR: 62385
Submitted by: Oleg Bulyzhin <oleg@rinet.ru>
Approved by: luigi
MFC after: 1 week


125784 13-Feb-2004 mlaier

This set of changes eliminates the use of MT_TAG "pseudo mbufs", replacing
them mostly with packet tags (one case is handled by using an mbuf flag
since the linkage between "caller" and "callee" is direct and there's no
need to incur the overhead of a packet tag).

This is (mostly) work from: sam

Silence from: -arch
Approved by: bms(mentor), sam, rwatson


125680 11-Feb-2004 bms

Initial import of RFC 2385 (TCP-MD5) digest support.

This is the first of two commits; bringing in the kernel support first.
This can be enabled by compiling a kernel with options TCP_SIGNATURE
and FAST_IPSEC.

For the uninitiated, this is a TCP option which provides for a means of
authenticating TCP sessions which came into being before IPSEC. It is
still relevant today, however, as it is used by many commercial router
vendors, particularly with BGP, and as such has become a requirement for
interconnect at many major Internet points of presence.

Several parts of the TCP and IP headers, including the segment payload,
are digested with MD5, including a shared secret. The PF_KEY interface
is used to manage the secrets using security associations in the SADB.

There is a limitation here in that as there is no way to map a TCP flow
per-port back to an SPI without polluting tcpcb or using the SPD; the
code to do the latter is unstable at this time. Therefore this code only
supports per-host keying granularity.

Whilst FAST_IPSEC is mutually exclusive with KAME IPSEC (and thus IPv6),
TCP_SIGNATURE applies only to IPv4. For the vast majority of prospective
users of this feature, this will not pose any problem.

This implementation is output-only; that is, the option is honoured when
responding to a host initiating a TCP session, but no effort is made
[yet] to authenticate inbound traffic. This is, however, sufficient to
interwork with Cisco equipment.

Tested with a Cisco 2501 running IOS 12.0(27), and Quagga 0.96.4 with
local patches. Patches for tcpdump to validate TCP-MD5 sessions are also
available from me upon request.

Sponsored by: sentex.net


125411 04-Feb-2004 brooks

Add the kernel side of network interface renaming support.

The basic process is to send a routing socket announcement that the
interface has departed, change if_xname, update the sockaddr_dl
associated with the interface, and announce the arrival of the interface
on the routing socket.

As part of this change, ifunit() is greatly simplified by testing
if_xname directly. if_clone_destroy() now uses if_dname to look up the
cloner for the interface and if_dunit to identify the unit number.

Reviewed by: ru, sam (concept)
Vincent Jardin <vjardin AT free.fr>
Max Laier <max AT love2party.net>


125345 02-Feb-2004 brooks

More macro cleanup. Use the system roundup2() macro instead of making
our own ROUNDUP() macro.

Suggested by: bde


125226 30-Jan-2004 sobomax

Remove NetBSD'isms (add FreeBSD'isms?), which makes gre(4) working again.


125109 27-Jan-2004 brooks

Cleanup malloc() use in if_attach():
- malloc() returns a void* and does not need a cast
- when called with M_WAITOK, malloc() can not return NULL so don't
check for that case. The result of the check was bogus anyway since
it would leave the interface broken.


125062 27-Jan-2004 brooks

Clean up macro usage in if_attach():
- Use the system offsetof macro rather then making out own.
- undef ROUND after we use it rather then polluting the whole file.


125024 26-Jan-2004 sobomax

Add support for WCCPv2. It should be enablem manually using link2
ifconfig(8) flag since header for version 2 is the same but IP payload
is prepended with additional 4-bytes field.

Inspired by: Roman Synyuk <roman@univ.kiev.ua>
MFC after: 2 weeks


125020 26-Jan-2004 sobomax

(whilespace-only)

Kill trailing spaces.


125017 26-Jan-2004 harti

Add a device type for virtual interfaces.


125015 26-Jan-2004 harti

Add an ATM sub-type for virtual interfaces.


124872 23-Jan-2004 ru

Don't panic if there are more than 255 interfaces in the system.


124823 22-Jan-2004 onoe

Fix definition of IFM_MODE, which should be refrected the change of
IFM_IEEE80211_ mode. Also ifconfig(8) must be recompiled.
Pointed out by Sam Leffeler.


124808 21-Jan-2004 phk

Remove #ifdef for ancient source FreeBSD compat.


124805 21-Jan-2004 brooks

Don't leak softc's when destroying interfaces.

Init the softc list when loaded.

Noticed by: Maurycy Pawlowski-Wieronski <maurycy at fouk dot org>


124683 18-Jan-2004 yar

A network interface driver can support IFCAP_VLAN_MTU only,
without IFCAP_VLAN_HWTAGGING. The previous version of the
leading comment in this file could lead to the opposite conclusion.

Fix some typos in the comment as well.


124543 15-Jan-2004 onoe

Add support for FH phy, which will be used by awi driver.
Also some if_media constants to indicate operational mode are changed
to bitmasks to reduce diffs from NetBSD.


124283 09-Jan-2004 harti

Fix a warning when NATM is not defined. In this case s is not used.


124237 07-Jan-2004 sam

Remove extraneous unlock. This fixes a panic seen when manipulating static
entries in the ARP table.


124096 03-Jan-2004 sam

backout the switch to use a zone for vlan tags; this requires
vlans be present if any driver with h/w vlan tagging is configured


124078 02-Jan-2004 sam

switch vlan packet tag allocation to use a private zone


123992 30-Dec-2003 sobomax

Sync with NetBSD:

if_gre.c rev.1.41-1.49

o Spell output with two ts.
o Remove assigned-to but not used variable.
o fix grammatical error in a diagnostic message.
o u_short -> u_int16_t.
o gi_len is ip_len, so it has to be network byteorder.

if_gre.h rev.1.11-1.13

o prototype must not have variable name.
o u_short -> u_int16_t.
o Spell address with two d's.

ip_gre.c rev.1.22-1.29

o KNF - return is not a function.
o The "osrc" variable in gre_mobile_input() is only ever set but not
referenced; remove it.
o correct (false) assumptions on mbuf chain. not sure if it really helps, but
anyways, it is necessary to perform m_pullup.
o correct arg to m_pullup (need to count IP header size as well).
o remove redundant adjustment of m->m_pkthdr.len.
o clear m_flags just for safety.
o tabify.
o u_short -> u_int16_t.

MFC after: 2 weeks


123957 29-Dec-2003 tjr

Unbreak build of bpf-free kernels.


123922 28-Dec-2003 sam

o eliminate widespread on-stack mbuf use for bpf by introducing
a new bpf_mtap2 routine that does the right thing for an mbuf
and a variable-length chunk of data that should be prepended.
o while we're sweeping the drivers, use u_int32_t uniformly when
when prepending the address family (several places were assuming
sizeof(int) was 4)
o return M_ASSERTVALID to BPF_MTAP* now that all stack-allocated
mbufs have been eliminated; this may better be moved to the bpf
routines

Reviewed by: arch@ and several others


123875 26-Dec-2003 green

Don't truncate the interface name in ifunit(). It's now possible to query
"very long interface names", e.g.:
ndis_atheros0: flags=8847<UP,BROADCAST,DEBUG,RUNNING,SIMPLEX,MULTICAST> mtu 1500


123808 24-Dec-2003 sam

correct bridge_version: replace unexpanded RCS keywords by a fixed string

PR: kern/60251


123338 09-Dec-2003 bms

Declare gre(4) as being of IFT_TUNNEL, Like God Intended.

Suggested by: fenner


123262 07-Dec-2003 sam

bandaid LOR in rt_setgate; a proper fix requires code refactoring


123220 07-Dec-2003 imp

Make the if_broadcastaddr const. All the drivers in the tree which
violated the constness were corrected before the freeze. This was
suggested by mdodd@, I think, and sam@ and others have signed off on
this if I recall my conversations with them correctly.


123033 28-Nov-2003 silby

Remove the call to M_ASSERTVALID from BPF_MTAP; some mbufs passed to
mpf are allocated on the stack, which causes this check to falsely trigger.

A new check which takes on-stack mbufs into account will be reintroduced
after 5.2 is out the door.

Approved by: re (watson)
Requested by: many


122986 25-Nov-2003 sam

workaround LOR in rt_setgate

Reviewed by: andre
Approved by: re (rwatson)


122922 20-Nov-2003 andre

Introduce tcp_hostcache and remove the tcp specific metrics from
the routing table. Move all usage and references in the tcp stack
from the routing table metrics to the tcp hostcache.

It caches measured parameters of past tcp sessions to provide better
initial start values for following connections from or to the same
source or destination. Depending on the network parameters to/from
the remote host this can lead to significant speedups for new tcp
connections after the first one because they inherit and shortcut
the learning curve.

tcp_hostcache is designed for multiple concurrent access in SMP
environments with high contention and is hash indexed by remote
ip address.

It removes significant locking requirements from the tcp stack with
regard to the routing table.

Reviewed by: sam (mentor), bms
Reviewed by: -net, -current, core@kame.net (IPv6 parts)
Approved by: re (scottl)


122921 20-Nov-2003 andre

Remove RTF_PRCLONING from routing table and adjust users of it
accordingly. The define is left intact for ABI compatibility
with userland.

This is a pre-step for the introduction of tcp_hostcache. The
network stack remains fully useable with this change.

Reviewed by: sam (mentor), bms
Reviewed by: -net, -current, core@kame.net (IPv6 parts)
Approved by: re (scottl)


122875 18-Nov-2003 rwatson

Introduce a MAC label reference in 'struct inpcb', which caches
the MAC label referenced from 'struct socket' in the IPv4 and
IPv6-based protocols. This permits MAC labels to be checked during
network delivery operations without dereferencing inp->inp_socket
to get to so->so_label, which will eventually avoid our having to
grab the socket lock during delivery at the network layer.

This change introduces 'struct inpcb' as a labeled object to the
MAC Framework, along with the normal circus of entry points:
initialization, creation from socket, destruction, as well as a
delivery access control check.

For most policies, the inpcb label will simply be a cache of the
socket label, so a new protocol switch method is introduced,
pr_sosetlabel() to notify protocols that the socket layer label
has been updated so that the cache can be updated while holding
appropriate locks. Most protocols implement this using
pru_sosetlabel_null(), but IPv4/IPv6 protocols using inpcbs use
the the worker function in_pcbsosetlabel(), which calls into the
MAC Framework to perform a cache update.

Biba, LOMAC, and MLS implement these entry points, as do the stub
policy, and test policy.

Reviewed by: sam, bms
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


122702 14-Nov-2003 andre

Introduce ip_fastforward and remove ip_flow.

Short description of ip_fastforward:

o adds full direct process-to-completion IPv4 forwarding code
o handles ip fragmentation incl. hw support (ip_flow did not)
o sends icmp needfrag to source if DF is set (ip_flow did not)
o supports ipfw and ipfilter (ip_flow did not)
o supports divert, ipfw fwd and ipfilter nat (ip_flow did not)
o returns anything it can't handle back to normal ip_input

Enable with sysctl -w net.inet.ip.fastforwarding=1

Reviewed by: sam (mentor)


122699 14-Nov-2003 bms

Fix a bug whereby the physical endpoints of a gre(4) tunnel would not
be printed, if the module were loaded into a kernel which had INET6 enabled.

The gre(4) driver does not use INET6, nor is it specified for IPv6. The
tunnel_status() function in ifconfig(8) is somewhat overzealous and assumes
that all tunnel interfaces speak KAME ifioctls.

This fix follows the path of least resistance, by teaching gre(4) about
the two KAME ifioctls concerned.

PR: bin/56341


122685 14-Nov-2003 bms

Add a sysctl MIB, NET_RT_IFMALIST, to retrieve multicast group memberships
in a protocol-independent way.

Submitted by: harti


122683 14-Nov-2003 ume

fix comments.

Obtained from: KAME


122555 12-Nov-2003 ru

- vlan_start(): Increment the correct interface statistics member.

Reviewed by: mdodd

- vlan_input(): Macroize the VLAN tag extraction from mbuf.


122524 12-Nov-2003 rwatson

Modify the MAC Framework so that instead of embedding a (struct label)
in various kernel objects to represent security data, we embed a
(struct label *) pointer, which now references labels allocated using
a UMA zone (mac_label.c). This allows the size and shape of struct
label to be varied without changing the size and shape of these kernel
objects, which become part of the frozen ABI with 5-STABLE. This opens
the door for boot-time selection of the number of label slots, and hence
changes to the bound on the number of simultaneous labeled policies
at boot-time instead of compile-time. This also makes it easier to
embed label references in new objects as required for locking/caching
with fine-grained network stack locking, such as inpcb structures.

This change also moves us further in the direction of hiding the
structure of kernel objects from MAC policy modules, not to mention
dramatically reducing the number of '&' symbols appearing in both the
MAC Framework and MAC policy modules, and improving readability.

While this results in minimal performance change with MAC enabled, it
will observably shrink the size of a number of critical kernel data
structures for the !MAC case, and should have a small (but measurable)
performance benefit (i.e., struct vnode, struct socket) do to memory
conservation and reduced cost of zeroing memory.

NOTE: Users of MAC must recompile their kernel and all MAC modules as a
result of this change. Because this is an API change, third party
MAC modules will also need to be updated to make less use of the '&'
symbol.

Suggestions from: bmilekic
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


122498 11-Nov-2003 silby

Remove the m_defrag call from if_loop; testing with m_fragment
has shown that the IPv6 stack can clearly handle fragmented
mbuf chains without a problem.

MFC after: 1 week


122352 09-Nov-2003 tanimura

- Implement selwakeuppri() which allows raising the priority of a
thread being waken up. The thread waken up can run at a priority as
high as after tsleep().

- Replace selwakeup()s with selwakeuppri()s and pass appropriate
priorities.

- Add cv_broadcastpri() which raises the priority of the broadcast
threads. Used by selwakeuppri() if collision occurs.

Not objected in: -arch, -current


122334 08-Nov-2003 sam

replace explicit changes to rt_refcnt by RT_ADDREF and RT_REMREF
macros that expand to include assertions when the system is built
with INVARIANTS

Supported by: FreeBSD Foundation


122320 08-Nov-2003 sam

o add a flags parameter to netisr_register that is used to specify
whether or not the isr needs to hold Giant when running; Giant-less
operation is also controlled by the setting of debug_mpsafenet
o mark all netisr's except NETISR_IP as needing Giant
o add a GIANT_REQUIRED assertion to the top of netisr's that need Giant
o pickup Giant (when debug_mpsafenet is 1) inside ip_input before
calling up with a packet
o change netisr handling so swi_net runs w/o Giant; instead we grab
Giant before invoking handlers based on whether the handler needs Giant
o change netisr handling so that netisr's that are marked MPSAFE may
have multiple instances active at a time
o add netisr statistics for packets dropped because the isr is inactive

Supported by: FreeBSD Foundation


122152 05-Nov-2003 sam

o make debug_mpsafenet globally visible
o move it from subr_bus.c to netisr.c where it more properly belongs
o add NET_PICKUP_GIANT and NET_DROP_GIANT macros that will be used to
grab Giant as needed when MPSAFE operation is enabled

Supported by: FreeBSD Foundation


122058 04-Nov-2003 ume

- update comments to refrect recent BSDs.
- nuke unused macro PSUEDO_SET().
- I believe our if_xname stuff is nothing strange against other BSDs.

Obtained from: KAME


121816 31-Oct-2003 brooks

Replace the if_name and if_unit members of struct ifnet with new members
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.

This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.

Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)


121778 31-Oct-2003 brooks

Make TUNDEBUG use if_printf instead of printf.


121777 31-Oct-2003 brooks

Replace a couple printfs with if_printfs.


121770 30-Oct-2003 sam

Overhaul routing table entry cleanup by introducing a new rtexpunge
routine that takes a locked routing table reference and removes all
references to the entry in the various data structures. This
eliminates instances of recursive locking and also closes races
where the lock on the entry had to be dropped prior to calling
rtrequest(RTM_DELETE). This also cleans up confusion where the
caller held a reference to an entry that might have been reclaimed
(and in some cases used that reference).

Supported by: FreeBSD Foundation


121717 29-Oct-2003 sam

avoid recursive lock panic by unlocking before calling rtrequest;
this is consistent with other places but will be replaced
shortly by a "proper fix"

Supported by: FreeBSD Foundation
Pain felt by: Jiri Mikulas


121698 29-Oct-2003 sam

Always queue looped back packets (rather than potentially using
direct dispatch) to avoid extensive kernel stack usage and to
avoid directly re-entering the network stack. The latter causes
locking problems when, for example, a complete TCP handshake`
happens w/o a context switch.


121645 29-Oct-2003 sam

Introduce the notion of "persistent mbuf tags"; these are tags that stay
with an mbuf until it is reclaimed. This is in contrast to tags that
vanish when an mbuf chain passes through an interface. Persistent tags
are used, for example, by MAC labels.

Add an m_tag_delete_nonpersistent function to strip non-persistent tags
from mbufs and use it to strip such tags from packets as they pass through
the loopback interface and when turned around by icmp. This fixes problems
with "tag leakage".

Pointed out by: Jonathan Stone
Reviewed by: Robert Watson


121637 28-Oct-2003 brooks

Use VLANNAME instead of "vlan".


121596 27-Oct-2003 kan

Delay if_lo module intialization until domain list has been
completely populated. This prevents a system crash on boot.


121574 26-Oct-2003 ume

use official # for IFT_STF
(are there any backward compat issue? i don't think so)

Obtained from: KAME


121470 24-Oct-2003 ume

Since dp->dom_ifattach calls malloc() with M_WAITOK, we cannot
use mutex lock directly here. Protect ifp->if_afdata instead.

Reported by: grehan


121436 23-Oct-2003 imp

Remove unnecessary (caddr_t) casts of if_broadcastaddr.


121431 23-Oct-2003 brooks

Use IF_MAXUNIT instead of rolling our own identical TUN_MAXUNIT.


121428 23-Oct-2003 imp

Merge from p4 (noticed these changes with DES' if_ethersubr.c changes caused
a minor conflict):
o Use ETHER_ADDR_LEN in preference to '6'.
o Remove two unnecessary (caddr_t) casts. One of them causes problems in
my tree where etherbroadcastaddr is const, and (caddr_t) casts the const
away.


121422 23-Oct-2003 des

Clean up whitespace, remove "register" keyword, ANSIfy.
No functional changes.


121358 22-Oct-2003 ume

we have ppsratecheck().


121341 22-Oct-2003 ume

protect by IFNET_RLOCK.


121260 19-Oct-2003 silby

Add a new macro M_ASSERTVALID which ensures that the mbuf in question
is non-free. (More checks can/should be added in the future.)

Use M_ASSERTVALID in BPF_MTAP so that we catch when freed mbufs are
passed in, even if no bpf listeners are active.

Inspired by a bug in if_dc caught by Kenjiro Cho.


121161 17-Oct-2003 ume

- add dom_if{attach,detach} framework.
- transition to use ifp->if_afdata.

Obtained from: KAME


121139 16-Oct-2003 sam

Correct handling of cloning loop avoidance: rtalloc1 may return a null
pointer in which case we should not do the unlock.

Supported by: FreeBSD Foundatin


121135 16-Oct-2003 ume

AF_LINK sockaddr has to be attached to ifp->if_addrlist until the
end, as many of the code assumes that TAILQ_FIRST(ifp->if_addrlist)
is non-null.

Submitted by: itojun


121071 13-Oct-2003 ume

- support AES counter mode for ESP.
- use size_t as return type of schedlen(), as there's no error
check needed.
- clear key schedule buffer before freeing.

Obtained from: KAME


121061 13-Oct-2003 ume

- support AES XCBC MAC for AH
- correct SADB_X_AALG_RIPEMD160HMAC to 8

Obtained from: KAME


121048 12-Oct-2003 rwatson

Comment spelling fix.


120993 11-Oct-2003 sam

fix braino: null the pointer who's memory we just free'd, not some other
pointers that are (potentially) used later


120888 07-Oct-2003 sam

insure local variable is initialized prior to use


120885 07-Oct-2003 ume

return(code) -> return (code)

Obtained from: KAME


120820 05-Oct-2003 sam

fix typo that caused a panic when processing an ICMP redirect

Sponsored by: FreeBSD Foundation


120727 04-Oct-2003 sam

Locking for updates to routing table entries. Each rtentry gets a mutex
that covers updates to the contents. Note this is separate from holding
a reference and/or locking the routing table itself.

Other/related changes:

o rtredirect loses the final parameter by which an rtentry reference
may be returned; this was never used and added unwarranted complexity
for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
we assume the parent will remain as long as the clone; doing this avoids
a circularity in locking during delete
o convert some timeouts to MPSAFE callouts

Notes:

1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
applications cannot/do-no know about mutex's. Doing this requires
that the mutex be the last element in the structure. A better solution
is to introduce an externalized version of struct rtentry but this is
a major task because of the intertwining of rtentry and other data
structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
work to eliminate many held references. If not these will be resolved
prior to release.
3. ATM changes are untested.

Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS (partly)


120725 04-Oct-2003 sam

add a stub for bpfattach2 so bpf is not required with the 802.11
module or related drivers

Spotted by: Dan Lukes <dan@obluda.cz>


120704 03-Oct-2003 rwatson

When direct dispatching an netisr (net.isr.enable=1), if there are already
any queued packets for the isr, process those packets before the newly
submitted packet, maintaining ordering of all packets being delivered
to the netisr. Remove the bypass counter since we don't bypass anymore.
Leave the comment about possible problems and options since later
performance optimization may change the strategy for addressing ordering
problems here.

Specifically, this maintains the strong isr ordering guarantee; additional
parallelism and lower latency may be possible by moving to weaker
guarantees (per-interface, for example). We will probably at some point
also want to remove the one instance netisr dispatch limit currently
enforced by a mutex, but it's not clear that's 100% safe yet, even in
the netperf branch.

Reviewed by: sam, others


120703 03-Oct-2003 sam

trivial locking rtsock_cb

Sponsored by: FreeBSD Foundation


120701 03-Oct-2003 sam

cleanups prior to adding locking (and in some cases to eliminate locking):

o move route_cb to be private to rtsock.c
o replace global static route_proto by locals
o eliminate global #define shorthands for info references
o remove some register decls
o ansi-fy function decls
o move items to be close in scope to their usage
o add rt_dispatch function for dispatching the actual message
o cleanup tangled logic for doing all-but-me msg send

Support by: FreeBSD Foundation


120656 02-Oct-2003 rwatson

Create a tunable for net.isr.enable so that it may be set from
inception, rather than having to wait for the boot to finish.


120653 01-Oct-2003 rwatson

Temporarily turn net.isr.enable back off again until patches to
correct potential nits in packet ordering are resolved.


120650 01-Oct-2003 rwatson

Enable net.isr.enable by default, causing "delivery to completion"
(direct dispatch) in interrupt threads when the netisr in question
isn't already active. If a netisr is already active, or direct
dispatch is already in progress, we queue the packet for later
delivery. Previously, this option was disabled by default. I have
measured 20%+ performance improvements in IP packet forwarding with
this enabled.

Please report any problems ASAP, especially relating to stack depth or
out-of-order packet processing.

Discussed with: jlemon, peter
Sponsored by: DARPA, Network Associates Laboratories


120626 01-Oct-2003 ru

By popular demand, added the "static ARP" per-interface option.


120593 30-Sep-2003 sam

Correct pfil_run_hooks return handling: if the return value is non-zero
then the mbuf has been consumed by a hook; otherwise beware of a null
mbuf return (gack). In particular the bridge was doing the wrong thing.
While in the ipv6 code make it's handling of pfil_run_hooks identical
to netbsd.

Pointed out by: Pyun YongHyeon <yongari@kt-is.co.kr>


120559 28-Sep-2003 phk

I don't know from where the notion that device driver should or
even could call VOP_REVOKE() on vnodes associated with its dev_t's
has originated, but it stops right here.

If there are things people belive destroy_dev() needs to learn how to
do, please tell me about it, preferably with a reproducible test case.

Include <sys/uio.h> in bluetooth code rather than rely on <sys/vnode.h>
to do so.

The fact that some of the USB code needs to include <sys/vnode.h>
still disturbs me greatly, but I do not have time to chase that.


120527 27-Sep-2003 phk

Correctly name r_unit member tun_unit.
Remove unused tun_wsel member.


120386 23-Sep-2003 sam

o update PFIL_HOOKS support to current API used by netbsd
o revamp IPv4+IPv6+bridge usage to match API changes
o remove pfil_head instances from protosw entries (no longer used)
o add locking
o bump FreeBSD version for 3rd party modules

Heavy lifting by: "Max Laier" <max@love2party.net>
Supported by: FreeBSD Foundation
Obtained from: NetBSD (bits of pfil.h and pfil.c)


120359 22-Sep-2003 peter

While cleaning out my tree, fix another strict alias warning that would
be happening if we didn't stop compiling with -fno-strict-aliasing.


120177 17-Sep-2003 sam

fix build on 64-bit platforms


120139 17-Sep-2003 sam

Minor overhaul and add locking.

o replace magic constants with #defines (e.g. ETHER_ADDR_LEN)
o move mib variables to net.link.ether.bridge with backwards compatible
entries for well-known items maintained under BURN_BRIDGES
o revamp debugging support so it is conditioanlly compiled with BRIDGE_DEBUG
(on currently) and runtime controlled by net.link.ether.bridge.debug
o change timeout to MPSAFE callout
o optimize lookup for common case of two interfaces
o optimize forwarding path to take IFNET lock only when needed
o make boot-time printf dependent on bootverbose
o sundry style changes (ANSI decls, extraneous spaces, etc.)

Sponsored by: FreeBSD Foundation


120049 14-Sep-2003 mdodd

Enable IPv6 for Token Ring.


120048 14-Sep-2003 mdodd

Cosmetic cleanups.


120047 14-Sep-2003 mdodd

Cosmetic adjustment.


119995 11-Sep-2003 ru

Fix a bunch of off-by-one errors in the range checking code.


119780 05-Sep-2003 sam

Add locking. We use a single lock to guard the global vlan list and also
to protect the vlan state in each ifnet (e.g. vlan count). The latter is
probably better handled through an ifnet-centric means but since changes
are infrequent shouldn't matter for now.

Sponsored by: FreeBSD Foundation


119751 04-Sep-2003 sam

Reduce window during which a race can occur when detaching
an interface from each descriptor that references it. This
is just a bandaid; the locking here needs to be redone.


119560 29-Aug-2003 rwatson

Introduce error checking for calls to M_PREPEND():

ether_output() when prepending netatalk AFA_PHASE2 llc headers (TRYWAIT).
ether_output() when prepending ethernet header to a frame (DONTWAIT).


119137 19-Aug-2003 sam

Change instances of callout_init that specify MPSAFE behaviour to
use CALLOUT_MPSAFE instead of "1" for the second parameter. This
does not change the behaviour; it just makes the intent more clear.


119135 19-Aug-2003 sam

add R_Zalloc definition that returns pre-zero'd memory


119131 19-Aug-2003 sam

use ETHER_IS_MULTICAST instead of explicit check


118688 09-Aug-2003 silby

Also ifdef the variable which becomes unused w/o INET6, fixing the build
after the previous commit.

Noticed by: alc


118681 09-Aug-2003 silby

#ifdef INET6 the if_loop packet defrag; since only the ipv6 stack (might)
require this to be done, there's no reason to waste time doing it if
ipv6 isn't compiled in.

MFC after: 1 week


118548 06-Aug-2003 harti

Remove the ATMIOCENA and ATMIOCDIS ioctl. Everyting has been converted
to use the new OPENVCC and CLOSEVCC calls that allow the sepcification
of traffic parameters for the connections.


118547 06-Aug-2003 harti

Remove the last vestiges of ATM raw mode. This has not been useful for a
long time and has already been removed from the only driver that supported
it (en(4)) and from the man page.


118496 05-Aug-2003 harti

Define a flag for asynchronuous VC open/close operations as used
by the NATM stuff.


118471 05-Aug-2003 jmg

add support for using kqueue to watch bpf sockets.

Submitted by: Brian Buchanan of nCircle, Inc.
Tested on: i386 and sparc64


118157 29-Jul-2003 harti

Implement a mechanism by which ATM drivers can inform interested
parts of the system about certain kinds of events, like changes
in the ABR rate, changes in the carrier state, PVC changes. The
main consumers of these events are the harp(4) pseudo-driver
and the ILMI daemon via ng_atm(4).


118072 26-Jul-2003 gj

Use M_WAITOK instead of M_WAIT in sppp_attach().


117817 21-Jul-2003 sam

add monitor mode


117786 19-Jul-2003 ume

Disabling multicast on vlan interface caused kernel panic.

PR: kern/40723
Submitted by: Hideki ONO <ono@kame.net>
MFC after: 1 week


117752 19-Jul-2003 hsu

Add mutex for routing entries.

Reviewed by: bmilekic, silby


117721 18-Jul-2003 harti

Correct the device identifiers for the ProATM cards.


117630 15-Jul-2003 harti

Implement an utility function that can be used by device drivers to
implement the ATMIOCGVCCS ioctls. This routine handles changing
VCC tables (which can occure because we cannot hold the driver mutex
while allocating memory) with a loop and a re-allocation, should the
table not fit in the allocated memory.


117629 15-Jul-2003 harti

The mbuf put on the interface queue contains the 4-byte pseudoheader.
Account for this in the byte count.


117628 15-Jul-2003 harti

Add identifiers for ProSum's and IDT's cards that are based on
the IDT77252 chip. The driver will follow soon.


117627 15-Jul-2003 harti

ATM_PH_LLCSNAP and ATMIO_FLAG_LLCSNAP must have the same value, so
define one in terms of the other.


117625 15-Jul-2003 harti

Protect a kernel structure by _KERNEL.


117518 13-Jul-2003 rwatson

Move the MAC entry point to label ethernet-sourced mbufs with a MAC label
from the network interface earlier in ether_input(). At some point
(no fingers pointed), things were restructured and the labeling operation
moved later. This wasn't a problem as BPF_MTAP() relies on the ifnet
label not the mbuf label, but there might have been other problems.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


117343 08-Jul-2003 wpaul

- In vlan_input(), always mask off all but the VLID bits from tags
extracted from received frames, both in the IFCAP_VLAN_HWTAGGING case
and not. (Some drivers may already do this masking internally, but
doing it here doesn't hurt and insures consistency.)

- In vlan_ioctl(), don't let the user set a VLAN ID value with anything
besides the VLID bits set, otherwise we will have trouble matching
an interface in vlan_input() later.

PR: kern/46405


117273 06-Jul-2003 wpaul

Testing VLANs with the new 8139C+ chip (which does hardware tag
insertion and extraction) has revealed two bugs:

- In vlan_start(), we're supposed to check the underlying interface to
see if it has the IFCAP_VLAN_HWTAGGING cabability set and, if so, set
things up for the VLAN_OUTPUT_TAG() routine. However the code checks
ifp->if_capabilities, which is the vlan pseudo-interface's capabilities
when it should be checking p->if_capabilities, which relates to the
underlying physical interface. Change ifp->if_capabilities to
p->if_capabilities so this works.

- In vlan_input(), we have to extract the 16-bit tag value from the
received frame and use it to figure out which vlan interface gets
the frame. The code that we use to track down the desired vlan
pseudo-interface is:

for (ifv = LIST_FIRST(&ifv_list); ifv != NULL;
ifv = LIST_NEXT(ifv, ifv_list))
if (ifp == ifv->ifv_p && tag == ifv->ifv_tag)
break;

The problem is that 'tag' is not computed consistently. In the case
where the interface supports hardware VLAN tag extraction and calls
VLAN_INPUT_TAG(), we do this:

tag = *(u_int*)(mtag+1);

But in the software emulation case, we do this

tag = EVL_VLANOFTAG(ntohs(evl->evl_tag));

The problem here is the EVL_VLANOFTAG() macro is only ever applied
in this one case. It's never applied to ifv->ifv_tag or anwhere else.
We must be consistent: either it's applied everywhere or nowhere.
To see how this can be a problem, do something like
ifconfig vlan0 vlan 12345 vlandev foo0 and observe the results.

I'm not quite sure what the right thing is to do here. Neither the
vlan(4) nor ifconfig(8) man pages suggest which way to go. For now,
I've removed this use of EVL_VLANOFTAG() so that the tag will match
correctly in all cases. I will not get upset if somebody makes a
compelling argument for using EVL_VLANOFTAG() everywhere instead,
as long as the use is consistent.


116949 28-Jun-2003 sam

remove old 802.11 support; replaced by new code in sys/net80211


116819 25-Jun-2003 sam

add "autoselect" mode and "auto" alias: these let you reset the
"phy mode" to an auto-selecting mode, as opposed to one where
you're locked to a particular one (e.g. 11a for 802.11)


116741 23-Jun-2003 harti

Add the hooks for netgraph and HARP to the NATM code. This allows us
to use one set of drivers for all ATM upper layers.


116720 23-Jun-2003 harti

Apply style(9) to this file. I'm going to touch large parts of this file
so make this beforehand.


116523 18-Jun-2003 harti

Now that most of this file is new, stylify the rest and correct the
style bugs (space/tab) introduced by me.


116480 17-Jun-2003 harti

Add definitions for the ioctls that are used by netgraph and harp to open
and close VCCs.


116441 16-Jun-2003 harti

Fix the breakage introduced by rev. 1.43 of sys/dev/midway.c (don't commit
on friday 13th and without making a universe). This adds struct and
constant definitions for ATM traffic parameters and re-enables the
build of the midway driver.

Tested by: make universe


115690 02-Jun-2003 harti

Fix a typo in an ATM media name. As this name was not use yet, no problems
should occur.


115535 31-May-2003 phk

Wrap macro in do {...} while(0);

Found by: FlexeLint


115534 31-May-2003 phk

Remove break after return.

Found by: FlexeLint


115360 28-May-2003 silby

Replace a handrolled defrag function with m_defrag. The handrolled
function couldn't handle chains of > MCLBYTES, and it had a bug which
caused corruption and panics in certain low mbuf situations.

Additionally, change the failure case so that looutput returns ENOBUFS
rather than attempting to pass on non-defragmented mbuf chains.

Finally, remove the printf which would happen every time the low memory
situation occured. It served no useful purpose other than to clue me
in as to what was causing the panic in question. :)

MFC after: 4 days


114739 05-May-2003 harti

Define a link layer MIB for ATM. Most fields of this MIB are needed by
ILMI daemons. Factor out common softc fields for all ATM interfaces that
need to be externally visible into an ifatm structure and make the midway
driver using this structure and fill the MIB.


114723 05-May-2003 obrien

Back out rev 1.146 -- it broke the LINT build.
We are about to enter the 5.1 code freeze and things must be buildable.


114293 30-Apr-2003 markm

Fix some easy, global, lint warnings. In most cases, this means
making some local variables static. In a couple of cases, this means
removing an unused variable.


114232 29-Apr-2003 harti

Add media types and options for ATM. While on most ATM cards media cannot
be changed, it is very convenient to be able to toggle SDH/Sonet,
idle/unassigned cells and scrambled mode and to see the carrier
state.

Reviewed by: -arch (if_media.h definitions)


114201 29-Apr-2003 harti

Add module data and version to the atm_subr and reference this info from the
(currently) only consumer (en).

Add a sysctl node hw.atm where the atm drivers will hook on their hardware
sysctl sub-trees.

Make atm_ifattach call if_attach and remove the corresponding call to if_attach
from en. Create atm_ifdetach and use that in en.

While the last change actually changes the interface this is not a problem in
practice because the only other consumer of this API is an older LANAI driver
on the net, that is not ready for current anyway.

Reviewed by: -atm


114163 28-Apr-2003 sam

o add support for multi-mode devices like 802.11 wireless cards that support
11a/b/g by adding an optional 3-bit mode field
o correct the spelling of OFDM (was ODFM)
o add an 802.11 subtype option for turbo mode: the phy is clocked at 2x the
normal clock rate; note this can be applied to both OFDM in 11a and OFDM
in 11g mode (and possibly DS11 in 11b for certain phy's)
o add 802.11 CCK aliases for 11b/11g rates--the more common terminology


113950 23-Apr-2003 archie

Tweak to previous commit: increment ifp->if_iqdrops if the m_copy() fails.

Suggested by: Neelkanth Natu <neelnatu@yahoo.com>


113919 23-Apr-2003 archie

Fix a case where the return value from m_copy() was not being checked
for NULL before proceeding, causing a crash if mbufs were exhausted.

MFC after: 3 days
Reported by: Mark Gooderum <mark@verniernetworks.com>


113487 14-Apr-2003 rwatson

Move MAC label storage for mbufs into m_tags from the m_pkthdr structure,
returning some additional room in the first mbuf in a chain, and
avoiding feature-specific contents in the mbuf header. To do this:

- Modify mbuf_to_label() to extract the tag, returning NULL if not
found.

- Introduce mac_init_mbuf_tag() which does most of the work
mac_init_mbuf() used to do, except on an m_tag rather than an
mbuf.

- Scale back mac_init_mbuf() to perform m_tag allocation and invoke
mac_init_mbuf_tag().

- Replace mac_destroy_mbuf() with mac_destroy_mbuf_tag(), since
m_tag's are now GC'd deep in the m_tag/mbuf code rather than
at a higher level when mbufs are directly free()'d.

- Add mac_copy_mbuf_tag() to support m_copy_pkthdr() and related
notions.

- Generally change all references to mbuf labels so that they use
mbuf_to_label() rather than &mbuf->m_pkthdr.label. This
required no changes in the MAC policies (yay!).

- Tweak mbuf release routines to not call mac_destroy_mbuf(),
tag destruction takes care of it for us now.

- Remove MAC magic from m_copy_pkthdr() and m_move_pkthdr() --
the existing m_tag support does all this for us. Note that
we can no longer just zero the m_tag list on the target mbuf,
rather, we have to delete the chain because m_tag's will
already be hung off freshly allocated mbuf's.

- Tweak m_tag copying routines so that if we're copying a MAC
m_tag, we don't do a binary copy, rather, we initialize the
new storage and do a deep copy of the label.

- Remove use of MAC_FLAG_INITIALIZED in a few bizarre places
having to do with mbuf header copies previously.

- When an mbuf is copied in ip_input(), we no longer need to
explicitly copy the label because it will get handled by the
m_tag code now.

- No longer any weird handling of MAC labels in if_loop.c during
header copies.

- Add MPC_LOADTIME_FLAG_LABELMBUFS flag to Biba, MLS, mac_test.
In mac_test, handle the label==NULL case, since it can be
dynamically loaded.

In order to improve performance with this change, introduce the notion
of "lazy MAC label allocation" -- only allocate m_tag storage for MAC
labels if we're running with a policy that uses MAC labels on mbufs.
Policies declare this intent by setting the MPC_LOADTIME_FLAG_LABELMBUFS
flag in their load-time flags field during declaration. Note: this
opens up the possibility of post-boot policy modules getting back NULL
slot entries even though they have policy invariants of non-NULL slot
entries, as the policy might have been loaded after the mbuf was
allocated, leaving the mbuf without label storage. Policies that cannot
handle this case must be declared as NOTLATE, or must be modified.

- mac_labelmbufs holds the current cumulative status as to whether
any policies require mbuf labeling or not. This is updated whenever
the active policy set changes by the function mac_policy_updateflags().
The function iterates the list and checks whether any have the
flag set. Write access to this variable is protected by the policy
list; read access is currently not protected for performance reasons.
This might change if it causes problems.

- Add MAC_POLICY_LIST_ASSERT_EXCLUSIVE() to permit the flags update
function to assert appropriate locks.

- This makes allocation in mac_init_mbuf() conditional on the flag.

Reviewed by: sam
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


113428 13-Apr-2003 hsu

No need to unlock if error detected before locking.

Submitted by: harti


113255 08-Apr-2003 des

Introduce an M_ASSERTPKTHDR() macro which performs the very common task
of asserting that an mbuf has a packet header. Use it instead of hand-
rolled versions wherever applicable.

Submitted by: Hiten Pandya <hiten@unixdaemons.com>


113072 04-Apr-2003 des

Don't use ovbcopy(); use void * instead of char *.


112707 27-Mar-2003 maxim

o netisr_queue() returns 1 on success and 0 on failure,
fix a typo (?) in rev. 1.90.

PR: kern/50163


112469 21-Mar-2003 mdodd

- Use if_broadcastaddr from struct ifnet rather than relying on
extern 'etherbroadcastaddr'.
- Make 'etherbroadcastaddr' static.

Reviewed by: imp


112463 21-Mar-2003 mdodd

Assignment could be NULL, check.


112451 20-Mar-2003 jhb

Use td->td_ucred instead of td->td_proc->p_ucred.


112308 16-Mar-2003 mdodd

- Use IFP2AC().
- Support IFF_MONITOR.
- Borrow some consistency for if_input() routines from if_ethersubr.c.
- Correct comments regarding fddi_input() that no longer apply.


112305 15-Mar-2003 mdodd

Fix whitespace issues.


112299 15-Mar-2003 mdodd

Don't strip header from packets before input routine is called.


112298 15-Mar-2003 mdodd

Use if_printf().


112297 15-Mar-2003 mdodd

iso88025_ifattach() changes:

- Call if_attach().
- Conditionally call bpfattach() based on second function argument.


112296 15-Mar-2003 mdodd

- Style(9) changes.
- Remove unneeded assignment.
- Increment if_oerrors as per if_fddisubr.c.
- Wrap ISO code with conditional.


112295 15-Mar-2003 mdodd

Stray } forgotten by manual merging.


112294 15-Mar-2003 mdodd

- Remove stray ).
- Add missing breaks.
- Add missing if_noproto++.


112291 15-Mar-2003 mdodd

Revert part of 1.37; use bcopy() like if_fddisubr.c.


112289 15-Mar-2003 mdodd

- Increment ifp->if_noproto when appropriate.
- Use 'goto dropanyway' when appropriate.
- Move dropanyway label out of switch for readability.


112287 15-Mar-2003 mdodd

Update interface statistics after MAC and IFF_UP|IFF_RUNNING checks.


112286 15-Mar-2003 mdodd

- Adopt tests for (IFF_UP|IFF_RUNNING) and non local unicast packets
in promiscuous mode from if_fddisubr.c.
- Add comment to reduce diffs.


112285 15-Mar-2003 mdodd

Add MAC support.

This is the same code that was added in 1.70 of if_fddisubr.c


112281 15-Mar-2003 mdodd

Use llc_control rather than llc_snap.control.


112280 15-Mar-2003 mdodd

- Add comment.
- Whitespace fixes.


112279 15-Mar-2003 mdodd

Reduce code differences.


112278 15-Mar-2003 mdodd

Use ISO88025_ADDR_LEN where appropriate.


112277 15-Mar-2003 mdodd

Don't use etherbroadcastaddr; use iso88025_broadcastaddr.


112276 15-Mar-2003 mdodd

- Remove definition of senderr() from iso88025.h.
- Use definition of senderr() from if_ethersubr.c.


112274 15-Mar-2003 mdodd

Some whitespace/style/readability changes.


112273 15-Mar-2003 mdodd

Add iso88025_resolvemulti().

Cribbed from net/if_fddisubr.c


112272 15-Mar-2003 mdodd

Fix formatting of iso88025_ifattach().


112271 15-Mar-2003 mdodd

Re-order and prune includes.


112269 15-Mar-2003 mdodd

Add module data and version.


112268 15-Mar-2003 mdodd

s/llc_un.type_snap/llc_snap/g


112266 15-Mar-2003 mdodd

Formatting and whitespace changes.


112193 13-Mar-2003 harti

This corrects a longstanding endian bug in processing LLC/SNAP encoded
frames. A comment in if_atm.h suggests that both macros, that for extracting
the ethertype and that for inserting it, handle their argument in host
byte order. In fact, the inserting macro treated its argument as an opposite
host order short and the calling code feeds it the result of htons(). This
happens to work on i386, but fails on sparc. Make the macro use real host
endianess.

Reviewed by: kjc, atm@


112168 13-Mar-2003 mux

Pass the correct malloc flags to m_tag_alloc().


112148 12-Mar-2003 sam

correct two more flag misuses; m_tag* use malloc flags


112037 09-Mar-2003 phk

Note that MAJOR_AUTO is now the default if d_maj is not initialized. This
is more robust and prevents the hijacking of /dev/console for the typical
mistake.

Remove unneeded MAJOR_AUTO uses, it is only needed explicitly now if the
driver source has cross-branch compatibility to old releases.


112011 08-Mar-2003 jlemon

Discard the packet if the netisr queue is null instead of panicing, for
the benefit of modules which are compiled differently than the kernel.


111999 08-Mar-2003 jlemon

Revert last change and insure the driver can support other address families.

Pointed out by: ume, matusita


111998 08-Mar-2003 jlemon

The tun driver is INET only. Don't pretend to support other address families.

Sponsored by: DARPA, NAI Labs


111926 05-Mar-2003 peter

Finish driving a stake through the heart of netns and the associated
ifdefs scattered around the place - its dead Jim!

The SMB stuff had stolen AF_NS, make it official.


111889 04-Mar-2003 jlemon

GC unused files.


111888 04-Mar-2003 jlemon

Update netisr handling; Each SWI now registers its queue, and all queue
drain routines are done by swi_net, which allows for better queue control
at some future point. Packets may also be directly dispatched to a netisr
instead of queued, this may be of interest at some installations, but
currently defaults to off.

Reviewed by: hsu, silby, jayanth, sam
Sponsored by: DARPA, NAI Labs


111821 03-Mar-2003 phk

Make nokqfilter() return the correct return value.

Ditch the D_KQFILTER flag which was used to prevent calling NULL pointers.


111815 03-Mar-2003 phk

Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by: re(scottl)


111794 03-Mar-2003 mdodd

Revert last commit. File tracks NetBSD.

Requested by: sam


111790 03-Mar-2003 mdodd

sizeof(struct llc) -> LLC_SNAPFRAMELEN
sizeof(struct ether_header) -> ETHER_HDR_LEN
sizeof(struct fddi_header) -> FDDI_HDR_LEN


111775 03-Mar-2003 mdodd

Use IFP2AC() rather than casting to struct arpcom *


111774 03-Mar-2003 mdodd

De-register.


111767 02-Mar-2003 mdodd

Reduce code duplication. This adds the function rt_check() to route.c.

Approved by: sam (in principle)


111748 02-Mar-2003 des

More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).


111742 02-Mar-2003 des

Clean up whitespace, s/register //, refrain from strong urge to ANSIfy.


111741 02-Mar-2003 des

uiomove-related caddr_t -> void * (just the low-hanging fruit)


111678 28-Feb-2003 mux

Make the network /dev entries use MAJOR_AUTO.


111568 26-Feb-2003 phk

NODEVFS cleanup: remove calls to cdevsw_remove()


111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


111071 18-Feb-2003 sam

remove stray debugging printf

Noted by: Kasper Steensgaard <steensgaard@person.dk>


111038 17-Feb-2003 maxim

o Restore an interrupt priority level before return.

Submitted by: Roman Kurakin <rik@cronyx.ru>
Reviewed by: joerg
MFC after: 5 days


111002 16-Feb-2003 phk

Remove #include <sys/dkstat.h>


110768 12-Feb-2003 peter

Do not do an assignment in a truth test (previous commit) or gcc gives a
warning which breaks builds.

cc1: warnings being treated as errors
src/sys/net/bridge.c: In function `bdg_forward':
sys/net/bridge.c:931: warning: suggest parentheses around assignment used as truth value
*** Error code 1


110733 11-Feb-2003 sam

PFIL_HOOKS optimization: check if at least one hook is present before
munging the IP header to pass to the hooks


110527 08-Feb-2003 hsu

Make the radix tree code compilable in userland. Requested by ru.
Some style fixes requested by bde.


110238 02-Feb-2003 phk

A minor stylistic change to make it more clear to lint-like tools.


110235 02-Feb-2003 alfred

chase more of the MIN/MAX mess. *sigh*


110232 02-Feb-2003 alfred

Consolidate MIN/MAX macros into one place (param.h).

Submitted by: Hiten Pandya <hiten@unixdaemons.com>


110106 30-Jan-2003 fjoe

- bpf is now working (tested with tcpdump)
- fix promiscious mode

MFC after: 3 days


110097 30-Jan-2003 phk

NODEVFS cleanup: unifdef


109996 28-Jan-2003 hsu

Avoid lock order reversal by expanding the scope of the
AF_INET radix tree lock to cover the ARP data structures.


109771 24-Jan-2003 fjoe

- add support for IPX (tested with mount -t nwfs and mars_nwe),
IP fast forwarding, SIOCGIFADDR, setting hardware address (not currently
enabled in cm driver), multicasts (experimental)
- add ARC_MAX_DATA, use IF_HANDOFF, remove arc_sprintf() and some unused
variables
- if_simloop logic is made more similar to ethernet
- drop not ours packets early (if we are not in promiscous mode)

Submitted by: mark tinguely (partially)


109711 22-Jan-2003 fenner

Implement SIOCGIFMEDIA for vlan devices by passing the request to the
parent device, if there is a parent configured. Modify the result
returned by the parent to indicate that the only supported media
is the currently configured one.

Reviewed by: brooks


109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


109580 20-Jan-2003 sam

o add BIOCGDLTLIST and BIOCSDLT ioctls to get the data link type list
and set the link type for use by libpcap and tcpdump
o move mtx unlock in bpfdetach up; it doesn't need to be held so long
o change printf in bpf_detach to distinguish it from the same one in bpfsetdlt

Note there are locking issues here related to ioctl processing; they
have not been addressed here.

Submitted by: Guy Harris <guy@alum.mit.edu>
Obtained from: NetBSD (w/ locking modifications)


109538 19-Jan-2003 sam

accept short WEP keys for backward compatibility


109526 19-Jan-2003 phk

Originally when DEVFS was added, a global variable "devfs_present"
was used to control code which were conditional on DEVFS' precense
since this avoided the need for large-scale source pollution with
#include "opt_geom.h"

Now that we approach making DEVFS standard, replace these tests
with an #ifdef to facilitate mechanical removal once DEVFS becomes
non-optional.

No functional change by this commit.


109522 19-Jan-2003 sam

fix ioctl handling for setting wep keys


109322 15-Jan-2003 suz

sync with KAME to simplify rev 1.28's patch (no functional changes)

Obtained from: KAME
Reviewd by: fenner
Approved by: re (jhb)


109319 15-Jan-2003 sam

802.11 link layer support. This code implements the basic 802.11
state machine to provide station and host ap functionality for drivers.

More work will follow to split out the state machine and protocol
support from the ioctl interfaces to ease portability/sharing with
NetBSD and forthcoming ports to other systems.

Reviewed by: imp
Obtained from: NetBSD (originally)


108825 06-Jan-2003 sam

don't reference a pkthdr after M_MOVE_PKTHDR has "remove it"; instead
reference the pkthdr now in the destination of the move

Sponsored by: Vernier Networks


108710 05-Jan-2003 fenner

Fix alignment problems -- the embedded v4 address is guaranteed to
be only 16-bit aligned, so only do byte operations to compare with it.


108533 01-Jan-2003 schweikh

Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.


108470 30-Dec-2002 schweikh

Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/
Add FreeBSD Id tag where missing.


108466 30-Dec-2002 sam

Correct mbuf packet header propagation. Previously, packet headers
were sometimes propagated using M_COPY_PKTHDR which actually did
something between a "move" and a "copy" operation. This is replaced
by M_MOVE_PKTHDR (which copies the pkthdr contents and "removes" it
from the source mbuf) and m_dup_pkthdr which copies the packet
header contents including any m_tag chain. This corrects numerous
problems whereby mbuf tags could be lost during packet manipulations.

These changes also introduce arguments to m_tag_copy and m_tag_copy_chain
to specify if the tag copy work should potentially block. This
introduces an incompatibility with openbsd which we may want to revisit.

Note that move/dup of packet headers does not handle target mbufs
that have a cluster bound to them. We may want to support this;
for now we watch for it with an assert.

Finally, M_COPYFLAGS was updated to include M_FIRSTFRAG|M_LASTFRAG.

Supported by: Vernier Networks
Reviewed by: Robert Watson <rwatson@FreeBSD.org>


108364 28-Dec-2002 phk

Remove cdevw_add() calls, they are deprecated.


108339 28-Dec-2002 kbyanc

Remove useless local variable from raw_input().

Sponsored by: NTT Multimedia Communications Labs
MFC after: 3 days


108326 27-Dec-2002 iedowse

Oops, I misread the purpose of the NULL check in EH_RESTORE() in
revision 1.62. It was checking for M_PREPEND() failing, not for the
case of a NULL mbuf pointer being supplied to the macro. Back out
that revision, and fix the NULL dereference by not calling EH_RESTORE()
in the case where the mbuf pointer is NULL because the firewall
rejected the packet.


108319 27-Dec-2002 iedowse

Fix a bug introduced by revision 1.59 that would cause an immediate
NULL dereference if a bridged packet was rejected by ipfw.


108298 27-Dec-2002 hsu

Long chain of calls starting with bridge_on(), going through IPv6, and
ending up at ifa_ifwithdstaddr() could lead to a recursive lock of
the ifnet list mutex.


108277 26-Dec-2002 hsu

Disable radix node locking for sysctl until we fix the sysctl infrastructure
to not sleep.


108273 25-Dec-2002 ru

Typo in function name.


108272 25-Dec-2002 ru

I'm not sure what was the problem at the time of revision 1.37
when julian@ added it, but the commented out code had at least
one bug -- not freeing the allocated mbuf.

Anyway, this comment no longer applies as of revision 1.67, so
remove it.


108271 25-Dec-2002 hsu

Range-check the address family parameter passed in to the sysctl handler.

Submitted by: ru


108270 25-Dec-2002 ru

Revision 1.67 changes correspond to CSRG revision 8.3.1.1 changes.


108269 25-Dec-2002 ru

If the caller of rtrequest*(RTM_DELETE, ...) asked for a copy of
the entry being removed (ret_nrt != NULL), increment the entry's
rt_refcnt like we do it for RTM_ADD and RTM_RESOLVE, rather than
messing around with 1->0 transitions for rtfree() all over.


108268 25-Dec-2002 ru

A month after pst@ has committed his revision 1.8, it was
incorporated by UCB as revision 8.5. Do a diff reduction.


108250 24-Dec-2002 hsu

SMP locking for radix nodes.


108206 23-Dec-2002 ru

rn_walktree*() compute the next leaf before applying a function
to current leaves because function may vanish the current node.

If parent RTA_GENMASK route has a clone (a "cloning clone"), an
rn_walktree_from() starting from parent will cause another walk
starting from clone. If a function is either rt_fixdelete() or
rt_fixchange(), this recursive walk may vanish the leaf that is
remembered by an outer walk (the "next leaf" above), panicing a
system when it resumes with an outer walk.

The following script paniced my single-user mode booted system:

: sysctl net.inet.ip.forwarding=1
: ipfw add 1 allow ip from any to any
: ifconfig lo0 127.1
: route add -net 10 -genmask 255.255.255.0 127.1
: telnet 10.1 # rt_fixchange() panic
: telnet 10.2
: telnet 10.1
: route delete -net 10 # rt_fixdelete() panic

For the time being, avoid these races by disallowing recursive
walks in rt_fixchange() and rt_fixdelete().

Also, make a slight optimization in the rtrequest(RTM_RESOLVE)
case: there is no reason to call rt_fixchange() in this case.

PR: kern/37606
MFC after: 5 days


108172 22-Dec-2002 hsu

SMP locking for ifnet list.


108124 20-Dec-2002 hsu

Swap the order of a free and a use of an ifaddr structure.


108107 19-Dec-2002 bmilekic

o Untangle the confusion with the malloc flags {M_WAITOK, M_NOWAIT} and
the mbuf allocator flags {M_TRYWAIT, M_DONTWAIT}.
o Fix a bpf_compat issue where malloc() was defined to just call
bpf_alloc() and pass the 'canwait' flag(s) along. It's been changed
to call bpf_alloc() but pass the corresponding M_TRYWAIT or M_DONTWAIT
flag (and only one of those two).

Submitted by: Hiten Pandya <hiten@unixdaemons.com> (hiten->commit_count++)


108041 18-Dec-2002 rwatson

Under some circumstances, the loopback interface will allocate a new
mbuf for a packet looping back to provide alignment guarantees for
KAME. Unfortunately, this code performs a direct copy of the header
rather than using a header copying primitive (largely because we have
sucky header copying primitives). This results in a multiple free
of the MAC label in the header when the same label data is freed
twice when the two mbufs with that header are freed. As a temporary
work-around, clear the initialized flag on the label to prevent the
duplicate free, which prevents panics on large unaligned loopback
IP and IPv6 data. The real fix is to improve and make use of proper
packet header copying routines here.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


108036 18-Dec-2002 hsu

Switch to the conventional reference counting scheme.


108033 18-Dec-2002 hsu

Lock up ifaddr reference counts.


107670 07-Dec-2002 sobomax

MFS: recognize gre packets used in the WCCP protocol.

Approved by: re


107114 20-Nov-2002 luigi

Move fw_one_pass from ip_fw2.c to ip_input.c so that neither
bridge.c nor if_ethersubr.c depend on IPFIREWALL.
Restore the use of fw_one_pass in if_ethersubr.c

ipfw.8 will be updated with a separate commit.

Approved by: re


107113 20-Nov-2002 luigi

Back out some style changes. They are not urgent,
I will put them back in after 5.0 is out.

Requested by: sam
Approved by: re


107080 19-Nov-2002 sam

correct function declarations of stubs used for building w/o device bpf


107024 17-Nov-2002 luigi

Replace m_copy() with m_copypacket() where applicable.
Replace 0 with NULL where appropriate.
Fix indentation and function headers.


107023 17-Nov-2002 luigi

Fix function headers, remove 'register' from variable declarations.


106968 15-Nov-2002 luigi

Massive cleanup of the ip_mroute code.

No functional changes, but:

+ the mrouting module now should behave the same as the compiled-in
version (it did not before, some of the rsvp code was not loaded
properly);
+ netinet/ip_mroute.c is now truly optional;
+ removed some redundant/unused code;
+ changed many instances of '0' to NULL and INADDR_ANY as appropriate;
+ removed several static variables to make the code more SMP-friendly;
+ fixed some minor bugs in the mrouting code (mostly, incorrect return
values from functions).

This commit is also a prerequisite to the addition of support for PIM,
which i would like to put in before DP2 (it does not change any of
the existing APIs, anyways).

Note, in the process we found out that some device drivers fail to
properly handle changes in IFF_ALLMULTI, leading to interesting
behaviour when a multicast router is started. This bug is not
corrected by this commit, and will be fixed with a separate commit.

Detailed changes:
--------------------
netinet/ip_mroute.c all the above.
conf/files make ip_mroute.c optional
net/route.c fix mrt_ioctl hook
netinet/ip_input.c fix ip_mforward hook, move rsvp_input() here
together with other rsvp code, and a couple
of indentation fixes.
netinet/ip_output.c fix ip_mforward and ip_mcast_src hooks
netinet/ip_var.h rsvp function hooks
netinet/raw_ip.c hooks for mrouting and rsvp functions, plus
interface cleanup.
netinet/ip_mroute.h remove an unused and optional field from a struct

Most of the code is from Pavlin Radoslavov and the XORP project

Reviewed by: sam
MFC after: 1 week


106957 15-Nov-2002 sam

Back out rev 1.150; things are more complicated than this.


106955 15-Nov-2002 sam

if_attach should not sleep; change malloc's M_WAITOK to M_NOWAIT


106939 15-Nov-2002 sam

network interface and link layer changes:

o on input don't strip the Ethernet header from packets
o input packet handling is now done with if_input
o track changes to ether_ifattach/ether_ifdetach API
o track changes to bpf tapping
o call ether_ioctl for default handling of ioctl's
o use constants from net/ethernet.h where possible

Reviewed by: many
Approved by: re


106938 14-Nov-2002 sam

track changes to ethernet input handling to no longer strip the Ethernet header

Reviewed by: many
Approved by: re


106932 14-Nov-2002 sam

o eliminate separate callback interface for h/w tagged input packets; instead
drivers "tag packets" with an m_tag and the input packet handling recognizes
such packets and does the right thing
o track the number of active vlans on an interface; this lets lots of places
only do vlan-specific processing when needed
o track changes to ether_ifdetach/ether_ifattach
o track bpf changes
o eliminate the use of M_PROTO1 for communicating to drivers about tagged
packets
o eliminate the use of IFF_LINK0 for drivers communicating to the vlan code
that they support h/w tagging; replaced by explicit interface capabilities
o add ifnet capabilities for h/w tagging and support of "large mtu's"
o use new interface capabilities to auto-configure use of large mtu's and h/w
tagging
o add support for proper handling of promiscuous mode
o document driver/vlan communication conventions

Reviewed by: many
Approved by: re


106931 14-Nov-2002 sam

o add if_nvlans member to track the number of vlans active on an interface
o add if_input member for interface drivers to call through to pass packets "up"
o remove ethernet-specific function decls (moved to ethernet.h)

Reviewed by: many
Approved by: re


106930 14-Nov-2002 sam

o change input packet handling to eliminate the pointer to the struct
ether_header; instead drivers are to leave the Ethernet header at the
front of the packet
o add declarations for netgraph and vlan hooks that were removed from ethernet.h
o change various in-file calling conventions to track change in input API
o fixup bridge support to handle Ethernet header no longer being stripped
o add consistency checks to ether_input to catch problems with the change
in the API; some of these may want to be moved to #ifdef DIAGNOSTIC at a
later time (though they are not too expensive to leave as is)
o change ether_demux to eliminate the passing of the Ethernet header; it is
now expected at the front of the packet a la ether_input
o add ether_sprintf compatibility shim
o change ether_ifattach API to remove "bpf supported param" and add a pointer
to the MAC address to be installed for the LL address (this is for future
changes to divest struct arpcom from struct ifnet)
o change ether_ifdetach API to remove "bpf support param"

Reviewed by: many
Approved by: re


106929 14-Nov-2002 sam

general cleanups mostly aimed at improving portability of drivers

o ETHER_* (ETHER_ALIGN, ETHER_MAX_FRAME, ETHER_CRC_LEN, etc.)
o M_HASFCS for drivers to indicate packets include FCS
o remove global declarations for ng_ether* and vlan_* since these
represent a private contract between the if_ethersubr.c code and
certain parts of the system that should not normally be abused
o add ether_* declarations that were elsewhere
o remove ETHER_BPF_* since they are no longer used with the parameter
no longer passed to ether_ifattach and ether_ifdetach

Reviewed by: many
Approved by: re


106927 14-Nov-2002 sam

o add support for multiple link types per interface (e.g. 802.11 and Ethernet)
o introduce BPF_TAP and BPF_MTAP macros to hide implementation details and
ease code portability
o use m_getcl where appropriate

Reviewed by: many
Approved by: re
Obtained from: NetBSD (multiple link type support)


106925 14-Nov-2002 sam

o add IF_*bps macros for netbsd compatibility
o add interface capabilities for vlan use and to signal jumbo frame support

Reviewed by: many
Approved by: re


106696 09-Nov-2002 alfred

Fix instances of macros with improperly parenthasized arguments.

Verified by: md5


106601 07-Nov-2002 jhb

Add a cast to quiet a warning.


105944 25-Oct-2002 simokawa

Don't check IFF_RUNNING in previous change.
The flag is sometimes unset if the interface has IPv6 link-local
address only.


105804 23-Oct-2002 simokawa

Don't send/recieve packets when the interface is down.


105603 21-Oct-2002 brooks

Use if_printf(ifp, "blah") instead of printf("ppp%d: blah", ifp->if_unit).


105602 21-Oct-2002 brooks

Use if_printf(ifp, "blah") instead of printf("vlan%d: blah", ifp->if_unit).


105601 21-Oct-2002 brooks

Use if_printf(ifp, "blah") instead of printf("sl%d: blah", sc->sc_if.if_unit).


105598 21-Oct-2002 brooks

Use if_printf(ifp, "blah") instead of
printf("%s%d: blah", ifp->if_name, ifp->if_xname).


105580 20-Oct-2002 rwatson

When packets pass in and out of six-to-four (STF) tunnels, perform
labeling checks and operations as with other network interfaces.
Eventually, if it proves desirable, we might want to offer special
casing of this or other tunnel interfaces where we have an existing
label of interest, rather than treating it as though it's an
entirely fresh mbuf in the incoming/outgoing encapsulation directions.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


105579 20-Oct-2002 phk

We have mem{cpy,cmp,set} functions in the kernel, don't #define them to
b{copy,zero,cmp} functions anymore.

Spotted by: FlexeLint.


105577 20-Oct-2002 rwatson

When a packet is sent via a FDDI interface, perform appropriate MAC
transmission checks; when it is received, label the packet appropriately.
Although we don't have a local FDDI setup to test this with, the
labeling and checks are identical to other interface classes.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


105576 20-Oct-2002 rwatson

When a packet is destined for delivery via an ATM medium, perform
appropriate interface transmission checks and delivery labeling. While
we don't have a local ATM configuration, this code is almost identical
to all other interface classes.

Approved by: re
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


105556 20-Oct-2002 phk

Don't us an array[1], it just hides where '&' isn't used right.

Be consistent about functions being static.

Verified by: md5 hash of generated .o file.


105340 17-Oct-2002 ume

last arg of in6?_gif_output() is not used any more.

Obtained from: KAME
MFC after: 3 weeks


105339 17-Oct-2002 ume

- drop too short IPv6 frame
- NULL != 0

Obtained from: KAME
MFC after: 3 weeks


105338 17-Oct-2002 ume

s/gifp/ifp/

Obtained from: KAME
MFC after: 3 weeks


105300 16-Oct-2002 alfred

de-__P()


105293 16-Oct-2002 ume

- after gif_set_tunnel(), psrc/pdst may be null. set IFF_RUNNING accordingly.
- set IFF_UP on SIOCSIFADDR. be consistent with others.
- set if_addrlen explicitly (just in case)
- multi destination mode is long gone.
- missing break statement
- add gif_set_tunnel(), so that we can set tunnel address from within the
kernel at ease.
- encap_attach/detach dynamically on ioctls
- move encap_attach() to dedicated function in in*_gif.c

Obtained from: KAME
MFC after: 3 weeks


105228 16-Oct-2002 phk

Be consistent about functions being static.

Spotted by: FlexeLint


105217 16-Oct-2002 phk

FIx misindentation.

Spotted by: FlexeLint.


105198 16-Oct-2002 sam

add definitions for RIPEMD-160 HMAC and Skipjack encryption algorithms,
for use by "Fast IPsec"


105194 16-Oct-2002 sam

Replace aux mbufs with packet tags:

o instead of a list of mbufs use a list of m_tag structures a la openbsd
o for netgraph et. al. extend the stock openbsd m_tag to include a 32-bit
ABI/module number cookie
o for openbsd compatibility define a well-known cookie MTAG_ABI_COMPAT and
use this in defining openbsd-compatible m_tag_find and m_tag_get routines
o rewrite KAME use of aux mbufs in terms of packet tags
o eliminate the most heavily used aux mbufs by adding an additional struct
inpcb parameter to ip_output and ip6_output to allow the IPsec code to
locate the security policy to apply to outbound packets
o bump __FreeBSD_version so code can be conditionalized
o fixup ipfilter's call to ip_output based on __FreeBSD_version

Reviewed by: julian, luigi (silent), -arch, -net, darren
Approved by: julian, silence from everyone else
Obtained from: openbsd (mostly)
MFC after: 1 month


105176 15-Oct-2002 ume

Correct the definitions of SADB_* to be compatible with
RFC2407/IANA assignment. This change breaks binary
compatibility. So, you need to recompile IPsec related
applications.


105078 14-Oct-2002 cjc

Unconditionally restore the pointer to the saved Ethernet header after
going to bridge.c:bdg_forward(). The header can be munged even if the
mbuf does not /appear/ to change.

PR: kern/42465
MFC after: 4 days


104570 06-Oct-2002 mux

When reusing a pointer as a number, at least cast it
to uintptr_t rather than u_int to avoid warnings on
64 bits architectures.


104393 03-Oct-2002 truckman

In an SMP environment post-Giant it is no longer safe to blindly
dereference the struct sigio pointer without any locking. Change
fgetown() to take a reference to the pointer instead of a copy of the
pointer and call SIGIO_LOCK() before copying the pointer and
dereferencing it.

Reviewed by: rwatson


104366 02-Oct-2002 sobomax

Since bpf is no longer an optional component, remove associated ifdef's.

Submitted by: don't quite remember - the name of the sender disappeared
with the rest of my inbox. :(


104360 02-Oct-2002 mike

style(9):
o Align members of struct if_nameindex.
o Align and sort function prototypes.


104355 02-Oct-2002 mike

Use standards visibility conditionals to conditionalize most of this
header (details on how the visibility conditionals work are available
in <sys/cdefs.h>). Use standard types instead of BSD specific ones,
so that this header compiles in the standards case (specifically this
means changing `u_int' to `unsigned int').


104302 01-Oct-2002 phk

Fix some harmless mis-indents.

Spotted by: FlexeLint


104140 29-Sep-2002 bde

Fixed some of the namespace pollution in rev.1.33. <sys/systm.h> was
included here because it was once a prerequisite of <sys/mutex.h>
although that bug was fixed long ago.


104094 28-Sep-2002 phk

Be consistent about "static" functions: if the function is marked
static in its prototype, mark it static at the definition too.

Inspired by: FlexeLint warning #512


104090 28-Sep-2002 phk

Don't return(foo(bla)) when foo returns void.


104044 27-Sep-2002 phk

Add the "Monitor" interface flag.

Setting this flag on an ethernet interface blocks transmission of packets
and discards incoming packets after BPF processing.

This is useful if you want to monitor network trafic but not interact
with the network in question.

Sponsored by: http://www.babeltech.dk


104002 26-Sep-2002 phk

Be a bit more technical:
Technically junk may have low entropy.


103994 26-Sep-2002 sobomax

Revert 1.27, as it breaks IPv6 over IPv4 tunnels.

Submitted by: Mark Huizer <xaa@timewasters.nl>, ume


103901 24-Sep-2002 brooks

Convert most printf()s to if_printf()s.


103900 24-Sep-2002 brooks

Add a new helper function if_printf() modeled on device_printf(). The
function takes a struct ifnet pointer followed by the usual printf
arguments and prints "<interfacename>: " before the results of printf.
Since this is the primary form of printf calls in network device drivers
and accounts for most uses of the ifnet menber if_unit, this
significantly simplifies many printf()s.


103844 23-Sep-2002 alfred

use __packed/__aligned rather than GCC-specific __attribute__.


103842 23-Sep-2002 alfred

s/__attribute__((__packed__))/__packed/g


103781 22-Sep-2002 jake

Moved netisr code from kern/kern_intr.c to net/netisr.c as threatened in a
comment.


103725 21-Sep-2002 rwatson

Insert a missing call to MAC protection check for delivering an
mbuf to a bpf device.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
Submitted by: phk


103709 20-Sep-2002 ume

mistakenly set IFF_UP by SIOCSIFPHYADDR.

Obtained from: KAME


103556 18-Sep-2002 phk

Optimize the way we call BPF a tiny bit: If we chop the ether-header off
ourselves, call bpf before we do so, rather than re-construct the entire
thing afterwards.

Sponsored: http://www.babeltech.dk/


103555 18-Sep-2002 phk

Use m_length() instead of home-rolled.

In bpf_mtap(), if the entire packet is in one mbuf, call bpf_tap()
instead since it is a tad faster.

Sponsored by: http://www.babeltech.dk/


103554 18-Sep-2002 phk

Use m_length() instead of home-rolled versions.


103487 17-Sep-2002 ume

- increment interface output counter. sync w/ netbsd-current
- increase if_oerrors. sync w/netbsd

Obtained from: KAME


103481 17-Sep-2002 sobomax

Remove __RCSID().

Submitted by: bde


103475 17-Sep-2002 ume

- reject SIOCSIFADDR if embedded address is in private address range
- reject packets from private address range. from hitachi

Obtained from: KAME


103394 16-Sep-2002 bde

Include include "opt_atalk.h" so that the NETATALK support can work.

Removed unused includes.

Removed used includes of <sys/queue.h> and <sys/time.h>, since these are
standard pollution (especially the latter).

Reviewed by: sobomax


103344 15-Sep-2002 bde

Include <sys/systm.h> instead of depending on namespace pollution 2
layers deep in <sys/malloc.h> or 1 layer deep in <net/if_var.h>.


103273 13-Sep-2002 sobomax

Restore original behaviour of recursion preventer.

Submitted by: sumikawa


103256 12-Sep-2002 obrien

Fix the GENERIC build. Don't refer to the non-existant fw_one_pass.


103242 12-Sep-2002 luigi

Make bridging and layer2-ipfw obey net.inet.ip.fw.one_pass.
I should have committed this ages ago.

The MFC for if_ethersubr.c could be done in the usual few days (only
ipfw2 uses it), the one for bridge.c should probably wait until
after 4.7 because it changes an existing though mostly undocumented
behaviour (on which i hope nobody relies). All in all, i'll wait for
both things unless there is demand.

MFC after: 35 days


103124 09-Sep-2002 sobomax

Since from now on encap_input() also catches IPPROTO_MOBILE and IPPROTO_GRE
packets in addition to IPPROTO_IPV4 and IPPROTO_IPV6, explicitly specify
IPPROTO_IPV4 or IPPROTO_IPV6 instead of -1 when calling encap_attach().

MFC after: 28 days
(along with other if_gre changes)


103120 09-Sep-2002 sobomax

Prevent namespace pollution in use-land by putting everything used only in
kernel (softc and such) under #ifdef _KERNEL.

Submitted by: bde


103070 07-Sep-2002 sobomax

Remove #include <netinet/ip.h>.

Submitted by: bde


103041 06-Sep-2002 sobomax

Include <netinet/ip.h> to unbreak kdump. I don't know why does kdump
includes if_gre.h at all, but it does, without including ip.h before
that.

Poked by: peter
Pointy hat to: kdump(1)


103032 06-Sep-2002 sobomax

Reduce namespace pollution by staticizing everything, which doesn't need to
be visible from outside of the module.


103026 06-Sep-2002 sobomax

Add a new gre(4) driver, which could be used to create GRE (RFC1701)
and MOBILE (RFC2004) IP tunnels.

Obrained from: NetBSD


103024 06-Sep-2002 sobomax

Add more ethernet types and move AppleTalk types into proper location.

Obtained from: NetBSD (syssrc/sys/net/ethertypes.h, rev.1.13)


102968 05-Sep-2002 sobomax

Make recursion prevention variable per-instance and remove XXX comment
about thread-unsafety.

MFC after: 2 weeks


102618 30-Aug-2002 sobomax

Fix a silly typo in user-setable promisc mode code.

Pointed out by: Yann Berthier <yb@sainte-barbe.org>
MFC after: 1 day


102526 28-Aug-2002 sobomax

Add IFF_POLLING into the list of flags which are protected from changing via
ioctl(SIOCSIFFLAGS).

MFC after: 1 day


102412 25-Aug-2002 charnier

Replace various spelling with FALLTHROUGH which is lint()able


102291 22-Aug-2002 archie

Replace (ab)uses of "NULL" where "0" is really meant.


102130 19-Aug-2002 brooks

Fix a couple of bogus return values in previous commit.

Submitted by: "Vladimir B. " Grebenschikov <vova@sw.ru>
Pointy hat to: brooks


102118 19-Aug-2002 jmallett

Clean up a comment talking about C strings, which are terminated with the
ASCII NUL character (0, or '\0' in C).


102099 19-Aug-2002 sobomax

Implement user-setable promiscuous mode (a new `promisc' flag for ifconfig(8)).
Also, for all interfaces in this mode pass all ethernet frames to upper layer,
even those not addressed to our own MAC, which allows packets encapsulated
in those frames be processed with packet filters (ipfw(8) et al).

Emphatically requested by: Anton Turygin <pa3op@ukr-link.net>
Valuable suggestions by: fenner


102052 18-Aug-2002 sobomax

Increase size of ifnet.if_flags from 16 bits (short) to 32 bits (int). To avoid
breaking application ABI use unused ifreq.ifru_flags[1] for upper 16 bits in
SIOCSIFFLAGS and SIOCGIFFLAGS ioctl's.

Reviewed by: -hackers, -net


101938 15-Aug-2002 rwatson

Move mac.h include to match the MAC tree location. Both locations
are about equally as alphabetized.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101849 14-Aug-2002 rwatson

Move to nested include of _label.h instead of mac.h, reducing namespace
pollution.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
Suggested by: bde


101739 12-Aug-2002 rwatson

Correct error handling during MAC transmission check for if_gif.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101588 09-Aug-2002 brooks

Make ppp(4) devices clonable and unloadable.


101345 04-Aug-2002 luigi

Extend the interface to ether_input(): a NULL eh pointer means that
the mbuf contains the ethernet header (eh) as well, which ether_input()
will strip off as needed.

This permits the removal (in a backward compatible way) of the
header removal code which right now is replicated in all drivers,
sometimes in an inconsistent way. Also, because many functions
called after ether_input() require the eh in the mbuf, eventually
we can propagate the interface and handle outdated drivers just
in ether_input().

Individual driver changes to use the new interface will follow as
we have a chance to touch them.

NOTE THAT THIS CHANGE IS FULLY BACKWARD COMPATIBLE AND DOES NOT BREAK
BINARY COMPATIBILITY FOR DRIVERS.

MFC after: 3 days


101184 01-Aug-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Introduce two ioctls, SIOCGIFMAC, SIOCSIFMAC, which permit user
processes to manage the MAC labels on network interfaces. Note
that this is part of the user process API/ABI that will be revised
prior to 5.0-RELEASE.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101183 01-Aug-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Add MAC support for if_ppp. Label packets as they are removed from
the raw PPP mbuf queue. Preserve the mbuf MAC label across various
PPP data-munging and reconstitution operations. Perform access
control checks on mbufs to be transmitted via the interface.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101182 01-Aug-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Label packets generated by the gif virtual interface.

Perform access control on packets delivered to gif virtual interfaces.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101083 31-Jul-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Label mbufs received via kernel tunnel device interfaces by invoking
appropriate MAC framework entry points.

Perform access control checks on out-going mbufs delivered via tunnel
interfaces by invoking appropriate MAC entry points:

NOTE: Currently the label for a tunnel interface is not derived from
the label of the process that opened the tunnel interface. It
probably should be.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101081 31-Jul-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Label mbufs received via ethernet-based interfaces by invoking
appropriate MAC framework entry points.

Perform access control checks on out-going mbufs delivered via
ethernet-based interfaces by invoking appropriate MAC entry
points.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101079 31-Jul-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument the interface management code so that MAC labels are
properly maintained on network interfaces (struct ifnet). In
particular, invoke entry points when interfaces are created and
removed. MAC policies may initialized the label interface based
on a variety of factors, including the interface name.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101077 31-Jul-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

When decompressing data from one mbuf into another mbuf, preserve the
mbuf label by copying it to the new mbuf.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101075 31-Jul-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke a MAC framework entry point to authorize reception of an
incoming mbuf by the BPF descriptor, permitting MAC policies to
limit the visibility of packets delivered to particular BPF
descriptors.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


101074 31-Jul-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument BPF so that MAC labels are properly maintained on BPF
descriptors. MAC framework entry points are invoked at BPF
instantiation and allocation, permitting the MAC framework to
derive the BPF descriptor label from the credential authorizing
the device open. Also enter the MAC framework to label mbufs
created using the BPF device.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


100992 30-Jul-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Label network interface structures, permitting security features to
be maintained on those objects. if_label will be used to authorize
data flow using the network interface. if_label will be protected
using the same synchronization primitives as other mutable entries
in struct ifnet.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


100991 30-Jul-2002 rwatson

Introduce support for Mandatory Access Control and extensible
kernel access control.

Label BPF descriptor objects, permitting security features to be
maintained on those objects. bd_label will be used to authorize
data flow from network interfaces to user processes. BPF
labels are protected using the same synchronization model as other
mutable data in the BPF descriptor.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


100763 27-Jul-2002 rwatson

Slight whitespace cleanup. Whitespace sync to MAC tree.


99994 14-Jul-2002 kbyanc

Add some additional 802.11 media definitions.

Reviewed by: imp


99555 07-Jul-2002 luigi

Remove 0 initializers for global/static variables, so they end up in
BSS instead of DATA. This marginally reduces the kernel image size, though
the difference is almost irrelevant for compressed kernels.


99419 05-Jul-2002 peter

Turn on BPF_ALIGN for all non-i386 platforms, instead of having an
ifdef list that currently lists all the non-i386 platforms that bpf
currently works on.


99340 03-Jul-2002 maxim

Remove trailing whitespaces.

Approved by: luigi


99339 03-Jul-2002 maxim

o Strict interface names comparison. The old code assumed "fxp1" == "fxp11".
o Use an appropriate constant for interface name buffer.

Reviewed by: luigi
Approved by: luigi
MFC after: 1 month


99250 02-Jul-2002 mini

Check retifma for NULL before using it.

PR: kern/9391
Submitted by: Assar Westerlund <assar@sics.se>
MFC after: 3 days


99126 30-Jun-2002 luigi

Remove one useless variable.


98849 26-Jun-2002 ken

At long last, commit the zero copy sockets code.

MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes.

ti.4: Update the ti(4) man page to include information on the
TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options,
and also include information about the new character
device interface and the associated ioctls.

man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated
links.

jumbo.9: New man page describing the jumbo buffer allocator
interface and operation.

zero_copy.9: New man page describing the general characteristics of
the zero copy send and receive code, and what an
application author should do to take advantage of the
zero copy functionality.

NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS,
TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT.

conf/files: Add uipc_jumbo.c and uipc_cow.c.

conf/options: Add the 5 options mentioned above.

kern_subr.c: Receive side zero copy implementation. This takes
"disposable" pages attached to an mbuf, gives them to
a user process, and then recycles the user's page.
This is only active when ZERO_COPY_SOCKETS is turned on
and the kern.ipc.zero_copy.receive sysctl variable is
set to 1.

uipc_cow.c: Send side zero copy functions. Takes a page written
by the user and maps it copy on write and assigns it
kernel virtual address space. Removes copy on write
mapping once the buffer has been freed by the network
stack.

uipc_jumbo.c: Jumbo disposable page allocator code. This allocates
(optionally) disposable pages for network drivers that
want to give the user the option of doing zero copy
receive.

uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are
enabled if ZERO_COPY_SOCKETS is turned on.

Add zero copy send support to sosend() -- pages get
mapped into the kernel instead of getting copied if
they meet size and alignment restrictions.

uipc_syscalls.c:Un-staticize some of the sf* functions so that they
can be used elsewhere. (uipc_cow.c)

if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid
calling malloc() with M_WAITOK. Return an error if
the M_NOWAIT malloc fails.

The ti(4) driver and the wi(4) driver, at least, call
this with a mutex held. This causes witness warnings
for 'ifconfig -a' with a wi(4) or ti(4) board in the
system. (I've only verified for ti(4)).

ip_output.c: Fragment large datagrams so that each segment contains
a multiple of PAGE_SIZE amount of data plus headers.
This allows the receiver to potentially do page
flipping on receives.

if_ti.c: Add zero copy receive support to the ti(4) driver. If
TI_PRIVATE_JUMBOS is not defined, it now uses the
jumbo(9) buffer allocator for jumbo receive buffers.

Add a new character device interface for the ti(4)
driver for the new debugging interface. This allows
(a patched version of) gdb to talk to the Tigon board
and debug the firmware. There are also a few additional
debugging ioctls available through this interface.

Add header splitting support to the ti(4) driver.

Tweak some of the default interrupt coalescing
parameters to more useful defaults.

Add hooks for supporting transmit flow control, but
leave it turned off with a comment describing why it
is turned off.

if_tireg.h: Change the firmware rev to 12.4.11, since we're really
at 12.4.11 plus fixes from 12.4.13.

Add defines needed for debugging.

Remove the ti_stats structure, it is now defined in
sys/tiio.h.

ti_fw.h: 12.4.11 firmware.

ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13,
and my header splitting patches. Revision 12.4.13
doesn't handle 10/100 negotiation properly. (This
firmware is the same as what was in the tree previously,
with the addition of header splitting support.)

sys/jumbo.h: Jumbo buffer allocator interface.

sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to
indicate that the payload buffer can be thrown away /
flipped to a userland process.

socketvar.h: Add prototype for socow_setup.

tiio.h: ioctl interface to the character portion of the ti(4)
driver, plus associated structure/type definitions.

uio.h: Change prototype for uiomoveco() so that we'll know
whether the source page is disposable.

ufs_readwrite.c:Update for new prototype of uiomoveco().

vm_fault.c: In vm_fault(), check to see whether we need to do a page
based copy on write fault.

vm_object.c: Add a new function, vm_object_allocate_wait(). This
does the same thing that vm_object allocate does, except
that it gives the caller the opportunity to specify whether
it should wait on the uma_zalloc() of the object structre.

This allows vm objects to be allocated while holding a
mutex. (Without generating WITNESS warnings.)

vm_object_allocate() is implemented as a call to
vm_object_allocate_wait() with the malloc flag set to
M_WAITOK.

vm_object.h: Add prototype for vm_object_allocate_wait().

vm_page.c: Add page-based copy on write setup, clear and fault
routines.

vm_page.h: Add page based COW function prototypes and variable in
the vm_page structure.

Many thanks to Drew Gallatin, who wrote the zero copy send and receive
code, and to all the other folks who have tested and reviewed this code
over the years.


98718 24-Jun-2002 imp

Add kernel print bits #define for the IEEE80211_CAPINFO bits.


98669 23-Jun-2002 luigi

fix indentation, whitespace and a few comments.


98613 22-Jun-2002 luigi

Remove (almost all) global variables that were used to hold
packet forwarding state ("annotations") during ip processing.
The code is considerably cleaner now.

The variables removed by this change are:

ip_divert_cookie used by divert sockets
ip_fw_fwd_addr used for transparent ip redirection
last_pkt used by dynamic pipes in dummynet

Removal of the first two has been done by carrying the annotations
into volatile structs prepended to the mbuf chains, and adding
appropriate code to add/remove annotations in the routines which
make use of them, i.e. ip_input(), ip_output(), tcp_input(),
bdg_forward(), ether_demux(), ether_output_frame(), div_output().

On passing, remove a bug in divert handling of fragmented packet.
Now it is the fragment at offset 0 which sets the divert status of
the whole packet, whereas formerly it was the last incoming fragment
to decide.

Removal of last_pkt required a change in the interface of ip_fw_chk()
and dummynet_io(). On passing, use the same mechanism for dummynet
annotations and for divert/forward annotations.

option IPFIREWALL_FORWARD is effectively useless, the code to
implement it is very small and is now in by default to avoid the
obfuscation of conditionally compiled code.

NOTES:
* there is at least one global variable left, sro_fwd, in ip_output().
I am not sure if/how this can be removed.

* I have deliberately avoided gratuitous style changes in this commit
to avoid cluttering the diffs. Minor stule cleanup will likely be
necessary

* this commit only focused on the IP layer. I am sure there is a
number of global variables used in the TCP and maybe UDP stack.

* despite the number of files touched, there are absolutely no API's
or data structures changed by this commit (except the interfaces of
ip_fw_chk() and dummynet_io(), which are internal anyways), so
an MFC is quite safe and unintrusive (and desirable, given the
improved readability of the code).

MFC after: 10 days


98540 21-Jun-2002 fenner

Update for libpcap 0.7.1

Originally-committed-to-wrong-repository by: fenner


98385 18-Jun-2002 tanimura

Remove so*_locked(), which were backed out by mistake.


97658 31-May-2002 tanimura

Back out my lats commit of locking down a socket, it conflicts with hsu's work.

Requested by: hsu


97649 31-May-2002 silby

Ensure that packet counts are always reset to 0 when
a route is cloned. Previously, they took on the count
of their parent route (which was sometimes nonzero.)

Submitted by: Andre Oppermann <oppermann@pipeline.ch>
MFC after: 5 days


97512 29-May-2002 phk

Add one copy of crc32() and crc32_tab[] in libkern, and remove it two other
places.

Comment out crc32 related definitions in zlib.h, we don't seem to have the
corresponding code in our kernel.


97290 25-May-2002 brooks

Make discard devices clonable and unloadable. Also, change the
interface name from ds# to disc#.


97289 25-May-2002 brooks

Move all unit number management cloned interfaces into the cloning
code. The reverts the API change which made the <if>_clone_destory()
functions return an int instead of void bringing us into closer
alignment with NetBSD.

Reviewed by: net (a long time ago)


97220 24-May-2002 peter

Fix warning; remove unused arg that was passed through uninitialized.


97093 22-May-2002 bde

Include <sys.systm.h> for the declaration of some atomic functions -- don't
depend on namespace pollution in <sys/mutex.h>.


97024 20-May-2002 iedowse

Avoid exposing struct if_clone and the sys/queue.h macros to userland
programs by restricting these to the case where _KERNEL is defined.

Reviewed by: brooks (ages ago)


96972 20-May-2002 tanimura

Lock down a socket, milestone 1.

o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a
socket buffer. The mutex in the receive buffer also protects the data
in struct socket.

o Determine the lock strategy for each members in struct socket.

o Lock down the following members:

- so_count
- so_options
- so_linger
- so_state

o Remove *_locked() socket APIs. Make the following socket APIs
touching the members above now require a locked socket:

- sodisconnect()
- soisconnected()
- soisconnecting()
- soisdisconnected()
- soisdisconnecting()
- sofree()
- soref()
- sorele()
- sorwakeup()
- sotryfree()
- sowakeup()
- sowwakeup()

Reviewed by: alfred


96755 16-May-2002 trhodes

More s/file system/filesystem/g


96511 13-May-2002 luigi

Add ipfw hooks to ether_demux() and ether_output_frame().
Ipfw processing of frames at layer 2 can be enabled by the sysctl variable

net.link.ether.ipfw=1

Consider this feature experimental, because right now, the firewall
is invoked in the places indicated below, and controlled by the
sysctl variables listed on the right. As a consequence, a packet
can be filtered from 1 to 4 times depending on the path it follows,
which might make a ruleset a bit hard to follow.

I will add an ipfw option to tell if we want a given rule to apply
to ether_demux() and ether_output_frame(), but we have run out of
flags in the struct ip_fw so i need to think a bit on how to implement
this.

to upper layers
| |
+----------->-----------+
^ V
[ip_input] [ip_output] net.inet.ip.fw.enable=1
| |
^ V
[ether_demux] [ether_output_frame] net.link.ether.ipfw=1
| |
+->- [bdg_forward]-->---+ net.link.ether.bridge_ipfw=1
^ V
| |
to devices


96400 11-May-2002 kbyanc

Fix logic inversion bug.


96349 10-May-2002 joerg

Fix a misplaced break statement within a switch that accidentally made
it into an "#ifdef INET6" block. This caused a (harmless but annoying)
EINVAL return value to be sent even though the operation completed
successfully.

PR: kern/37786
Submitted by: Ari Suutari <ari.suutari@syncrontech.com>,David Malone <dwmalone@maths.tcd.ie>
MFC after: 1 day


96245 09-May-2002 luigi

Cleanup the interface to ip_fw_chk, two of the input arguments
were totally useless and have been removed.

ip_input.c, ip_output.c:
Properly initialize the "ip" pointer in case the firewall does an
m_pullup() on the packet.

Remove some debugging code forgotten long ago.

ip_fw.[ch], bridge.c:
Prepare the grounds for matching MAC header fields in bridged packets,
so we can have 'etherfw' functionality without a lot of kernel and
userland bloat.


96203 08-May-2002 kbyanc

Roll my own min() (named ISO88025_MIN() so as to not cause conflicts) so
that this header may be included from userland where min() may not be
declared (or worse, declared differently). I open to alternative
solutions.


96184 07-May-2002 kbyanc

Move ISO88025 source routing information into sockaddr_dl's sdl_data
field. This returns the sdl_data field to a variable-length field. More
importantly, this prevents a easily-reproduceable data-corruption bug when
the interface name plus the hardware address exceed the sdl_data field's
original 12 byte limit. However, token-ring interfaces may still overflow
the new sdl_data field's 46 byte limit if the interface name exceeds 6
characters (since 6 characters for interface name plus 6 for hardware
address plus 34 for source routing = the size of sdl_data). Further
refinements could overcome this limitation but would break binary
compatibility; this commit only addresses fixing the bug for
commonly-occuring cases without breaking binary compatibility with the
intention that the functionality can be MFC'ed to -stable.

See message ID's (both send to -arch):
20020421013332.F87395-100000@gateway.posi.net
20020430181359.G11009-300000@gateway.posi.net
for a more thorough description of the bug addressed and how to
reproduce it.

Approved by: silence on -arch and -net
Sponsored by: NTT Multimedia Communications Labs
MFC after: 1 week


96174 07-May-2002 imp

MFOpenBSD: ibss and ibss-master.

ibss is the modern ad-hoc mode. ibss-master is the same, except that
it creates the ibss network. This distinction is necessary because
some supported cards (symbol) support the former without supporting
the latter.

A seprate commit will introduce a demo-adhoc mode so that we can
disentwingle the multiple, mutually exclusive meandings of adhoc in
the present state of affairs.

Submitted by: jhay


96173 07-May-2002 imp

Minor style nit


96122 06-May-2002 alfred

Make funsetown() take a 'struct sigio **' so that the locking can
be done internally.

Ensure that no one can fsetown() to a dying process/pgrp. We need
to check the process for P_WEXIT to see if it's exiting. Process
groups are already safe because there is no such thing as a pgrp
zombie, therefore the proctree lock completely protects the pgrp
from having sigio structures associated with it after it runs
funsetownlst.

Add sigio lock to witness list under proctree and allproc, but over
proc and pgrp.

Seigo Tanimura helped with this.


95883 01-May-2002 alfred

Redo the sigio locking.

Turn the sigio sx into a mutex.

Sigio lock is really only needed to protect interrupts from dereferencing
the sigio pointer in an object when the sigio itself is being destroyed.

In order to do this in the most unintrusive manner change pgsigio's
sigio * argument into a **, that way we can lock internally to the
function.


95848 01-May-2002 obrien

"pointers are not permitted as case values", so force the macros to ints.


95759 30-Apr-2002 tanimura

Revert the change of #includes in sys/filedesc.h and sys/socketvar.h.

Requested by: bde

Since locking sigio_lock is usually followed by calling pgsigio(),
move the declaration of sigio_lock and the definitions of SIGIO_*() to
sys/signalvar.h.

While I am here, sort include files alphabetically, where possible.


95702 29-Apr-2002 phk

Move us yet closer to IFM_* definitions in NetBSD.


95673 28-Apr-2002 phk

Follow NetBSD and s/IFM_1000_TX/IFM_1000_T/


95552 27-Apr-2002 tanimura

Add a global sx sigio_lock to protect the pointer to the sigio object
of a socket. This avoids lock order reversal caused by locking a
process in pgsigio().

sowakeup() and the callers of it (sowwakeup, soisconnected, etc.) now
require sigio_lock to be locked. Provide sowwakeup_locked(),
soisconnected_locked(), and so on in case where we have to modify a
socket and wake up a process atomically.


95023 19-Apr-2002 suz

just merged cosmetic changes from KAME to ease sync between KAME and FreeBSD.
(based on freebsd4-snap-20020128)

Reviewed by: ume
MFC after: 1 week


94660 14-Apr-2002 fjoe

Cosmetical change: remove empty line to reduce diffs to RELENG_4


94489 12-Apr-2002 imp

Add hostap 802.11 media type.

From wi_hostap stuff by Thomas Skibo


94398 11-Apr-2002 imp

Add two more IEEE80211 defines for status.


94385 10-Apr-2002 dwmalone

Swap a bzero for an M_ZERO. Borris approved this ages ago, but
the hard drive with the patch on it went south before I committed
it.

Approved by: bp


94348 10-Apr-2002 peter

Add missing 'struct ifreq ifr;' that was forgotten in the last commit.


94344 10-Apr-2002 suz

fixed a kernel crash when enabling multicast on vlan interface
owing to a NULL argument to vlan_ioctl() at if_allmulti().

Reviewed by: ume
MFC after: 1 week


93818 04-Apr-2002 jhb

Change callers of mtx_init() to pass in an appropriate lock type name. In
most cases NULL is passed, but in some cases such as network driver locks
(which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used.

Tested on: i386, alpha, sparc64


93752 04-Apr-2002 luigi

Replace (deprecated ?) FREE() macro with direct calls to free()


93750 04-Apr-2002 luigi

Fix incorrect m_free - m_freem() usage.


93748 04-Apr-2002 luigi

Fix a couple of incorrect m_free() vs. m_freem() usages and related issues.

Reviewed-by: brooks


93593 01-Apr-2002 jhb

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


93546 01-Apr-2002 ume

Make `route add -inet6 default ::1 -ifp gif0' work actually.
The change between 1.13 and 1.14 is specific to AF_INET.

MFC after: 1 week


93383 29-Mar-2002 mdodd

- Merge the pdq driver (if_fpa and if_fea) from NetBSD.
Among other things this gets us ifmedia support.
- Update fddi_ifattach() to take an additional argument.


93382 29-Mar-2002 mdodd

- Define fddibroadcastaddr in if_fddisubr.c.
- Add fddi_ifdetach() and fddi_ioctl().


93381 29-Mar-2002 mdodd

- Use ifp->if_broadcastaddr when possible.
- Remove unnecessary preprocessor conditional.


93380 29-Mar-2002 mdodd

- Add a comment.
- Whitespace.
- Remove forgotten duplicate assignments in fddi_ifattach().


93379 29-Mar-2002 mdodd

- Update interface statistics on error conditions.
- Make sure the interface is UP and RUNNING in fddi_input().
- Reorder and comment packet tests in fddi_input().
- Call if_attach() in fddi_ifattach().
- Test for a valid return from ifaddr_byindex().


93377 29-Mar-2002 mdodd

- Whitespace changes.
- Formatting.
- Use macro, not magic numbers.
- Move a dropanyway label in fddi_input() to end of function.


93376 29-Mar-2002 mdodd

Back a small part of the last patch.


93375 29-Mar-2002 mdodd

- Simplify first arg of nd6_storelladdr().
- Use struct fddi_header where appropriate.
- Use bcopy() rather than memcpy().
- Use FDDI_ADDR_LEN macro instead of ETHER_ADDR_LEN macro.
- Add loadable module support.


93373 29-Mar-2002 mdodd

- Use net/fddi.h rather than netinet/if_fddi.h.
- Use FDDI_ADDR_LEN rather than a magic number or a sizeof().
- Hide distracting sizeof() behind FDDI_HDR_LEN macro.
- Don't use sizeof(struct llc) in areas where we mean LLC_SNAPFRAMELEN.


93372 29-Mar-2002 mdodd

Sync defines with NetBSD.
Define FDDI_ADDR_LEN and use it.


93371 29-Mar-2002 mdodd

Remove unnecessary LLC defines and use the standard ones.


93369 29-Mar-2002 mdodd

- style(9) fixes for 'return'.
- retire RTALLOC1 and ARPRESOLVE macros.
- use IFP2AC to hide discracting casts.


93368 29-Mar-2002 mdodd

Un-ifdef.


93367 29-Mar-2002 mdodd

De-register.


93366 29-Mar-2002 mdodd

Sync with NetBSD.


93084 24-Mar-2002 bde

Fixed some style bugs in the removal of __P(()). The main ones were
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses. Switch to KNF
formatting and/or rewrap the whole prototype in some cases.


93013 23-Mar-2002 jedgar

Work around zlib bug where using a deflate window size of 8 will
cause memory corruption.


92749 20-Mar-2002 dillon

Fix a bug introduced in 1.11 (and also MFCd to stable AND the security branch)
that causes a machine to panic when the kernel PPP / DEFLATE code is used.
1.11 moved a ZFREE to a point after the structural members were clobbered
by stores into a union'd structure.

This commit fixes the bug and adds a big whopping comment to make sure
the code isn't 'cleaned up' again :-)

Ian Dowse came up with the same patch independantly 68 seconds before I
did, talk about Karma!

I would also like to thank Eugene Grosbein for marathon work in tracking the
problem down by udpating his -stable based on date over and over again
to close in on the commit that caused his crashes.

PR: kern/35969
Reviewed by: Ian Dowse <iedowse@maths.tcd.ie>
X-MFC after: immediately


92725 19-Mar-2002 alfred

Remove __P.


92522 18-Mar-2002 cjc

Add hooks for very basic IPFilter support in bridging. Set,

# sysctl net.link.ether.bdg_ipf=1

To enable. Just like ipfw(8) bridging, only input packets are filtered
in the bridge. Filtering works just like in the IP layer, ipf(8)
first, then ipfw(8). And just like in the IP layer, both are
independent, one need not be run to use the other. (Note: This will
not work in, but doesn't break, the bridge.ko module. The ipl.ko
module would need to be fixed before that is worth worrying about.)

Reviewed by: luigi


92260 14-Mar-2002 alfred

Missed this file for select SMP fixes associated with rev 1.93 of
kern/sys_generic.c


92081 11-Mar-2002 mux

Simplify the interface cloning framework by handling unit
unit allocation with a bitmap in the generic layer. This
allows us to get rid of the duplicated rman code in every
clonable interface.

Reviewed by: brooks
Approved by: phk


91699 05-Mar-2002 green

Use revoke_and_destroy_dev() instead of destroy_dev() when removing /dev/net
pseudo-devices when an interface goes away. Otherwise, an open /dev/net/foo0
when the interface is removed can cause a crash.

Not objected to by: jlemon


91674 05-Mar-2002 maxim

Remove duplicated and wrong sc->sc_last_recv setting. It unbreaks
active-filter in pppd(8).

PR: kern/12281
Submitted by: Tim Moore <moore@bricoworks.com>
Not objected by: peter
Reviewed by: ru
Approved by: ru
MFC after: 1 week


91650 05-Mar-2002 cjc

Unbreak bridge.ko. Replace an unresolved symbol with the actions it
was meant to take.

Submitted by: luigi
Approved by: luigi
MFC after: 3 days


91648 04-Mar-2002 brooks

Add cloning support to the loopback interface.

Submitted by: mux


91647 04-Mar-2002 brooks

Change the network interface cloning API so the destroy function returns
an int errorcode instead of void in preperation for merging cloning of
the loopback device.

Submitted by: mux
MFC after: 2 weeks


91452 28-Feb-2002 peter

Fix warnings.


91406 27-Feb-2002 jhb

Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.


91327 26-Feb-2002 brooks

Fix warnings in the gif(4) driver so it compiles with -Werror.


91317 26-Feb-2002 dillon

Did someone turn on -Werror or something?

Fix kernel breakage.


91275 26-Feb-2002 imp

minor style(9) fix: return (foo); The file was mostly style(9) before.


91272 26-Feb-2002 brooks

When using hardware decoding, reconstruct the wire form of the ethernet
header and push it up any attached bpf devices on the parent interface.
This makes hardware vlan decoding more like the normal software path.

Tested by: cjtt@employees.org
MFC after: 2 weeks


91270 26-Feb-2002 brooks

Make gif(4) nesting level and parallel tunnel support tunable at runtime
via sysctl's. The old #defines, MAX_GIF_NEST and XBONEHACK are
currently supported for backwards compatability, but will probably be
removed at some point in the future.


91266 26-Feb-2002 peter

Fix a warning by pulling prototype for arp_ifinit() into scope.
Then fix cast the correct value into an incorrect value, which was not
detected due to the missing prototype (but was harmless anyway).


91140 23-Feb-2002 tanimura

Lock struct pgrp, session and sigio.

New locks are:

- pgrpsess_lock which locks the whole pgrps and sessions,
- pg_mtx which protects the pgrp members, and
- s_mtx which protects the session members.

Please refer to sys/proc.h for the coverage of these locks.

Changes on the pgrp/session interface:

- pgfind() needs the pgrpsess_lock held.

- The caller of enterpgrp() is responsible to allocate a new pgrp and
session.

- Call enterthispgrp() in order to enter an existing pgrp.

- pgsignal() requires a pgrp lock held.

Reviewed by: jhb, alfred
Tested on: cvsup.jp.FreeBSD.org
(which is a quad-CPU machine running -current)


90875 18-Feb-2002 luigi

When the local link address is changed, send out gratuitous ARPs
to notify other nodes about the address change. Otherwise, they
might try and keep using the old address until their arp table
entry times out and the address is refreshed.

Maybe this ought to be done for INET6 addresses as well but i have
no idea how to do it. It should be pretty straightforward though.

MFC-after: 10 days


90868 18-Feb-2002 mike

o Move NTOHL() and associated macros into <sys/param.h>. These are
deprecated in favor of the POSIX-defined lowercase variants.
o Change all occurrences of NTOHL() and associated marcros in the
source tree to use the lowercase function variants.
o Add missing license bits to sparc64's <machine/endian.h>.
Approved by: jake
o Clean up <machine/endian.h> files.
o Remove unused __uint16_swap_uint32() from i386's <machine/endian.h>.
o Remove prototypes for non-existent bswapXX() functions.
o Include <machine/endian.h> in <arpa/inet.h> to define the
POSIX-required ntohl() family of functions.
o Do similar things to expose the ntohl() family in libstand, <netinet/in.h>,
and <sys/param.h>.
o Prepend underscores to the ntohl() family to help deal with
complexities associated with having MD (asm and inline) versions, and
having to prevent exposure of these functions in other headers that
happen to make use of endian-specific defines.
o Create weak aliases to the canonical function name to help deal with
third-party software forgetting to include an appropriate header.
o Remove some now unneeded pollution from <sys/types.h>.
o Add missing <arpa/inet.h> includes in userland.

Tested on: alpha, i386
Reviewed by: bde, jake, tmm


90775 17-Feb-2002 jedgar

Error handling fixes for inflate.


90678 15-Feb-2002 luigi

Lots of improvement to the bridging code.
In order of importance:

+ each cluster now uses private data structures (filtering and
local address tables) so you can treat them as fully independent
switches. This part of the work was supported by:
Cisco Systems, Inc. - NSITE lab, RTP, NC.

+ cleaned up the handling of configuration, so the system will behave
much better when real or pseudo devices are dynamically attached
or detached. It should also not panic anymore on systems with large
number of devices, closing a few existings PRs on the topic.

+ while at it, add support for VLAN. This means that a FreeBSD box
can now work as a real VLAN switch, with trunk interfaces etc.
As an example:
ifconfig vlan0 vlan 3 vlandev dc0
ifconfig vlan1 vlan 4 vlandev dc0
net.link.ether.bridge_cfg="vlan0:3,dc1:3,vlan1:4,dc1:4"
uses dc0 as a trunk interface, and dc1 and dc3 as ports on vlans 3 and 4
You get the idea...
NOTA BENE: by default bridge_cfg is initialised to "" so even if
you enable bridging, no packets will be bridged until you set the
list of interfaces on which you want this to happen.

+ large restructuring of the code, moving private vars and types from
bridge.h to bridge.c.

+ added a lot of comments to the code to explain how to use it.


90677 15-Feb-2002 luigi

Remove useless initialization to 0 of a couple of global variables.


90631 13-Feb-2002 fjoe

remove superflous empty line (in preparation to MFC)


90227 05-Feb-2002 dillon

Get rid of the twisted MFREE() macro entirely.

Reviewed by: dg, bmilekic
MFC after: 3 days


89883 27-Jan-2002 gallatin

Prevent the kernel from generating an unaligned sysctl data buffer on
64-bit platforms. The unaligned access is caused by struct ifa_msghdr
not being a multiple of 8-bytes in size. If an interface has an odd
number of addresses, this causes the next interface to generate an
unaligned access in the user-level app walking the interfaces (ifconfig).

Submitted by: Bernd Walter <ticso@cicely8.cicely.de>


89768 25-Jan-2002 cjc

Have sysctl() return the correct errno(2) as documented in the
sysctl(3) manpage.

Submitted by: ru
Obtained from: BSD/OS


89498 18-Jan-2002 ru

Introduce an interface announcement message for the routing
socket so that routing daemons and other interested parties
know when an interface is attached/detached.

PR: kern/33747
Obtained from: NetBSD
MFC after: 2 weeks


89263 11-Jan-2002 jesper

It turns out that when a broadcast packet is looped back, the checksums
are checked on the way in even if they were not calculated on the
way out.

This fixes rwhod

PR: 31954
Submitted by: fenner
Approved by: fenner
MFC after: 1 week


89099 08-Jan-2002 fjoe

- generic Arcnet framework
- device driver for SMC COM90cx6 Arcnet network adapters

Obtained from: NetBSD


89069 08-Jan-2002 msmith

Initialise the intrq_present fields at runtime, not link time. This allows
us to load protocols at runtime, and avoids the use of common variables.

Also fix the ip6_intrq assignment so that it works at all.


89065 08-Jan-2002 msmith

Staticise private interface lists.


88723 30-Dec-2001 joerg

Implement an option to administratively disable the negotiation of
IPv6 on an sppp interface. In an IPv6-enabled kernel, every IPv6
interface automatically gets an IPv6 address assigned (and IPv6
multicast packets sent at initialization time). For sppp links where
we know our remote peer wouldn't support IPv6 at all, there's no point
in attempting to negotiate IPV6CP (or to even dial out for an IPv6
packet at all for dial-on-demand interfaces).

I wish there were a more generic way to administratively disable IPv6
on an interface instead. ume told me there isn't.

While i was at it, converted both, enable_vj and enable_ipv6 into flag
bits in struct sppp (enable_vj used to be an int of its own).

MFC after: 1 month


88716 30-Dec-2001 joerg

Merge last-minute fix from the i4b file made by gj:

Protect mtx_init() invocations with mtx_intialized() checks to avoid a
reported panic.

MFC after: 1 month


88711 30-Dec-2001 joerg

Bump AUTHNAMELEN to 64. Should probably be made dynamic instead.


88710 30-Dec-2001 joerg

We explicitly close LCP when going to state CLOSED, so we better open
it again when going from INITIAL to STARTING. This has been done for
passive or auto-conecting interfaces always, but not for permanent
ones.

Obtained from: NetBSD (rev 1.32)


88709 30-Dec-2001 joerg

run IPCP only if we have IPv4 in kernel

Obtained from: NetBSD (rev 1.19)
MFC after: 1 month


88706 30-Dec-2001 joerg

Fix a long-standing blatant bug where the operator precedence between
& and && has been botched. This was likely the cause for some havoc
with various negotiation cases of sppp in the past.

Obtained from: NetBSD (rev 1.13)
MFC after: 1 week


88705 30-Dec-2001 joerg

Fix compilation without INET (though not really tested yet without
INET).

Obtained from: NetBSD (rev 1.12)
MFC after: 1 month


88704 30-Dec-2001 joerg

Add the `packed' attribute to structures which describe wire protocol
data formats.

Obtained from: NetBSD (rev 1.6)
MFC after: 1 month


88702 30-Dec-2001 joerg

Extend the hack where 0.0.0.1 meant `any address for remote is
acceptable' to addresses 0.0.0.*. This allows for multiple such
interfaces.

MFC after: 1 month


88700 30-Dec-2001 joerg

Fix the handling of VJ uncompression. Unfortunately, tcp_uncompress()
makes the implied assumption there were another 128 bytes of space in
front of the packet handed off to it... which is not the case for
sppp. This could easily end up in corrupting random memory.

This fix is about the same as revs 1.6, 1.8, and 1.9 from our
i4b_ispppsubr.c.

Also fixed IPCP option negotiation to zero out the options when
starting IPCP. Otherwise, if negotiation parameters change between
various IPCP startups, it could happen that old options would still be
requested (this happened if VJ was turned off, and ended up in half
off the link still negotiating for VJ compression).

IMHO, the base system's sppp is now feature-wise up to date with the
one in the i4b part of the tree, so the latter can be disabled.

MFC after: 1 month


88660 29-Dec-2001 jake

sparc64 needs the same alingment fixes that alpha and ia64 do.

Submitted by: tmm


88659 29-Dec-2001 jake

sparc64 needs the same alignment fixes that ia64 and alpha need.

Submitted by: tmm


88600 28-Dec-2001 joerg

Convert sppp_params() to use a malloced structure in order to reduce
kernel stack usage.

This effectively merges rev 1.3 of i4b's i4b_ispppsubr.c.

MFC after: 1 month


88599 28-Dec-2001 joerg

Fix my breakage to the low-level hardware sync drivers brought by the
inclusion of VJ compression into sppp.

Now, instead of the need to include this and that and everything plus
the kitchensink in each of those drivers, struct sppp uses struct
slcompress as an opaque structure only referenced by a pointer. The
actual structure is then malloced at initialization time.

While i was at it, also fixed a bug where received VJ packets would only
be recognized if INET6 was defined.


88577 28-Dec-2001 joerg

Implement timestamps so i4b/driver/i4b_isppp.c can derive the idle
time from the PPP packets sent. This effectively merges rev 1.2 of
the old i4b_ispppsubr.c, with the exception that i eventually ended up
in debugging and fixing it so the idle time is now really
detected. ;-) (The version in i4b simply doesn't work right since it
still accounts for incoming LCP echo packets which it is supposed to
ignore for idle time considerations...)

Obtained from: i4b
MFC after: 1 month


88558 27-Dec-2001 joerg

Break out the relevant fields from struct sppp into a struct
sppp_parms that are needed for the SPPPIO[GS]DEFS ioctl commands.
This allows it to keep struct sppp inside #ifdef _KERNEL (where it
belongs), and prevents userland programs that wish to include
<net/if_sppp.h> from including the earth, the hell, and the universe
before the are able to resolve all the kernel-internal stuff that's in
struct sppp.

Discussed with: hm
MFC after: 1 month


88550 27-Dec-2001 joerg

Make the LCP restart timer configurable.

This (effectively) merges rev 1.36 of i4b's old if_spppsubr.c, albeit
in a slightly different manner (we export the timer in millisecond
values as exposed to tick values from/to userland).

Obtained from: i4b
MFC after: 1 month


88534 27-Dec-2001 joerg

Implement VJ header compression for sppp.

This is the logical merge of rev 1.32 of i4b's old if_spppsubr.c (which
was based on PR misc/11767), plus (i4b) rev 1.6 of i4b's if_ispppsubr.c,
albeit with numerous stylistic and cosmetic changes.

PR: misc/11767
Submitted by: i4b, Joachim Kuebart
MFC after: 1 month


88508 26-Dec-2001 joerg

Don't log RXJ+ protocol rejects unless we are in debug mode. (RXJ-
events are always logged.) This stops sppp from spamming the syslog
files in case the remote peer is not configured to negotiate IPv6.


88507 26-Dec-2001 joerg

Fix some pseudo-enumeration constants in the IPv6 implementation so
they are unique and thus actually usable as flagbits. I wonder how
it even worked so far...

MFC after: 1 week


88506 26-Dec-2001 joerg

Ignore (and silently conf-ack) conf-reqs for an Async-Control-
Character-Map. RFC 1662 demands it for the sake of async to sync
PPP protocol converters (like Win9* :).

This merges rev 1.26/1.27 of the old i4b sppp changes.


88503 26-Dec-2001 joerg

For SIOCSIFADDR, don't call if_up() since it would attempt to add the
route to the destination twice. Now that brian has fixed route.c to no
longer accept this second route, this long-standing nuisance became a
showstopper bug for sppp users.

In retrospect, this is the same fix as the one in rev 1.78 of if_sl.c;
most likely the original version of sppp has been cloned from SLIP. ;-)


88198 19-Dec-2001 brian

It's no longer necessary to ensure that ``gate'' is set when RTF_GATEWAY
is passed, as subsequent code does that check now anyway.

Submitted by: ru


88196 19-Dec-2001 brian

Only call rt_getifa() if we've either been passed a gateway or
if we've been given an RTA_IFP or changed RTA_IFA sockaddr.

This fixes the following bug:
>/dev/tun100
>/dev/tun101
ifconfig tun100 1.2.3.4 5.6.7.8
ifconfig tun101 1.2.3.4 6.7.8.9
route change 6.7.8.9 -ifa 1.2.3.4 -iface -mtu 500
which erroneously changed tun101's host route to have an ifp of tun100
(rt_getifa() sets the ifp after calling ifa_ifwithnet(1.2.3.4))

This incarnation submitted by: ru


88034 17-Dec-2001 brooks

Initalize ifq_maxlen to prevent a harmless warning message.

MFC After: 1 day
Pointed out by: jacks@sage-american.com, bmah


87955 14-Dec-2001 jdp

Make bpf's read timeout feature work more correctly with
select/poll, and therefore with pthreads. I doubt there is any way
to make this 100% semantically identical to the way it behaves in
unthreaded programs with blocking reads, but the solution here
should do the right thing for all reasonable usage patterns.

The basic idea is to schedule a callout for the read timeout when a
select/poll is done. When the callout fires, it ends the select if
it is still in progress, or marks the state as "timed out" if the
select has already ended for some other reason. Additional logic in
bpfread then does the right thing in the case where the timeout has
fired.

Note, I co-opted the bd_state member of the bpf_d structure. It has
been present in the structure since the initial import of 4.4-lite,
but as far as I can tell it has never been used.

PR: kern/22063 and bin/31649
MFC after: 3 days


87914 14-Dec-2001 jlemon

whitespace fixes.


87912 14-Dec-2001 jlemon

minor style fix.


87902 14-Dec-2001 luigi

Device Polling code for -current.

Non-SMP, i386-only, no polling in the idle loop at the moment.

To use this code you must compile a kernel with

options DEVICE_POLLING

and at runtime enable polling with

sysctl kern.polling.enable=1

The percentage of CPU reserved to userland can be set with

sysctl kern.polling.user_frac=NN (default is 50)

while the remainder is used by polling device drivers and netisr's.
These are the only two variables that you should need to touch. There
are a few more parameters in kern.polling but the default values
are adequate for all purposes. See the code in kern_poll.c for
more details on them.

Polling in the idle loop will be implemented shortly by introducing
a kernel thread which does the job. Until then, the amount of CPU
dedicated to polling will never exceed (100-user_frac).
The equivalent (actually, better) code for -stable is at

http://info.iet.unipi.it/~luigi/polling/

and also supports polling in the idle loop.

NOTE to Alpha developers:
There is really nothing in this code that is i386-specific.
If you move the 2 lines supporting the new option from
sys/conf/{files,options}.i386 to sys/conf/{files,options} I am
pretty sure that this should work on the Alpha as well, just that
I do not have a suitable test box to try it. If someone feels like
trying it, I would appreciate it.

NOTE to other developers:
sure some things could be done better, and as always I am open to
constructive criticism, which a few of you have already given and
I greatly appreciated.
However, before proposing radical architectural changes, please
take some time to possibly try out this code, or at the very least
read the comments in kern_poll.c, especially re. the reason why I
am using a soft netisr and cannot (I believe) replace it with a
simple timeout.

Quick description of files touched by this commit:

sys/conf/files.i386
new file kern/kern_poll.c
sys/conf/options.i386
new option
sys/i386/i386/trap.c
poll in trap (disabled by default)
sys/kern/kern_clock.c
initialization and hardclock hooks.
sys/kern/kern_intr.c
minor swi_net changes
sys/kern/kern_poll.c
the bulk of the code.
sys/net/if.h
new flag
sys/net/if_var.h
declaration for functions used in device drivers.
sys/net/netisr.h
NETISR_POLL
sys/dev/fxp/if_fxp.c
sys/dev/fxp/if_fxpvar.h
sys/pci/if_dc.c
sys/pci/if_dcreg.h
sys/pci/if_sis.c
sys/pci/if_sisreg.h
device driver modifications


87843 14-Dec-2001 dg

Moved the updating of if_ibytes from ether_demux() to ether_input() to fix
a bug where the interface input bytes count wasn't updated when bridging
is enabled.

MFC after: 3 days


87599 10-Dec-2001 obrien

Update to C99, s/__FUNCTION__/__func__/,
also don't use ANSI string concatenation.


87473 07-Dec-2001 arr

- malloc should be passed M_WAITOK, not M_WAIT (a mbuf flag)
- make use of M_ZERO to remove a call to bzero()


87276 03-Dec-2001 brooks

Don't pass an interface pointer to VLAN_INPUT{,_TAG}. Get it from the
mbuf instead.

Suggested by: fenner


87060 28-Nov-2001 brian

Fix a typo in a comment


86843 24-Nov-2001 luigi

Whitespace change - replace leading spaces with tabs.


86797 22-Nov-2001 luigi

Expand the comment on the layout of softc, arpcom and ifnet structures,
and list the places where the assumption is used.


86764 22-Nov-2001 jlemon

Introduce a syncache, which enables FreeBSD to withstand a SYN flood
DoS in an improved fashion over the existing code.

Reviewed by: silby (in a previous iteration)
Sponsored by: DARPA, NAI Labs


86749 21-Nov-2001 arr

- Utilize the great M_ZERO flag rather than allocating memory then do
a call to memset.


86526 18-Nov-2001 arr

- M_ZERO already sets bif_dlist to zero; there is no need to
do it again.


86487 17-Nov-2001 dillon

Give struct socket structures a ref counting interface similar to
vnodes. This will hopefully serve as a base from which we can
expand the MP code. We currently do not attempt to obtain any
mutex or SX locks, but the door is open to add them when we nail
down exactly how that part of it is going to work.


86364 14-Nov-2001 jhb

Remove ifnet.if_mpsafe for now. If this is needed, it won't be needed
until much later when the network stack locking is farther along.

Approved by: jlemon


86106 05-Nov-2001 phk

3.5 years ago Wollman wrote:
"[...] and removes the hostcache code from standard kernels---the
code that depends on it is not going to happen any time soon,
I'm afraid."
Time to clean up.


86047 04-Nov-2001 luigi

MFS: sync the ipfw/dummynet/bridge code with the one recently merged
into stable (mostly , but not only, formatting and comments changes).


85812 01-Nov-2001 luigi

Remove an extra splimp() call.

Spotted-by: diff(1)


85608 27-Oct-2001 dillon

sc_lasttime and sc_starttime are time_t's, not long's.


85305 22-Oct-2001 ru

Remove extra memory region kept by "struct pfil_head pfil_head_t;".

Seems to be a typo for typedef, but we don't want this non-style(9)
typedef anyway.

PR: kern/31356


85181 19-Oct-2001 mjacob

Fix this so it compiles cleanly for alpha. Tried to do some minimal testing.

Reviewed by: freebsd-net


85079 17-Oct-2001 jlemon

Add a SIOCGIFINDEX ioctl, which returns the index of a named interface.
This will be used to more efficiently support if_nametoindex(3).


85077 17-Oct-2001 jlemon

Cleanup ifunit(), so it uses the dev_named() function to map an interface
name into a device.


85074 17-Oct-2001 ru

Pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2.

Have sys/net/route.c:rtrequest1(), which takes ``rt_addrinfo *''
as the argument. Pass rt_addrinfo all the way down to rtrequest1
and ifa->ifa_rtrequest. 3rd argument of ifa->ifa_rtrequest is now
``rt_addrinfo *'' instead of ``sockaddr *'' (almost noone is
using it anyways).

Benefit: the following command now works. Previously we needed
two route(8) invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

Remove unsafe typecast in rtrequest(), from ``rtentry *'' to
``sockaddr *''. It was introduced by 4.3BSD-Reno and never
corrected.

Obtained from: BSD/OS, NetBSD
MFC after: 1 month
PR: kern/28360


85053 17-Oct-2001 ru

Bring in latest CSRG revisions to this file:

- Report destination address of a P2P link when servicing
routing socket messages.

- Report interface name, address, and destination address
of a P2P link when servicing NET_RT_{DUMP,FLAGS} sysctls.

Part of CSRG revision 8.6 coresponds to revision 1.12.
CSRG revision 8.7 corresponds to revision 1.15.


85052 17-Oct-2001 ru

64-bit fixes from CSRG.


85051 17-Oct-2001 ru

Revision 1.32 corresponded to CSRG revision 8.2.


85050 17-Oct-2001 ru

Revision 1.13 corresponded to CSRG revision 8.4.
Revision 1.59 corresponded to CSRG revision 8.5.


85049 17-Oct-2001 ru

Record the fact that revision 1.39 corresponded to CSRG revision 8.4,
and first hunk of revision 1.76 corresponded to CSRG revision 8.3.


85042 17-Oct-2001 fenner

if_index is the highest interface index in the system, not the next
available index.


85040 17-Oct-2001 fenner

The interface index space may be sparsely populated (e.g. when an
interface in the middle is if_detach()'d). Return (and handle)
ENOENT when the ifmib(4) is accessed for a nonexistent interface.

MFC after: 14 days


85005 15-Oct-2001 fenner

Set the interface speed back to zero, after ether_ifattach() set it
to 10Mbps. RFC 2863 says: "For a sub-layer which has no concept
of bandwidth, [ifSpeed] should be zero."


84971 15-Oct-2001 ru

Don't even attempt to clone host routes.

MFC after: 1 week


84931 14-Oct-2001 fjoe

bring in ARP support for variable length link level addresses

Reviewed by: jdp
Approved by: jdp
Obtained from: NetBSD
MFC after: 6 weeks


84853 12-Oct-2001 mjacob

Traverse the list of network interfaces rather than use if_index- if_index is
not guaranteed to be dense with respect to the actual list of interfaces.


84817 11-Oct-2001 jlemon

Fix the ``WARNING: Driver mistake: repeat make_dev'', caused by using
the wrong index variable within a loop. I have no idea how this managed
to work on my test box.

Spotted by: fenner


84787 11-Oct-2001 jlemon

Move device nodes into a /dev/net/ directory, to avoid conflict with
existing devices (e.g.: tunX). This may need a little more thought.

Create a /dev/netX alias for devices. net0 is reserved.

Allow wiring of net aliases in /boot/device.hints of the form:
hint.net.1.dev="lo0"
hint.net.12.ether="00:a0:c9:c9:9d:63"


84785 11-Oct-2001 jlemon

Set if_type and if_addrlen before calling if_attach(), so the values are
available for the routine to use.


84781 10-Oct-2001 jhb

Malloc mutexes pre-zero'd as random garbage (including 0xdeadcode) my
trigget the check to make sure we don't initalize a mutex twice.


84576 06-Oct-2001 fenner

- Fix typo in "didn't find tag in list" code -- != should have been ==.
This fixes the panic when receiving a packet with an unknown tag, and
also allows reception of packets with known tags.
- Allow overlapping tag number spaces when using multiple hardware-assisted
VLAN parent devices (by comparing the parent interface in
vlan_input_tag() just as in vlan_input() ).
- fix typo in comment

MFC after: 1 week


84558 05-Oct-2001 dfr

Add ia64 to the list of machines which don't do unaligned reads.


84516 05-Oct-2001 ps

Make it so dummynet and bridge can be loaded as modules.

Submitted by: billf


84380 02-Oct-2001 mjacob

Documentation comment: note that the each NIC's softc is assumed to start
with an ifnet structure.

MFC after: 1 week


84318 01-Oct-2001 jlemon

Update the hash table when sppp mucks directly with the interface address.


84139 29-Sep-2001 jlemon

Add ability to attach knotes to network devices.
Introduce EVFILT_NETDEV to report network device changes.


84106 29-Sep-2001 jlemon

Introduce network device nodes. Network devices will now automatically
appear in /dev. Interface hardware ioctls (not protocol or routing) can
be performed on the descriptor. The SIOCGIFCONF ioctl may be performed
on the special /dev/network node.


84105 29-Sep-2001 jlemon

Change sysctl_iflist() so it has a single point of return. This will
assist any future locking efforts.


84104 29-Sep-2001 jlemon

Use in_ifaddrhashtbl instead of in_ifaddrhead to look up IP address.


84058 27-Sep-2001 luigi

Two main changes here:
+ implement "limit" rules, which permit to limit the number of sessions
between certain host pairs (according to masks). These are a special
type of stateful rules, which might be of interest in some cases.
See the ipfw manpage for details.

+ merge the list pointers and ipfw rule descriptors in the kernel, so
the code is smaller, faster and more readable. This patch basically
consists in replacing "foo->rule->bar" with "rule->bar" all over
the place.
I have been willing to do this for ages!

MFC after: 1 week


83998 26-Sep-2001 brooks

/home/brooks/ng_gif.message


83997 26-Sep-2001 brooks

Use LIST_ macros instead of TAILQ_ macros to be more like NetBSD.

Obtained from: NetBSD


83934 25-Sep-2001 brooks

Make faith loadable, unloadable, and clonable.


83805 21-Sep-2001 jhb

Use the passed in thread to selrecord() instead of curthread.


83711 20-Sep-2001 ru

Use the current process's credentials rather than socket's cached.
If the process drops its super-user privileges, we certainly don't
want to allow it to modify routing tables.

Discussed with: rwatson


83655 19-Sep-2001 brooks

Make stf a clonable device.

Yes this really is rather silly and the implementation is overkill given
that you are only allowed one of them, but NetBSD implements cloning on
this device and it's a less cluttered example of cloning then most.


83636 18-Sep-2001 jlemon

Split HWCSUM into two components: RX and TX, for the benefit of drivers
which can only do checksum offloading in one direction.


83624 18-Sep-2001 jlemon

Add two fields to the ifnet structure indicating what extra capabilities
a network device has, and which ones are enabled.


83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


83291 10-Sep-2001 kris

Fix some signed/unsigned integer confusion, and add bounds checking of
arguments to some functions.

Obtained from: NetBSD
Reviewed by: peter
MFC after: 2 weeks


83268 10-Sep-2001 peter

Remove/comment tokens after #endif (#endif NETATALK)


83187 07-Sep-2001 julian

Patches from KAME to remove usage of Varargs in existing
IPV4 code. For now they will still have some in the developing stuff (IPv6)

Submitted by: Keiichi SHIMA / <keiichi@iij.ad.jp>
Obtained from: KAME


83185 07-Sep-2001 jlemon

Fix another shortcircuit return() statement that I missed.


83184 07-Sep-2001 jlemon

Fix sense of comparison in space test. Also eliminate a compile
warning and remove a previously existing off-by-one error.


83130 06-Sep-2001 jlemon

Wrap array accesses in macros, which also happen to be lvalues:

ifnet_addrs[i - 1] -> ifaddr_byindex(i)
ifindex2ifnet[i] -> ifnet_byindex(i)

This is intended to ease the conversion to SMPng.


83129 06-Sep-2001 jlemon

Cosmetic cleanups and rearrangement for code to come. There should be
no functional change in this commit.


83115 05-Sep-2001 brooks

Make vlan(4) loadable, unloadable, and clonable. As a side effect,
interfaces must now always enable VLAN support.

Reviewed by: jlemon
MFC after: 3 weeks


83043 05-Sep-2001 brooks

Add cloning support for the tap(4) device similar to that in the tun(4)
device.

Submitted by: Maksim Yevmenkin <myevmenk@digisle.net>


82884 03-Sep-2001 julian

Patches from Keiichi SHIMA <keiichi@iij.ad.jp>
to make ip use the standard protosw structure again.

Obtained from: Well, KAME I guess.


82651 31-Aug-2001 ru

Synch with NetBSD and OpenBSD.

Allow non-superuser to open, listen to, and send safe commands on the
routing socket. Superuser priviledge is required for all commands
but RTM_GET.

Lose `setuid root' bit of route(8).

Reviewed by: wollman, dd


82319 25-Aug-2001 brian

TUNSIFINFO now expects IFF_MULTICAST to be OR'd with either IFF_POINTOPOINT
or IFF_BROADCAST. If it's not, the IFF_MULTICAST is removed.

This is in line with how NetBSD & OpenBSD do it.


82239 23-Aug-2001 dd

Correct the comment about bpfattach() to match reality.

PR: 29967
Submitted by: Joseph Mallett <jmallett@xMach.org>


81788 16-Aug-2001 julian

Fix typo

Submitted by: BDE
MFC after: 2 weeks


81787 16-Aug-2001 julian

Only allocate teh 1540 byte buffer if we need it..
(lazy allocation)

MFC after: 13 days


81736 15-Aug-2001 julian

Don't allocate an entire 1500 byte buffer on the stack.
May need more review in light of SMP.

MFC after: 2 weeks


81215 06-Aug-2001 ume

printed current sequence number of the SA. accordingly, changed
into sadb_x_sa2_sequence from sadb_x_sa2_reserved3 in the sadb_x_sa2
structure. Also the output of setkey is changed. sequence number
of the sadb is replaced to the end of the output.

Obtained from: KAME


81106 03-Aug-2001 fenner

Don't terminate the uiomove() loop on a zero-length mbuf. It's not
particularly nice that IPSEC inserts a zero-length mbuf into the
chain, and that bug should be fixed too, but interfaces should be
robust to bad input.
Print the interface name when TUNDEBUG()ing about dropping an mbuf.


81065 02-Aug-2001 jon

fix memory leak when error during opening of routing socket

PR: kern/29336
Submitted by: Richard Andrades <richard@xebeo.com>
MFC after: 1 month


80767 31-Jul-2001 fenner

Update our bpf.h with tcpdump.org's new DLT_ types.
Use our bpf.h instead of tcpdump.org's to build libpcap.


80715 31-Jul-2001 ume

If LCP proto-rej is received, drop the protocol mentioned by the message.
This is to be friendly with non-IPv6 peer (If the peer complains due to
lack of IPv6CP, drop IPv6CP). This basically implements "RXJ+" state
transition in the RFC.

Obtained from: NetBSD


80405 26-Jul-2001 itojun

incorrect bounds-check on snprintf.

Submitted by: fenner


80353 25-Jul-2001 fenner

Don't bother passing p to rtioctl just so it can fail to pass it to mrt_ioctl


80350 25-Jul-2001 ume

As commented in defined in sys/net/route.c, rt_fixchange() has a bad
effect, which would cause unnecessary route deletion:

* Unfortunately, this has the obnoxious
* property of also triggering for insertion /above/ a pre-existing network
* route and clones. Sigh. This may be fixed some day.

The effect has been even worse, because recent versions of route.c set
the parent rtentry for cloned routes from an interface-direct route.
For example, suppose that we have an interface "ne0" that has an IPv4
subnet "10.0.0.0/24". Then we may have a cloned route like 10.0.0.1
on the interface, whose parent route is 10.0.0.0/24 (to the interface
ne0). Now, when we add the default route (i.e. 0.0.0.0/0),
rt_fixchange() will remove the cloned route 10.0.0.1. The (bad) effect
also prevents rt_setgate from configuring rt_gwroute, which would not
be an intended behavior.

As suggested in the comments to rt_fixchange(), we need stricter check
in the function, to prevent unintentional route deletion.

This fix also solve the "IPV6 panic?" problem in nd6_timer().

Submitted by: JINMEI Tatuya <jinmei@isl.rdc.toshiba.co.jp>
MFC after: 4 days


80296 24-Jul-2001 fenner

Eliminate the panic, reported by Daniel Sobral, which occurs when
vlan_unconfig()-ing an interface on which multicast groups have been
joined. Instead, keep the list of groups around (and, in fact, allow
changing of the membership list) and re-join them when the vlan interface
is reassociated with a lower level interface.


80238 24-Jul-2001 fenner

Use the IANA assignment IFT_L2VLAN directly instead of indirecting through
a privately #defined IFT_8021_VLAN.

MFC after: 3 days


79326 05-Jul-2001 ume

unbreak building kernel without option INET6

Reported by: markp


79198 04-Jul-2001 ume

adjust mbuf length right in route_output().

Obtained from: KAME
MFC after: 1 week


79106 02-Jul-2001 brooks

gif(4) and stf(4) modernization:

- Remove gif dependencies from stf.
- Make gif and stf into modules
- Make gif cloneable.

PR: kern/27983
Reviewed by: ru, ume
Obtained from: NetBSD
MFC after: 1 week


79103 02-Jul-2001 brooks

Add kernel infrastructure for network device cloning.

Reviewed by: ru, ume
Obtained from: NetBSD
MFC after: 1 week


78701 24-Jun-2001 ume

inject outbound packet to BPF.

Submitted by: itojun
Obtained from: KAME
MFC after: 10 days


78491 20-Jun-2001 brian

Close a race where we were releasing the unit resource at the start
of tunclose() rather than the end, and tunopen() grabbed that unit
before tunclose() finished (one process is allocating it while another
is freeing it!).

It may be worth hanging some sort of rw mutex around all specinfo
calls where d_close and the detach handler get a write lock and all
other functions get a read lock. This would guarantee certain levels
of ``atomicity'' (is that a word?) that people may expect (I believe
Solaris does something like this).


78470 19-Jun-2001 sumikawa

Suppress update ifnet.iflastchange when processing packets for SNMP
requirements(RFC1573, interface MIB). This change for 4.4BSD was
first introduced in if_ethersubr.c:1.17->1.18.

BTW, iflastchange on all of IFs are inconsistent. e.g.
ether, tun: update
fddi, tokenring, ppp: not update
I'll make patch later.

Obtained from: KAME
MFC after: 2 weeks


78404 18-Jun-2001 brian

Remove the SI_CHEAPCLONE flag when hanging resources off the dev_t


78351 16-Jun-2001 markm

This file was a horrible mixture of styles old and new.

Apply style(9).


78295 15-Jun-2001 jlemon

Do not perform arp send/resolve on an interface marked NOARP.

PR: 25006
MFC after: 2 weeks


78251 15-Jun-2001 peter

Fix warning. s/char/unsigned char/ in "(char *)eth"
294: warning: ethernet address is not type unsigned char *


78250 15-Jun-2001 peter

Fix warning: 848: warning: label `nosupport' defined but not used


78249 15-Jun-2001 peter

Fix warning; remove unused variable


78248 15-Jun-2001 peter

Remove unused variable


78176 13-Jun-2001 ume

Make compilable. addlog(...) was replaced with log(-1, ...)

Reported by: peter


78134 12-Jun-2001 ume

Restore the code wrongly nuked by previous commit.

Following changed was made by previous commit:

- IPV6CP supporting in kernel level ppp from NetBSD.

Submitted by: y.shirasaki@ntt.com


78064 11-Jun-2001 ume

Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
- The definitions of SADB_* in sys/net/pfkeyv2.h are still different
from RFC2407/IANA assignment because of binary compatibility
issue. It should be fixed under 5-CURRENT.
- ip6po_m member of struct ip6_pktopts is no longer used. But, it
is still there because of binary compatibility issue. It should
be removed under 5-CURRENT.

Reviewed by: itojun
Obtained from: KAME
MFC after: 3 weeks


77900 08-Jun-2001 peter

"Fix" the previous initial attempt at fixing TUNABLE_INT(). This time
around, use a common function for looking up and extracting the tunables
from the kernel environment. This saves duplicating the same function
over and over again. This way typically has an overhead of 8 bytes + the
path string, versus about 26 bytes + the path string.


77853 07-Jun-2001 peter

Back out part of my previous commit. This was a last minute change
and I botched testing. This is a perfect example of how NOT to do
this sort of thing. :-(


77843 06-Jun-2001 peter

Make the TUNABLE_*() macros look and behave more consistantly like the
SYSCTL_*() macros. TUNABLE_INT_DECL() was an odd name because it didn't
actually declare the int, which is what the name suggests it would do.


77689 04-Jun-2001 ru

When looking for an interface appropriate for the (new or changing)
route in ifa_ifwithroute(), as the last resort, look up the route to
the gateway, not destination (to derive the interface from).

PR: kern/27852
Submitted by: Iasen Kostoff <tbyte@tbyte.org>
MFC after: 2 weeks


77658 03-Jun-2001 yar

First, wrap the if_up() call into splimp()/splx() because
if_up() must be called at splnet or higher.
Second, set the IFF_RUNNING flag on an interface after its
resources (i.e. tunnel source and destination addresses)
have been set. Note that we don't set IFF_UP because it is
if_up()'s job to do that.

PR: kern/27851
Submitted by: Horacio J. PeÓa <horape@compendium.com.ar>


77589 01-Jun-2001 brian

Support /dev/tun cloning. Ansify if_tun.c while I'm there.

Only tun0 -> tun32767 may now be opened as struct ifnet's if_unit
is a short.

It's now possible to open /dev/tun and get a handle back for an available
tun device (use devname to find out what you got).

The implementation uses rman by popular demand (and against my judgement)
to track opened devices and uses the new dev_depends() to ensure that
all make_dev()d devices go away before the module is unloaded.

Reviewed by: phk


77217 26-May-2001 phk

Currently, each wireless networking driver has it's own control program
despite the fact that most people want to set exactly the same settings
regardless of which card they have. It has been repeatidly suggested
that this configuration should be done via ifconfig. This patch
implements the required functionality in ifconfig and add support to the
wi and an drivers. It also provides partial, untested support for the
awi driver.

PR: 25577
Submitted by: Brooks Davis <brooks@one-eyed-alien.net>


77178 25-May-2001 phk

Make if_tun's clone create SI_CHEAPCLONE devices.


76762 17-May-2001 dmlb

Add a couple more codes for upcoming raylink driver additions.

MFC after: 3 days


76213 02-May-2001 fenner

Get IP multicast working on VLAN devices:

- Allocate zeroed memory in ether_resolvemulti() to prevent equal() from
comparing garbage and determining that two otherwise-equal sockaddr_dls
are different.
- Fill in all required fields of the sockaddr_dl
- Actually copy the multicast address into the sockaddr_dl when calling
if_addmulti()
- Don't claim that we don't have a way to resolve layer 3 addresses into
layer 2 addresses; use the ethernet way.


76166 01-May-2001 markm

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


76083 27-Apr-2001 fenner

Better handling of ioctl(SIOCSIFFLAGS) failing in ifpromisc():
- Don't print the "promiscuous mode (enabled|disabled)" on failure
- Restore the reference count on failure


75582 17-Apr-2001 brian

Dont (ab)use drv2 to know if (si_flags & SI_NAMED) (pointed out by dd)
Call cdevsw_remove when we unload.


75321 08-Apr-2001 joerg

Move the decision whether we want to request authentication from our
peer out from sppp_lcp_open() to sppp_lcp_up(). For one, this makes
things look more symmetrical to sppp_lcp_close(), and somehow it also
just occurred to me that an Up event following the open caused the
value of the authentication option to be clobbered.


75204 04-Apr-2001 gad

Fix bpf devices so select() recognizes that they are always writable.

PR: 9355
Submitted by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Garrett Rooney <rooneg@electricjellyfish.net> (see pr :-)


75179 04-Apr-2001 yar

Change the type of the VLAN interface from IFT_PROPVIRTUAL,
which was a temporary hack, to IFT_L2VLAN, which is the type
assigned by IANA.


75177 04-Apr-2001 yar

Add recently assigned interface types.

Obtained from: ftp://ftp.isi.edu/in-notes/iana/assignments/smi-numbers


75176 04-Apr-2001 yar

Sync up to NetBSD, Step 2:

Add the interface types 0x37 through 0xbd.

Obtained from: NetBSD


75175 04-Apr-2001 yar

Sync up to NetBSD again, Step 1:

* Set the CSRG SCCS ID to the revision this file is actually based on
(the file itself has been updated to Lite2 in rev. 1.4).

* Fix some typos in comments.

* Add a comment to the trailing #endif according to style(9)


75103 03-Apr-2001 brian

Allow MOD_UNLOADs of if_tun, and handle event handler registration
failures in MOD_LOAD.

Dodge duplicate make_dev() calls by (ab)using dev->si_drv2 to
remember if we created the device node via a dev_clone callback
before the d_open call.


75096 02-Apr-2001 brian

If ifpromisc() fails the SIOCSIFFLAGS ioctl, put ifp->if_flags
back the way we found them.


75095 02-Apr-2001 brian

Return 0 and do nothing when we get a SIOCSIFFLAGS.

Without this, ifpromisc() always fails (after setting the IFF_PROMISC
bit in ifp->if_flags) and bpf never bothers to turn promiscuous mode off.

PR: 20188


74943 28-Mar-2001 yar

Fix a number of minor bugs in the VLAN code:

* Initialize the "struct sockaddr_dl sdl" correctly in vlan_setmulti().

PR: kern/22181

* The driver used to call malloc(..., M_NOWAIT), but to not check the
return value. Change malloc(..., M_NOWAIT) to malloc(..., M_WAITOK)
because the corresponding part of code is called from the upper
half of the kernel only.

PR: kern/22181

* Make sure a parent interface is up and running before invoking
its if_start() routine in order to avoid system panic.

PR: kern/22179 kern/24741 i386/25478

* Do not copy all the flags from a parent mindlessly.

PR: kern/22179

* Do not call if_down() on a parent interface if it's already down.
Call if_down() at splimp because if_down() needs that.

PR: kern/22179

Reviewed by: wollman


74914 28-Mar-2001 jhb

Catch up to header include changes:
- <sys/mutex.h> now requires <sys/systm.h>
- <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>


74913 28-Mar-2001 jhb

Use mtx_initiaalized() rather than violating the internals of the mutex
structure.


74852 27-Mar-2001 yar

Don't bypass notifying a corresponding interface
when leaving a link-layer multicast group.

PR: kern/22176
Reviewed by: wollman


74810 26-Mar-2001 phk

Send the remains (such as I have located) of "block major numbers" to
the bit-bucket.


74774 25-Mar-2001 joerg

This is another MFC candidate.

Fix a serious bug in sppp where anyone could obtain a successful PAP
authentication by supplying a null password. I've only stumpled across
the PR while browsing for all sppp-related PRs.

Should we also file a security advisory for this?

PR: 21592
Submitted by: <dli@3bc.de> Dirk Liebke


74703 23-Mar-2001 joerg

(MFC candidate, see below).

When we get an Open event in stopped state, experience shows that this
is usually means we've somehow missed a previous Down event. This has
occasionally bitten people for the IPCP layer with ISDN, apparently a
previously aborted IPCP negotiation must have caused this. As a
bandaid, we quickly pretent a Down event by advancing to starting
state; this effectively implements the `restart' option mentioned in
RFC 1663.

While i'm not yet fully convinced this is the best thing to do (and is
fully compliant with RFC 1661), i've seen a number of reports here on
the German mailing lists where people have been bitten by the previous
behaviour which usually causes quickly looping ISDN reconnects (thus
loss of money...), and where just this patch fixes the problem.

For this, i'd even like to see it MFC'd if possible.

Submitted by: Helmut Kreft <kreft@zeus.ai-lab.fh-furtwangen.de>


74408 18-Mar-2001 mdodd

- Add iso88025_ifdetach().
- Add support for 802.2 type IPX frames.
- Cleanup iso88025_output() and iso88025_output() a bit.


74407 18-Mar-2001 mdodd

- Define payload length constants for 4Mbps and 16Mbps.
- Use explicit sizes for header structure fields.
- Use __attribute__ ((__packed__)) for header structures.
- Define struct iso88025_rif; for future use.
- Prototype upcoming iso88025_ifdetach()
- Get rid of __P() constructs in prototypes.


74299 15-Mar-2001 ru

net/route.c:

A route generated from an RTF_CLONING route had the RTF_WASCLONED flag
set but did not have a reference to the parent route, as documented in
the rtentry(9) manpage. This prevented such routes from being deleted
when their parent route is deleted.

Now, for example, if you delete an IP address from a network interface,
all ARP entries that were cloned from this interface route are flushed.

This also has an impact on netstat(1) output. Previously, dynamically
created ARP cache entries (RTF_STATIC flag is unset) were displayed as
part of the routing table display (-r). Now, they are only printed if
the -a option is given.

netinet/in.c, netinet/in_rmx.c:

When address is removed from an interface, also delete all routes that
point to this interface and address. Previously, for example, if you
changed the address on an interface, outgoing IP datagrams might still
use the old address. The only solution was to delete and re-add some
routes. (The problem is easily observed with the route(8) command.)

Note, that if the socket was already bound to the local address before
this address is removed, new datagrams generated from this socket will
still be sent from the old address.

PR: kern/20785, kern/21914
Reviewed by: wollman (the idea)


74279 15-Mar-2001 mdodd

This include file has no business being here.


74093 11-Mar-2001 bmilekic

Plug several mbuf leaks in error cases (in nd6)

Submitted by: jhay


73085 26-Feb-2001 alfred

Protect against negative numbers as well


73079 26-Feb-2001 alfred

fix typo in comment


73078 26-Feb-2001 alfred

Santize a size variable passed to kernel malloc.

Since we know there's always an upper bound we force that bound,
otherwise users can cause a panic via malloc getting hit with a
odd (huge or negative) amount of memory to allocate.

Tested by: kris
Pointed out by: Andrey Valyaev <dron@infosec.ru>


72786 21-Feb-2001 rwatson

o Move per-process jail pointer (p->pr_prison) to inside of the subject
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
pr_free(), invoked by the similarly named credential reference
management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
required to protect the reference count plus some fields in the
structure.

Reviewed by: freebsd-arch
Obtained from: TrustedBSD Project


72784 21-Feb-2001 rwatson

o Remove unnecessary jail() check in bpfopen() -- we limit device access
in jail using /dev namespace limits and mknod() limits, not by explicit
checks in the device open code.


72669 18-Feb-2001 markm

Insert entropy harvesting calls for network traffic. By
default, no entropy will be harvested.


72544 16-Feb-2001 jlemon

Add mutexes to the entire bpf subsystem to make it MPSAFE.

Previously reviewed by: jhb, bde


72484 14-Feb-2001 asmodai

Fix another typo I missed on first reading:
insersion -> insertion


72482 14-Feb-2001 asmodai

Fix typo and comma placement.


72270 10-Feb-2001 luigi

Sync with the bridge/dummynet/ipfw code already tested in stable.

In ip_fw.[ch] change a couple of variable and field names to
avoid having types, variables and fields with the same name.


72200 09-Feb-2001 bmilekic

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


72175 08-Feb-2001 archie

When we receive an incoming Ethernet frame that was unicast to a
different hardware address, we should drop it (this should only
happen in promiscuous mode). Relocate the code for this check
from before ng_ether(4) processing to after ng_ether(4) processing.
Also fix a compiler warning.

PR: kern/24465


72093 06-Feb-2001 asmodai

Fix typo: compatability -> compatibility.

Compatability is not an existing english word.


72084 06-Feb-2001 phk

Convert if_multiaddrs from LIST to TAILQ so that it can be traversed
backwards in the three drivers which want to do that.

Reviewed by: mikeh


72012 04-Feb-2001 phk

Another round of the <sys/queue.h> FOREACH transmogriffer.

Created with: sed(1)
Reviewed by: md5(1)


71999 04-Feb-2001 phk

Mechanical change to use <sys/queue.h> macro API instead of
fondling implementation details.

Created with: sed(1)
Reviewed by: md5(1)


71991 04-Feb-2001 peter

Pull the rug from under the 'LKM Compatability' macro - PSEUDO_SET().
There are two 3rd party code chunks using this still - the IPv6 stuff and
i4b. Give them a private copy as an alternative to changing them too much.

XXX sys/kernel.h still has a #include <sys/module.h> in it. I will be
taking this out shortly - this affects a number of drivers.


71959 03-Feb-2001 phk

Use <sys/queue.h> macro api rather than fondle its implementation detals.

Created with: /usr/bin/sed
Reviewed by: /sbin/md5


71946 03-Feb-2001 brian

o Allow non-root users to open /dev/tun* (remove suser()
in tunopen())
o Change the default device permissions to 0600 root:wheel
(were uucp:dialer)
o Only let root (suser()) change the MTU

This makes it possible for an administrator to open up the
permissions on /dev/tun*, letting non-root programs service
a tun interface. Co-operation is still required with a
priviledged program that will configure the interface side
of things.


71921 02-Feb-2001 brian

Pass the minor number rather than the unit number to make_dev()
from the clone handler.


71910 02-Feb-2001 luigi

MFS: bridge/ipfw/dummynet fixes


71909 02-Feb-2001 luigi

MFS: bridge/ipfw/dummynet fixes (bridge.c will be committed separately)


71891 01-Feb-2001 bp

Fix breakage caused by incomplete transition to IF_HANDOFF().
Remove unused variable.


71864 31-Jan-2001 peter

Quieten gcc.


71862 31-Jan-2001 peter

Exterminate the use of PSEUDO_SET() with extreme prejudice.


71859 31-Jan-2001 bp

Let M_PANIC go back to the private tree as its intention isn't understood well
for now.


71853 30-Jan-2001 jasone

Revert mutex initialization check to look at mtx_description.

Pointed out by: jlemon, jhb


71802 29-Jan-2001 peter

Supply a stub bpf_validate() (always returning false - the script is not
valid) if BPF is missing.
The netgraph_bpf node forced bpf to be present, reflect that in the
options.
Stop doing a 'count bpf' - we provide stubs.
Since a handful of drivers still refer to "bpf.h", provide a more accurate
indication that the API is present always. (eg: netinet6)


71801 29-Jan-2001 peter

Use M_PANIC instead of if (sc == NULL) panic();


71791 29-Jan-2001 peter

Make the number of loopback interfaces dynamically tunable. Why one
would *want* to is a different story, but it used to be able to be done
statically. Get rid of #include "loop.h" and struct ifnet loif[NLOOP];
This could be used as an example of how to do this in other drivers,
for example: ccd.


71686 26-Jan-2001 luigi

Minor cleanups after yesterday's patch.
The code (bridging and dummynet) actually worked fine!


71666 26-Jan-2001 luigi

Bring bridging code in line with the one which works on -STABLE.
It compiles on -CURRENT, but I can not test functionality yet.


71655 25-Jan-2001 luigi

Comment the interface to ether_input() and the way is normally
used by most ethernet drivers.


71602 24-Jan-2001 phk

DEVFS cloning for if_tap.

Submitted by: Maksim Yevmenkin <m_evmenkin@yahoo.com>


71392 22-Jan-2001 luigi

Assorted bugfixes:
+ configuration: make sure that the NUL at the end of the config
string is properly detected and handled, and the stats passed
up via sysctl properly reflect which interfaces do bridging.
(The whole config support might make good use of some cleanup
in the future).

+ fixed some bugs related to the corruption of multicast and
broadcast packets: make sure that for those packets the entire
IP + ethernet header is in the mbuf, not in a cluster, so
that writes performed in that area by the upper layers do
not affect us.

+ performance: when calling m_pullup, make room for the ethernet header
as well, we are going to add it in right after. Also, change an m_dup
back to m_copypacket. The former is not necessary anymore now, and
it did not help, anyways.

I will do a fast MFC because 95% of this patch is fixing bad bugs
and i doubt anyone would test the fix in CURRENT. Plus the last
two items mostly bring back some code which was already there in 4.0
times.


71352 21-Jan-2001 jasone

Move most of sys/mutex.h into kern/kern_mutex.c, thereby making the mutex
inline functions non-inlined. Hide parts of the mutex implementation that
should not be exposed.

Make sure that WITNESS code is not executed during boot until the mutexes
are fully initialized by SI_SUB_MUTEX (the original motivation for this
commit).

Submitted by: peter


70834 09-Jan-2001 wollman

select() DKI is now in <sys/selinfo.h>.


70414 27-Dec-2000 bmilekic

Small fix for bpf compat:
Make malloc() use M_NOWAIT istead of M_DONTWAIT and in the
bpf_compat case, define M_NOWAIT to be M_DONTWAIT.


70254 21-Dec-2000 bmilekic

* Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
forever when nothing is available for allocation, and may end up
returning NULL. Hopefully we now communicate more of the right thing
to developers and make it very clear that it's necessary to check whether
calls with M_(TRY)WAIT also resulted in a failed allocation.
M_TRYWAIT basically means "try harder, block if necessary, but don't
necessarily wait forever." The time spent blocking is tunable with
the kern.ipc.mbuf_wait sysctl.
M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
value of the M_WAIT flag, this could have became a big problem.


70199 19-Dec-2000 jhay

Various fixes to make leased line operation more robust. On lcp_up, start
to negotiate from scratch. Make leased lines survive being put into
loopback mode. Bits and pieces and ideas taken from PRs 11238 and 21771.
Make it a module so that it can be kldloaded. Whitespace cleanup. (Can be
ignored with "cvs diff -b".)

PR: 11238 and 21771 (bits and pieces)


70127 17-Dec-2000 jdp

Fix bug: a read() on a bpf device which was in non-blocking mode
and had no data available returned 0. Now it returns -1 with errno
set to EWOULDBLOCK (== EAGAIN) as it should. This fix makes the bpf
device usable in threaded programs.

Reviewed by: bde


69781 08-Dec-2000 dwmalone

Convert more malloc+bzero to malloc+M_ZERO.

Submitted by: josh@zipperup.org
Submitted by: Robert Drehmel <robd@gmx.net>


69774 08-Dec-2000 phk

Staticize some malloc M_ instances.


69621 05-Dec-2000 jlemon

Move the wakeup/signaling of the reader side of the tun device into
a tunstart function, which is called when a packet is sucessfully
placed on the queue. This allows us to properly do output byte accounting
within the handoff routine.


69586 05-Dec-2000 jake

Remove the last of the MD netisr code. It is now all MI. Remove
spending, which was unused now that all software interrupts have
their own thread. Make the legacy schednetisr use an atomic op
for setting bits in the netisr mask.

Reviewed by: jhb


69224 26-Nov-2000 jlemon

Unbreak world; #include <sys/mutex.h> instead of <machine/mutex.h>
Only include <sys/mbuf.h> when building kernel sources. This should
probably be changed to require callers to include it themselves.


69211 26-Nov-2000 phk

Make log(-1, ...) do what addlog(...) did.

Replace all uses of addlog(...) with log(-1, ...)

Remove bogus "register" keywords in subr_prf.c

Make log() return void.


69153 25-Nov-2000 jlemon

Remove unused variable, spl() manipulation isn't done for the ifq now.


69152 25-Nov-2000 jlemon

Lock down the network interface queues. The queue mutex must be obtained
before adding/removing packets from the queue. Also, the if_obytes and
if_omcasts fields should only be manipulated under protection of the mutex.

IF_ENQUEUE, IF_PREPEND, and IF_DEQUEUE perform all necessary locking on
the queue. An IF_LOCK macro is provided, as well as the old (mutex-less)
versions of the macros in the form _IF_ENQUEUE, _IF_QFULL, for code which
needs them, but their use is discouraged.

Two new macros are introduced: IF_DRAIN() to drain a queue, and IF_HANDOFF,
which takes care of locking/enqueue, and also statistics updating/start
if necessary.


69099 23-Nov-2000 bmilekic

Fixup (hopefully) bridging + ipfw + dummynet together...

* Some dummynet code incorrectly handled a malloc()-allocated pseudo-mbuf
header structure, called "pkt," and could consequently pollute the mbuf
free list if it was ever passed to m_freem(). The fix involved passing not
pkt, but essentially pkt->m_next (which is a real mbuf) to the mbuf
utility routines.

* Also, for dummynet, in bdg_forward(), made the code copy the ethernet header
back into the mbuf (prepended) because the dummynet code that follows expects
it to be there but it is, unfortunately for dummynet, passed to bdg_forward
as a seperate argument.

PRs: kern/19551 ; misc/21534 ; kern/23010
Submitted by: Thomas Moestl <tmoestl@gmx.net>
Reviewed by: bmilekic
Approved by: luigi


68315 04-Nov-2000 ume

Make compilable. if_fddisubr.c depended on sys/malloc.h by my
previous commit.

Reported by: Jim Bryant <jbryant@A010-0935.KSCY.splitrock.net>


68271 03-Nov-2000 jhb

Fix an order of operations buglet. ! has higher precedence than &. This
should fix the warnings about bpf not calling make_dev().


68250 02-Nov-2000 jlemon

Have tuninit() return an error if an interface address is NULL.
SIOCGIFSTATUS was returning at splimp(); fix this. (to be MFC'd)

Submitted by: Marius Bendiksen


68180 01-Nov-2000 ume

IPv6 was not work on FDDI.

Reported by: Akihiro IIJIMA <aki@noc.titech.ac.jp>
Submitted by: JINMEI Tatuya <jinmei@isl.rdc.toshiba.co.jp>
Reviewed by: Akihiro IIJIMA <aki@noc.titech.ac.jp>


67934 30-Oct-2000 ru

Add pfil.9 manpage to build after a repository copy.


67927 30-Oct-2000 imp

Add some additional message types for coming raylan driver from Duncan
Barclay.


67893 29-Oct-2000 phk

Move suser() and suser_xxx() prototypes and a related #define from
<sys/proc.h> to <sys/systm.h>.

Correctly document the #includes needed in the manpage.

Add one now needed #include of <sys/systm.h>.
Remove the consequent 48 unused #includes of <sys/proc.h>.


67882 29-Oct-2000 phk

Remove unneeded #include <sys/proc.h> lines.


67727 27-Oct-2000 wollman

Initialize rn_mklist in rn_newpair(). The undocumented assumption
seems to be that the nodes are bzero'd beforehand, but the submitter
found that this was not always the case, and in any event defensive
programming here costs epsilon squared.

PR: 22244
Submitted by: Dave Gillam <daveg@chiaro.com>


67708 27-Oct-2000 phk

Convert all users of fldoff() to offsetof(). fldoff() is bad
because it only takes a struct tag which makes it impossible to
use unions, typedefs etc.

Define __offsetof() in <machine/ansi.h>

Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h>

Remove myriad of local offsetof() definitions.

Remove includes of <stddef.h> in kernel code.

NB: Kernelcode should *never* include from /usr/include !

Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API.

Deprecate <struct.h> with a warning. The warning turns into an error on
01-12-2000 and the file gets removed entirely on 01-01-2001.

Paritials reviews by: various.
Significant brucifications by: bde


67696 27-Oct-2000 phk

Remove bogus undocumented macros used to control conditional assembly.


67695 27-Oct-2000 phk

Remove #if DO_DEFLATE
Remove #if DO_BSD_COMPRESS

They are the wrong way to enable/disable features and undocumented to boot.


67551 25-Oct-2000 jhb

- Overhaul the software interrupt code to use interrupt threads for each
type of software interrupt. Roughly, what used to be a bit in spending
now maps to a swi thread. Each thread can have multiple handlers, just
like a hardware interrupt thread.
- Instead of using a bitmask of pending interrupts, we schedule the specific
software interrupt thread to run, so spending, NSWI, and the shandlers
array are no longer needed. We can now have an arbitrary number of
software interrupt threads. When you register a software interrupt
thread via sinthand_add(), you get back a struct intrhand that you pass
to sched_swi() when you wish to schedule your swi thread to run.
- Convert the name of 'struct intrec' to 'struct intrhand' as it is a bit
more intuitive. Also, prefix all the members of struct intrhand with
'ih_'.
- Make swi_net() a MI function since there is now no point in it being
MD.

Submitted by: cp


67334 19-Oct-2000 joe

Augment the 'ifaddr' structure with a 'struct if_data' to keep
statistics on a per network address basis.

Teach the IPv4 and IPv6 input/output routines to log packets/bytes
against the network address connected to the flow.

Teach netstat to display the per-address stats for IP protocols
when 'netstat -i' is evoked, instead of displaying the per-interface
stats.


67169 15-Oct-2000 brian

BPF wants packets in host byte order whereas TUN_IFHEAD wants them
in network byte order.
When we've got TUN_IFHEAD set, swap the AF byte order before passing
a packet to bpf_mtap().


67164 15-Oct-2000 phk

Remove unneeded #include <machine/clock.h>


66988 12-Oct-2000 phk

Do some cleanups of the HARP atm codes interface into the system:

Define the NETISR just like all the other NETISRs.

unifdef -Usun -D__FreeBSD__ we will probably never support sun4c
and if we do we can't use the solaris code anyway and I doubt
anybody will be running Fore ATM cards in then in the first place.


66878 09-Oct-2000 phk

Don't make_dev() in bpfopen() unless we need to.


66640 04-Oct-2000 itojun

make sure we have root priv on SIOCSIFPHY*. from thorpej@netbsd


66479 30-Sep-2000 bp

Properly setup link level header length for 802.2 and SNAP frames.


66393 26-Sep-2000 bde

Handle slip options in the usual way (generate a dummy options file in
the module Makefile and don't clutter the sources with ifdefs).

Fixed nearby formatting bugs.


66390 26-Sep-2000 bde

Removed unused includes (garbage left over/created by the SMPng megacommit).


66359 25-Sep-2000 nsayer

In theory, m_dup should not be necessary, as m_copypacket should be
sifficient. But somewhere (I believe in the UDP stuff), someone is
overwriting an mbuf without calling m_pullup() first. This results in
broad- and multi-cast traffic that is passed through the bridge getting
corrupted.

This should be backed out when there is some assurance that the upper
layers (and I suppose all of the device drivers) are fixed.

Suggested by: archie


66316 24-Sep-2000 bmilekic

Get rid of a panic that occurs in ether_demux() by dereferencing a NULL mbuf
pointer, when bridging and bridge_ipfw are enabled, and when bdg_forward()
happens to free the packet and make our pointer NULL. There may be
more similar problems like this one with calls to bdg_forward().

PR: Related to kern/19551
Reviewed by: jlemon


66067 19-Sep-2000 phk

Rename lminor() to dev2unit(). This function gives a linear unit number
which hides the 'hole' in the minor bits.

Introduce unit2minor() to do the reverse operation.

Fix some some make_dev() calls which didn't use UID_* or GID_* macros.

Kill the v_hashchain alias macro, it hides the real relationship.

Introduce experimental SI_CHEAPCLONE flag set it on cloned bpfs.


65922 16-Sep-2000 brian

Call bpfattach() correctly from if_ppp.c

Submitted by: Andy Adams <ala@merit.edu>
PR: 18506


65837 14-Sep-2000 ru

Follow BSD/OS and NetBSD, keep the ip_id field in network order all the time.

Requested by: wollman


65557 07-Sep-2000 jasone

Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The
alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
preempted (i386 only).

Partially contributed by: BSDi (BSD/OS)
Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh


65454 04-Sep-2000 rwatson

o Add missing "\n" to warning output in netinet/if_loop.c, when an
unsupported address family is used on localhost interface.

looutput: af=0 unexpected

Speculation as to the reasons for my seeing this error are welcome, of
course. :-)


65374 02-Sep-2000 phk

Avoid the modules madness I inadvertently introduced by making the
cloning infrastructure standard in kern_conf. Modules are now
the same with or without devfs support.

If you need to detect if devfs is present, in modules or elsewhere,
check the integer variable "devfs_present".

This happily removes an ugly hack from kern/vfs_conf.c.

This forces a rename of the eventhandler and the standard clone
helper function.

Include <sys/eventhandler.h> in <sys/conf.h>: it's a helper #include
like <sys/queue.h>

Remove all #includes of opt_devfs.h they no longer matter.


64880 20-Aug-2000 phk

Remove all traces of Julians DEVFS (incl from kern/subr_diskslice.c)

Remove old DEVFS support fields from dev_t.

Make uid, gid & mode members of dev_t and set them in make_dev().

Use correct uid, gid & mode in make_dev in disk minilayer.

Add support for registering alias names for a dev_t using the
new function make_dev_alias(). These will show up as symlinks
in DEVFS.

Use makedev() rather than make_dev() for MFSs magic devices to prevent
DEVFS from noticing this abuse.

Add a field for DEVFS inode number in dev_t.

Add new DEVFS in fs/devfs.

Add devfs cloning to:
disk minilayer (ie: ad(4), sd(4), cd(4) etc etc)
md(4), tun(4), bpf(4), fd(4)

If DEVFS add -d flag to /sbin/inits args to make it mount devfs.

Add commented out DEVFS to GENERIC


64808 18-Aug-2000 dwmalone

The slip driver used to allocate a mbuf cluster without attaching
it to a mbuf. This patch makes it attach it to mbuf. This patch
is in preperation for Bosko Milekic's mbuf external reference
counting patches.

PR: 19866 (first stage)
Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
Reviewed by: alfred


64658 15-Aug-2000 itojun

repair endianness issue in IN_MULTICAST().
again, *BSD difference...

From: Nick Sayer <nsayer@quack.kfu.com>


64651 15-Aug-2000 archie

Export the functionality of SIOCSIFLLADDR with if_setlladdr()
and add some more rigorous sanity checking in the process.

Reviewed by: freebsd-net


64639 14-Aug-2000 onoe

Change the argument for SIOCG80211NWID/SIOCS80211NWID to include the
length of NWID. This breaks binary compatibility but only the awi driver
refers this ioctl; no userland tools refers it.
Add WEP stuff.
Obtained from: NetBSD current


64081 01-Aug-2000 ache

Replace nonexistent !defined(_LKM) by !defined(KLD_MODULE)


64080 01-Aug-2000 ache

Check IPFILTER (options IPFILTER generates) instead of NIPFILTER


64073 31-Jul-2000 ache

Nonexistent "ipfilter.h" -> "opt_ipfilter.h"
Kernel 'make depend' fails otherwise


63992 29-Jul-2000 nsayer

Make the bridge_refresh operation automatic when ethernet interfaces
are attached or detached.


63954 28-Jul-2000 asmodai

Fix if_types.h as per the IANA assignments with regard to IPv6.
gif/faith/stf moved to 0xfN entries, since their previous location
is allocated to some other interfaces.
Also add the IFT_PVC, which is the ATM PVC subinterface from ALTQ.

This also syncs us up a bit to NetBSD again.

This change requires a total recompilation of all kmem users, as
itojun told me.

Next in line is synching to the IANI SMI list.

Approved by: itojun


63861 25-Jul-2000 nsayer

Change to support vmware... SIOCSIFADDR on the character device
sets the (notional) "remote" ethernet address.

Submitted by: vsilyaev@mindspring.com


63841 25-Jul-2000 ume

Workaround to avoid panic during detach pccard nic.


63803 24-Jul-2000 nsayer

Sundry changes to debugging code.
Add spl/splx to various sensitive spots
Change semantics of the vmnet version of the device to keep VMware happy
(don't junk state when the device is closed)

Submitted by: vsilyaev@mindspring.com


63745 21-Jul-2000 jayanth

When a connection is being dropped due to a listen queue overflow,
delete the cloned route that is associated with the connection.
This does not exhaust the routing table memory when the system
is under a SYN flood attack. The route entry is not deleted if there
is any prior information cached in it.

Reviewed by: Peter Wemm,asmodai


63679 20-Jul-2000 nsayer

Oops. SYSCTL_HANDLER_ARGS -> (SYSCTL_HANDLER_ARGS)


63672 20-Jul-2000 nsayer

Add sysctl to perform bridge refresh. This is required if bridged
configurations include loadable interfaces. After loading new
interface drivers, perform a 'sysctl -w net.link.ether.bridge_refresh=1'
and the bridge code will reinitialize itself.

Submitted by: <vsilyaev@mindspring.com>


63670 20-Jul-2000 nsayer

Add the tap driver.

The tap driver is used to present a virtual Ethernet interface to the
system. Packets presented by the network stack to the interface are
made available to a character device in /dev. With tap and the bridge
code, you can make remote bridge configurations where both sides of
the bridge are separated by userland daemons.

This driver also has a special naming hack to allow it to serve a similar
purpose to the vmware port.

Submitted by: myevmenkin@att.com, vsilyaev@mindspring.com


63577 20-Jul-2000 kris

Temporary hack for the benefit of the X-Bone project
(http://www.isi.edu/xbone). I expect this to go away in due course.

Submitted by: Lars Eggert <larse@ISI.EDU>


63474 18-Jul-2000 archie

Const'ify parameters to ethers(3) routines as appropriate.


63358 17-Jul-2000 brian

Initialise ifnet::if_type

PR: 17873
Submitted by: Kensaku Masuda <greg@greg.rim.or.jp>


63241 16-Jul-2000 itojun

improve route/nd cache cleanup on interface removal.
CAVEAT: haven't really tested it yet, please report


63090 13-Jul-2000 archie

Make all Ethernet drivers attach using ether_ifattach() and detach using
ether_ifdetach().

The former consolidates the operations of if_attach(), ng_ether_attach(),
and bpfattach(). The latter consolidates the corresponding detach operations.

Reviewed by: julian, freebsd-net


62838 09-Jul-2000 itojun

repair IPV6_JOIN_GROUP to IPv6 all multi.
From: ume


62587 04-Jul-2000 itojun

sync with kame tree as of july00. tons of bug fixes/improvements.

API changes:
- additional IPv6 ioctls
- IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8).
(also syntax change)


62573 04-Jul-2000 phk

Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.

Pointed out by: bde


62454 03-Jul-2000 phk

Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:

Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

-sysctl_vm_zone SYSCTL_HANDLER_ARGS
+sysctl_vm_zone (SYSCTL_HANDLER_ARGS)


62290 30-Jun-2000 archie

Previous commit didn't work; this time really fix it.


62267 29-Jun-2000 archie

Provide forward declarations for struct ifnet and struct mbuf
to avoid compiler warnings.


62264 29-Jun-2000 archie

Fix kernel build breakage when 'device ether' was not included.


62143 26-Jun-2000 archie

Make the ng_ether(4) node type dynamically loadable like the rest.
This means 'options NETGRAPH' is no longer necessary in order to get
netgraph-enabled Ethernet interfaces. This supports loading/unloading
the ng_ether.ko and attaching/detaching the Ethernet interface in any
order.

Add two new hooks 'upper' and 'lower' to allow access to the protocol
demux engine and the raw device, respectively. This enables bridging
to be defined as a netgraph node, if so desired.

Reviewed by: freebsd-net@freebsd.org


61734 16-Jun-2000 wpaul

Implement SIOCSIFLLADDR, which allows you to change the link-level
address on an interface. This basically allows you to do what my
little setmac module/utility does via ifconfig. This involves the
following changes:

socket.h: define SIOCSIFLLADDR
if.c: add support for SIOCSIFLLADDR, which resets the values in
the arpcom struct and sockaddr_dl for the specified interface.
Note that if the interface is already up, we need to down/up
it in order to program the underlying hardware's receive filter.
ifconfig.c: add lladdr command
ifconfig.8: document lladdr command

You can now force the MAC address on any ethernet interface to be
whatever you want. (The change is not sticky across reboots of course:
we don't actually reprogram the EEPROM or anything.) Actually, you
can reprogram the MAC address on other kinds of interfaces too; this
shouldn't be ethernet-specific (though at the moment it's limited to
6 bytes of address data).

Nobody ran up to me and said "this is the politically correct way to
do this!" so I don't want to hear any complaints from people who think
I could have done it more elegantly. Consider yourselves lucky I didn't
do it by having ifconfig tread all over /dev/kmem.


61648 14-Jun-2000 bp

Do not perform any opeartion with mbuf after it placed into
interface queue.

Tested by: Bosko Milekic <bmilekic@dsuper.net>


61491 10-Jun-2000 peter

Unused include: #include "sl.h" - NSL is no longer used.


61192 02-Jun-2000 archie

Don't try to apply ipfw filtering to non-IP packets.

Reported-by: "Lachlan O'Dea" <lodea@vet.com.au>


61181 02-Jun-2000 mjacob

fix KASSERT usage


61153 01-Jun-2000 phk

Don't panic if ifpromisc() returnes ENXIO, it's probably just an pccard
which have been pulled.


61090 30-May-2000 green

Make sl(4) SLIP devices dynamically expansible. Yay! =)

PR: kern/17758
Submitted by: David Malone <dwmalone@maths.tcd.ie>


60952 26-May-2000 gallatin

Rather than checking for hlen causing misalignment, we should do the
m_adj() and then check the resulting mbuf for misalignment, copying
backwards to align the mbuf if required.

This fixes a longstanding problem where an mbuf which would have been
properly aligned after an m_adj() was being misaligned and causing an
unaligned access trap in ip_input(). This bug only triggered when booting
diskless.

Reviewed by: dfr


60938 26-May-2000 jake

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


60889 24-May-2000 archie

Just need to pass the address family to if_simloop(), not the whole sockaddr.


60833 23-May-2000 jake

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


60536 14-May-2000 archie

Move code to handle BPF and bridging for incoming Ethernet packets out
of the individual drivers and into the common routine ether_input().
Also, remove the (incomplete) hack for matching ethernet headers
in the ip_fw code.

The good news: net result of 1016 lines removed, and this should make
bridging now work with *all* Ethernet drivers.

The bad news: it's nearly impossible to test every driver, especially
for bridging, and I was unable to get much testing help on the mailing
lists.

Reviewed by: freebsd-net


60342 11-May-2000 darrenr

patch from Alexey Zelkin


60317 10-May-2000 darrenr

Add pfil(9) subroutines and manpage from NetBSD.


59760 29-Apr-2000 phk

Remove unneeded #include <sys/kernel.h>


59731 28-Apr-2000 julian

OOps forgot to check in this one...
API chage for netgraph.


59696 27-Apr-2000 wpaul

Add a bpfdetach() stub routine to bpf.c. Without this, you'll get an
unresolved symbol error if you try to load a network driver into a kernel
which doesn't have bpf enabled.

Forgotten by: rwatson
Found by: peter


59681 27-Apr-2000 bp

Fix support for 802.2 and SNAP frames. Bug was introduced during
initial import.

Tested by: Jorge P Vasquez <jorge@acron.ind.br>


59633 26-Apr-2000 kjc

remove "register" specifiers to supress compiler warning.


59604 24-Apr-2000 obrien

* Use sys/sys/random.h rather than a i386 specific one.
* There was nothing that should be machine dependant about
i386/isa/random_machdep.c, so it is now sys/kern/kern_random.c.


59529 23-Apr-2000 wollman

A couple months ago, Kirk and I were doing a walkthrough of the radix-tree
search routine, and scratching our heads over why it was so obfuscated.
This delta fixes a number of confusing style bugs and renames several
structure members to have more meaningful names. There remain a number
of odd control-flow structures. These changes do not affect the generated
code.


59468 21-Apr-2000 guido

IOCGIFCONF once and for all. Sometimes the ifc_len variable
would be returned with a wrong value.
While we're here, get rid of unnecessary panic call.

PR: 17311, 12996, 14457
Submitted by: Patrick Bihan-Faou <patrick@mindstep.com>,
Kris Kennaway <kris@FreeBSD.org>


59391 19-Apr-2000 phk

Remove ~25 unneeded #include <sys/conf.h>
Remove ~60 unneeded #include <sys/malloc.h>


59058 06-Apr-2000 imp

Awi driver, ported from NetBSD from Atsushi Once-san.

From the README:
Any IEEE 802.11 cards use AMD Am79C930 and Harris (Intersil) Chipset
with PCnetMobile firmware by AMD.
BayStack 650 1Mbps Frequency Hopping PCCARD adapter
BayStack 660 2Mbps Direct Sequence PCCARD adapter
Icom SL-200 2Mbps Direct Sequence PCCARD adapter
Melco WLI-PCM 2Mbps Direct Sequence PCCARD adapter
NEL SSMagic 2Mbps Direct Sequence PCCARD adapter
Netwave AirSurfer Plus
1Mbps Frequency Hopping PCCARD adapter
Netwave AirSurfer Pro
2Mbps Direct Sequence PCCARD adapter

Known Problems:
WEP is not supported.
Does not create IBSS itself.
Cannot configure the following on FreeBSD:
selection of infrastructure/adhoc mode
ESSID
...

Submitted by: Atsushi Onoe <onoe@sm.sony.co.jp>


59005 04-Apr-2000 gj

Pass me the pointy hat.

It was not a good idea to remove csu_header from struct cspace, it had
ramifications which I didn't notice.

Restore src/usr.sbin/ppp/slcompress.h to the way it was, since MAX_HDR
was already defined as 128 there and it's a user program anyway.

In sys/net/slcompress.h make MAX_HDR 128 intead of MLEN to avoid
bloat.

My apologies for any inconvenience.


58982 03-Apr-2000 gj

Nuke csu_hdr from struct cspace. csu_hdr is not used anywhere in the
tree. This considerably reduces unnecessary bloat in struct slcompress.

I'm running with this change right now and have seen no negative
side-effects.

On my sytem this reduced kernel BSS by about 25KB.

Submitted by: bde
Approved by: brian for user-ppp


58698 27-Mar-2000 jlemon

Add support for offloading IP/TCP/UDP checksums to NIC hardware which
supports them.


58635 26-Mar-2000 charnier

Remove duplicate word


58313 19-Mar-2000 lile

o Replace most magic numbers related to token ring with #defines
from iso88025.h.

o Add minimal llc support to iso88025_input.

o Clean up most of the source routing code.

* Submitted by: Nikolai Saoukh <nms@otdel-1.org>


58273 19-Mar-2000 rwatson

The advent of if_detach, allowing interface removal at runtime, makes it
possible for a panic to occur if BPF is in use on the interface at the
time of the call to if_detach. This happens because BPF maintains pointers
to the struct ifnet describing the interface, which is freed by if_detach.

To correct this problem, a new call, bpfdetach, is introduced. bpfdetach
locates BPF descriptor references to the interface, and NULLs them. Other
BPF code is modified so that discovery of a NULL interface results in
ENXIO (already implemented for some calls). Processes blocked on a BPF
call will also be woken up so that they can receive ENXIO.

Interface drivers that invoke bpfattach and if_detach must be modified to
also call bpfattach(ifp) before calling if_detach(ifp). This is relevant
for buses that support hot removal, such as pccard and usb. Patches to
all effected devices will not be committed, only to if_wi.c, due to
testing limitations. To reproduce the crash, load up tcpdump on you
favorite pccard ethernet card, and then eject the card. As some pccard
drivers do not invoke if_detach(ifp), this bug will not manifest itself
for those drivers.

Reviewed by: wes


58192 18-Mar-2000 rwatson

Introduce a new bd_seesent flag to the BPF descriptor, indicating whether or
not the current BPF device should report locally generated packets or not.
This allows sniffing applications to see only packets that are not generated
locally, which can be useful for debugging bridging problems, or other
situations where MAC addresses are not sufficient to identify locally
sourced packets. Default to true for this flag, so as to provide existing
behavior by default.

Introduce two new ioctls, BIOCGSEESENT and BIOCSSEESENT, which may be used
to manipulate this flag from userland, given appropriate privilege.

Modify bpf.4 to document these two new ioctl arguments.

Reviewed by: asmodai


57903 11-Mar-2000 shin

IPv6 6to4 support.

Now most big problem of IPv6 is getting IPv6 address
assignment.
6to4 solve the problem. 6to4 addr is defined like below,

2002: 4byte v4 addr : 2byte SLA ID : 8byte interface ID

The most important point of the address format is that an IPv4 addr
is embeded in it. So any user who has IPv4 addr can get IPv6 address
block with 2byte subnet space. Also, the IPv4 addr is used for
semi-automatic IPv6 over IPv4 tunneling.

With 6to4, getting IPv6 addr become dramatically easy.
The attached patch enable 6to4 extension, and confirmed to work,
between "Richard Seaman, Jr." <dick@tar.com> and me.

Approved by: jkh

Reviewed by: itojun


57637 01-Mar-2000 archie

The "sdl_family" field in a "struct sockaddr_dl" will be equal
to AF_LINK, not AF_DLI, as stated in the comment. Fix the comment.

Reviewed by: wollman


57570 28-Feb-2000 guido

This fixes a problem where the SIOCGIFCONF ioctl goes wrong. This
is triggered when qmail is used with INET6 enabled. The bug
manifests itself in that the space variable can become negative
and that in the comparison in the guards of the 2 loops, this was
not noticed because sizeof() returns an unsigned and thus the signed
variable gets promoted to unsigned. I decided not to make space
unsigned because I think we should guard against this from happening.
Thus panic() in case space becomes negative.

Approved by: jkh


57536 27-Feb-2000 shin

Wrap if_up() by splnet.

Approved by: jkh

Submitted by: peter


57363 20-Feb-2000 ache

Fix possible SLIOCSUNIT panic
PR: 16564
Submitted by: ru
Approved by: jkh


57250 16-Feb-2000 mdodd

Track if_i{bytes,packets,errors}.

Approved by: jkh


57178 13-Feb-2000 peter

Clean up some loose ends in the network code, including the X.25 and ISO
#ifdefs. Clean out unused netisr's and leftover netisr linker set gunk.
Tested on x86 and alpha, including world.

Approved by: jkh


57052 08-Feb-2000 luigi

Update bridging code to the one already in -stable (this was
forgotten some time ago...).

Approved-by: jordan


57020 07-Feb-2000 mdodd

m_pullup() frees the supplied mbuf on failure; we don't need to try
and do this ourselves.

Approved by: jkh
Noticed by: Mike Spengler <mks@networkcs.com>


56970 03-Feb-2000 mdodd

Make sure that the entire header is in the first mbuf before we
attempt to copy the ethernet header forward and otherwise encapsulate
a packet for output.

This fixes the panic when using VLAN devices on hardware that doesn't
do 802.1Q tagging onboard. (That is to say, all drivers except the Tigon.)

My tests consisted of telnet, ttcp, and a pingflood of packets
between 1 and 1600 (plus headers) bytes.

MFC to follow in 1 week.

Approved by: jkh


56938 01-Feb-2000 shin

Add workaround for fxp issue at interface initialization with IPv6.

Some LAN card chip for fxp is known to cause problem at
interface initialization with IPv6 enabled. It happens at
some delicate timing.
And also, just adding some DELAY before IPv6 address
autoconfiguration is known to avoid the problem.

Complete fix is changing the driver not to use interrupt at
multicast filter initialization, but trying such change in
this stage will be dangerous.

So I add some DELAY() only inside #ifdef INET6 part,
as temporal workaround only for 4.0.

Approbed by: jkh

Noticed by: Mattias Pantzare <pantzer@ludd.luth.se>

Obtained from: openbsd-tech mailing list


56868 29-Jan-2000 peter

Remove #if NGIF > 0 and #if NFAITH > 0 as config already checks this.


56856 29-Jan-2000 peter

Remove some #if NFOO > 0 that are always true because of config rules.


56844 29-Jan-2000 peter

Fix this so LINT compiles. There is no way this could have worked if
tested with LINT. I've put back netatm/kern_include.h and maked it
with a fixme!, otherwise NETISR_ATM isn't defined as ATM_KERNEL isn't
defined. Defining that exposes a whole bunch of other dependencies.. :-(


56777 29-Jan-2000 brian

Remove unused includes


56761 28-Jan-2000 shin

Count AF_INET6 attachement to routing socket.

Obtained from: KAME project


56703 27-Jan-2000 brian

Redo the intrq.c idea as

int family_enqueue(sa_family_t, struct mbuf *);


56555 24-Jan-2000 brian

Move the *intrq variables into net/intrq.c and unconditionally
include this in all kernels. Declare some const *intrq_present
variables that can be checked by a module prior to using *intrq
to queue data.

Make the if_tun module capable of processing atm, ip, ip6, ipx,
natm and netatalk packets when TUNSIFHEAD is ioctl()d on.

Review not required by: freebsd-hackers


56517 24-Jan-2000 ru

Notify user processes about interface's MTU change.

Reviewed by: wollman, freebsd-net


56424 23-Jan-2000 bp

Allow if_ef driver to be compiled into kernel.


56410 23-Jan-2000 brian

Implement TUN[GS]IFHEAD ioctls. Passing a non-zero int to TUNSIFHEAD
tells that tun unit to prepend a four byte address family to packets
queued for tunread() and to expect a four byte address family at the
front of data received by tunwrite().

We queue any protocol received from the interface for tunread(), but
only accept INET, INET6, IPX and NETATALK from tunwrite(). There is
support for Xerox's NS stuff, but AFAICT config(8) doesn't ever
define NS.


56349 21-Jan-2000 brian

Add a new TUNSIFPID ioctl to update the tun_pid (recorded in
tunopen) with the current pid.


56057 15-Jan-2000 phk

|The hard limit for the BPF buffer size is 32KB, which appears too low
|for high speed networks (even at 100Mbit/s this corresponds to 1/300th
|of a second). The default buffer size is 4KB, but libpcap and ipfilter
|both override this (using the BIOCSBLEN ioctl) and allocate 32KB.
|
|The following patch adds an sysctl for bpf_maxbufsize, similar to the
|one for bpf_bufsize that you added back in December 1995. I choose to
|make the default for this limit 512KB (the value suggested by NFR).

Submitted by: se
Reviewed by: phk


56030 15-Jan-2000 shin

Clear ro->ro_rt just after RTFREE().
Pleases let me make sure that no one touch the invalid ro_rt pointer,
after splx(s) and before next ro_rt initialization.
Though usually this seems to be already called at splnet,
I still sometime experience kernel crash at rtfree() in my
INET6 enabled environment where IPv6 connection is frequently used.
(Off-course, it might be just due to another bug.)


56014 15-Jan-2000 shin

cosmetic change: sort function prototypes

Specified by: bde


56013 15-Jan-2000 shin

-K&R fix for some prototype declaration
-fix some comments for #endif to match them with their #ifndef

Submitted by: bde


55789 10-Jan-2000 wpaul

Attempt to fix a problem with receiving packets on USB ethernet interfaces.
Packets are received inside USB bulk transfer callbacks, which run at
splusb() (actually splbio()). The packet input queues are meant to be
manipulated at splimp(). However the locking apparently breaks down under
certain circumstances and the input queues can get trampled.

There's a similar problem with if_ppp, which is driven by hardware/tty
interrupts from the serial driver, but which must also manipulate the
packet input queues at splimp(). The fix there is to use a netisr, and
that's the fix I used here. (I can hear you groaning back there. Hush up.)

The usb_ethersubr module maintains a single queue of its own. When a
packet is received in the USB callback routine, it's placed on this
queue with usb_ether_input(). This routine also schedules a soft net
interrupt with schednetisr(). The ISR routine then runs later, at
splnet, outside of the USB callback/interrupt context, and passes the
packet to ether_input(), hopefully in a safe manner.

The reason this is implemented as a separate module is that there are
a limited number of NETISRs that we can use, and snarfing one up for
each driver that needs it is wasteful (there will be three once I get
the CATC driver done). It also reduces code duplication to a certain
small extent. Unfortunately, it also needs to be linked in with the
usb.ko module in order for the USB ethernet drivers to share it.

Also removed some uneeded includes from if_aue.c and if_kue.c

Fix suggested by: peter
Not rejected as a hairbrained idea by: n_hibma


55633 09-Jan-2000 shin

Remove BROADCAST flag from faith interface,
-it not seems to be necessary
-to avoid dhcp messages or something like that sent to faith interface

The problem reported by: Jim Bloom <bloom@acm.org>


55276 30-Dec-1999 shin

Prevent kernel panic at ifconfig up after Note PC resume.

Submitted by: imp, kuriyama
Reviewed by: imp


55205 29-Dec-1999 peter

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


55178 28-Dec-1999 ru

Make cloning mask sockaddr (genmask) possible.

PR: kern/3061
Reviewed by: wollman


55009 22-Dec-1999 shin

IPSEC support in the kernel.
pr_input() routines prototype is also changed to support IPSEC and IPV6
chained protocol headers.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


54799 19-Dec-1999 green

M_PREPEND-related cleanups (unregisterifying struct mbuf *s).


54796 18-Dec-1999 green

Fix a broken macro usage. It had no semicolon.

Noticed by: eivind


54728 17-Dec-1999 imp

Two more fixes to if_detach. These are generic to all interfaces and
do not pollute the interface further.

o Run if_detach at splnet().
o Creatively swipe the relevant parts of the netatm atm_nif_detach
which will delete the relevant references to the interface going
away.


54558 13-Dec-1999 bp

Bring up an if_ef driver which allows support for four ethernet
frame types. Currently it supports only IPX protocol and doesn't
affect existing functionality when not loaded.

Reviewed by: Ollivier Robert <roberto@keltia.freenix.fr>


54557 13-Dec-1999 bp

Allow ifunit() routine to understand names like ed0f2. Also
fix a bug caused by using bcmp() instead of strcmp().

Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>


54531 13-Dec-1999 jkh

The current code incorrectly assumes that all vlans
are configured, and/or associated with a parent device. If you
receive a frame for a VLAN that's not in the list, you walk off
the end of the list. Boom.

Submitted by: C. Stephen Gunn <csg@waterspout.com>
PR: 15291


54530 13-Dec-1999 jkh

sys/net/if_vlan.c fails to maintain the IFF_RUNNING flag on the
vlan interfaces it manages. This prevents the interface from
actually sending or receiving data.

Submitted by: C. Stephen Gunn <csg@waterspout.com>
PR: 15290


54410 10-Dec-1999 imp

Add some gross ad-hock hacks to increase stability of if_detach:
o be more careful about clearing addresses (this isn't a kludge)
o For AF_INET interfaces, call SIOCDIFFADDR to remove last(?) bit
of cruft.

Special cases for AF_INET shouldn't be here, but I didn't see a good
generic way of doing this. If I missed something, please let me know.

This gross hack makes pccard ejection stable for ethernet cards.

Submitted by: Atushi Onoe-san


54369 09-Dec-1999 jdp

Fix a route table leak in rtalloc() and rtalloc_ign(). It is
possible for ro->ro_rt to be non-NULL even though the RTF_UP flag
is cleared. (Example: a routing daemon or the "route" command
deletes a cloned route in active use by a TCP connection.) In that
case, the code was clobbering a reference to the routing table
entry without decrementing the entry's reference count.

The splnet() call probably isn't needed, but I haven't been able
to prove that yet. It isn't significant from a performance standpoint
since it is executed very rarely.

Reviewed by: wollman and others in the freebsd-current mailing list


54350 09-Dec-1999 shin

rtcalloc() is removed because it turned out not to be necessary for FreeBSD.
(It was added as a part of KAME patch)

Specified by: jdp@polstra.com


54263 07-Dec-1999 shin

udp IPv6 support, IPv6/IPv4 tunneling support in kernel,
packet divert at kernel for IPv6/IPv4 translater daemon

This includes queue related patch submitted by jburkhol@home.com.

Submitted by: queue related patch from jburkhol@home.com
Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


54075 03-Dec-1999 julian

Make the stub routines have the same prototypes as the real bpf
routines.


54038 02-Dec-1999 archie

Add 'const' to the bpf_filter() and bpf_validate() prototypes.
Remove a stale comment from bpf_validate().


53913 30-Nov-1999 archie

Add two new generic control messages, NGM_ASCII2BINARY and
NGM_BINARY2ASCII, which convert control messages to ASCII and back.
This allows control messages to be sent and received in ASCII form
using ngctl(8), which makes ngctl a lot more useful.

This also allows all the type-specific debugging code in libnetgraph
to go away -- instead, we just ask the node itself to do the ASCII
translation for us.

Currently, all generic control messages are supported, as well as
messages associated with the following node types: async, cisco,
ksocket, and ppp.

See /usr/share/examples/netgraph/ngctl for an example of using this.

Also give ngctl(8) the ability to print out incoming data and
control messages at any time. Eventually nghook(8) may be subsumed.

Several other misc. bug fixes.

Reviewed by: julian


53649 24-Nov-1999 julian

Expand the field width for subtypes. We had already overflowed it
by 2 with people just adding numbers on the end of the ethernet subtypes.
We now have an additional 14 subtypes available in ethernet.
Use one of them immediatly for homePNA.

Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>


53647 23-Nov-1999 brian

Only emit the ``wrong ifa'' message if the matching interface
is neither IFF_LOOPBACK or IFF_POINTOPOINT. It's quite common
(and probably more correct) to route local IP numbers via lo0
and it makes configuration easier to assign the hostname address
to local POINTOPOINT links too.

This message usually remains hidden because the loopback interface
gets the highest interface number at boot time, but when the
ethernet interface is added later, the message can get pretty
annoying.

Also, fix a typo.

Not objected to by: freebsd-net


53541 22-Nov-1999 shin

KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP
for IPv6 yet)

With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project


53192 15-Nov-1999 archie

Add some more comments to the sl_compress_tcp() function.


53171 15-Nov-1999 julian

YUCK!
m_prepend doesn't fix m_pkthdr.len, use M_PREPEND instead, which does..
(Netgraph only)


53144 14-Nov-1999 julian

Fix screwup on synthesising incoming ethernet header in Netgraph mode.

Submitted by: brian@freebsd.org


53115 12-Nov-1999 phk

Set the queue length.


53099 11-Nov-1999 julian

Oops forgot to put the source MAC address on outgoing packets!


52904 05-Nov-1999 shin

KAME related header files additions and merges.
(only those which don't affect c source files so much)

Reviewed by: cvs-committers
Obtained from: KAME project


52852 03-Nov-1999 archie

Fix bug in BIOCGETIF ioctl() where it would return a bogus interface
name if the interface unit number was greater than 9.


52754 01-Nov-1999 julian

Use typedefs for node methods.


52633 29-Oct-1999 joerg

When getting a RCN event in state ACK_RCVD, RFC 1661 demands that we
go to REQ_SENT (and we probably should also log this since it should
only happen in a cross-linked connection).

Submitted by: Mark Tinguely <tinguely@plains.NoDak.edu>


52631 29-Oct-1999 archie

Add a comment before sl_compress_tcp() regarding mbuf assumptions.


52598 28-Oct-1999 ru

Re-allocate cblocks after changing the slip unit number.


52525 26-Oct-1999 julian

Minor hack in the netgraph interface to ethernets.


52419 21-Oct-1999 julian

Whistle's Netgraph link-layer (sometimes more) networking infrastructure.
Been in production for 3 years now. Gives Instant Frame relay to if_sr
and if_ar drivers, and PPPOE support soon. See:
ftp://ftp.whistle.com/pub/archie/netgraph/index.html
for on-line manual pages.

Reviewed by: Doug Rabson (dfr@freebsd.org)
Obtained from: Whistle CVS tree


52248 15-Oct-1999 msmith

Implement pseudo_AF_HDRCMPLT, which controls the state of the 'header
completion' flag. If set, the interface output routine will assume that
the packet already has a valid link-level source address. This defaults
to off (the address is overwritten)

PR: kern/10680
Submitted by: "Christopher N . Harrell" <cnh@mindspring.net>
Obtained from: NetBSD


51709 27-Sep-1999 peter

Zap #include "tun.h" (for NTUN) - which isn't used anymore.


51683 26-Sep-1999 peter

Minor tidy up of PPP_FILTER and NBPF stuff. Don't generate bpf.h in the
module and don't #include "bpf.h".


51658 25-Sep-1999 phk

Remove five now unused fields from struct cdevsw. They should never
have been there in the first place. A GENERIC kernel shrinks almost 1k.

Add a slightly different safetybelt under nostop for tty drivers.

Add some missing FreeBSD tags


51654 25-Sep-1999 phk

This patch clears the way for removing a number of tty related
fields in struct cdevsw:

d_stop moved to struct tty.
d_reset already unused.
d_devtotty linkage now provided by dev_t->si_tty.

These fields will be removed from struct cdevsw together with
d_params and d_maxio Real Soon Now.

The changes in this patch consist of:

initialize dev->si_tty in *_open()
initialize tty->t_stop
remove devtotty functions
rename ttpoll to ttypoll
a few adjustments to these changes in the generic code
a bump of __FreeBSD_version
add a couple of FreeBSD tags


51646 25-Sep-1999 phk

Remove NBPF conditionality of bpf calls in most of our network drivers.

This means that we will not have to have a bpf and a non-bpf version
of our driver modules.

This does not open any security hole, because the bpf core isn't loadable

The drivers left unchanged are the "cross platform" drivers where the respective
maintainers are urged to DTRT, whatever that may be.

Add a couple of missing FreeBSD tags.


51254 14-Sep-1999 ru

Don't call if_up() here, just set IFF_UP.

PR: 12251
Reviewed by: wollman


51252 14-Sep-1999 ru

Add comments, fix typos.

Reviewed by: wollman


51172 11-Sep-1999 nsayer

Fix kernel compile with BRIDGE, but without DUMMYNET


50655 30-Aug-1999 sheldonh

For every "promiscuous mode enabled" message printed for an interface,
print a matching "disabled" message when we drop out of promiscuous
mode for that interface.

Discussed on the freebsd-hackers mailing list.


50477 28-Aug-1999 peter

$Id$ -> $FreeBSD$


50013 18-Aug-1999 peter

Hopefully make IFMEDIA_DEBUG compile. if_xname[] is a NetBSD addition,
we need if_name, if_unit. (maybe we should pick up if_xname[] ?)

Pointed out by: jkb@yahoo-inc.com


49829 15-Aug-1999 phk

Give if_tun the "almost clone" makeover.


49827 15-Aug-1999 phk

Give BPF the "almost-clone" update. If you need more of them, make
more entries in /dev and be happy you don't need to recompile your
kernel.


49469 06-Aug-1999 brian

Back out redundant check, and remove the MAXMTU comparison as it's
outside of the (bogus) tuninfo mtu range.
Pointed out by: bde


49468 06-Aug-1999 brian

Back out redundant checks
Pointed out by: bde


49459 06-Aug-1999 brian

Define IF_MAXMTU and IF_MINMTU and don't allow MTUs with
out-of-range values.

``comparison is always 0'' warnings are silly !

Ok'd by: wollman, dg
Advised by: bde


49116 26-Jul-1999 brian

Don't complain if 0 bytes are written to the tun device, simply
do nothing.


49038 23-Jul-1999 jmg

fix a problem w/ zero byte writes to the tunnel device. It would bypass
the loop and not set an error, so we would then try to access an invalid
mbuf...

PR: 12780
Submitted by: bright@rush.net aka zb^3

a new record in length a pr was open... only about a half hour...


48645 06-Jul-1999 des

Rename bpfilter to bpf.


48589 05-Jul-1999 bde

Fixed English errors, spelling errors and formatting errors in rev.1.51
and rev.1.53.


48548 04-Jul-1999 bde

Quick fix for breakage of bounds checking in rev.1.12. Only one
of the additional checks in rev.1.12 was wrong. The others are a
bit inconsistent and are probably unnecessarily pessimal. Checking
for overflow of addition, if necessary at all, should be done in
bpf_validate().

PR: 12484


48426 01-Jul-1999 peter

Fix a printf int/long problem on the Alpha


48400 01-Jul-1999 peter

Fix two easy warnings when using BRIDGE without IPFIREWALL.


48381 30-Jun-1999 msmith

Increase the size of the route reference count from 15 bits to 31 bits.

This doesn't change the size or alignment of the structure on either i386
or Alpha, and thus should be binary-compatible (modulo problems with old
applications and routes with more than 2^15 references).

Reviewed by: peter


48215 25-Jun-1999 pb

Never return the root node itself from rn_match(); return NULL instead.

This caused a panic in rtfreee() called with a root node from the
routing socket code (when processing a RTM_GET message looking for
the default route while there is none).

Since no existing code seems to have any use getting the root node
from rn_match(), it seems cleaner never to return it rather than
check for this condition at the caller's.

PR: kern/12265


48021 19-Jun-1999 phk

Add a new interface ioctl, to return "aux status".

This is inteded for to allow ifconfig to print various unstructured
information from an interface.

The data is returned from the kernel in ASCII form, see the comment in
if.h for some technicalities.

Canonical cut&paste example to be found in if_tun.c

Initial use:
Now tun* interfaces tell the PID of the process which opened them.

Future uses could be (volounteers welcome!):
Have ppp/slip interfaces tell which tty they use.
Make sync interfaces return their media state: red/yellow/blue
alarm, timeslot assignment and so on.
Make ethernets warn about missing heartbeats and/or cables


47778 06-Jun-1999 phk

typo in previous commit


47777 06-Jun-1999 phk

Introduce IFF_SMART bit.

This means that the driver will add/delete routes when it knows it is
up/down, rather than have the generic code belive it is up if configured.

This is probably most useful for serial lines, although many PHY chips
could probably tell us if we're connected to the cable/hub as well.


47640 31-May-1999 phk

Simplify cdevsw registration.

The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it. cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.

cdevsw_add() will print an message if the d_maj field looks bogus.

Remove nblkdev and nchrdev variables. Most places they were used
bogusly. Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.

Move bdevsw() and devsw() functions to kern/kern_conf.c

Bump __FreeBSD_version to 400006

This commit removes:
72 bogus makedev() calls
26 bogus SYSINIT functions

if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.

I4b and vinum not changed. Patches emailed to authors. LINT
probably broken until they catch up.


47625 30-May-1999 phk

This commit should be a extensive NO-OP:

Reformat and initialize correctly all "struct cdevsw".

Initialize the d_maj and d_bmaj fields.

The d_reset field was not removed, although it is never used.

I used a program to do most of this, so all the files now use the
same consistent format. Please keep it that way.

Vinum and i4b not modified, patches emailed to respective authors.


47550 27-May-1999 brian

In tunclose():
Delete all routes if IFF_RUNNING is set, irrespective of IFF_UP
Unset IFF_RUNNING.


47254 16-May-1999 pb

PR: kern/10570
Submitted by: adrian@freebsd.org

Change reference count in struct ifaddr to a u_int, to be able
to handle more than 2^16 routes to the same interface.

Fix suggested by Andrew Bangs <andrewb@demon.net> in PR kern/10570.
Tested by <adrian@freebsd.org> and me under -current.


46695 08-May-1999 kjc

clean up en atm driver
o fix DDB support
- include "opt_ddb.h"
- fix Debugger() arg
pointed out by bde

o back out pvc shadow interface support
- it is currently not used
- to make it easier to merge another implementation

o misc minor cleanup


46678 08-May-1999 phk

Fix some disordering I introduced with the jail code.


46676 08-May-1999 phk

I got tired of seeing all the cdevsw[major(foo)] all over the place.

Made a new (inline) function devsw(dev_t dev) and substituted it.

Changed to the BDEV variant to this format as well: bdevsw(dev_t dev)

DEVFS will eventually benefit from this change too.


46568 06-May-1999 peter

Add sufficient braces to keep egcs happy about potentially ambiguous
if/else nesting.


46420 04-May-1999 luigi

Free the dummynet descriptor in ip_dummynet, not in the called
routines. The descriptor contains parameters which could be used
within those routines (eg. ip_output() ).

On passing, add IPPROTO_PGM entry to netinet/in.h


46161 29-Apr-1999 luoqi

Postpone route_init() until all domains are attached.


46155 28-Apr-1999 phk

This Implements the mumbled about "Jail" feature.

This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

I have no scripts for setting up a jail, don't ask me for them.

The IP number should be an alias on one of the interfaces.

mount a /proc in each jail, it will make ps more useable.

/proc/<pid>/status tells the hostname of the prison for
jailed processes.

Quotas are only sensible if you have a mountpoint per prison.

There are no privisions for stopping resource-hogging.

Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/


46130 28-Apr-1999 msmith

Allow loadable interface drivers with BPF support to be loaded into a kernel
that doesn't have it. This is achieved by having minimal do-nothing stubs
enabled when there are no bpfilter devices configured.

Driver modules should be built with BPF enabled for maximum
convenience (but can be built without it for maximum performance).


46112 27-Apr-1999 phk

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


46092 26-Apr-1999 peter

Temporary hack. The radix code shouldn't need this, it should be
able to expand the zeros, ones etc masks on the fly. It seems a good
number of domains don't set the rn_maxkey variable anyway, and because
this is a domain itself, there is no guarantee we've been called after
a protocol that actually has set it (ie: inet), so start with a maxkey
of a relatively sane size as a base point until it can adapt on the fly.


46091 26-Apr-1999 peter

Protect the ifinit() function's internals with splimp() for safety since
it used to be that way. I'm not sure that it's needed, but it does
walk the ifp list..

Incidently, there's nothing to sanity check the ifq_maxlen on loaded
interfaces..


46090 26-Apr-1999 peter

Minor seatbelt tweak. The init code used to be splimp() protected,
maintain that in case.


46082 26-Apr-1999 peter

Make NETISR_SET use a SYSINIT() rather than a linker set.


45923 21-Apr-1999 peter

Fix my breakage of BRIDGE compiling option without IPFIREWALL..
(Note that if you have bridge compiled in and then kldload ipfw, bridge
won't automatically use it - knowledge of ipfw/dummynet is compiled in)


45869 20-Apr-1999 peter

Tidy up some stray / unused stuff in the IPFW package and friends.
- unifdef -DCOMPAT_IPFW (this was on by default already)
- remove traces of in-kernel ip_nat package, it was never committed.
- Make IPFW and DUMMYNET initialize themselves rather than depend on
compiled-in hooks in ip_init(). This means they initialize the same
way both in-kernel and as kld modules. (IPFW initializes now :-)


45720 16-Apr-1999 peter

Bring the 'new-bus' to the i386. This extensively changes the way the
i386 platform boots, it is no longer ISA-centric, and is fully dynamic.
Most old drivers compile and run without modification via 'compatability
shims' to enable a smoother transition. eisa, isapnp and pccard* are
not yet using the new resource manager. Once fully converted, all drivers
will be loadable, including PCI and ISA.

(Some other changes appear to have snuck in, including a port of Soren's
ATA driver to the Alpha. Soren, back this out if you need to.)

This is a checkpoint of work-in-progress, but is quite functional.

The bulk of the work was done over the last few years by Doug Rabson and
Garrett Wollman.

Approved by: core


45574 11-Apr-1999 eivind

Break long lines that I introduced in a previous commit.


45451 07-Apr-1999 wpaul

Add missing SYSCTL_DECL(_net_link); required by newer sysctl implementation.

Noticed by: Matthew Dodd <winter@jurai.net>


45272 03-Apr-1999 jdp

Add a missing declaration that broke the compilation of this file.


45164 30-Mar-1999 nsayer

Merge from RELENG_2_2, per luigi. Fixes the ntoh?() issue for the
firewall code when called from the bridge code.

PR: 10818
Submitted by: nsayer
Obtained from: luigi


45152 30-Mar-1999 phk

rganize the various modes (CISCO/AUTO/DEMAND/LEASED) a little bit better,
centralize the code.

Remember to call TLF/TLS on the hardware in CISCO mode.


45014 24-Mar-1999 des

Implement TUNSIFMODE and TUNSLMODE.

Submitted by: Alfred Perlstein <bright@cygnus.rush.net>


44764 15-Mar-1999 wpaul

Grrr... botched remote commit. Let's try this again: vlan updates,
take two.


44763 15-Mar-1999 wpaul

Updates for vlan stuff:

- add support for devices that do vlan tag insertion/deletion in firmware
- add multicast support
- add vlan_unconfig() to complement vlan_config()
- update ifconfig(8) to configure vlan interfaces (vlan tag and
parent device)

Also fix a small bug in ifconfig; sometimes sa_family is overwritten
by ioctls.

Reviewed by: wollman


44627 10-Mar-1999 julian

Submitted by: Larry Lile
Move the Olicom token ring driver to the officially sanctionned location of
/sys/contrib. Also fix some brokenness in the generic token ring support.

Be warned that if_dl.h has been changed and SOME programs might
like recompilation.


44542 07-Mar-1999 wpaul

Also add 1000baseSX, 1000baseLX, 1000baseCX and 1000baseTX media types. At
this point I don't know if there are any actual gigabit ethernet devices
that support media other than 1000baseSX (multi-mode fiber) but who knows.


44521 06-Mar-1999 wpaul

Add 1000baseFX, 10baseSTP and 10baseFL media types. The 1000baseFX
type may become necessary soon. :)

Also add a couple of additional macros that NetBSD has which we don't.
Nothing in FreeBSD uses these (yet) so adding them in shouldn't hurt
anything.


44254 25-Feb-1999 kato

The fe driver supports bridging, so added it to lists.


44235 23-Feb-1999 phk

Misplaces brace puts important code into debug section.

Reviewed by: phk
Submitted by: Stefan Bethke <stefan.bethke@hanse.de>


44169 20-Feb-1999 dt

Set ifq_maxlen.


44165 20-Feb-1999 julian

World, I'd like you to meet the first FreeBSD token Ring driver.
This is for various Olicom cards. An IBM driver is following.
This patch also adds support to tcpdump to decode packets on tokenring.
Congratulations to the proud father.. (below)

Submitted by: Larry Lile <lile@stdio.com>


44145 19-Feb-1999 phk

Remove all the #ifdef notyet stuff, it is probably never going to happen
in the first place.

Use 3sec timeout as recommended.

Reorder some debug messages.

Label som of the 0x%x in debug messages

Make sppp_print_bytes() use %*D and handle zero length.

If we don't have MAGIC numbers, don't yell loopback if 0 == 0


44144 19-Feb-1999 phk

Since ifru_flags is a short, we can fit in a copy of the flags
before they got changed. This can help eliminate much of the
gymnastics drivers do in their ioctl routines to figure this out.

Remove commented out IFF_NOTRAILERS


44078 16-Feb-1999 dfr

* Change sysctl from using linker_set to construct its tree using SLISTs.
This makes it possible to change the sysctl tree at runtime.

* Change KLD to find and register any sysctl nodes contained in the loaded
file and to unregister them when the file is unloaded.

Reviewed by: Archie Cobbs <archie@whistle.com>,
Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)


43518 02-Feb-1999 dillon

Get rid of IFF_BROADCAST from default IFF_ slip options. This accidently
snuck in during the big -Wall commit and wasn't supposed to be in there.


43508 01-Feb-1999 phk

Print a message if the driver didn't initialize ifq_maxlen.
Drivers should be updated if they get flagged by this message.

(The reason this is important is because we do not have a way
to catch this mistake for interfaces added after ifinit() runs.)


43457 31-Jan-1999 julian

Slight cleanups. There were 2 ways of getting the arpcom from the ifp.
Both equally bogus. Make it a macro so that we can pretend it's not
bogus and maybe make it less so some time in the future.


43305 27-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


42957 21-Jan-1999 dillon

This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.

Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>


42769 17-Jan-1999 peter

Undo #undef KERNEL hack for vnode.h to avoid vnode_if.h.
XXX It probably makes sense to have a flag for bsd.kern.mk to avoid these
rules.
XXX IO_NDELAY seems to be the main reason for it, when used in a cdevsw
read or write "flag" context. Perhaps a redundant declaration
somewhere like sys/conf.h might help remove the need for vnode.h in
these device drivers in the first place.


42570 12-Jan-1999 eivind

Remove unused variable & clean up a couple of style issues.


42195 31-Dec-1998 luigi

Remove one unused variable.


42104 27-Dec-1998 phk

Update sppp support to i4b level. This includes the new spppcontrol
program to set PPP options like authentication with.


42066 26-Dec-1998 phk

More isdn4bsd convergence: cleanup log messages.


42065 26-Dec-1998 phk

Converge further on the isdn4bsd version of this file.


42064 26-Dec-1998 phk

clean up more timeout/untimeout portability stuff.
make sure flags and stuff are set sensibly.


41963 20-Dec-1998 phk

Add two fields for the lower layers convenience.


41881 16-Dec-1998 phk

Straigthen out the use of the tls and tlf callbacks.

Not tested on the if_sr, if_cx and if_ar drivers, but
expected to work just the same as it used to.

Any users of these drivers (or even better: donors
of hardware for them) please contact phk@freebsd.org
so we can test the next batch of changes to if_sppp.


41879 16-Dec-1998 phk

Generalize the if_up() and if_down() functions under the names
if_route() and if_unroute().

This is first step towards sanitizing IFF_UP and IFF_RUNNING


41792 14-Dec-1998 luigi

Bridging support. Wait for LINT to be updated before trying it.


41757 14-Dec-1998 eivind

Make the use of 'disc' vs 'ds' as prefix consistent by making all 'disc'.
This fix the conflict of having two functions called 'dsioctl()'.


41687 11-Dec-1998 phk

(almost) null commit, recording message for previous commit:

s/_NET_IF_HDLC_H_/_NET_IF_SPPP_H_/

Unfold almost correct and hideous beyond reason, boolean expression,
making it more correct at the same time.


41686 11-Dec-1998 phk

*** empty log message ***


41591 07-Dec-1998 archie

The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.


41588 07-Dec-1998 eivind

Propagate unsignedness to all variants of 'k', and reorganize the
conditionals to be fully resistent against overflow in unsigned
computations.

Potential problem pointed out by: bde
Reviewed by: bde


41571 07-Dec-1998 eivind

Remove guard for < 0 on an unsigned variable.


41514 04-Dec-1998 archie

Examine all occurrences of sprintf(), strcat(), and str[n]cpy()
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.

These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.

Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by: Mike Spengler <mks@networkcs.com>


41087 11-Nov-1998 truckman

I got another batch of suggestions for cosmetic changes from bde.


41086 11-Nov-1998 truckman

Installed the second patch attached to kern/7899 with some changes suggested
by bde, a few other tweaks to get the patch to apply cleanly again and
some improvements to the comments.

This change closes some fairly minor security holes associated with
F_SETOWN, fixes a few bugs, and removes some limitations that F_SETOWN
had on tty devices. For more details, see the description on the PR.

Because this patch increases the size of the proc and pgrp structures,
it is necessary to re-install the includes and recompile libkvm,
the vinum lkm, fstat, gcore, gdb, ipfilter, ps, top, and w.

PR: kern/7899
Reviewed by: bde, elvind


40779 31-Oct-1998 dfr

* Use explicitly sized types for grovelling around inside packets.
* On the alpha, make sure memory accesses are only made to aligned boundaries.

Submitted by: Alex Nash <nash@mcs.net>


40049 08-Oct-1998 alex

Check the timeval passed to BIOCSRTIMEOUT with itimerfix. Use tvtohz()
to convert the timeval into a tick count.

Suggested by: bde
Reviewed by: bde

Handle hz > 1000 in BIOCGRTIMEOUT.

Pointed out by: bde
Reviewed by: bde
Obtained from: OpenBSD


40010 06-Oct-1998 joerg

Minor cleanup: kill a couple of unused variables, and a couple of
uninitialized variables.

Obtained from: The isdn4bsd project (partially)


40008 06-Oct-1998 joerg

In an attempt to reduce the huge number of differences between the
FreeBSD repository version of this file and the isdn4bsd version,
adopt those changes from the i4b version that make this file
BSD-version independent. I attempted to avoid uglifying this file too
much, thus deviated a little from the i4b version (and hope they will
adopt the changes, too).

The diffs mostly concentrate on:

. #include differences between the systems
. different callout handling between FreeBSD vs. Net/OpenBSD
. interface naming (Net/OpenBSD store the ASCII name including the
unit # in struct ifnet, FreeBSD only the name)
. use of random() in FreeBSD vs. time-based pseudo-randomization in
Net/OpenBSD (for loopback detection ad CHAP challenges -- i
assume at least OpenBSD could also benefit from random(), but that's
the way i've got this file)
. interface address list elements are named a little differently
between FreeBSD and Net/OpenBSD

I attempted to segregate those compat fixes from other code fixes and
enhancements.

Obtained from: The isdn4bsd project


39981 05-Oct-1998 joerg

Fix a =/== confusion that caused the CHAP type renegotiation to
completely fail.

Obtained from: The isdn4bsd project (original author unknown right now)


39964 04-Oct-1998 alex

The length argument for bcopy is a size_t, not u_int. Adjust
bpf_mcopy() and catchpacket() prototypes accordingly.


39963 04-Oct-1998 alex

Change BPF_ALIGNMENT to long, necessary for correct alignment on Alpha.


39955 04-Oct-1998 alex

Support hz > 1000 (Alpha) in BIOCSRTIMEOUT.

Obtained from: OpenBSD


39464 18-Sep-1998 luigi

fix an mbuf leak when using ipfw to filger bridged packets
(from -stable, since this code is not yet active in -current)


39296 15-Sep-1998 fenner

Add DLT_{SLIP,PPP}_BSDOS from libpcap 0.4


39120 12-Sep-1998 luigi

Bring in files for bridging support.


38482 23-Aug-1998 wollman

Yow! Completely change the way socket options are handled, eliminating
another specialized mbuf type in the process. Also clean up some
of the cruft surrounding IPFW, multicast routing, RSVP, and other
ill-explored corners.


38423 18-Aug-1998 ache

Implement DLT_RAW from libpcap


38372 17-Aug-1998 bde

Fixed printf format errors. sppp_dotted_quad() was yet another private,
broken, version of inet_ntoa(). It should go away.


38343 15-Aug-1998 bde

Fixed yet more ioctl breakage due to the type of the `cmd' arg changing
from int to u_long but not changing here.


38293 12-Aug-1998 wpaul

One-liner: add a call to the underlying device driver's SIOCDELMULTI
ioctl() routine at the end of if_delmulti() so that interfaces with
hardware multicast filtering can update their filters in a timely
manner.

If the interface doesn't support hardware multicast filtering, then
reception of multicast frames is done using 'promiscious mode' or
'capture all multicast frames' mode and software filtering in the
kernel. In this case, it doesn't matter if if_delmulti() ever does
an SCIODELMULTI on the interface or not: if MULTICAST support is
enabled, then we join the 'all hosts' group when the interface is
configured, and remain in it until the interface is brought down.
Without hardware filtering, joining one group means joining all
groups, so it makes no difference if we call the SIOCDELMULTI
routine.

If the interface does support hardware multicast filtering, then
by not reprogramming the hardware filter in if_delmulti(), we have
to wait until somebody calls if_setmulti(), during which time the
interface is receiving frames for multicast groups in which we are
no longer interested.


38114 04-Aug-1998 julian

fix broken loopback code for ddp (again)
Submitted by: Stefan Bethke <stb@hanse.de>


37939 29-Jul-1998 kjc

update ATM driver. (base version: midway.c 1.67 --> 1.68)

several new features are added:
- support vc/vp shaping
- support pvc shadow interface

code cleanup:
- remove WMAYBE related code. ENI WMAYBE DMA doen't work.
- remove updating if_lastchange for every packet.
- BPF related code is moved to midway.c as it should be.
(bpfwrite should work if atm_pseudohdr and LLC/SNAP are
prepended.)
- BPF link type is changed to DLT_ATM_RFC1483.
BPF now understands only LLC/SNAP!! (because bpf can't
handle variable link header length.)
It is recommended to use LLC/SNAP instead of NULL
encapsulation for various reasons. (BPF, IPv6,
interoperability, etc.)

the code has been used for months in ALTQ and KAME IPv6.

OKed by phk long time ago.


37778 20-Jul-1998 dfr

Make sure the link level sockaddr size is rounded up correctly on alpha.


37649 15-Jul-1998 bde

Cast pointers to uintptr_t/intptr_t instead of to u_long/long,
respectively. Most of the longs should probably have been
u_longs, but this changes is just to prevent warnings about
casts between pointers and integers of different sizes, not
to fix poorly chosen types.


37619 13-Jul-1998 bde

Don't attempt to optimize the space allocated for bpf headers if
sizeof(struct bpf_hdr) > 20. 20 is normal on 32-bit systems with
32-bit alignment, but we still assume that the last 2 bytes of the
struct are unnecessary padding on such systems. On systems with
64-bit longs, struct timeval is bloated to 16 bytes, so bpf headers
certainly don't fit in 18 bytes.


37600 12-Jul-1998 dfr

Make sure the packet is aligned correctly for the alpha in if_simloop.


37560 11-Jul-1998 bde

Fixed printf format errors.


37094 21-Jun-1998 bde

Removed unused includes.


37067 20-Jun-1998 peter

Zap what appears to be a relic of the older version of zlib. The other
maintained mbuf based ppp-deflate.c's have removed this.


37066 20-Jun-1998 peter

Missing splx().


37065 20-Jun-1998 peter

Merge ppp changes from 2.3.3 -> 2.3.5. I have spotted some more
problems, which I'll have a go at shortly.


36994 14-Jun-1998 julian

Oops
left a "break;" out of the last patch
it complains for every loopback packet..


36992 14-Jun-1998 julian

Try narrow down the culprit sending undefined packet types through the loopback


36940 13-Jun-1998 julian

Allow a protocol to specify that it does NOT want to be looped back
even if it looks like it should (backwards compatibility with
old broken code) should get rid of some annoying messags.


36933 12-Jun-1998 julian

Remove 3 occurances of __FUNCTION__


36908 12-Jun-1998 julian

Go through the loopback code with a broom..
Remove lots'o'hacks.
looutput is now static.

Other callers who want to use loopback to allow shortcutting
should call the special entrypoint for this, if_simloop(), which is
specifically designed for this purpose. Using looutput for this purpose
was problematic, particularly with bpf and trying to keep track
of whether one should be using the charateristics of the loopback interface
or the interface (e.g. if_ethersubr.c) that was requesting the loopback.
There was a whole class of errors due to this mis-use each of which had
hacks to cover them up.

Consists largly of hack removal :-)


36775 08-Jun-1998 julian

Don't let ifunit() modify the string passed as an argument.
it may be in the text segment and write protected.


36735 07-Jun-1998 dfr

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


36726 07-Jun-1998 bde

Added a used include (in ifdefed code).


36724 07-Jun-1998 bde

Fixed pedantic syntax errors caused by a trailing semicolon in a macro
definition.


36265 21-May-1998 dg

Backed out last fix and fixed my typo:
ipflow(fastforward -> ipflow_fastforward


36256 20-May-1998 dufault

Add missing close paren


36192 19-May-1998 dg

Added fast IP forwarding code by Matt Thomas <matt@3am-software.com> via
NetBSD, ported to FreeBSD by Pierre Beyssac <pb@fasterix.freenix.org> and
minorly tweaked by me.
This is a standard part of FreeBSD, but must be enabled with:
"sysctl -w net.inet.ip.fastforwarding=1" ...and of course forwarding must
also be enabled. This should probably be modified to use the zone
allocator for speed and space efficiency. The current algorithm also
appears to lose if the number of active paths exceeds IPFLOW_MAX (256),
in which case it wastes lots of time trying to figure out which cache
entry to drop.


36119 17-May-1998 phk

s/nanoruntime/nanouptime/g
s/microruntime/microuptime/g

Reviewed by: bde


36078 15-May-1998 wollman

Fix an obvious parameter-order bogon. (Don't know what happened to
the warning message before.)


35596 01-May-1998 bde

Oops, the previous commit should have changed `i386' to `__i386__',
not `__i386'.


35563 30-Apr-1998 phk

Loopback network interface driver (net/if_loop.c) has no SIOCSIFFLAGS
ioctl handler.

PR: 6466
Reviewed by: phk
Submitted by: Ruslan Ermilov <ru@ucb.crimea.ua>


35472 27-Apr-1998 brian

Support more than 256 tun devices:

$ ls -l /dev/tun25[4-7]
crw------- 1 fax dialer 52, 254 Apr 27 02:27 /dev/tun254
crw------- 1 fax dialer 52, 255 Apr 27 02:27 /dev/tun255
crw------- 1 fax dialer 52, 0x00010000 Apr 27 02:31 /dev/tun256
crw------- 1 fax dialer 52, 0x00010001 Apr 27 02:31 /dev/tun257


35256 17-Apr-1998 des

Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108.


35210 15-Apr-1998 bde

Support compiling with `gcc -ansi'.


35067 06-Apr-1998 phk

Use getmicrotime() for if_lastchange, 10msec is plenty precision.


35064 06-Apr-1998 phk

Use random() for seq numbers and read_random for CHAP challenge.


35060 06-Apr-1998 phk

Make read_random() take a (void *) argument instead of (char *)


35029 04-Apr-1998 phk

Time changes mark 2:

* Figure out UTC relative to boottime. Four new functions provide
time relative to boottime.

* move "runtime" into struct proc. This helps fix the calcru()
problem in SMP.

* kill mono_time.

* add timespec{add|sub|cmp} macros to time.h. (XXX: These may change!)

* nanosleep, select & poll takes long sleeps one day at a time

Reviewed by: bde
Tested by: ache and others


34961 30-Mar-1998 phk

Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time. Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by: bde


34924 28-Mar-1998 bde

Moved some #includes from <sys/param.h> nearer to where they are actually
used.


34885 25-Mar-1998 peter

Quieten a debug message.. This happens under "normal" operation by 4 bytes
on a frequent enough rate to be annoying. There is a real bug somewhere,
but it looks harmless enough.


34823 23-Mar-1998 bde

Added a forward struct declaration so that this file is less
self-insufficient.


34774 22-Mar-1998 peter

ppp-2.3.x ships with a bad compression number for deflate. It uses number
24 (which is magnalink!) rather than the correct 26.

Initial attempt at a compatability kludge that will negotiate for either
but will prefer to use the correct deflate compression type.


34768 21-Mar-1998 peter

Update kernel parts of ppp to ppp-2.3.3. Not much has changed except
that the deflate components use zlib 1.0.4 instead of zlib 0.95.


34750 21-Mar-1998 peter

On most other systems "out there", <net/if.h> does not require the caller
to #include <sys/time.h> first. I've lost count of the number of times
I've had to patch this in porting code. The problem is the
"struct timeval ifi_lastchange" in the mib stats. (most other systems don't
have this, until 4.4bsd anyway).


34649 18-Mar-1998 wollman

Add preliminary support for IEEE 802.1Q VLAN tagging. It doesn't actually
work reliably yet (I've had panics), but it does seem to occasionally
be able to transmit and receive syntactically-correct packets.
Also fixes one of if_ethersubr.c's legion style bugs, and removes
the hostcache code from standard kernels---the code that depends on it
is not going to happen any time soon, I'm afraid.


33939 01-Mar-1998 bde

Fixed syntax error in previous commit.


33928 28-Feb-1998 phk

Make it possible to indicate that we don't care about the remote
sides IP address, as long as it isn't 0.0.0.0


33679 20-Feb-1998 bde

Don't depend on "implicit int" or bloat the data section in the
declaration of xxx_devsw_installed.


33676 20-Feb-1998 bde

Removed unused #includes.


33322 13-Feb-1998 phk

Implement the spirit but not the letter of Terrys hot-char patch.

The differences Terrys patch and this patch are:
* Remove a lot of un-needed comments.
* Don't put l_hotchar at the front of stuct linesw, there is no need to.
* Use the #defines for the hotchar in the SLIP and PPP line disciplines


33181 09-Feb-1998 eivind

Staticize.


33134 06-Feb-1998 eivind

Back out DIAGNOSTIC changes.


33108 04-Feb-1998 eivind

Turn DIAGNOSTIC into a new-style option.


33058 03-Feb-1998 bde

Added #include of <sys/queue.h> so that this file is more "self"-sufficent.


32957 01-Feb-1998 steve

Revert previous commit. Remove all ifp->if_* = 0 initializations,
since pkh made tunctl static in revision 1.17 these are already
guaranteed to be zero'd and tunattach will only be called once.

Pointed out by: Bruce Evans and Bill Fenner


32929 31-Jan-1998 eivind

Make the debug options new-style.

This also zaps a DPT option from lint; it wasn't referenced from
anywhere.


32925 31-Jan-1998 eivind

Make POWERFAIL_NMI, PPS_SYNC and NATM new style options.

This also fixes a couple of defunct options; submitted by bde.


32809 26-Jan-1998 brian

Correct $Id$


32776 25-Jan-1998 steve

Initialize if_ibytes and if_obytes to zero.

PR: 1376
Submitted by: risner@stdio.com


32726 24-Jan-1998 eivind

Make all file-system (MFS, FFS, NFS, LFS, DEVFS) related option new-style.

This introduce an xxxFS_BOOT for each of the rootable filesystems.
(Presently not required, but encouraged to allow a smooth move of option *FS
to opt_dontuse.h later.)

LFS is temporarily disabled, and will be re-enabled tomorrow.


32491 13-Jan-1998 wollman

Add a macro to accurately calculate the length of a struct ifreq when
it contains an address. This can replace all the myriad (wrong) ways
in which this task is performed in the current system. As an added
bonus, since it's a macro, then third-party software vendors have an easy
way to tell whether it's there or not. (This will become necessary
when sizeof(struct sockaddr) is increaased, and also when additional
fields are added to struct ifreq.)


32441 11-Jan-1998 brian

Move softc stuff into if_tunvar.h
Suggested by: Peter Wemm <peter@netplex.com.au>
Hinted at by: Bruce Evans <bde@FreeBSD.org>
?\005
?\005


32440 11-Jan-1998 brian

Move softc stuff into if_tunvar.h
Suggested by: Peter Wemm <peter@netplex.com.au>
Hinted at by: Bruce Evans <bde@FreeBSD.org>
À³?\005


32384 10-Jan-1998 bde

Fixed change prerequisites for <net/if_arp.h>:
- don't declare `struct arpcom' except in the kernel, so that there is no
dependency on <net/if.h> except in the kernel. This may break something
else.
- spell ETHER_ADDR_LEN as 6 again, so that there is no dependency on
<net/ethernet.h> even in the kernel.


32356 09-Jan-1998 eivind

NETATALK -> opt_atalk.h


32350 08-Jan-1998 eivind

Make INET a proper option.

This will not make any of object files that LINT create change; there
might be differences with INET disabled, but hardly anything compiled
before without INET anyway. Now the 'obvious' things will give a
proper error if compiled without inet - ipx_ip, ipfw, tcp_debug. The
only thing that _should_ work (but can't be made to compile reasonably
easily) is sppp :-(

This commit move struct arpcom from <netinet/if_ether.h> to
<net/if_arp.h>.


32169 01-Jan-1998 gj

Reviewed by: Joerg Wunsch
In sppp_chap_input:
1) in the CHAP_CHALLENGE case don't output the peer's name if it is not
what we expected (DEBUG) since it will be printed out in the course
of events anyway.
2) in the CHAP_SUCCESS case test whether the peer is required to
authenticate himself [(sp->lcp.opts & (1 << LCP_OPT_AUTH_PROTO))],
otherwise the state machine may never switch into the network state.
I saw this case against 2 different ISPs; they never bothered to
authenticate themselves to me.

In sppp_pap_input:
in the PAP_ACK case do the same as in 2) above for the same reason.


31896 20-Dec-1997 eivind

Remove bogus #ifdef INET - SLIP doesn't compile without INET.


31884 20-Dec-1997 bde

Fixed gratuitous ANSIisms.


31861 19-Dec-1997 ache

MALLOC->malloc
Suggested-by: bde


31854 19-Dec-1997 ache

SUNIT: use MALLOC/FREE and M_NOWAIT


31778 16-Dec-1997 eivind

Make COMPAT_43 and COMPAT_SUNOS new-style options.


31742 15-Dec-1997 eivind

Throw options IPX, IPXIP and IPTUNNEL into opt_ipx.h.

The #ifdef IPXIP in netipx/ipx_if.h is OK (used from ipx_usrreq.c and
ifconfig.c only).

I also fixed a typo IPXTUNNEL -> IPTUNNEL (and #ifdef'ed out the code
inside, as it never could have compiled - doh.)


31577 06-Dec-1997 bde

Use ENOIOCTL instead of -1 (= ERESTART) for tty ioctls that are
not handled at a particular level. This fixes mainly restarting
of interrupted TIOCDRAINs and TIOCSETA{W,F}s.


31390 24-Nov-1997 bde

Unstaticized rn_delete() and rn_lookup(). They are used in dark corners
of netatalk (if NETATALKDEBUG is configured).

Removed stray semicolons.


31284 18-Nov-1997 bde

Removed now-unused blocking mode flag.


31283 18-Nov-1997 bde

Removed unused #includes.

Fixed gratuitous ANSIisms.

Fixed nonblocking mode. It was per-device instead of per-file.


31282 18-Nov-1997 bde

Removed unused #includes.

Fixed nonblocking mode. It was per-device instead of per-file. This
also fixes clobbering of bd_rtout by overloading it to hold a wrong
version of the blocking flag. I hope nothing depends on the bugs.


31266 18-Nov-1997 bde

Don't test for conflicting combinations of PPP_FILTER/BPFILTER here.
Testing in if_ppp.c is good enough.

Added comments about bogus #includes and #defines.

Removed unused #includes.

Don't depend on gcc's misfeature of rewriting short args in old-style
function definitions to match wrong prototypes. I just changed the
function definition to match the prototype, since this is easy to
verify automatically (it causes no changes in the object code), but
it breaks K&R1 support and doesn't fix the pessimal type.


31265 18-Nov-1997 bde

Cleaned up PPP_FILTER/NBPFILTER ifdefs.

Use gettime() instead of microtime() to set if_lastchange for i/o's.
microtime() is probably too expensive. However, setting if_lastchange
for i/o's may be wrong.


31264 18-Nov-1997 bde

Use gettime() instead of assignment from `time'. (`time' is too
volatile to use outside of splclock(). microtime() is probably too
expensive to use for every i/o. However, setting ifi_lastchange for
every i/o is just wrong according to the comment about ifi_lastchange
in <net/if.h>. It is set then for atm, fddi and the latest version
of ppp.)


31016 07-Nov-1997 phk

Remove a bunch of variables which were unused both in GENERIC and LINT.

Found by: -Wunused


30868 31-Oct-1997 dg

Fixed bug in RTM_ADD where rmx_locks weren't being set on the new route,
preventing "route add default 1.2.3.4 -lock -mtu 1500" from working as
expected (which is, BTW, to disable Path MTU Discovery).


30834 29-Oct-1997 julian

didn't even know fddi had the atalk support.
fix it here too. (really needs more of the fixes from the ethernet)


30822 29-Oct-1997 julian

Fix various problems with netatalk kernel support.
Some of these changes are a bit rough and will become
more polished later. the changes to if_ethersubr should largely be moved
to within the appletalk code, but that will happen later.
A few of these were related to network-byteorder problems,
and more were related to loopback failures.


30813 28-Oct-1997 bde

Removed unused #includes.


30535 18-Oct-1997 peter

Braino on my part.. a #define isn't a reference to a structure, so the
struct only needs to be defined if the macro is used.

Pointed out by: bde


30527 18-Oct-1997 peter

Convert PPP_FILTER to an option, like PPP_BSDCOMP and PPP_DEFLATE.
It requires bpf, I'll note this in LINT.


30525 18-Oct-1997 peter

Better fix for the bpf dependency that doesn't have such a large impact
on the code and pppd in userland. PPP_FILTER is meant to be an option (or
negatable option).


30524 18-Oct-1997 peter

Back out the `PPP_FILTER => NBPFILTER' changes.


30523 18-Oct-1997 peter

Back out the `PPP_FILTER => NBPFILTER' changes


30521 17-Oct-1997 roberto

A better fix for both kernel and LKM compilation.


30520 17-Oct-1997 roberto

Change PPP_FILTER into NBPFILTER to fix kernel compilation.
It should probably be changed in ppp_tty.c for consistency but I'll let
Brian deal with this.

Forgotten by: brian


30499 17-Oct-1997 brian

PPP_FILTER => NBPFILTER


30354 12-Oct-1997 phk

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


30300 11-Oct-1997 joerg

Jumbo patch to implement PAP and CHAP for sppp(4). Partially based on
Serge's (Cronyx's) code in the vendor branch. (FR support not yet
merged.)


30270 10-Oct-1997 peter

Try out PPP_FILTER


30201 07-Oct-1997 ache

Preserve old SC_STATIC value after units exchange


30199 07-Oct-1997 joerg

Ooops, this should have made it into the same commit, but didn't.

Introduce the SIOC[SG]IFGENERIC hooks that can be used to pass an
arbritrary ioctl subcommand into an interface driver. Surprisingly
enough, there was no provision for this already present (except of the
option of abusing SIOC[SG]IFMEDIA for this).

The idea is that an interface driver can establish ioctl subcommands
of its own that can't be meaningfully interpreted by the upper layer
interface ioctl function. Something like this is required to
implement a clean solution of passing down things like CHAP secrets or
PPP options to the /sys/net/if_sppp* files. (Yes, my CHAP is now
finally working with it, but i gotta update my kernel to the new
callout interface before being able to commit _that_.)

Reviewed by: peter [long ago, actually]


30090 03-Oct-1997 julian

Allow interfaces to be attached to bpf at times other than boot.
doing so without this patch leads to an infinite loop in the kernel.


29691 21-Sep-1997 dyson

Remove an unfortunate name clash with the zalloc/zfree routines. Since the
ppp_deflate code uses the names locally - it looses.


29681 21-Sep-1997 gibbs

Update for new callout interface.


29506 16-Sep-1997 bde

Fixed gratuitous ANSIisms.


29366 14-Sep-1997 peter

Update network code to use poll support.


29365 14-Sep-1997 peter

Update select -> poll


29364 14-Sep-1997 peter

select -> poll

Obtained from: NetBSD (I think)


29194 07-Sep-1997 joerg

Fix a typo that becomes apparent when compiling without COMPAT_443.

Submitted by: Tony Kimball <Anthony.Kimball@East.Sun.COM>


29189 07-Sep-1997 bde

Some staticized variables were still declared to be extern.


29179 07-Sep-1997 bde

Some staticized variables were still declared to be extern.


29024 02-Sep-1997 bde

Added used #include - don't depend on <sys/mbuf.h> including
<sys/malloc.h> (unless we only use the bogusly shared M*WAIT flags).


28989 01-Sep-1997 bde

Removed unused #includes.


28845 28-Aug-1997 julian

Add a per-interface-address pointer to a function that can be supplied
by a protocol, to detirmine if an address matches the net this address
is part of. This is needed by protocols for which netmasks
"just don't work", for example appletalk.

Also add the code in appletalk to make use of this new feature.
Thsi fixes one of the longest standing bugs in appletalk.
The inability to talk to machines to which the path is via a router
which is on a different net, but the same netrange, as your interface.
Protocols that do not supply this function (e.g. IP) should not be affected.


28608 22-Aug-1997 julian

add some comments while trying to understand why appletalk
gets some things wrong.
(part of my continuing "comment it as you understand it" effort :)


28583 22-Aug-1997 peter

Some fixes from Bruce:
- don't access time (a volatile) via struct copy.
- merge botches
- note risk of CCOUNT accessing *tp outside spltty().

Submitted by: bde


28426 19-Aug-1997 peter

Remove some stray extra prototypes


28425 19-Aug-1997 peter

Use two NetBSD-style options (PPP_DEFLATE and PPP_BSDCOMP) to control
whether or not to compile the two ppp compression methods.


28415 19-Aug-1997 peter

Update kernel parts of pppd from 2.2.0 to 2.3.0. I've yet to look at the
2.3.0 -> 2.3.1 changes, but I seem to recall that there are certain
"issues" with 2.3.1 (I'm not sure if it's just pppd or the whole lot, I
am not quite that far). The present pppd seems to work with it just fine
for the time being.

Among the changes are that zlib (aka LZ77 aka deflate aka gzip) compression
is implemented as well as the original compress(1) LZW style.


28270 16-Aug-1997 wollman

Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.


28167 13-Aug-1997 ache

SUNIT: exchange up/down states too


28088 12-Aug-1997 kjc

Fix a traceroute problem in the CISCO HDLC mode. (cisco routers not
returning ICMP_TIMXCEED)

use CISCO_UNICAST instead of CISCO_MULTICAST to send normal packets.
this is needed for packets to get processed by a cisco router,
but doesn't matter if a packet is just forwarded.

Reviewed by:itojun@itojun.org


28036 10-Aug-1997 joerg

Implement the LCP fail_counter: if an option has been NAK'ed for more
than max_failures attempts, we are going to REJ it, to prevent endless
NAK loops.

(This is actually part of a larger local set of modifications i'm
running with, but the remainder (PAP & CHAP) ain't ready for prime-
time yet.)


27929 06-Aug-1997 itojun

PR: kern/4117
Reviewed by: ishii@csl.sony.co.jp, kjc@csl.sony.co.jp

checked with FreeBSD+Riscom - cisco4500 configuration.


27845 02-Aug-1997 bde

Removed unused #includes.


27743 28-Jul-1997 ache

Use malloc to save space for temp SUNIT variable
Submitted by: bde


27719 27-Jul-1997 ache

Move tmpnc struct out of stack, too large
Suggested by: bde


27711 26-Jul-1997 ache

SUNIT: exchange back whole ifnet structures since they are in the linked
list, not device numbers only


27707 26-Jul-1997 ache

Forget to change units in prev. SUNIT commit. Move variales to local
section for SUNIT.


27706 26-Jul-1997 ache

Exchange whole structures on SUNIT, not unit+flags fields only.
It is needed because if_attach() assumes fixed units order
and pass it to ifconfig


27504 18-Jul-1997 julian

An actual fix for the routing default crashes that
1/ is compatible with the old route(1) in case needed.
2/ actually fixes the problem while vetting bad user input.
note: I have already fixed route(1) so the problem shouldn't occur.
if it does. use 0.0.0.0/0 instead of the word 'default' :)


27476 17-Jul-1997 msmith

Fix Julian's fixed fix. Routing is weird.

We need to accept at least one sockaddr with zero length, in order
to be able to set the default route.

Suggested by: Phone conversation with Julian (sleep well!)


27458 16-Jul-1997 julian

Bungled cut/paste leaves kernel with page faults..
(read all about it!)


27431 15-Jul-1997 julian

Finally track down the reason for some of my occasional kernel crashes.
Route(1) has a bug that sends a bad message to the kernel. The kernel
trusts it and crashes. Add some sanity checks so that
we don't trust the user quite as much any more.
(also add a comment in if_ethersubr.c)


27265 07-Jul-1997 julian

Don't add an item to the multicast linked list if it's already
on the list.


27209 05-Jul-1997 peter

Send these files to the attic until they are in use for several reasons.
1: cvs and cvsup don't really support vendor branches other than 1.1.1.x,
this is on 1.1.2.x and causing problems in cvsup 'checkout mode', just the
same as cvs has problems interpreting dates. (cvs has "1.1.1" hard coded)
2: cvs 'rm'ing them takes them off the vendor branch and should hide the
above problems.
3: it's just clutter until the merge is done.
4: if the problem isn't sufficiently resolved by taking these off the
vendor branch, the files will have to be nuked and re-imported.


27154 01-Jul-1997 peter

Initial revision


26778 22-Jun-1997 brian

Fix this damn mbuf with a negative m_len. It turns
out to be a problem with VJ header compression.
davidg spotted this in usr.sbin/ppp/slcompress.c
a while ago, but I believe gave the wrong reasons -
it's too easy to reproduce ! The only scenario that
I've been able to reproduce the problem under is when
m_len is *exactly* 40 ! So go figure !

PR: 3749
Submitted elsewhere by: davidg
Obtained from: usr.sbin/ppp/slcompress.c


26709 18-Jun-1997 brian

Protect against garbage mbufs in pppstart.
Remove previous hack in pppfcs().

This is still not the correct solution. We shouldn't
have any incorrect mbufs. This patch does however make
pppd/natd work (rather than jamming the interface).


26706 18-Jun-1997 wollman

Add for public examination the beginnings of the per-host cache support
which will for the basis of RTF_PRCLONING's more efficient, better-
designed replacement.


26566 11-Jun-1997 julian

As the Tunnel device has no real inherrent MRU limit,
so don't enforce the MTU as an MRU. Allow bidirectional ppp MTU
negotiation, by checking against a differnt figure for MRU.
Make it large enough for ATM frames at least.

Submitted by: archie@whistle.com (archie cobbs)


26517 09-Jun-1997 brian

Prevent panic with garbage mbuf.

Submitted by: Lenzi, Sergio <lenzi@bsi.com.br>


26373 02-Jun-1997 dfr

Move interrupt handling code from isa.c to a new file. This should make
isa.c (slightly) more portable and will make my life developing the really
portable version much easier.

Reviewed by: peter, fsmp


26314 31-May-1997 peter

Bruce mentioned to me that Paul Traina had noticed that the ppp_tty
interrupt mask hackery wasn't happening when being modloaded via the
if_ppp lkm. It seems that the lkm system doesn't particularly like having
two sets of load/unload/etc routines. :-] This really should be fixed
by having a seperate if_ppp and ppp_tty lkm, but that requires that ppp_tty
is loaded after if_ppp, and needs to be able to link with symbols in
if_ppp. This gets messy, it is a better task for the in-kernel linker.
(if_ppp is generic, ppp_tty is a tty-specific bottom end for if_ppp, it's
not _too_ hard to have another "provider" (such as a hdlc sync card)
connected to if_ppp)


26313 31-May-1997 peter

don't refer to SWI_*_MASK, it's not SMP/UP kernel portable for the lkm.


26077 23-May-1997 joerg

Fix a couple of log()'s that came out with the wrong (default)
log level, as opposed to LOG_DEBUG.


26018 22-May-1997 joerg

Introduce a third queue per interface, serving only PPP control
protocol packets. This queue is the only one being enabled until
network phase has been reached.


25955 20-May-1997 joerg

Major nit: i've confused link0 and link1 in my brain and/or in either
the man page or the source file. Fix this.

Minor problem: don't choke with ENETDOWN early. As long as our output
queue has space, put the IP packets there even if IPCP ain't up yet.
We will eventually be able delivering them once the PPP state machine
came up.


25944 19-May-1997 joerg

Major overhaul of the SyncPPP layer. Basically, this comprises now a
full implementation of the sate machine as described in RFC1661, and
provides support for plugging in various control protocols. I needed
this to provide PPP support for the BISDN project (right now).

Unfortunatley, while the existing API was almost up to the point, i
needed one minor API change in order to decouple the this-layer-
started and this-layer-finished actions from the respective Up and
Down events of the lower layer. This requires two additional lines in
the attach routines of all existing lower layer interface drivers that
are using syncPPP (shortcutting these actions and events). Apart from
this, i believe i didn't change the API of all this, so everything
should plug in without too many hassles. Please report if i broke
something in the existing drivers.

For a list of features (including new ones like dial-on-demand), and
things still to be done, please refer to the man page i'll commit asap.

Encouraged by: Serge Vakulenko <vak@cronyx.ru>


25706 11-May-1997 joerg

Make sppp's logging human-readable. Also, use log(9), as opposed to
printf(9), so the log output doesn't clutter the console.

While i was at it, KNFified some function definitions. This file was
very inconsistent in this respect.


25653 10-May-1997 jhay

Use the MAC address of an interface for the host part of an IPX address
and not the MAC address of the first interface for every IPX address.
This is more inline with the way others like Novell do it.
Originally Submitted by: "Serge A. Babkin" <babkin@hq.icb.chel.su>


25609 09-May-1997 kjc

merge ATM driver


25604 09-May-1997 kjc

This commit was generated by cvs2svn to compensate for changes in r25603,
which included commits to RCS files with non-trunk default branches.


25434 03-May-1997 peter

add SIOC{S,G}IFMEDIA ioctl support


25431 03-May-1997 peter

Make it compile on FreeBSD, add $Id$


25429 03-May-1997 peter

This commit was generated by cvs2svn to compensate for changes in r25428,
which included commits to RCS files with non-trunk default branches.


25201 27-Apr-1997 wollman

The long-awaited mega-massive-network-code- cleanup. Part I.

This commit includes the following changes:
1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility
glue for them is deleted, and the kernel will panic on boot if any are compiled
in.

2) Certain protocol entry points are modified to take a process structure,
so they they can easily tell whether or not it is possible to sleep, and
also to access credentials.

3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt()
call. Protocols should use the process pointer they are now passed.

4) The PF_LOCAL and PF_ROUTE families have been updated to use the new
style, as has the `raw' skeleton family.

5) PF_LOCAL sockets now obey the process's umask when creating a socket
in the filesystem.

As a result, LINT is now broken. I'm hoping that some enterprising hacker
with a bit more time will either make the broken bits work (should be
easy for netipx) or dike them out.


24936 14-Apr-1997 phk

Use LIST macros instead of insque/remque


24208 24-Mar-1997 bde

Don't include <sys/ioctl.h> in the kernel. Stage 6: include
<sys/filio.h>, <sys/sockio.h> and <sys/ttycom.h> instead of
<sys/ioctl.h> in a couple of files. This is still only 1/3
as spammish as <sys/ioctl.h> - 5 or 6 old tty ioctl headers
aren't needed.


24206 24-Mar-1997 bde

Don't include <sys/ioctl.h> in the kernel. Stage 4: include
<sys/ttycom.h> and sometimes <sys/filio.h> instead of <sys/ioctl.h>
in miscellaneous files. Most of these files have nothing to do
with ttys but need to include <sys/ttycom.h> to get the definitions
of TIOC[SG]PGRP which are (ab)used to convert F[SG]ETOWN fcntls into
ioctls.


24204 24-Mar-1997 bde

Don't include <sys/ioctl.h> in the kernel. Stage 2: include
<sys/sockio.h> instead of <sys/ioctl.h> in network files.


24203 24-Mar-1997 bde

Don't include <sys/ioctl.h> in the kernel. Stage 1: don't include
it when it is not used. In most cases, the reasons for including it
went away when the special ioctl headers became self-sufficient.


24131 23-Mar-1997 bde

Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined.
Fixed everything that depended on getting fcntl.h stuff from the wrong
place. Most things don't depend on file.h stuff at all.


23910 15-Mar-1997 joerg

Fix from Matt for the problem described in PR # kern/2990: ``DEC FDDI
is a little *too* promiscuous''

Also a 2.2 candidate, again, after testing.

Submitted by: Matt Thomas <matt@lkg.dec.com>


23738 11-Mar-1997 bde

Fixed clist limits. I got them wrong several years ago in rev.1.9
(1994/11/26). Packets with more than approximately 128 0xc0's or
0xdb's in them were untransmittable.


23392 05-Mar-1997 julian

add a bunch of comments to describe what's going on.
This is some of the worst code I've had to wade through in
ages and I don't want to have to start from scratch again next time.

(I have a 2.2 version of these comments, can I commit them?)


22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


22718 14-Feb-1997 wollman

Send RTM_IFINFO messages whenever promiscuous and all-multicast
modes are enabled or disabled.


22614 12-Feb-1997 wollman

Implement PRC_IFUP a la PRC_IFDOWN so that protocols know when an interface
has come bacl up (and can referse actions taken as a result of downing).


22250 04-Feb-1997 fenner

Make sure we have arguments to pass before calling ifaof_ifpforaddr
and ifa_ifwithroute.

This eliminates the panic seen in kern/2647, although it doesn't
address the fact that RTM_CHANGE can't change flags.


22137 30-Jan-1997 joerg

Fix yet another breakage i've missed when committing rev 1.14. It was
non-obvious to me since my test kernel didn't run NETATALK. Sorry.

LINT should compile again now.


22010 26-Jan-1997 julian

fix mixleading comment (my error.. I wrote the comment)


21831 17-Jan-1997 joerg

Round #2. This basically brings back the changes from rev 1.12.

I have only separated both to make it more convenient merging all this
into 2.2.


21830 17-Jan-1997 joerg

This mega-merge brings Matt Thomas' 960801 FDDI driver (almost) up
to -current.

Thanks goes to Ulrike Nitzsche <ulrike@ifw-dresden.de> for giving me
a chance to test this. Only the PCI driver is tested though.

One final patch will follow in a separate commit. This is so that
everything up to here can be dragged into 2.2, if we decide so.

Reviewed by: joerg
Submitted by: Matt Thomas <matt@3am-software.com>


21818 17-Jan-1997 wollman

Don't try to do anything with the `ifr' parameter for SIOCADDMULTI
and SIOCDELMULTI; it is guaranteed to be null in the new system.


21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


21666 13-Jan-1997 wollman

Use the new if_multiaddrs list for multicast addresses rather than the
previous hackery involving struct in_ifaddr and arpcom. Get rid of the
abominable multi_kludge. Update all network interfaces to use the
new machanism. Distressingly few Ethernet drivers program the multicast
filter properly (assuming the hardware has one, which it usually does).


21437 08-Jan-1997 wollman

Fix typo. I hate waking up at 4:45 in the morning...


21436 08-Jan-1997 wollman

Correctly account for header length in m_pkthdr.len when sending
packets through BPF.

Submitted by: seki@sysrap.cs.fujitsu.co.jp in PR#2415


21434 08-Jan-1997 wollman

Fix a few oversights in the new multicast membership interface.


21404 07-Jan-1997 wollman

Checkpoint the beginnings of the new kernel interface for
multicast group memberships. This is not actually operative
at the moment (a lot of other code still needs to be changed), but
this seemed like a useful reference point to check in so that
others (i.e. Bill Fenner) have fair warning of where we are going.


21260 03-Jan-1997 wollman

Move the ethertypes from <netinet/if_ether.h> to <net/ethernet.h>.
Many programs need the numbers but don't need the internals of ARP.

More commits to follow...


21259 03-Jan-1997 wollman

Separate kernel-internal data structures from exposed user interface
to interfaces. (Amazing nobody had done this!)

More commits to fix up user-land to follow.


20686 19-Dec-1996 bde

More cleanups to satisfy the following rules:
- C++ should be supported for application functions (use __BEGIN_DECLS,
etc.).
- prototypes should be sorted.
- comments on #endif's should spell identifiers the same as the code.
- comments on #endif's should have the same sense as the code (use `!'
to match ifndef, etc.).


20681 19-Dec-1996 wollman

Clean up Bill's additions.


20661 18-Dec-1996 wpaul

Add prototypes for ethers.3 functions as per wollman:

> wollman 96/12/10 09:19:15
>
> Modified: lib/libc/net ether_addr.c ethers.3
> Log:
> Get struct ether_addr directly from <net/ethernet.h> rather than pulling
> in lots of unrelated junk from <net/if.h> and <net/if_ether.h>. These
> functions still aren't prototyped anywhere (but should be in
> <net/ethernet.h>---got that, Bill?).

(Note that this file has no copyright header; one should probably
be added.)


20653 18-Dec-1996 bde

Fixed pedantic syntax error.


20559 16-Dec-1996 fenner

Change default tun MTU back to 1500.
Use the interface MTU instead of the constant when deciding what
packets to accept.
Allow using the SIOCSIFMTU ioctl (e.g. "ifconfig tun0 mtu XXX") to
set the MTU.


20407 13-Dec-1996 wollman

Convert the interface address and IP interface address structures
to TAILQs. Fix places which referenced these for no good reason
that I can see (the references remain, but were fixed to compile
again; they are still questionable).


20337 11-Dec-1996 wollman

Use queue macros for the list of interfaces. Next stop: ifaddrs!


20330 11-Dec-1996 wollman

Include <net/if_arp.h> in the one header that requires it,
<netinet/if_ether.h>, rather than in <net/if.h>, most of whose callers
have no need of it.

Pointed-out-by: bde


20292 10-Dec-1996 wollman

Finally, after six years, remove the ``quick hack for SNMP'' that was
``going away soon''.


20276 10-Dec-1996 dg

1) Implement SIOCSIFMTU in ether_ioctl(), and change ether_ioctl's return
type to be int so that errors can be returned.
2) Use the new SIOCSIFMTU ether_ioctl support in the few drivers that are
using ether_ioctl().
3) In if_fxp.c: treat if_bpf as a token, not as a pointer. Don't bother
testing for FXP_NTXSEG being reached in fxp_start()...just check for
non-NULL 'm'. Change fxp_ioctl() to use ether_ioctl().


20098 02-Dec-1996 julian

2 small changes:
1/ increase the tun MTU from 1500 to 1600 to allow it to be used with
packets formatted according to RFC1490 and RFC1717

2/ allow the tsleep() when reading, to be interruptable by signals
so that one can now do:
od -xc </dev/tun0
to dump packets for debugging without getting hung.

Passed on by: Archie@whistle.com (archie Cobbs)

Nice but not neccessary in 2.2


19846 18-Nov-1996 dg

Fixed broken SIOCGIFADDR. It was copying out garbage as the ethernet
address.


19665 12-Nov-1996 dg

Killed "unknown protocol" printf.


19079 21-Oct-1996 fenner

Fix comments, which appear to have been mangled long ago and far away.


19016 18-Oct-1996 jkh

ns_nettype should be declared, not externed.


18991 17-Oct-1996 jkh

Netcon's changes for their extended NS support. This only effects
people compiling with NS, so the effects on everyone else are nil.


18892 12-Oct-1996 bde

Removed nested include if <sys/socket.h> from <net/if.h> and
<net/if_arp.h> and fixed the things that depended on it. The nested
include just allowed unportable programs to compile and made my
simple #include checking program report that networking code doesn't
need to include <sys/socket.h>.


18872 11-Oct-1996 wollman

Add primitive link MIB support.


18839 09-Oct-1996 wollman

Get rid of obsolete RTF_MASK and RTF_CHAINDELETE flags.


18796 07-Oct-1996 wollman

Remove some historical cruft inherited from the loopback driver in which
there were three possible different code paths through which we could
discard a packet (which, after all, is the entire function of this interface).


18206 10-Sep-1996 julian

No code changes what so ever, but added about 150 lines of comments
Sorry if this makes it harder to merge in lite2 stuff but hey..
At least I can figure out what is going on whenever I end up going through those
files again..

do we have a policy regarding commenting existing code?


18010 03-Sep-1996 asami

Second phase of merge, get rid of more machine-independent-dependencies.
Get rid of pc98/pc98/pc98_device.h.

Submitted by: The FreeBSD(98) Development Team


17997 02-Sep-1996 fenner

Bugfix and simplification for rev 1.34: make sure that the route
is non-null before trying to delete it in rt_setgate(), which then
allows removal of the special-case code from the RTM_ADD case.
This should fix the panics that joerg and Phil Karn have been seeing.


17949 30-Aug-1996 jhay

Get rid of the ifdef MULTICAST's. I think the rest of the kernel got rid
of them 2 years ago.


17947 30-Aug-1996 asami

Re-sync with the state of PC98 world. This will be the last commit before
we start merging things in earnest...

Submitted by: The FreeBSD(98) Development Team


17871 28-Aug-1996 wollman

Add some padding to struct ifmibdata and move the `struct ifdata' to
the end of that sstructure to make evolution easier.

Add definitions for the 802.3/Ethernet MIB. To implement this, simply
add a `struct ifmib_iso_8802_3' somewhere in your interface's softc,
point if_linkmib to it, set if_linkmiblen, and fill in the statistics
with appropriate values. (I didn't want to create Yet Another Ethernet-
related header file, otherwise this would have been separated out.)


17837 26-Aug-1996 julian

correct a field comment that someone must have accidentally spammed
as it's still used for what the original BSD4.4 comment says it's for.


17835 26-Aug-1996 julian

change a comment to match what the BSD4.4 book says.


17802 24-Aug-1996 peter

route.c:RTM_ADD does not check for a netmask before doing a tree walk
like it does elsewhere. This is probably only happens when incorrect
args are given to route(8), or when running with non-IPv4 stacks but
incorrect args to the route command is no excuse for panicing!

Submitted by: Michael Clay <mclay@weareb.org>, PR#1532


17679 19-Aug-1996 pst

Update to match definitions in LBL June 96 release


17487 09-Aug-1996 julian

Submitted by: archie@whistle.com
allow a tunnel interface to be openned even if it has no remote address yet.
this may be needed if you have used
route add default -interface tun0
where the remote end might not even HAVE a number (e.g. netcom links)


17462 07-Aug-1996 julian

Submitted by: archie@whistle.com
This is a patch to sys/net/if.c. What it does is patch the algorithm
for finding an IP address on an interface which most closely matches
a given IP address. The problem with it is when no address matches,
and you have to just pick one at random. Then the code ends up picking
the last IP address in the list. This patch changes things so it
picks up the first address instead.
Usually the first address is more useful as the later ones are aliases.


17455 06-Aug-1996 phk

Megacommit to straigthen out ETHER_ mess.

I'm pretty convinced after looking at this that the majority of our
drivers are confused about the in/exclusion of ETHER_CRC_LEN :-(


17444 05-Aug-1996 phk

use <net/ethernet.h>


17443 05-Aug-1996 phk

This file is the (intended) definitive source of the ETHER_ macros.


17418 04-Aug-1996 phk

Make the NS and IPX cases compile again.


17415 04-Aug-1996 phk

Add a callback pointer to the interfaces "init" routine.
Add ether_ioctl() which can take care of the SIOC[SG]IFADDR cases for
most (ethernet) drivers.


17352 30-Jul-1996 wollman

Add better support for retrieving management information from network
interfaces. This creates two new tables in the net.link.generic branch
of the MIB; one contains (essentially) `ifdata' structures, and the other
contains a blob provided by the interface (and presumably used to
implement link-layer-specific MIB variables). A number of things
have been moved around in the `ifnet' and `ifdata' structures, so
NEW VERSIONS OF ifconfig(8) AND routed(8) ARE REQUIRED. (A simple
recompile is all that's necessary.)

I have a sample program which uses this interface for those interested
in making use of it.


17270 24-Jul-1996 wollman

Fix a bug in ifa_ifwithnet() which caused a page fault in bcmp()
when attepmting to add certain types of routes. This problem
only manifested itself in the presence of unconfigured point-to-point
interfaces.

Noticed by: Chuck Cranor <chuck@maria.wustl.edu>


17258 23-Jul-1996 wollman

Fix a spelling error I forgot to bring over from my personal version
of the last (IF_ENQ_DROP) commit.


17247 22-Jul-1996 wollman

Add a new, better mechanism for sticking packets onto ifqueues.
The old system had the misfeature that the only policy it could implement
was tail-drop; the new IF_ENQ_DROP macro/function makes it possible
to implement more sophisticated queueing policies on a system-wide
basis. No code actually uses this yet (although on my machine
I have converted the ethernet and (polled) loopback to use it).


17241 21-Jul-1996 peter

Don't dereference sc->sc_setmtu if it's NULL (such as when it's not running)
as discussed on current. (bde pointed out the cause of the problem)

Reported by: dev@fgate.flevel.co.uk


17096 11-Jul-1996 wollman

Modify the kernel to use the new pr_usrreqs interface rather than the old
pr_usrreq mechanism which was poorly designed and error-prone. This
commit renames pr_usrreq to pr_ousrreq so that old code which depended on it
would break in an obvious manner. This commit also implements the new
interface for TCP, although the old function is left as an example
(#ifdef'ed out). This commit ALSO fixes a longstanding bug in the
TCP timer processing (introduced by davidg on 1995/04/12) which caused
timer processing on a TCB to always stop after a single timer had
expired (because it misinterpreted the return value from tcp_usrreq()
to indicate that the TCB had been deleted). Finally, some code
related to polling has been deleted from if.c because it is not
relevant t -current and doesn't look at all like my current code.


17052 10-Jul-1996 fenner

Disallow host routes that point to themselves. These routes serve no
purpose, other than to get in the way of the ARP table and cause
"can't allocate llinfo" errors.

This change may cause gated or routed to start complaining when adding
such routes. If so, these programs will need to be fixed to not try
to add these routes.

Reviewed by: wollman


16674 24-Jun-1996 gpalmer

Remove another extraneous setting of if_lastchange


16604 23-Jun-1996 gpalmer

Remove an un-necessary call to microtime() to set if_lastchange
as it is set in the call to if_down in the line above


16512 19-Jun-1996 wollman

Set IFF_RUNNING on the loopback interface.


16498 19-Jun-1996 julian

Submitted by: archie@whistle.com

gary went a little overboard on commenting out unused variables.
Variables needed for ISO, LLC and NETATALK
were only enabled for ISO & LLC.. so NETATALK bombed.


16363 14-Jun-1996 asami

The Great PC98 Merge.

All new code is "#ifdef PC98"ed so this should make no difference to
PC/AT (and its clones) users.

Ok'd by: core
Submitted by: FreeBSD(98) development team


16341 13-Jun-1996 dg

Keep ether_type in network order for BPF to be consistent with other
systems.

Submitted by: Ted Lemon, Matt Thomas, and others. Retrofitted for
-current by me.


16334 12-Jun-1996 nate

Only print out the new masks if bootverbose is set.


16332 12-Jun-1996 gpalmer

Since the updates to ifnet.if_lastchange are so rare (relatively
speaking), go for the extra accuracy and call microtime() to get
the current time.

Pointed Out By: bde


16322 12-Jun-1996 gpalmer

Clean up -Wunused warnings.

Reviewed by: bde


16288 10-Jun-1996 gpalmer

Add $Id$


16287 10-Jun-1996 gpalmer

Change the use if ifnet.if_lastchange to be more in line with
SNMP requirements. Update description of ifnet.if_lastchange in if.h
to indicate this.


16258 09-Jun-1996 phk

Also count bytes in if_tun. kern/1253
Reviewed by: phk
Submitted by: John Capo <jc@irbs.com>


16206 08-Jun-1996 bde

Changed some memcpy()'s back to bcopy()'s.

gcc only inlines memcpy()'s whose count is constant and didn't inline
these. I want memcpy() in the kernel go away so that it's obvious that
it doesn't need to be optimized. Now it is only used for one struct
copy in si.c.


16194 08-Jun-1996 dg

Fix bug in bpf_ifname() where the unit didn't get added correctly to the
name string. This function should be rewritten to deal with more than
10 units of a given type.

Pointed out by: jmf@free-gate.com (Jean-Marc Frailong)
(I fixed it slightly differently)


16142 05-Jun-1996 wollman

Don't allow trailing garbage after the unit number in ifunit().


16063 01-Jun-1996 gpalmer

Set ifnet.baudrate for ethernet / FDDI interfaces too. Makes
SNMP slightly more informative

Reviewed by: Garrett Wollman


15915 26-May-1996 scrappy

added missing semicolon

Submitted by: Jeffrey Hsu <hsu@freefall.freebsd.org>


15906 26-May-1996 phk

If tunnel is busy we return EBUSY, not ENXIO.


15885 24-May-1996 julian

Obtained from: netatalk distribution netatalk@itd.umich.edu

Kernel Appletalk protocol support
both CAP and netatalk can make use of this..
still needs some owrk but it seemd the right tiime to commit it
so other can experiment.


15769 13-May-1996 ache

When two units swapped, copy SC_STATIC flag too, not clear it


15768 13-May-1996 ache

Prevent mixing of static and dynamic unit allocation strategies


15680 08-May-1996 gpalmer

Clean up various compiler warnings. Most (if not all) were benign

Reviewed by: bde


15652 06-May-1996 wollman

Add three new route flags to help determine what sort of address
the destination represents. For IP:

- Iff it is a host route, RTF_LOCAL and RTF_BROADCAST indicate local
(belongs to this host) and broadcast addresses, respectively.

- For all routes, RTF_MULTICAST is set if the destination is multicast.

The RTF_BROADCAST flag is used by ip_output() to eliminate a call to
in_broadcast() in a common case; this gives about 1% in our packet-generation
experiments. All three flags might be used (although they aren't now)
to determine whether a packet can be forwarded; a given host route can
represent a forwardable address if:

(rt->rt_flags & (RTF_HOST | RTF_LOCAL | RTF_BROADCAST | RTF_MULTICAST))
== RTF_HOST

Obviously, one still has to do all the work if a host route is not present,
but this code allows one to cache the results of such a lookup if rtalloc1()
is called without masking RTF_PRCLONING.


15377 25-Apr-1996 dg

Regardless of whether or not the check for IPv4 is useful, we certainly
don't need to assign the "ip" pointer twice.


15370 24-Apr-1996 phk

Reject all IP versions but 4.


15238 13-Apr-1996 bde

Eliminated sloppy common-style declarations. Now there are no duplicated
common labels for LINT. There are still some common declarations for the
!KERNEL case in tcp_debug.h and spx_debug.h. trpt depends on the ones in
tcp_debug.h.


15185 11-Apr-1996 dg

When cslip gets an uncompressed packet, it attempts to save off the TCP/IP
header for use in decompressing subsequant packets. If cslip gets garbage
(such as what happens when there is a port speed mismatch or modem line
noise), it will occasionally mistake the packet as a valid uncompressed
packet. When it tries to save off the header, it doesn't bother to check
for the validity of the header length and will happily clobber not only
the cslip data structure, but parts of other kernel memory that happens
to follow it...causing, ahem, undesired behavior.


15117 07-Apr-1996 bde

Removed never-used #includes of <machine/cpu.h>. Many were apparently
copied from bad examples.


15116 07-Apr-1996 bde

Removed now-unused #includes of <machine/cpu.h>. They were for bootverbose
being declared in the wrong place.


14904 29-Mar-1996 fenner

Eliminate panic("rtfree") caused by double-freeing the route
when rt == rt->rt_gwroute . rt == rt->gwroute shouldn't happen
in the first place, but that's another problem.

(try "route add -host <hostonmynet> <hostonmynet>; ping <hostonmynet>;
route delete <hostonmynet>")


14877 28-Mar-1996 scrappy

Using devfs_add_devswf() instead of devfs_add_devsw()

Reviewed by: julian@freebsd.org


14852 27-Mar-1996 bde

Fixed ownerships of callout devices.


14546 11-Mar-1996 dg

Move or add #include <queue.h> in preparation for upcoming struct socket
changes.


14421 08-Mar-1996 ache

Make user-level PPP on-demand with dynamic IP actually work.
Story so fr:
1) PPP on-demand with static IP works.
2) PPP on-demand with dynamic IP says "Host is down" on any IP request
The problem is that tun driver check its READY state by *first* ifconfig address.
i.e.:
set ifaddr <addr> <addr2>
works (static IP) and
set ifaddr 0 <addr2>
not works (dynamic IP) because first address is equal 0.
Since tun is always POINTOPOINT interface, dst address is more meaningfull.
I change checking to second (dst) address in READY test.
PPP on-demand finally works.


14328 02-Mar-1996 peter

Add more options into the conf/options and i386/conf/options.i386 files
and the #include hooks so that 'make depend' is more useful. This
covers most of the options I regularly use (but not all) and some other
easy ones.


13993 09-Feb-1996 phk

Make tundebug sysctl writable.


13981 08-Feb-1996 wollman

If a slow input queue was defined by the driver, initialize it.


13937 06-Feb-1996 wollman

Clean up Ethernet drivers:
- fill in and use ifp->if_softc
- use if_bpf rather than private cookie variables
- change bpf interface to take advantage of this
- call ether_ifattach() directly from Ethernet drivers
- delete kludge in if_attach() that did this indirectly


13928 05-Feb-1996 wollman

Make me feel a little better by filling in reasonable values for rmx_sendpipe
and rmx_recvpipe. This has no demonstrable effect on performance.
(ttcp reports about 44 Mbit/s for all the buffer sizes I tried between
16384 and 65536.)


13765 30-Jan-1996 mpp

Fix a bunch of spelling errors in the comment fields of
a bunch of system include files.


13641 26-Jan-1996 wollman

Delete the if_private[] array in struct ifnet; this turned out to be
of limited utility. In their place, add bunch of pointers
which will eventually be needed by the polled-interrupt scheme we're working
here. (It will probably be a while before the code is written and
committed here.) At the same time, a `void *if_softc' field
was added to the beginning of the structure to make certain driver
writers happier.

The practical upshot of all this is that you need to
recompile utilities such as netstat which manipulate struct ifnet.


13638 26-Jan-1996 phk

The last part of the ether_sprint -> %6D change.
Sorry for the delay.
(%D is for hexdumping.)


13619 24-Jan-1996 phk

Use new printf features rather than local kludges.


13616 24-Jan-1996 wollman

Fix memory leak in case of adding a host route on top of another one.

Pointed-out-by: Bill Fenner <fenner@parc.xerox.com>


12942 20-Dec-1995 wollman

in_proto.c: spell ``Internet'' right and put whitespace after commas.

others: start to populate the link-layer branch of the net mib, by
moving ARP to its proper place. (ARP is not a protocol family, it's an
interface layer between a medium-access layer and a protocol family.)
sysctl(8) needs to be taught about the structure of this branch, unless
Poul-Henning implements dynamic MIB exploration soon.


12881 16-Dec-1995 bde

Uniformized pr_ctlinput protosw functions. The third arg is now `void
*' instead of caddr_t and it isn't optional (it never was). Most of the
netipx (and netns) pr_ctlinput functions abuse the second arg instead of
using the third arg but fixing this is beyond the scope of this round
of changes.


12820 14-Dec-1995 phk

Another mega commit to staticize things.


12773 11-Dec-1995 peter

Make FIONREAD return the actual that a read() would return, not just the
amount of data in the first mbuf.

Obtained from: Bob Smart <smart@mel.dit.csiro.au> (for NetBSD & SunOS)


12708 10-Dec-1995 bde

Restored used variable `name[32]' (used by DEVFS).

Removed an `#ifdef __FreeBSD__'. Hundreds, if not thousands of other
FreeBSD-specific things aren't ifdefed.


12706 09-Dec-1995 phk

Staticize, clean lint.


12678 08-Dec-1995 phk

Julian forgot to make the *devsw structures static.


12675 08-Dec-1995 julian

Pass 3 of the great devsw changes
most devsw referenced functions are now static, as they are
in the same file as their devsw structure. I've also added DEVFS
support for nearly every device in the system, however
many of the devices have 'incorrect' names under DEVFS
because I couldn't quickly work out the correct naming conventions.
(but devfs won't be coming on line for a month or so anyhow so that doesn't
matter)

If you "OWN" a device which would normally have an entry in /dev
then search for the devfs_add_devsw() entries and munge to make them right..
check out similar devices to see what I might have done in them in you
can't see what's going on..
for a laugh compare conf.c conf.h defore and after... :)
I have not doen DEVFS entries for any DISKSLICE devices yet as that will be
a much more complicated job.. (pass 5 :)

pass 4 will be to make the devsw tables of type (cdevsw * )
rather than (cdevsw)
seems to work here..
complaints to the usual places.. :)


12659 06-Dec-1995 bde

Replaced #includes of <sys/user.h> by less gross headers, usually
<sys/vm.h>. Many device drivers need only the definition of vtophys()
from vm.

Added nearby #includes of <sys/conf.h> where appropriate.


12628 05-Dec-1995 dg

all:
Removed ifnet.if_init and ifnet.if_reset as they are generally unused.
Change the parameter passed to if_watchdog to be a ifnet * rather than
a unit number. All of this is an attempt to move toward not needing an
array of softc pointers (which is usually static in size) to point to
the driver softc.

if_ed.c:
Changed some of the argument passing to some functions to make a little
more sense.

if_ep.c, if_vx.c:
Killed completely bogus use of if_timer. It was being set in such a way
that the interface was being reset once per second (blech!).


12611 03-Dec-1995 bde

Added a prototype.

Declared dsioctl() as static consistently. Note that both if_disc.c
and subr_diskslice.c use the same prefix `ds' and there is a name
conflict for dsioctl().


12592 03-Dec-1995 bde

Moved inline functions for insque() and remque() to <sys/queue.h>.
Protected them with `#ifdef KERNEL' so that <sys/queue.h> is valid C++.
Added the necessary #includes of <sys/queue.h>.

These functions are bogus and should be replaced by the queue macros.


12579 02-Dec-1995 bde

Completed function declarations and/or added prototypes.


12578 02-Dec-1995 bde

Fixed call to mrt_ioctl(). mrt_ioctl() for some reason has different
number of args when MROUTING is defined.


12569 02-Dec-1995 bde

Finished (?) cleaning up sysinit stuff.


12521 29-Nov-1995 julian

If you're going to mechanically replicate something in 50 files
it's best to not have a (compiles cleanly) typo in it! (sigh)


12517 29-Nov-1995 julian

OK, that's it..
That's EVERY SINGLE driver that has an entry in conf.c..
my next trick will be to define cdevsw[] and bdevsw[]
as empty arrays and remove all those DAMNED defines as well..

Each of these drivers has a SYSINIT linker set entry
that comes in very early.. and asks teh driver to add it's own
entry to the two devsw[] tables.

some slight reworking of the commits from yesterday (added the SYSINIT
stuff and some usually wrong but token DEVFS entries to all these
devices.

BTW does anyone know where the 'ata' entries in conf.c actually reside?
seems we don't actually have a 'ataopen() etc...

If you want to add a new device in conf.c
please make sure I know
so I can keep it up to date too..

as before, this is all dependent on #if defined(JREMOD)
(and #ifdef DEVFS in parts)


12495 28-Nov-1995 peter

Implement some rudimentry IPX support...


12436 21-Nov-1995 peter

If a lcp configure request is received in the lcp opened state and it
is acknowledged, it should go to the lcp ack sent state.

Don't reply on lcp echo requests when not in the lcp opened state.

If the interface is set to CISCO mode, it should still be marked
running when ifconfiged.

Fixed a few indentations that had gone wrong somewhere.

Submitted-by: John.Hay@csir.co.za


12427 20-Nov-1995 phk

Fix #includes.


12375 18-Nov-1995 bde

Fixed a comment.


12374 18-Nov-1995 bde

Added bogus casts to avoid warnings.

Continued cleaning up sysinit stuff.


12340 16-Nov-1995 phk

All net.* sysctl converted now.


12271 13-Nov-1995 peter

Enhance the likelyhood that IPX over ppp will actually work.. :-)
Note that pppd doesn't have an ipxcp.c module for negotiating and confuguring
IPX at startup, but after these mods, you can manually ifconfig IPX addresses
on the interface and it will probably work.. :-)


12118 06-Nov-1995 bde

Replaced bogus macros for dummy devswitch entries by functions.
These functions went away:

enosys (hasn't been used for some time)
enxio
enodev
enoioctl (was used only once, actually for a vop)

if_tun.c:
Continued cleaning up...

conf.h:
Probably fixed the type of d_reset_t. It is hard to tell the correct
type because there are no non-dummy device reset functions.

Removed last vestige of ambiguous sleep message strings.


12109 05-Nov-1995 bde

Replaced bogus macros for entry points to unconfigured line disciplines
by functions.

tty_conf.c:
Cleaned up formatting of tables.

Removed another ARGSUSED for consistency.

conf.h:
Introduced typedefs for line discipline functions.

Backed out most of previous revision (it is done elsewhere).


12071 04-Nov-1995 bde

Moved prototypes for devswitch functions from conf.c and driver sources
to <machine/conf.h>. conf.h was mechanically generated by
`grep ^d_ conf.c >conf.h'. This accounts for part of its ugliness. The
prototypes should be moved back to the driver sources when the functions
are staticalized.


12020 03-Nov-1995 peter

Fix the incomplete merge for the IPX code - the internals are different.
Note, the IPX in pppd support is not really there. I suspect that
it may work if you ifconfig it up manually.


11994 01-Nov-1995 peter

Re-Zap unused variables in their new location.. :-)


11975 31-Oct-1995 peter

Remove the old pppcompress files (btw: these look net-2 derived)
commit merge for bsd_comp.c - I missed this with a *ppp* wildcard.


11974 31-Oct-1995 peter

Drat.. Missed this one, which #includes ppp-comp.h, not ppp_comp.h


11970 31-Oct-1995 peter

Merge/update ppp-2.2 kernel parts onto mainline.

Note that the old if_ppp.c has been split in half into if_ppp.c and
ppp_tty.c


11967 31-Oct-1995 peter

Initial revision


11964 31-Oct-1995 peter

slcompress: split one of the functions into two parts, to allow use by both
if_sl and if_ppp (from ppp-2.2), eliminating the nearly identical
pppcompress.[ch] code. Add maximum VJ compression states argument to
sl_compress_init().
if_sl: call sl_compress_init() with the extra argument.


11963 31-Oct-1995 peter

Add a simplistic netisr register routine - I need this now for ppp-2.2.


11921 29-Oct-1995 phk

Second batch of cleanup changes.
This time mostly making a lot of things static and some unused
variables here and there.


11819 26-Oct-1995 julian

Reviewed by: julian and jhay@mikom.csir.co.za
Submitted by: Mike Mitchell, supervisor@alb.asctmd.com

This is a bulk mport of Mike's IPX/SPX protocol stacks and all the
related gunf that goes with it..
it is not guaranteed to work 100% correctly at this time
but as we had several people trying to work on it
I figured it would be better to get it checked in so
they could all get teh same thing to work on..

Mikes been using it for a year or so
but on 2.0

more changes and stuff will be merged in from other developers now that this is in.

Mike Mitchell, Network Engineer
AMTECH Systems Corporation, Technology and Manufacturing
8600 Jefferson Street, Albuquerque, New Mexico 87113 (505) 856-8000
supervisor@alb.asctmd.com


11541 16-Oct-1995 wollman

Change signature of rt->rt_output() so that it is compatible with
ifp->if_output() functions. This way, initial implementations of
rt_output functionality can just lazily use if_output until customized
versions are written.


11539 16-Oct-1995 wollman

When adding a route fails because there is already a route with the same
(mask,value) in the tree, don't immediately return EEXIST. Instead, check
to see if the pre-existing route was generated by protcol-cloning. If so,
then it is OK to simply blow away the old route and re-attempt the insertion.
If not, then fall back to the same error code as before.


11460 13-Oct-1995 wollman

Say goodbye to IFF_NOTRAILERS. Support for trailers was officially
dropped for 4.4, but for some reason this flag lived on. (Until
today, that is.)


11459 13-Oct-1995 wollman

Protect against routing socket messages with way-too-big address families.

Submitted by: Keith Sklower by way of Paul Traina


11341 09-Oct-1995 bde

Fix types of sysctl functions. Add prototypes. Cosmetic.


11193 05-Oct-1995 bde

Don't wait for output to drain in pppclose(). Discard output immediately
for the same reasons as in slclose().

Free the cblock in the canonical queue in pppclose(). This is a no-op in
the usual cases where the tty is being closed or the line discipline is
being switched back to the standard discipline, but it saves a cblock if
the line discipline is being switched to one that doesn't use the canonical
queue.

Add prototypes. I use `extern' in prototypes for functions with bogus
linkage. This should be fixed someday.

Continue cleaning up new init stuff yet again.


11189 04-Oct-1995 jkh

This upgrades the driver for Cronyx-Sigma multiplexor boards
from version 1.2 to version 1.9.
Submitted by: Serge Vakulenko, <vak@cronyx.ru>


11029 27-Sep-1995 wollman

Add newline at end of log message and reduce log level to INFO from NOTICE.


11004 25-Sep-1995 wollman

Add BPF and IP multicast capabilities to the `tun' and `lp' network
interfaces.

Submitted by: Bill Fenner <fenner@parc.xerox.com>


10957 22-Sep-1995 wollman

Fix BPf to generate a header mbuf for writes.
Fix loopback and discard interfaces to understand BPF writes.
(These two from Bill Fenner to fix PR 512.)

Move ifpromisc() from bpf.c to if.c as suggested by comment in BPF.
Send a notice to the log when promiscuous mode is enabled.


10929 20-Sep-1995 wollman

Only print `bpf: foo0 attached' if bootverbose.


10861 17-Sep-1995 ache

Clear SC_OUTWAIT after checking of free clists, not before


10659 10-Sep-1995 bde

Call output process in slstart() whether or not there is any output. As
in pppstart(), the output process may be overloaded to handle hardware
flow control and hardware output completions.

Don't wait for output to drain in slclose(). Discard output immediately.
New output is not prevented while processes are waiting for output to
drain (this is a bug), so the wait was sometimes forever. Infinite
waits are also possible when CCTS_OFLOW is enabled and CTS is down.
Infinite waits were also caused by the above bug in slstart().

Start changing new init stuff yet again: rename unused arg `dummy'.

Reviewed by: davidg


10653 09-Sep-1995 dg

Fixed init functions argument type - caddr_t -> void *. Fixed a couple of
compiler warnings.


10624 08-Sep-1995 bde

Fix benign type mismatches in devsw functions. 82 out of 299 devsw
functions were wrong.


10496 31-Aug-1995 wollman

Add a few hooks (in the form of an array of four void *'s) to allow
various bits of software to save some data in the ifnet structure without
having to constantly change the declaration thereof.


10429 30-Aug-1995 bde

Fix several sysinit functions that had the wrong type and unnecessarily
external linkage.

Remove useless comments saying that SYSINIT() does system initialization.


10358 28-Aug-1995 julian

Reviewed by: julian with quick glances by bruce and others
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.


10080 16-Aug-1995 bde

Make everything except the unsupported network sources compile cleanly
with -Wnested-externs.


9830 31-Jul-1995 bde

Obtained from: an ancient patch of mine via 1.1.5

Call output process whether or not there is any output. The output
process may be overloaded to handle hardware flow control and
hardware output completions.


9824 31-Jul-1995 bde

Obtained from: partly from ancient patches of mine via 1.1.5

Introduce TS_CONNECTED and TS_ZOMBIE states. TS_CONNECTED is set
while a connection is established. It is set while (TS_CARR_ON or
CLOCAL is set) and TS_ZOMBIE is clear. TS_ZOMBIE is set for on to
off transitions of TS_CARR_ON that occur when CLOCAL is clear and
is cleared for off to on transitions of CLOCAL. I/o can only occur
while TS_CONNECTED is set. TS_ZOMBIE prevents further i/o.

Split the input-event sleep address TSA_CARR_ON(tp) into TSA_CARR_ON(tp)
and TSA_HUP_OR_INPUT(tp). The former address is now used only for
off to on carrier transitions and equivalent CLOCAL transitions.
The latter is used for all input events, all carrier transitions
and certain CLOCAL transitions. There are some harmless extra
wakeups for rare connection- related events. Previously there were
too many extra wakeups for non-rare input events.

Drivers now call l_modem() instead of setting TS_CARR_ON directly
to handle even the initial off to on transition of carrier. They
should always have done this. l_modem() now handles TS_CONNECTED
and TS_ZOMBIE as well as TS_CARR_ON.

gnu/isdn/iitty.c:
Set TS_CONNECTED for first open ourself to go with bogusly setting
CLOCAL.

i386/isa/syscons.c, i386/isa/pcvt/pcvt_drv.c:
We fake carrier, so don't also fake CLOCAL.

kern/tty.c:
Testing TS_CONNECTED instead of TS_CARR_ON fixes TIOCCONS forgetting to
test CLOCAL. TS_ISOPEN was tested instead, but that broke when we disabled
the clearing of TS_ISOPEN for certain transitions of CLOCAL.

Testing TS_CONNECTED fixes ttyselect() returning false success for output
to devices in state !TS_CARR_ON && !CLOCAL.

Optimize the other selwakeup() call (this is not related to the other
changes).

kern/tty_pty.c:
ptcopen() can be declared in traditional C now that dev_t isn't short.


9819 31-Jul-1995 peter

Fix panic("ifpromisc failed") when shutting down a bpf tap when the attached
interface is no longer IFF_UP.
The test for IFF_UP in ifpromisc is only useful while enabling IFF_PROMISC
and the higher levels of the bpf code do not allow for the possibility of
failure while shutting down. This is a trivial change.
Also, fixes PR#522.


9763 29-Jul-1995 bde

Obtained from: partly from ancient patches by ache and me via 1.1.5

Remove nullmodem().

It may be useful to have a null modem routine, but nullmodem()
wasn't one. nullmodem() was identical to ttymodem() except it
didn't implement MDMBUF (carrier) flow control, didn't do any
wakeups for off to on carrier transitions, and didn't flush the
i/o queues for on to off carrier transitions (flushing has the side
effect of waking up readers and writers) although it did generate
SIGHUPs. The wakeups must normally be done even if nullmodem() is
null in case something is sleeping waiting for a carrier transition.
In any case, the wakeups should be harmless. They may cause bogus
results for select(), but select() is already bogus for nonstandard
line disciplines.


9759 29-Jul-1995 bde

Eliminate sloppy common-style declarations. There should be none left for
the LINT configuation.


9624 21-Jul-1995 bde

Obtained from: partly from ancient patches by ache and me via 1.1.5

Nuke `symbolic sleep message strings'. Use unique literal messages so that
`ps l' shows unambiguously where processes are sleeping.


9540 16-Jul-1995 bde

Don't include <sys/tty.h> in drivers that aren't tty drivers or in general
files that don't depend on the internals of <sys/tty.h>


9469 10-Jul-1995 wollman

When adding a route, set rt_ifa and rt_ifp a little earlier so that
the protocol-specific add routine can examine it if desired.


9457 09-Jul-1995 joerg

Move some struct definitions outside of struct's, so their scopes for
C++ will match the scopes for C.

Submitted by: Warner Losh


9443 08-Jul-1995 joerg

PR #kern/600: PPP does not pay attention to IPTOS_LOWDELAY

Kernel PPP doesn't pay attention to IPTOS_LOWDELAY, but uses
a table of port numbers, which isn't a generic method. The following
patch fixes this (the table is still used, but in addition
PPP queues the packet in fastq if IPTOS_LOWDELAY is set.

Obtained from: Tatu Ylonen <ylo@cs.hut.fi>
Submitted by: Heikki Suonsivu <hsu@clinet.fi>


9415 07-Jul-1995 dg

Worked around a bug with if.c setting the interface up even when we don't
want it to.


9412 06-Jul-1995 dg

Modified joerg's last change to only set the interface "up" when setting
the address if the device is a SLIP device (i.e. "attached").


9385 02-Jul-1995 joerg

Revision 1.21 of if_sl.c broke the traditional behaviour that
assigning an address to an interface automatically marks this
interface IFF_UP. The fix corrects this (and closes PR sys/577).
This is consistent with the way ethernet interfaces are being handled.


9348 28-Jun-1995 dg

Don't skip point-to-point interfaces if the netmask==0 (the netmask
should be completely ignored for point-to-point interfaces).
For point-to-point interfaces, route based on the destination address,
not the local address.

Submitted by: Peter Wemm


9276 21-Jun-1995 dg

Killed a couple lines of redundant code.


9275 21-Jun-1995 dg

Protect the call to if_up() with an splnet().


9274 21-Jun-1995 dg

1) Set interface up/down correctly as a function of open and close of the
SLIP device.
2) Don't directly frob the IFF_UP flag - use if_up/if_down as it was
intended.
3) Return ENETDOWN if IFF_UP isn't set when outputing, drop the packet if
if IFF_UP isn't set when inputing.


9235 15-Jun-1995 pst

Give the BPF the ability to generate signals when a packet is available.

Reviewed by: pst & wollman
Submitted by: grossman@cygnus.com


9234 15-Jun-1995 dg

Took out P2P_LOCALADDR_SHARE option and made it standard.


9202 11-Jun-1995 rgrimes

Merge RELENG_2_0_5 into HEAD


8876 30-May-1995 rgrimes

Remove trailing whitespace.


8788 27-May-1995 dg

Added a fix for a bug which caused the wrong interface to be selected
for broadcasts if point-to-point links shared the same IP address as
the ethernet. The fix must be enabled with P2P_LOCALADDR_SHARE option
in the kernel config file. This will someday likely be standard, but
there isn't sufficient time before release to determine if there are
any interoperability problems with routed and/or gated.

Reviewed by: Garrett Wollman, and me
Submitted by: Peter Wemm


8544 15-May-1995 dg

Fixed route reference count bug that squirmed in during the the
routing-socket code upgrade from Berkeley..

Submitted by: Garrett Wollman via Peter Wemm via Cornell


8456 11-May-1995 rgrimes

Fix -Wformat warnings from LINT kernel.


8426 11-May-1995 wollman

Make networking domains drop-ins, through the magic of GNU ld. (Some day,
there may even be LKMs.) Also, change the internal name of `unixdomain'
to `localdomain' since AF_LOCAL is now the preferred name of this family.
Declare netisr correctly and in the right place.


8412 10-May-1995 wollman

Updated routing-socket code from Berkeley

Obtained from: Keith Bostic by way of Paul Traina


8384 09-May-1995 dg

Replaced some bcopy()'s with memcpy()'s so that gcc while inline/optimize.


8171 29-Apr-1995 bde

Fix misplaced idempotency #endif.

Fix tabs and spaces in the wrong places.


8152 28-Apr-1995 pst

Incorporate new radix code from UCB. This fixes the orphaned mask bugs.
This submission was done by hand-applying FreeBSD local modifications on
top of the UCB code, rather than trying to patch the UCB code in on top
of the FreeBSD code due to the extensive changes.

Reviewed by: pst (been handling 30k routes for 4+ months)
Obtained from: Sklower/Woody/Honing/Traina (8.4 UCB release)


8143 28-Apr-1995 ache

Implement SLIOCSUNIT (set slip unit number)


8090 26-Apr-1995 pst

Cleanup loopback interface support.
Reviewed by: wollman


8070 25-Apr-1995 wollman

Finally finish the cloning cleanup work by making sure that clones
go away whenever a clone's parent is changed, or a route is added in a
certain set of circumstances.

This also includes code to forbid setting a route's gateway to an
address which can only be reached through that route, thus (hopefully)
eliminating one class of cloning bottomless-recursion bugs.


7747 10-Apr-1995 wollman

Tunnel driver is nmow capable of installing its own cdevsw[] entry,
with a little help from conf.c. While e're at it, actually declare the
tunnel entry points to have the correct types. This fixes PR #306.


7570 02-Apr-1995 bde

Fix slioctl(). It has to return -1 for ioctls that it doesn't know about
so that these ioctls can be handled by the calling layer(s).

Clean up the recently added code:
- include the appropriate header to declare an implicitly declared function.
- declare timeout functions correctly and remove numerous bogus casts that
hid (but didn't fix) their incorrectness.


7567 01-Apr-1995 ache

slopen() never sets t_line to SLIPDISC, but uses slip-specific queue allocation


7543 01-Apr-1995 dg

Patch from Greg Ansley:

In rare cases, when the filter specified accesses an multi-byte value that
is split across mbuf's, the value loaded is incorrect. And if you are very
unlucky (like me) it will index off the end of the mbuf and into an
unallocated page and panic the system.

If you look at the code you will discover the the index *k* is added to
the pointer *cp* and the used AGAIN as a subscript.


7524 31-Mar-1995 ache

Fix typing error sneaked somehow in prev. commit


7503 30-Mar-1995 ache

This sl enhancement helps to keep serial line (modem) connection alive.
It is common case when modem hangs with carier on but don't
receive anything from another side.
This thing commonly healed with hangup and redialing.
Enhancements below allows to determine when such action
is needed and inform attach program with SIGURG signal.
There two ioctls set: outfill and keepalive, used from both
sides of connection. Outfill repeatedly sends FRAME_END with
specified timeout (i.e. 40 seconds). It is needed to get input on
other side even if no user activity on slip line currently.
Keepalive checks FRAME_ENDs from other side, and if no one
got in specified timeout (i.e. 60 seconds, max modem retrain time),
send SIGURG to attach program.
I plan to add code to slattach to handle this thing too.

Reviewed by: wollman


7474 29-Mar-1995 ache

pppinput:
Fix serial errors handling
Add no carrier check


7335 24-Mar-1995 wollman

Don't delete clones if they are PINNED.


7279 23-Mar-1995 wollman

radix.c: correct exit condition in rn_walktree_from()
route.c: be a little more careful when running deleting children of dying
. routes


7224 21-Mar-1995 wollman

Protocol-cloned routes should gain a reference to their parents to make
sure that rt->rt_parent values can never be re-used harmfully.


7199 20-Mar-1995 dg

Made minor readability tweak.


7197 20-Mar-1995 wollman

Better fix for the deletion of parents of cloned routes problem,
superseding the `nextchild' hack. This also provides a way
forward to fix RTM_CHANGE and RTM_ADD as well.


7193 20-Mar-1995 wollman

Support for pseudo-device LKMs. Note that this is restricted to only
one pseudo per module (a restriction which will eventually be lifted) and
isthus not in its final form.


7117 17-Mar-1995 wollman

Beginnings of support for loadable pseudo-devices. bsd.kmod.mk support
and Makefiles for the more interesting ones to come on Monday.


7090 16-Mar-1995 bde

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


7061 14-Mar-1995 dg

Added $Id$


7055 14-Mar-1995 dg

Added support for generic FDDI and the DEC DEFEA and DEFPA FDDI adapters.

Submitted by: Matt Thomas


6876 04-Mar-1995 dg

Reduced loopback MTU to 16k to work around a miriad of problems with it
set near or above 32k (likely caused by 16bit signed word overflow). 16k
actually (surprizingly) has higher performance than other values I tested.


6735 26-Feb-1995 amurai

New user Process PPP based on iij-ppp0.94beta2.

o Supporting SYNC SIO device (But need a device driver)
- add "set speed sync"
o Fixing bug for Predictor-1 function.
o Add new parameter that re-sent interval for set timeout commands.
o Improving RTT (Round Trip Time) and reducing processor time.
- Previous Timer service was using polling, and now using
SIGALRM ;-)
- A 0.94beta2 will not work correctly....

-- Follows are additinal feature not including 0.94beta2
o Support Proxy ARP
- add "enable/disable proxy" commands
o Marging common routine in CHAP/PAP.
o Enhancing LCP/IPCP log information.
o Support local Authfication connection on port 300x and tty.
- You can set up pair of your "hostname -s" and
password in ppp.secret. if either ppp.secret file nor
your hostname line don't exist, It will notify a message
and working as same as previous version.(Backword compatibility)
- If you did set up them, It's allow connection but nothing to do
except help and passwd command.
- add "passwd yourpasswd" commands
o Support afilter - keep Alive filter that a packet can send/receiving
according to ifilter/ofilter but doesn't count it as preventing idle
timer expires.
- Same syntax of other filters.
o Fixing bugs reported by current user for previous one. Thanks !!

Reviewed by: Atsushi Murai (amurai@spec.co.jp)


6699 25-Feb-1995 dg

Fixed comment - IFT_P80 is 80mbit.

Submitted by: frank@fwi.uva.nl (Frank van der Linden)


6687 24-Feb-1995 dg

In ifa_ifwithdstaddr() when walking through ifa structs associated with
a point-to-point link, don't attempt a comparison if the pointer to the
destination sockaddr is NULL (i.e. it has not been set/initialized).


6477 16-Feb-1995 wollman

Attempting to bind() or connect() a routing socket, while meaningless,
shouldn't cause a panic.

Obtained from: Stevens, vol. 2, p. 667


6429 15-Feb-1995 jkh

Nuke some more compiler warnings, while I'm at it..


6335 13-Feb-1995 ache

*close: just purge tty queues if we can't drain them


6245 08-Feb-1995 wollman

Define RTF_PINNED for future use.


6235 07-Feb-1995 wollman

Correct fix for merge conflicts: RTM_VERSION is always 5. Header files
included by user code must never depend on kernel compile options.


6228 07-Feb-1995 dg

Fixed unresolved CVS conflict on RTM_VERSION.


6223 07-Feb-1995 wollman

Merge in the socket-level support for Transaction TCP from the OLAH_TTCP
branch.

Submitted by: Andras Olah <olah@cs.utwente.nl>


6054 31-Jan-1995 amurai

This commit was generated by cvs2svn to compensate for changes in r6053,
which included commits to RCS files with non-trunk default branches.


5833 24-Jan-1995 bde

Declare `struct mbuf' with the correct scope to avoid lots of warnings
for compiling routed... Previously a kernel function pointer that is
bogusly visible to applications was incompletely declared to hide the
problem.


5801 23-Jan-1995 dg

Added back the missing last few bytes of the file.


5791 23-Jan-1995 wollman

route.c: keep track of where cloned routes come from, and make sure to
delete them when the ``parent'' goes away

route.h: add glue to track this to rtentry structure. WARNING WILL ROBINSON!
This will be yet another incompatible change in your route-using binaries.
I apologize, but this was the only way to do it. I took this opportunity
to increase the size of the metrics to what I believe will be the final
length for 2.1, so that when the T/TCP stuff is done, this won't happen
again.


5413 05-Jan-1995 se

Submitted by: Wolfgang Stanglmeier <wolf@dentaro.GUN.de>
Reviewed by: <wollman>
First hooks and defines for the ISDN driver,
that soon will see the light ...


5280 30-Dec-1994 dg

Moved declaration of ifnet pointer out of the header file and into the
.c file where it belongs. Bezeroed some uninitialized malloc data.


5191 22-Dec-1994 wollman

Added `ds', a black-hole network interface.


5187 22-Dec-1994 dg

Removed bogus semicolon at end of a #define line.


5184 21-Dec-1994 wollman

Add generic part of generic multiple-physical-interface support (the
successor of IFF_ALTPHYS).


5181 21-Dec-1994 wollman

Add a #define for if_rawoutput(), which isn't used now, but eventually will
be.


5104 13-Dec-1994 wollman

Implemented rtalloc_ign().


5099 13-Dec-1994 wollman

Add support for two separate cloning flags, one set by the lower layers,
and one set by the protocol family. Also add another parameter to
rtalloc1() to allow for any interface flags to be ignored; currently
this is only useful for RTF_PRCLONING. Get rid of rt_prflags and re-unite
with rt_flags. Add T/TCP ``route metrics''.

NB: YOU MUST RECOMPILE `route' AND OTHER RELATED PROGRAMS AS A RESULT OF
THIS CHANGE.

This also adds a new interface parameter, `ifi_physical', which will
eventually replace IFF_ALTPHYS as the mechanism for specifying the
particular physical connection desired on a multiple-connection card.

NB: YOU MUST RECOMPILE `ifconfig' AND OTHER RELATED PROGRAMS AS A RESULT OF
THIS CHANGE.


4952 04-Dec-1994 bde

Fix bogus include paths:

<systm.h> is <sys/systm.h>.
<kernel.h> is <sys/kernel.h>.


4911 02-Dec-1994 wollman

This commit was generated by cvs2svn to compensate for changes in r4910,
which included commits to RCS files with non-trunk default branches.


4838 27-Nov-1994 bde

Fix previous change: don't attempt to reserve cblocks if the tty is null.


4825 26-Nov-1994 bde

Fix cblock starvation bugs by reserving enough cblocks for minimal
operation of each clist. Limit the growth of each clist. Clists
can only grow larger than the reserved minimum if there are free
cblocks in a shared pool. The size of this pool is now fixed
(this could be improved). The reserved and maximum sizes are more
carefully allocated for slip and ppp, depending on the mtu. A maximum
MTU of 16384 is now enforced for ppp.


4796 24-Nov-1994 dg

Moved conversion of ether_type to host byte order out of ethernet drivers
and into ether_input(). It was silly to have bpf want this one way and
ether_input want it another way. Ripped out trailer support from the few
remaining drivers that still had it.


4783 23-Nov-1994 ugen

The long-time-waited-for patch for PPP.
See Gene's mail for explanation..
Submitted by: Gene Stark


4518 16-Nov-1994 phk

#include <socket.h> -> <sys/socket.h>


4507 15-Nov-1994 bde

Include <sys/socket.h> for declaration of struct sockaddr. This helps
genassym compile when KERNEL is not defined.

Uniformize idempotency ifdef.


4469 14-Nov-1994 bde

if.h:
Declare a complete prototype for the function pointer *ifa_rtrequest.

radix.h:
Declare a complete prototype for the function pointer *rnh_walktree
and for the function rn_walktree.

Uniformize idempotency ifdef.


4345 10-Nov-1994 guido

Remove redundant stuff. Amazing that they actually solved a bug found in
1.1.5.1, and oversaw this thang.


4104 03-Nov-1994 wollman

Collapse two fields so that we have space for another 32 flags.
NB: You will have to recompile programs which use the `rt_use' member in
order to get the correct values. This should not cause incorrect operation,
but the statistics may look a little confusing.


4073 02-Nov-1994 wollman

Add code to be a bit smarter about IP routes, conditioned on the option
IN_RMX. (Eventually this will be standard, but I just wrote the code today
and don't want to break anyone.)


4060 01-Nov-1994 wollman

Make it compile again. XXX - this is not really the right way to fix
this problem.


4041 01-Nov-1994 pst

Make PPP compile cleanly


3628 15-Oct-1994 phk

moved a message into a #ifdef DEBUG. This message comes out if a kernel
doesn't have any networking in it. For instance the new "MINI" install-
kernel.


3514 11-Oct-1994 wollman

Fix a bug which caused panics when attempting to change just the flags of
a route. (This still doesn't work, but it doesn't panic now.) It looks
like there may be a number of incipient bugs in this code.

Also, get ready for the time when all IP gateway routes are cloning, which
is necessary to keep proper TCP statistics.


3451 09-Oct-1994 dg

Got rid of map.h. It's a leftover from the rmap code, and we use rlists.
Changed swapmap into swaplist.


3443 08-Oct-1994 phk

Cosmetics: to silence gcc -wall.


3419 08-Oct-1994 phk

Mostly Cosmetics. Some of the procedures in if_sl.c was void, but should
be int. I made them int, and let them return 0. Will have to find out
what the return-val is used for.


3379 05-Oct-1994 wollman

Install line discipline the new way.


3377 05-Oct-1994 wollman

A number of bug-fixes inspired by Mark Treacy:
- Allow PPP to run multicasts natively.
- Deal properly with lots of similarly-named interfaces.
- Don't sign-extend if_flags.

NB: the last fix (to rtsock.c) must be reversed when we expand if_flags to a
reasonable size.

Submitted by: Mark Treacy


3352 04-Oct-1994 phk

Moved m_copyback into uipc_mbuf.c


3311 02-Oct-1994 phk

GCC cleanup.
Reviewed by:
Submitted by:
Obtained from:


3274 01-Oct-1994 wollman

Define IFF_ALTPHYS to be IFF_LINK2. Gross, but effective. (There aren't any
more bits left in if_flags and I don't want to make it a long this late in
the release cycle.)


3014 23-Sep-1994 wollman

Make the kernel side of PPP compile.


3009 23-Sep-1994 wollman

Initial revision


2822 16-Sep-1994 phk

Made the kernel compile even without "ether".


2754 14-Sep-1994 wollman

Shuffle some functions and variables around to make it possible for
multicast routing to be implemented as an LKM. (There's still a bit of
work to do in this area.)


2735 13-Sep-1994 dg

Made SLIP MTU configurable via ifconfig(8). Changed default MTU to 552
as it is a better choice in a day with BTLZ compression modems.


2691 12-Sep-1994 dg

Re-enabled check for low clist condition.


2608 09-Sep-1994 dg

Made SLMTU kernel config'able.


2554 08-Sep-1994 wollman

The mrt_ioctl goop properly depends on MROUTING, not MULTICAST. (Oof!)


2544 07-Sep-1994 se

Reviewed by: Stefan Esser
Submitted by:
rtioctl(): changed parameter to mrt_ioctl from "cmd" to "req" to make
it compile with MULTICAST defined.


2531 06-Sep-1994 wollman

Initial get-the-easy-case-working upgrade of the multicast code
to something more recent than the ancient 1.2 release contained in
4.4. This code has the following advantages as compared to
previous versions (culled from the README file for the SunOS release):

- True multicast delivery
- Configurable rate-limiting of forwarded multicast traffic on each
physical interface or tunnel, using a token-bucket limiter.
- Simplistic classification of packets for prioritized dropping.
- Administrative scoping of multicast address ranges.
- Faster detection of hosts leaving groups.
- Support for multicast traceroute (code not yet available).
- Support for RSVP, the Resource Reservation Protocol.

What still needs to be done:

- The multicast forwarder needs testing.
- The multicast routing daemon needs to be ported.
- Network interface drivers need to have the `#ifdef MULTICAST' goop ripped
out of them.
- The IGMP code should probably be bogon-tested.

Some notes about the porting process:

In some cases, the Berkeley people decided to incorporate functionality from
later releases of the multicast code, but then had to do things differently.
As a result, if you look at Deering's patches, and then look at
our code, it is not always obvious whether the patch even applies. Let
the reader beware.

I ran ip_mroute.c through several passes of `unifdef' to get rid of
useless grot, and to permanently enable the RSVP support, which we will
include as standard.

Ported by: Garrett Wollman
Submitted by: Steve Deering and Ajit Thyagarajan (among others)


2214 22-Aug-1994 bde

Idempotency #endif was not at end of file or commented.


2192 21-Aug-1994 paul

Fix typo (define -> ifndef)
Reviewed by:
Submitted by:


2168 21-Aug-1994 paul

Make idempotent.

Submitted by: Paul


2142 20-Aug-1994 dg

1) cleaned up after Garrett - fixed more redundant declarations, changed
use of timeout_t -> timeout_func_t in aha1542 and aha1742 drivers.
2) fix a bug in the portalfs that was uncovered by better prototyping -
specifically, the time must be converted from timeval to timespec
before storing in va_atime.
3) fixed/added some miscellaneous prototypes


2112 18-Aug-1994 wollman

Fix up some sloppy coding practices:

- Delete redundant declarations.
- Add -Wredundant-declarations to Makefile.i386 so they don't come back.
- Delete sloppy COMMON-style declarations of uninitialized data in
header files.
- Add a few prototypes.
- Clean up warnings resulting from the above.

NB: ioconf.c will still generate a redundant-declaration warning, which
is unavoidable unless somebody volunteers to make `config' smarter.


1944 08-Aug-1994 dg

Added ioctl support for SIOCSIFMTU.


1943 08-Aug-1994 dg

On second thought, better restrict the mtu to between 72-65535...strange
things happen otherwise.


1942 08-Aug-1994 dg

Enforce the mtu to between the range 1-65535 before calling the driver
ioctl routine.


1941 08-Aug-1994 dg

Added ioctl support for SIOCGIFMTU and SIOCSIFMTU. These set the per-
interface MTU.


1817 02-Aug-1994 dg

Added $Id$


1811 01-Aug-1994 dg

Reduced loopback MTU from 65535 to 65532 because some things like NFS
really like it to be rounded to a longword.


1622 29-May-1994 dg

Changed loopback MTU to 65535.


1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


1542 24-May-1994 rgrimes

This commit was generated by cvs2svn to compensate for changes in r1541,
which included commits to RCS files with non-trunk default branches.