#
a254d687 |
|
29-Apr-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
carp: isolate VRRP from CARP There is only one functional change here - we don't allow SIOCSVH (or netlink request) to change sc->sc_version. I'm convinced that allowing such a change doesn't brings any practical value, but creates enless minefields in front of both developers and end users (sysadmins). If you want to switch from VRRP to CARP or vice versa, you'd need to recreate the VHID. Oh, one tiny funtional change: carp_ioctl_set() won't modify any fields if it returns EINVAL. Previously you could provide valid advbase with invalid advskew - that used to modify advbase and return EINVAL. All other changes is a sweep around not ever using CARP fields when we are in VRRP mode and vice versa. Also adding assertions on sc_version where necessary. Do not send VRRP vars in CARP mode via NetLink and vice versa. However in compat ioctl SIOCGVH for VRRP mode the CARP fields would be zeroes. This allows to declare softc as union and thus prevent any future logic deterioration wrt to mixing VRRP and CARP. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D45039
|
#
601438fb |
|
29-Apr-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
carp: refactor packet tagging for ether_output() - Separate HMAC preparation (CARP specific) from tagging. - In unicast mode (CARP specific) don't put tag at all. - Don't put pointer to software context into the tag. Putting just vhid, an integer value, is a safer design. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D45038
|
#
cda57d95 |
|
29-Apr-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
carp: assert that we are calling correct input function. We are. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D45037
|
#
5ee92cbd |
|
29-Apr-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
carp: don't chain call vrrp_send_ad via carp_send_ad Provide inline send_ad_locked() that switches between protocol specific sending function. Rename carp_send_ad() to carp_callout() to avoid getting lost in all these multiple foo_send_ad. No functional change intended. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D45036
|
#
37115154 |
|
02-Apr-2024 |
Kristof Provost <kp@FreeBSD.org> |
carp: support VRRPv3 Allow carp(4) to use the VRRPv3 protocol (RFC 5798). We can distinguish carp and VRRP based on the protocol version number (carp is 2, VRRPv3 is 3), and support both from the carp(4) code. Reviewed by: glebius Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D44774
|
#
6bce41a3 |
|
27-Feb-2024 |
Gordon Bergling <gbe@FreeBSD.org> |
carp(4): Fix a typo in a source code comment - s/successfull/successful/ MFC after: 3 days
|
#
ab393e95 |
|
12-Oct-2023 |
Kristof Provost <kp@FreeBSD.org> |
netlink: move NETLINK define to opt_global.h Move the NETLINK define into opt_global.h so we can rely on it being set correctly, without having to remember to include opt_netlink.h. This ensures that the NETLINK define is correctly set. If not we may end up with unloadable modules, due to missing symbols (such as nlmsg_get_group_writer). PR: 274306 Reviewed by: imp, markj MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D42179
|
#
242fa308 |
|
09-Sep-2023 |
Zhenlei Huang <zlei@FreeBSD.org> |
carp: Explicitly mark tunnable net.inet.carp.allow with CTLFLAG_NOFETCH With recent change 110113bc086f, a vnet tunable can be initialized when there is a corresponding kernel environment variable unless it is marked with the flag CTLFLAG_NOFETCH. The initialization may happen during early boot(linker preload), at that time vnet0 has not been created. The hander carp_allow_sysctl() for the tunable net.inet.carp.allow requires vnet, thus invoking it during early boot will cause kernel panic. The tunnable is initialized by vnet sysinit routine ipcarp_sysinit() so let's just mark it with flag CTLFLAG_NOFETCH. No functional change intended. Fixes: 110113bc086f sysctl(9): Enable vnet sysctl variables to be loader tunable MFC after: 2 week Differential Revision: https://reviews.freebsd.org/D41525
|
#
685dc743 |
|
16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
#
600bf006 |
|
02-Aug-2023 |
Andrey V. Elsukov <ae@FreeBSD.org> |
carp: delete interface routes on link loss. Obtained from: Yandex LLC MFC after: 10 days Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D41290
|
#
c2c28c0f |
|
18-May-2023 |
Kristof Provost <kp@FreeBSD.org> |
carp: fix unicast link-local If the peer6 address is a link-local address we have to embed the scopeid, much like we have to for IPv6 multicast as well. Sponsored by: Rubicon Communications, LLC ("Netgate")
|
#
4d846d26 |
|
10-May-2023 |
Warner Losh <imp@FreeBSD.org> |
spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
|
#
28921c4f |
|
30-Mar-2023 |
Kristof Provost <kp@FreeBSD.org> |
carp: allow commands to use interface name rather than index Get/set commands can now choose to provide the interface name rather than the interface index. This allows userspace to avoid a call to if_nametoindex(). Suggested by: melifaro Reviewed by: melifaro Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D39359
|
#
ccff2078 |
|
27-Mar-2023 |
Kristof Provost <kp@FreeBSD.org> |
carp: fix source MAC When we're not in unicast mode we need to change the source MAC address. The check for this was wrong, because IN_MULTICAST() assumes host endianness and the address in sc_carpaddr is in network endianness. Sponsored by: Rubicon Communications, LLC ("Netgate")
|
#
19e43c16 |
|
27-Mar-2023 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: add netlink KPI to the kernel by default This change does the following: Base Netlink KPIs (ability to register the family, parse and/or write a Netlink message) are always present in the kernel. Specifically, * Implementation of genetlink family/group registration/removal, some base accessors (netlink_generic_kpi.c, 260 LoC) are compiled in unconditionally. * Basic TLV parser functions (netlink_message_parser.c, 507 LoC) are compiled in unconditionally. * Glue functions (netlink<>rtsock), malloc/core sysctl definitions (netlink_glue.c, 259 LoC) are compiled in unconditionally. * The rest of the KPI _functions_ are defined in the netlink_glue.c, but their implementation calls a pointer to either the stub function or the actual function, depending on whether the module is loaded or not. This approach allows to have only 1k LoC out of ~3.7k LoC (current sys/netlink implementation) in the kernel, which will not grow further. It also allows for the generic netlink kernel customers to load successfully without requiring Netlink module and operate correctly once Netlink module is loaded. Reviewed by: imp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D39269
|
#
511a6d5e |
|
20-Mar-2023 |
Kristof Provost <kp@FreeBSD.org> |
carp: use if_name() Reported by: melifaro Sponsored by: Rubicon Communications, LLC ("Netgate")
|
#
13781800 |
|
15-Mar-2023 |
Kristof Provost <kp@FreeBSD.org> |
carp: support unicast Allow users to configure the address to send carp messages to. This allows carp to be used in unicast mode, which is useful in certain virtual configurations (e.g. AWS, VMWare ESXi, ...) Reviewed by: melifaro Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D38940
|
#
40e04359 |
|
07-Mar-2023 |
Kristof Provost <kp@FreeBSD.org> |
carp: add netlink interface Allow carp configuration information to be supplied and retrieved via netlink. Reviewed by: melifaro Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D39048
|
#
49cad3da |
|
16-Mar-2023 |
Zhenlei Huang <zlei@FreeBSD.org> |
carp: carp_master_down_locked() requires net epoch Reviewed by: kp Fixes: 1d126e9b9474 carp: Widen epoch coverage MFC after: 1 day Differential Revision: https://reviews.freebsd.org/D39113
|
#
3d0d5b21 |
|
23-Jan-2023 |
Justin Hibbits <jhibbits@FreeBSD.org> |
IfAPI: Explicitly include <net/if_private.h> in netstack Summary: In preparation of making if_t completely opaque outside of the netstack, explicitly include the header. <net/if_var.h> will stop including the header in the future. Sponsored by: Juniper Networks, Inc. Reviewed by: glebius, melifaro Differential Revision: https://reviews.freebsd.org/D38200
|
#
ee49c5d3 |
|
29-Jan-2023 |
Boris Lytochkin <lytboris@gmail.com> |
carp: turn net.inet.carp.allow into a RW tunable Currently CARP starts announcing its state when initialised, regardless of the state of the other services provided by the server. As a result, the device can become master while still loading the firewall ruleset or initialising long-starting service. This change adds the way to request delayed CARP start by setting the net.inet.carp.allow=0 in the loader.conf. Differential Revision: https://reviews.freebsd.org/D38167 MFC after: 2 weeks
|
#
78b1fc05 |
|
17-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
protosw: separate pr_input and pr_ctlinput out of protosw The protosw KPI historically has implemented two quite orthogonal things: protocols that implement a certain kind of socket, and protocols that are IPv4/IPv6 protocol. These two things do not make one-to-one correspondence. The pr_input and pr_ctlinput methods were utilized only in IP protocols. This strange duality required IP protocols that doesn't have a socket to declare protosw, e.g. carp(4). On the other hand developers of socket protocols thought that they need to define pr_input/pr_ctlinput always, which lead to strange dead code, e.g. div_input() or sdp_ctlinput(). With this change pr_input and pr_ctlinput as part of protosw disappear and IPv4/IPv6 get their private single level protocol switch table ip_protox[] and ip6_protox[] respectively, pointing at array of ipproto_input_t functions. The pr_ctlinput that was used for control input coming from the network (ICMP, ICMPv6) is now represented by ip_ctlprotox[] and ip6_ctlprotox[]. ipproto_register() becomes the only official way to register in the table. Those protocols that were always static and unlikely anybody is interested in making them loadable, are now registered by ip_init(), ip6_init(). An IP protocol that considers itself unloadable shall register itself within its own private SYSINIT(). Reviewed by: tuexen, melifaro Differential revision: https://reviews.freebsd.org/D36157
|
#
8c77967e |
|
11-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
protosw: retire pr_output method The only place to execute this method was raw_usend(). Only those protocols that used raw socket were able to actually enter that method. All pr_output assignments being deleted by this commit were a dead code for many years. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36126
|
#
9a8cf950 |
|
18-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
carp: fix send error demotion recovery The problem is that carp(4) would clear the error counter on first successful send, and stop counting successes after that. Fix this logic and document it in human language. PR: 260499 Differential revision: https://reviews.freebsd.org/D33536
|
#
1019354b |
|
31-Oct-2021 |
Marius Halden <marius.halden@modirum.com> |
carp: deal with negative net.inet.carp.demotion Given nodes 1 and 2, where node 1 has an advskew of 0 and node 2 has an advskew of 100, making them master and backup respectively. If net.inet.carp.demotion is set to a negative value on node 1, node 2 might become master while node 1 still retains it master status. Wether or not node 2 becomes master seems to depend on the nodes advskew and what the demotion sysctl was set to on node 1. The reason for node 2 becoming master seems to be that the calculated advskew taking demotion into account is truncated to a single unsigned byte when copied into the carp header for sending, and node 1 stays master since it takes uses the whole non-truncated calculated advskew when deciding wether to stay master. PR: 259528 Reviewed by: donner, glebius MFC after: 3 weeks Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D32759
|
#
130aebba |
|
19-Jan-2021 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Further refactor IPv4 interface route creation. * Fix bug with /32 aliases introduced in 81728a538d24. * Explicitly document business logic for IPv4 ifa routes. * Remove remnants of rtinit() * Deduplicate ifa->route prefix code by moving it into ia_getrtprefix() * Deduplicate conditional check for ifa_maintain_loopback_route() by moving into ia_need_loopback_route() * Remove now-unused flags argument from in_addprefix(). Reviewed by: donner PR: 252883 Differential Revision: https://reviews.freebsd.org/D28246
|
#
662c1305 |
|
01-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
net: clean up empty lines in .c and .h files
|
#
773e541e |
|
20-Aug-2020 |
Warner Losh <imp@FreeBSD.org> |
Use devctl.h instead of bus.h to reduce newbus pollution. There's no need for these parts of the kernel to know about newbus, so narrow what is included to devctl.h for device_notify_*. Suggested by: kib@
|
#
1d126e9b |
|
12-Apr-2020 |
Kristof Provost <kp@FreeBSD.org> |
carp: Widen epoch coverage Fix panics related to calling code which expects to be running inside the NET_EPOCH from outside that epoch. This leads to panics (with INVARIANTS) such as this one: panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/netinet/if_ether.c:373 cpuid = 7 time = 1586095719 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0090819700 vpanic() at vpanic+0x182/frame 0xfffffe0090819750 panic() at panic+0x43/frame 0xfffffe00908197b0 arprequest_internal() at arprequest_internal+0x59e/frame 0xfffffe00908198c0 arp_announce_ifaddr() at arp_announce_ifaddr+0x20/frame 0xfffffe00908198e0 carp_master_down_locked() at carp_master_down_locked+0x10d/frame 0xfffffe0090819910 carp_master_down() at carp_master_down+0x79/frame 0xfffffe0090819940 softclock_call_cc() at softclock_call_cc+0x13f/frame 0xfffffe00908199f0 softclock() at softclock+0x7c/frame 0xfffffe0090819a20 ithread_loop() at ithread_loop+0x279/frame 0xfffffe0090819ab0 fork_exit() at fork_exit+0x80/frame 0xfffffe0090819af0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0090819af0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Widen the NET_EPOCH to cover the relevant (callback / task) code. Differential Revision: https://reviews.freebsd.org/D24302
|
#
10b49b23 |
|
21-Feb-2020 |
Pawel Biernacki <kaktus@FreeBSD.org> |
Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (6 of many) r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. Mark all nodes in pf, pfsync and carp as MPSAFE. Reviewed by: kp Approved by: kib (mentor, blanket) Differential Revision: https://reviews.freebsd.org/D23634
|
#
b9555453 |
|
21-Jan-2020 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Make ip6_output() and ip_output() require network epoch. All callers that before may called into these functions without network epoch now must enter it.
|
#
8c3fbf3c |
|
31-Dec-2019 |
Alexander Motin <mav@FreeBSD.org> |
Relax locking of carp_forus(). This fixes deadlock between CARP and bridge. Bridge calls this function taking CARP lock while holding bridge lock. Same time CARP tries to send its announcements via the bridge while holding CARP lock. Use of CARP_LOCK() here does not solve anything, since sc_addr is constant while race on sc_state is harmless and use of the lock does not close it. Reviewed by: glebius MFC after: 2 weeks Sponsored by: iXsystems, Inc.
|
#
646e3488 |
|
06-Dec-2019 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Remove the extra epoch tracker change sneaked into r355449 and was not part of the originally reviewed or described change. Pointyhat to: bz Reported by: glebius
|
#
dad68fc3 |
|
06-Dec-2019 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
carp: replace caddr_t with char * Change the remaining caddr_t usages to char * following the removal of the KAME macros No functional change. Requested by: glebius Reviewed by: glebius MFC after: 2 weeks Sponsored by: Netflix (originally) Differential Revision: https://reviews.freebsd.org/D22399
|
#
a4adf6cc |
|
30-Nov-2019 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Fix m_pullup() problem after removing PULLDOWN_TESTs and KAME EXT_*macros. r354748-354750 replaced the KAME macros with m_pulldown() calls. Contrary to the rest of the network stack m_len checks before m_pulldown() were not put in placed (see r354748). Put these m_len checks in place for now (to go along with the style of the network stack since the initial commits). These are not put in for performance but to avoid an error scenario (even though it also will help performance at the moment as it avoid allocating an extra mbuf; not because of the unconditional function call). The observed error case went like this: (1) an mbuf with M_EXT arrives and we call m_pullup() unconditionally on it. (2) m_pullup() will call m_get() unless the requested length is larger than MHLEN (in which case it'll m_freem() the perfectly fine mbuf) and migrate the requested length of data and pkthdr into the new mbuf. (3) If m_get() succeeds, a further m_pullup() call going over MHLEN will fail. This was observed with failing auto-configuration as an RA packet of 200 bytes exceeded MHLEN and the m_pullup() called from nd6_ra_input() dropped the mbuf. (Re-)adding the m_len checks before m_pullup() calls avoids this problems with mbufs using external storage for now. MFC after: 3 weeks Sponsored by: Netflix
|
#
63abacc2 |
|
15-Nov-2019 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
netinet*: replace IP6_EXTHDR_GET() In a few places we have IP6_EXTHDR_GET() left in upper layer protocols. The IP6_EXTHDR_GET() macro might perform an m_pulldown() in case the data fragment is not contiguous. Convert these last remaining instances into m_pullup()s instead. In CARP, for example, we will a few lines later call m_pullup() anyway, the IPsec code coming from OpenBSD would otherwise have done the m_pullup() and are copying the data a bit later anyway, so pulling it in seems no better or worse. Note: this leaves very few m_pulldown() cases behind in the tree and we might want to consider removing them as well to make mbuf management easier again on a path to variable size mbufs, especially given m_pulldown() still has an issue not re-checking M_WRITEABLE(). Reviewed by: gallatin MFC after: 8 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D22335
|
#
b8a6e03f |
|
07-Oct-2019 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Widen NET_EPOCH coverage. When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111
|
#
59854ecf |
|
25-Jun-2019 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Convert all IPv4 and IPv6 multicast memberships into using a STAILQ instead of a linear array. The multicast memberships for the inpcb structure are protected by a non-sleepable lock, INP_WLOCK(), which needs to be dropped when calling the underlying possibly sleeping if_ioctl() method. When using a linear array to keep track of multicast memberships, the computed memory location of the multicast filter may suddenly change, due to concurrent insertion or removal of elements in the linear array. This in turn leads to various invalid memory access issues and kernel panics. To avoid this problem, put all multicast memberships on a STAILQ based list. Then the memory location of the IPv4 and IPv6 multicast filters become fixed during their lifetime and use after free and memory leak issues are easier to track, for example by: vmstat -m | grep multi All list manipulation has been factored into inline functions including some macros, to easily allow for a future hash-list implementation, if needed. This patch has been tested by pho@ . Differential Revision: https://reviews.freebsd.org/D20080 Reviewed by: markj @ MFC after: 1 week Sponsored by: Mellanox Technologies
|
#
4c62bffe |
|
08-Jun-2019 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Fix dpcpu and vnet panics with complex types at the end of the section. Apply a linker script when linking i386 kernel modules to apply padding to a set_pcpu or set_vnet section. The padding value is kind-of random and is used to catch modules not compiled with the linker-script, so possibly still having problems leading to kernel panics. This is needed as the code generated on certain architectures for non-simple-types, e.g., an array can generate an absolute relocation on the edge (just outside) the section and thus will not be properly relocated. Adding the padding to the end of the section will ensure that even absolute relocations of complex types will be inside the section, if they are the last object in there and hence relocation will work properly and avoid panics such as observed with carp.ko or ipsec.ko. There is a rather lengthy discussion of various options to apply in the mentioned PRs and their depends/blocks, and the review. There seems no best solution working across multiple toolchains and multiple version of them, so I took the liberty of taking one, as currently our users (and our CI system) are hitting this on just i386 and we need some solution. I wish we would have a proper fix rather than another "hack". Also backout r340009 which manually, temporarily fixed CARP before 12.0-R "by chance" after a lead-up of various other link-elf.c and related fixes. PR: 230857,238012 With suggestions from: arichardson (originally last year) Tested by: lwhsu Event: Waterloo Hackathon 2019 Reported by: lwhsu, olivier MFC after: 6 weeks Differential Revision: https://reviews.freebsd.org/D17512
|
#
a68cc388 |
|
08-Jan-2019 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Mechanical cleanup of epoch(9) usage in network stack. - Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro is used in embedded scopes. Explicitly declare epoch_tracker always. - Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read locking macros to what they actually are - the net_epoch. Keeping them as is is very misleading. They all are named FOO_RLOCK(), while they no longer have lock semantics. Now they allow recursion and what's more important they now no longer guarantee protection against their companion WLOCK macros. Note: INP_HASH_RLOCK() has same problems, but not touched by this commit. This is non functional mechanical change. The only functionally changed functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter epoch recursively. Discussed with: jtl, gallatin
|
#
e2c532f1 |
|
01-Nov-2018 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
carpstats are the last virtualised variable in the file and end up at the end of the vnet_set. The generated code uses an absolute relocation at one byte beyond the end of the carpstats array. This means the relocation for the vnet does not happen for carpstats initialisation and as a result the kernel panics on module load. This problem has only been observed with carp and only on i386. We considered various possible solutions including using linker scripts to add padding to all kernel modules for pcpu and vnet sections. While the symbols (by chance) stay in the order of appearance in the file adding an unused non-file-local variable at the end of the file will extend the size of set_vnet and hence make the absolute relocation for carpstats work (think of this as a single-module set_vnet padding). This is a (tmporary) hack. It is the least intrusive one as we need a timely solution for the upcoming release. We will revisit the problem in HEAD. For a lot more information and the possible alternate solutions please see the PR and the references therein. PR: 230857 MFC after: 3 days
|
#
f9be0386 |
|
15-Aug-2018 |
Matt Macy <mmacy@FreeBSD.org> |
Fix in6_multi double free This is actually several different bugs: - The code is not designed to handle inpcb deletion after interface deletion - add reference for inpcb membership - The multicast address has to be removed from interface lists when the refcount goes to zero OR when the interface goes away - decouple list disconnect from refcount (v6 only for now) - ifmultiaddr can exist past being on interface lists - add flag for tracking whether or not it's enqueued - deferring freeing moptions makes the incpb cleanup code simpler but opens the door wider still to races - call inp_gcmoptions synchronously after dropping the the inpcb lock Fundamentally multicast needs a rewrite - but keep applying band-aids for now. Tested by: kp Reported by: novel, kp, lwhsu
|
#
5f901c92 |
|
24-Jul-2018 |
Andrew Turner <andrew@FreeBSD.org> |
Use the new VNET_DEFINE_STATIC macro when we are defining static VNET variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147
|
#
0d3d234c |
|
01-Jul-2018 |
Kristof Provost <kp@FreeBSD.org> |
carp: Set DSCP value CS7 Update carp to set DSCP value CS7(Network Traffic) in the flowlabel field of packets by default. Currently carp only sets TOS_LOWDELAY in IPv4 which was deprecated in 1998. This also implements sysctl that can revert carp back to it's old behavior if desired. This will allow implementation of QOS on modern network devices to make sure carp packets aren't dropped during interface contention. Submitted by: Nick Wolff <darkfiberiru AT gmail.com> Reviewed by: kp, mav (earlier version) Differential Revision: https://reviews.freebsd.org/D14536
|
#
d7c5a620 |
|
18-May-2018 |
Matt Macy <mmacy@FreeBSD.org> |
ifnet: Replace if_addr_lock rwlock with epoch + mutex Run on LLNW canaries and tested by pho@ gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace. When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch: InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32 After the patch InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52 Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366
|
#
167a3440 |
|
07-May-2018 |
Alexander Motin <mav@FreeBSD.org> |
Keep CARP state as INIT when net.inet.carp.allow=0. Currently when net.inet.carp.allow=0 CARP state remains as MASTER, which is not very useful (if there are other masters -- it can lead to split brain, if there are none -- it makes no sense). Having it as INIT makes it clear that carp packets are disabled. Submitted by: wg MFC after: 1 month Relnotes: yes Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D14477
|
#
f3e1324b |
|
02-May-2018 |
Stephen Hurd <shurd@FreeBSD.org> |
Separate list manipulation locking from state change in multicast Multicast incorrectly calls in to drivers with a mutex held causing drivers to have to go through all manner of contortions to use a non sleepable lock. Serialize multicast updates instead. Submitted by: mmacy <mmacy@mattmacy.io> Reviewed by: shurd, sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14969
|
#
0437c8e3 |
|
11-Apr-2018 |
Brooks Davis <brooks@FreeBSD.org> |
Remove support for FDDI networks. Defines in net/if_media.h remain in case code copied from ifconfig is in use elsewere (supporting non-existant media type is harmless). Reviewed by: kib, jhb Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D15017
|
#
541d96aa |
|
30-Mar-2018 |
Brooks Davis <brooks@FreeBSD.org> |
Use an accessor function to access ifr_data. This fixes 32-bit compat (no ioctl command defintions are required as struct ifreq is the same size). This is believed to be sufficent to fully support ifconfig on 32-bit systems. Reviewed by: kib Obtained from: CheriBSD MFC after: 1 week Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14900
|
#
69f0fecb |
|
28-Mar-2018 |
Brooks Davis <brooks@FreeBSD.org> |
Remove infrastructure for token-ring networks. Reviewed by: cem, imp, jhb, jmallett Relnotes: yes Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14875
|
#
fe267a55 |
|
27-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys: general adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. No functional change intended.
|
#
81098a01 |
|
19-Oct-2017 |
Alexander Motin <mav@FreeBSD.org> |
Relax per-ifnet cif_vrs list double locking in carp(4). In all cases where cif_vrs list is modified, two locks are held: per-ifnet CIF_LOCK and global carp_sx. It means to read that list only one of them is enough to be held, so we can skip CIF_LOCK when we already have carp_sx. This fixes kernel panic, caused by attempts of copyout() to sleep while holding non-sleepable CIF_LOCK mutex. Discussed with: glebius MFC after: 2 weeks Sponsored by: iXsystems, Inc.
|
#
338e227a |
|
25-Jan-2017 |
Luiz Otavio O Souza <loos@FreeBSD.org> |
After the in_control() changes in r257692, an existing address is (intentionally) deleted first and then completely added again (so all the events, announces and hooks are given a chance to run). This cause an issue with CARP where the existing CARP data structure is removed together with the last address for a given VHID, which will cause a subsequent fail when the address is later re-added. This change fixes this issue by adding a new flag to keep the CARP data structure when an address is not being removed. There was an additional issue with IPv6 CARP addresses, where the CARP data structure would never be removed after a change and lead to VHIDs which cannot be destroyed. Reviewed by: glebius Obtained from: pfSense MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC (Netgate)
|
#
cfff8d3d |
|
30-Dec-2016 |
Enji Cooper <ngie@FreeBSD.org> |
Unbreak ip_carp with WITHOUT_INET6 enabled by conditionalizing all IPv6 structs under the INET6 #ifdef. Similarly (even though it doesn't seem to affect the build), conditionalize all IPv4 structs under the INET #ifdef This also unbreaks the LINT-NOINET6 tinderbox target on amd64; I have not verified other MACHINE/TARGET pairs (e.g. armv6/arm). MFC after: 2 weeks X-MFC with: r310847 Pointyhat to: jpaetzel Reported by: O. Hartmann <o.hartmann@walstatt.org>
|
#
8151740c |
|
30-Dec-2016 |
Josh Paetzel <jpaetzel@FreeBSD.org> |
Harden CARP against network loops. If there is a loop in the network a CARP that is in MASTER state will see it's own broadcasts, which will then cause it to assume BACKUP state. When it assumes BACKUP it will stop sending advertisements. In that state it will no longer see advertisements and will assume MASTER... We can't catch all the cases where we are seeing our own CARP broadcast, but we can catch the obvious case. Submitted by: torek Obtained from: FreeNAS MFC after: 2 weeks Sponsored by: iXsystems
|
#
d6e82913 |
|
17-Dec-2015 |
Steven Hartland <smh@FreeBSD.org> |
Revert r292275 & r292379 glebius has concerns about these changes so reverting those can be discussed and addressed. Sponsored by: Multiplay
|
#
3a909afe |
|
16-Dec-2015 |
Steven Hartland <smh@FreeBSD.org> |
Fix issues introduced by r292275 * Fix panic for etherswitches which don't have a LLADDR. * Disabled DELAY in unsolicited NDA, which needs further work. * Fixed missing DELAY in carp_send_na. * style(9) fix. Reported by: kp & melifaro X-MFC-With: r292275 MFC after: 1 month Sponsored by: Multiplay
|
#
52e53e2d |
|
15-Dec-2015 |
Steven Hartland <smh@FreeBSD.org> |
Fix lagg failover due to missing notifications When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited Neighbour Advertisements (IPv6) are sent to notify other nodes that the address may have moved. This results is slow failover, dropped packets and network outages for the lagg interface when the primary link goes down. We now use the new if_link_state_change_cond with the force param set to allow lagg to force through link state changes and hence fire a ifnet_link_event which are now monitored by rip and nd6. Upon receiving these events each protocol trigger the relevant notifications: * inet4 => Gratuitous ARP * inet6 => Unsolicited Neighbour Announce This also fixes the carp IPv6 NA's that stopped working after r251584 which added the ipv6_route__llma route. The new behavour can be controlled using the sysctls: * net.link.ether.inet.arp_on_link * net.inet6.icmp6.nd6_on_link Also removed unused param from lagg_port_state and added descriptions for the sysctls while here. PR: 156226 MFC after: 1 month Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D4111
|
#
9925ac11 |
|
13-Nov-2015 |
Steven Hartland <smh@FreeBSD.org> |
Revert r290403 CARP rework invalidated this change.
|
#
1c302b58 |
|
09-Nov-2015 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Decompose arp_ifinit() into arp_add_ifa_lle() and arp_announce_ifaddr(). Rename arp_ifinit2() into arp_announce_ifaddr(). Eliminate zeroing ifa_rtrequest: it was used for calling arp_rtrequest() which was responsible for handling route cloning requests. It became obsolete since r186119 (L2/L3 split).
|
#
ac19560a |
|
05-Nov-2015 |
Steven Hartland <smh@FreeBSD.org> |
Add MTU support to carp interfaces MFC after: 2 weeks Sponsored by: Multiplay
|
#
3e7a2321 |
|
14-Sep-2015 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
* Do more fine-grained locking: call eventhandlers/free_entry without holding afdata wlock * convert per-af delete_address callback to global lltable_delete_entry() and more low-level "delete this lle" per-af callback * fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D3573
|
#
9c2cd1aa |
|
21-Apr-2015 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Improve carp(4) locking: - Use the carp_sx to serialize not only CARP ioctls, but also carp_attach() and carp_detach(). - Use cif_mtx to lock only access to those the linked list. - These locking changes allow us to do some memory allocations with M_WAITOK and also properly call callout_drain() in carp_destroy(). - In carp_attach() assert that ifaddr isn't attached. We always come here with a pristine address from in[6]_control(). Reviewed by: oleg Sponsored by: Nginx, Inc.
|
#
93d4534c |
|
06-Apr-2015 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Add sleepable lock to protect at least against two parallel SIOCSVHs. Sponsored by: Nginx, Inc.
|
#
6d947416 |
|
01-Apr-2015 |
Gleb Smirnoff <glebius@FreeBSD.org> |
o Use new function ip_fillid() in all places throughout the kernel, where we want to create a new IP datagram. o Add support for RFC6864, which allows to set IP ID for atomic IP datagrams to any value, to improve performance. The behaviour is controlled by net.inet.ip.rfc6864 sysctl knob, which is enabled by default. o In case if we generate IP ID, use counter(9) to improve performance. o Gather all code related to IP ID into ip_id.c. Differential Revision: https://reviews.freebsd.org/D2177 Reviewed by: adrian, cy, rpaulo Tested by: Emeric POUPON <emeric.poupon stormshield.eu> Sponsored by: Netflix Sponsored by: Nginx, Inc. Relnotes: yes
|
#
bb269f3a |
|
23-Jan-2015 |
Will Andrews <will@FreeBSD.org> |
Log hardware interface up/down as "hardware" rather than just "hw". Suggested by: glebius MFC after: 1 week MFC with: 277530
|
#
369a6708 |
|
23-Jan-2015 |
Will Andrews <will@FreeBSD.org> |
When a CARP state change is caused by an ifconfig request, log it accordingly. Suggested by: glebius MFC after: 1 week MFC with: 277530
|
#
d01641e2 |
|
22-Jan-2015 |
Will Andrews <will@FreeBSD.org> |
Improve CARP logging so that all state transitions are logged. sys/netinet/ip_carp.c: Add a "reason" string parameter to carp_set_state() and carp_master_down_locked() allowing more specific logging information to be passed into these apis. Refactor existing state transition logging into a single log call in carp_set_state(). Update all calls to carp_set_state() and carp_master_down_locked() to pass an appropriate reason string. For state transitions that were previously logged, the output should be unchanged. Submitted by: gibbs (original), asomers (updated) MFC after: 1 week Sponsored by: Spectra Logic MFSpectraBSD: 1039697 on 2014/02/11 (original) 1049992 on 2014/03/21 (updated)
|
#
57c5139c |
|
06-Jan-2015 |
Luiz Otavio O Souza <loos@FreeBSD.org> |
Remove the check that prevent carp(4) advskew to be set to '0'. CARP devices are created with advskew set to '0' and once you set it to any other value in the valid range (0..254) you can't set it back to zero. The code in question is also used to prevent that zeroed values overwrite the CARP defaults when a new CARP device is created. Since advskew already defaults to '0' for newly created devices and the new value is guaranteed to be within the valid range, it is safe to overwrite it here. PR: 194672 Reported by: cmb@pfsense.org In collaboration with: garga Tested by: garga MFC after: 2 weeks
|
#
ed6a66ca |
|
05-Jan-2015 |
Robert Watson <rwatson@FreeBSD.org> |
To ease changes to underlying mbuf structure and the mbuf allocator, reduce the knowledge of mbuf layout, and in particular constants such as M_EXT, MLEN, MHLEN, and so on, in mbuf consumers by unifying various alignment utility functions (M_ALIGN(), MH_ALIGN(), MEXT_ALIGN() in a single M_ALIGN() macro, implemented by a now-inlined m_align() function: - Move m_align() from uipc_mbuf.c to mbuf.h; mark as __inline. - Reimplement M_ALIGN(), MH_ALIGN(), and MEXT_ALIGN() using m_align(). - Update consumers around the tree to simply use M_ALIGN(). This change eliminates a number of cases where mbuf consumers must be aware of whether or not mbufs returned by the allocator use external storage, but also assumptions about the size of the returned mbuf. This will make it easier to introduce changes in how we use external storage, as well as features such as variable-size mbufs. Differential Revision: https://reviews.freebsd.org/D1436 Reviewed by: glebius, trasz, gnn, bz Sponsored by: EMC / Isilon Storage Division
|
#
6df8a710 |
|
07-Nov-2014 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed. Sponsored by: Nginx, Inc.
|
#
257480b8 |
|
04-Nov-2014 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Convert netinet6/ to use new routing API. * Remove &ifpp from ip6_output() in favor of ri->ri_nh_info * Provide different wrappers to in6_selectsrc: Currently it is used by 2 differenct type of customers: - socket-based one, which all are unsure about provided address scope and - in-kernel ones (ND code mostly), which don't have any sockets, options, crededentials, etc. So, we provide two different wrappers to in6_selectsrc() returning select source. * Make different versions of selectroute(): Currenly selectroute() is used in two scenarios: - SAS, via in6_selecsrc() -> in6_selectif() -> selectroute() - output, via in6_output -> wrapper -> selectroute() Provide different versions for each customer: - fib6_lookup_nh_basic()-based in6_selectif() which is capable of returning interface only, without MTU/NHOP/L2 calculations - full-blown fib6_selectroute() with cached route/multipath/ MTU/L2 * Stop using routing table for link-local address lookups * Add in6_ifawithifp_lla() to make for-us check faster for link-local * Add in6_splitscope / in6_setllascope for faster embed/deembed scopes
|
#
73d76e77 |
|
14-Aug-2014 |
Kevin Lo <kevlo@FreeBSD.org> |
Change pr_output's prototype to avoid the need for explicit casts. This is a follow up to r269699. Phabric: D564 Reviewed by: jhb
|
#
8f5a8818 |
|
07-Aug-2014 |
Kevin Lo <kevlo@FreeBSD.org> |
Merge 'struct ip6protosw' and 'struct protosw' into one. Now we have only one protocol switch structure that is shared between ipv4 and ipv6. Phabric: D476 Reviewed by: jhb
|
#
4a2dd8d4 |
|
22-Feb-2014 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Improve logging of send errors, reporting error code and interface. Reduce code duplication between INET and INET6. Tested by: Lytochkin Boris <lytboris gmail.com>
|
#
f6b84910 |
|
19-Jan-2014 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Further rework netinet6 address handling code: * Set ia address/mask values BEFORE attaching to address lists. Inet6 address assignment is not atomic, so the simplest way to do this atomically is to fill in ia before attach. * Validate irfa->ia_addr field before use (we permit ANY sockaddr in old code). * Do some renamings: in6_ifinit -> in6_notify_ifa (interaction with other subsystems is here) in6_setup_ifa -> in6_broadcast_ifa (LLE/Multicast/DaD code) in6_ifaddloop -> nd6_add_ifa_lle in6_ifremloop -> nd6_rem_ifa_lle * Split working with LLE and route announce code for last two. Add temporary in6_newaddrmsg() function to mimic current rtsock behaviour. * Call device SIOCSIFADDR handler IFF we're adding first address. In IPv4 we have to call it on every address change since ARP record is installed by arp_ifinit() which is called by given handler. IPv6 stack, on the opposite is responsible to call nd6_add_ifa_lle() so there is no reason to call SIOCSIFADDR often.
|
#
0cc726f2 |
|
03-Jan-2014 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Make failure of ifpromisc() a non-fatal error. This makes it possible to run carp(4) on vtnet(4). Sponsored by: Nginx, Inc.
|
#
76039bc8 |
|
26-Oct-2013 |
Gleb Smirnoff <glebius@FreeBSD.org> |
The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.
|
#
1f6addd9 |
|
08-Sep-2013 |
Mikolaj Golub <trociny@FreeBSD.org> |
Relese the interface in the last. Reviewed by: glebius Approved by: re (kib)
|
#
c5c392e7 |
|
13-Aug-2013 |
Mikolaj Golub <trociny@FreeBSD.org> |
Virtualize carp(4) variables to have per vnet control. Reviewed by: ae, glebius
|
#
69edf037 |
|
09-Jul-2013 |
Andrey V. Elsukov <ae@FreeBSD.org> |
Migrate struct carpstats to PCPU counters.
|
#
47e8d432 |
|
25-Apr-2013 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Add const qualifier to the dst parameter of the ifnet if_output method.
|
#
dc4ad05e |
|
14-Mar-2013 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Use m_get/m_gethdr instead of compat macros. Sponsored by: Nginx, Inc.
|
#
24421c1c |
|
11-Feb-2013 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Resolve source address selection in presense of CARP. Add a couple of helper functions: - carp_master() - boolean function which is true if an address is in the MASTER state. - ifa_preferred() - boolean function that compares two addresses, and is aware of CARP. Utilize ifa_preferred() in ifa_ifwithnet(). The previous version of patch also changed source address selection logic in jails using carp_master(), but we failed to negotiate this part with Bjoern. May be we will approach this problem again later. Reported & tested by: Anton Yuzhaninov <citrin citrin.ru> Sponsored by: Nginx, Inc
|
#
c4d06976 |
|
25-Dec-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Garbage collect carp_cksum().
|
#
7951008b |
|
25-Dec-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Change net.inet.carp.demotion sysctl to add the supplied value to the current demotion factor instead of assigning it. This allows external scripts to control demotion factor together with kernel in a raceless manner.
|
#
8f134647 |
|
22-Oct-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Switch the entire IPv4 stack to keep the IP packet header in network byte order. Any host byte order processing is done in local variables and host byte order values are never[1] written to a packet. After this change a packet processed by the stack isn't modified at all[2] except for TTL. After this change a network stack hacker doesn't need to scratch his head trying to figure out what is the byte order at the given place in the stack. [1] One exception still remains. The raw sockets convert host byte order before pass a packet to an application. Probably this would remain for ages for compatibility. [2] The ip_input() still subtructs header len from ip->ip_len, but this is planned to be fixed soon. Reviewed by: luigi, Maxim Dounin <mdounin mdounin.ru> Tested by: ray, Olivier Cochard-Labbe <olivier cochard.me>
|
#
9823d527 |
|
10-Oct-2012 |
Kevin Lo <kevlo@FreeBSD.org> |
Revert previous commit... Pointyhat to: kevlo (myself)
|
#
a10cee30 |
|
09-Oct-2012 |
Kevin Lo <kevlo@FreeBSD.org> |
Prefer NULL over 0 for pointers
|
#
891122d1 |
|
28-Sep-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
carp_send_ad() should never return without rescheduling next run.
|
#
8253dcab |
|
24-Jul-2012 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Fix a problem when CARP is enabled on the interface for IPv4 but not for IPv6. The current checks in nd6_nbr.c along with the old version will result in ifa being NULL and subsequently the packet will be dropped. This prevented NS/NA, from working and with that IPv6. Now return the ifa from the carp lookup function in two cases: 1) if the address matches, is a carp address, and we are MASTER (as before), 2) if the address matches but it is not a carp address at all (new). Reported by: Peter Wemm (new Y! FreeBSD cluster, eating our own dogfood) Tested on: New Y! FreeBSD cluster machines Reviewed by: glebius
|
#
eaf151c4 |
|
30-May-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Improve style(9) of bcopy() to and from mbuf tag. Submitted by: bde
|
#
a856ddc6 |
|
30-May-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
After r228571 carp_output() expects carp_softc * pointer in the mtag. Noticed by: thompsa
|
#
a9a2c40c |
|
10-Apr-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
It is a logical error that in carp_multicast_cleanup() we look at count of addresses on a particular vhid, we should account number of addresses on cif. To achieve this we need to run carp_attach() and carp_detach() under appropriate cif lock.
|
#
9ca1fe0d |
|
09-Apr-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
CARP should be capable to run on if_bridge(4). Unfortunately, this commit is not enough to enable CARP operation on if_bridge(4), because the latter doesn't handle or even initialize its ifp->if_link_state. Reported by: Alexander Lunev <sol289 gmail.com>
|
#
afdbac98 |
|
08-Feb-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Set vnet context in callouts and taskqueues. PR: 164696
|
#
2512b096 |
|
01-Feb-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
o Provide functions carp_ifa_addroute()/carp_ifa_delroute() to cleanup routes from a single ifa. o Implement carp_addroute()/carp_delroute() via above functions. o Call carp_ifa_delroute() in the carp_detach() to avoid junk routes left in routing table, in case if user removes an address in a MASTER state. [1] Reported by: az [1]
|
#
137f91e8 |
|
05-Jan-2012 |
John Baldwin <jhb@FreeBSD.org> |
Convert all users of IF_ADDR_LOCK to use new locking macros that specify either a read lock or write lock. Reviewed by: bz MFC after: 2 weeks
|
#
1c435c73 |
|
22-Dec-2011 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Use a better log message for master down event.
|
#
f08535f8 |
|
20-Dec-2011 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Restore a feature that was present in 5.x and 6.x, and was cleared in 7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP preemption, while it is running its bulk update. However, reimplement the feature in more elegant manner, that is partially inspired by newer OpenBSD: - Rename term "suppression" to "demotion", to match with OpenBSD. - Keep a global demotion factor, that can be raised by several conditions, for now these are: - interface goes down - carp(4) has problems with ip_output() or ip6_output() - pfsync performs bulk update - Unlike in OpenBSD the demotion factor isn't a counter, but is actual value added to advskew. The adjustment values for particular error conditions are also configurable, and their defaults are maximum advskew value, so a single failure bumps demotion to maximum. This is for POLA compatibility, and should satisfy most users. - Demotion factor is a writable sysctl, so user can do foot shooting, if he desires to.
|
#
08b68b0e |
|
15-Dec-2011 |
Gleb Smirnoff <glebius@FreeBSD.org> |
A major overhaul of the CARP implementation. The ip_carp.c was started from scratch, copying needed functionality from the old implemenation on demand, with a thorough review of all code. The main change is that interface layer has been removed from the CARP. Now redundant addresses are configured exactly on the interfaces, they run on. The CARP configuration itself is, as before, configured and read via SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or SIOCAIFADDR_IN6 may now be configured to a particular virtual host id, which makes the prefix redundant. ifconfig(8) semantics has been changed too: now one doesn't need to clone carpXX interface, he/she should directly configure a vhid on a Ethernet interface. To supply vhid data from the kernel to an application the getifaddrs(8) function had been changed to pass ifam_data with each address. [1] The new implementation definitely closes all PRs related to carp(4) being an interface, and may close several others. It also allows to run a single redundant IP per interface. Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for idea on using ifam_data and for several rounds of reviewing! PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448 Reviewed by: bz Submitted by: bz [1]
|
#
4b22573a |
|
11-Nov-2011 |
Brooks Davis <brooks@FreeBSD.org> |
In r191367 the need for if_free_type() was removed and a new member if_alloctype was used to store the origional interface type. Take advantage of this change by removing all existing uses of if_free_type() in favor of if_free(). MFC after: 1 Month
|
#
2a2e6f0a |
|
14-Oct-2011 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Never switch directly from INIT to MASTER, since this produces nasty status flaps. PR: kern/161123 Submitted by: Damien Fleuriot <dam my.gd> OpenBSD: ip_carp.c, rev. 1.115
|
#
a0ae8f04 |
|
27-Apr-2011 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Make various (pseudo) interfaces compile without INET in the kernel adding appropriate #ifdefs. For module builds the framework needs adjustments for at least carp. Reviewed by: gnn Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems MFC after: 4 days
|
#
07155461 |
|
23-Nov-2010 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Redo r166423. It is important not only skip freeing multicast entires when underlying interface is detached, but also purge pointers to them, to avoid double-free in future.
|
#
a7d5f7eb |
|
19-Oct-2010 |
Jamie Gritton <jamie@FreeBSD.org> |
A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
|
#
6baf7a24 |
|
19-Sep-2010 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Do not convert some meaningful error value to EINVAL. Reviewed by: will
|
#
15249f73 |
|
06-Sep-2010 |
Will Andrews <will@FreeBSD.org> |
Fix CARP in backup mode by properly registering its hooks for INET and INET6 using ipproto_{un,}register() and the newly created ip6proto_{un,}register() so that it can again receive IPPROTO_CARP packets allowing its state machine to work. Reviewed by: bz Approved by: ken (mentor)
|
#
e24fa11d |
|
06-Sep-2010 |
Will Andrews <will@FreeBSD.org> |
Fix static kernel builds with carp(4) by changing its SYSINIT order so that it is initialized after basic protocol initialization, which allows it to register via pf_proto_register(). Reviewed by: bz Approved by: ken (mentor)
|
#
9963e8a5 |
|
11-Aug-2010 |
Will Andrews <will@FreeBSD.org> |
Unbreak LINT by moving all carp hooks to net/if.c / netinet/ip_carp.h, with the appropriate ifdefs. Reviewed by: bz Approved by: ken (mentor)
|
#
54bfbd51 |
|
10-Aug-2010 |
Will Andrews <will@FreeBSD.org> |
Allow carp(4) to be loaded as a kernel module. Follow precedent set by bridge(4), lagg(4) etc. and make use of function pointers and pf_proto_register() to hook carp into the network stack. Currently, because of the uncertainty about whether the unload path is free of race condition panics, unloads are disallowed by default. Compiling with CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure. This commit requires IP6PROTOSPACER, introduced in r211115. Reviewed by: bz, simon Approved by: ken (mentor) MFC after: 2 weeks
|
#
9fe5092d |
|
08-Aug-2010 |
Xin LI <delphij@FreeBSD.org> |
Address an edge condition that we found at work, where the carp(4) interface goes to issue LINK_UP, then LINK_DOWN, then LINK_UP at cold boot. This behavior is not observed when carp(4) interface is created slightly later, when the underlying interface is fully up. Before this change what happen at boot is roughly: - ifconfig creates em0 interface; - ifconfig clones a carp device using em0; (em0's link state is DOWN at this point) - carp state: INIT -> BACKUP [*] - carp state: BACKUP -> MASTER - [Some negotiate between em0 and switch] - em0 kicks up link state change event (em0's link state is now up DOWN at this point) - do_link_state_change() -> carp_carpdev_state() - carp state: MASTER -> INIT (via carp_set_state(sc, INIT)) [+] - carp state: INIT -> BACKUP - carp state: BACKUP -> MASTER At the [*] stage, em0 did not received any broadcast message from other node, and assume our node is the master, thus carp(4) sets the link state to "UP" after becoming a master. At [+], the master status is forcely set to "INIT", then an election is casted, after which our node would actually become a master. We believe that at the [*] stage, the master status should remain as "INIT" since the underlying parent interface's link state is not up. Obtained from: iXsystems, Inc. Reported by: jpaetzel MFC after: 2 months
|
#
60ee8f1a |
|
10-Jan-2010 |
Ruslan Ermilov <ru@FreeBSD.org> |
MFC: r200026,201801: Swap carp(4) log levels.
|
#
acc0fee0 |
|
08-Jan-2010 |
Ruslan Ermilov <ru@FreeBSD.org> |
Complete the swap of carp(4) log levels and document the change. MFC after: 3 days
|
#
e81ab876 |
|
02-Dec-2009 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Until this moment carp(4) used a strange logging priority. It used debug priority for such important information as MASTER/BACKUP state change, and used a normal logging priority for such innocent messages as receiving short packet (which is a normal VRRP packet between some other routers) or receving a CARP packet on non-carp interface (someone else running CARP). This commit shifts message logging priorities to a more sane default.
|
#
566abe95 |
|
19-Aug-2009 |
Will Andrews <will@FreeBSD.org> |
MFC r196397 from head: Fix CARP memory leaks on carp_if's malloc'd using M_CARP. This occurs when CARP tries to free them using M_IFADDR after the last address for a virtual host is removed and when detaching from the parent interface. Approved by: re (kib), ken (mentor)
|
#
52e12426 |
|
19-Aug-2009 |
Will Andrews <will@FreeBSD.org> |
Fix CARP memory leaks on carp_if's malloc'd using M_CARP. This occurs when CARP tries to free them using M_IFADDR after the last address for a virtual host is removed and when detaching from the parent interface. Reviewed by: mlaier Approved by: re (kib), ken (mentor)
|
#
530c0060 |
|
01-Aug-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
|
#
16e324fc |
|
30-Jul-2009 |
Xin LI <delphij@FreeBSD.org> |
Show interface name which received short CARP packet (e.g. a VRRP packet), in order to match other codepaths nearby. This makes troubleshooting easier. Approved by: re (kib) MFC after: 1 month
|
#
eddfbb76 |
|
14-Jul-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
|
#
d1da0a06 |
|
25-Jun-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Add address list locking for in6_ifaddrhead/ia_link: as with locking for in_ifaddrhead, we stick with an rwlock for the time being, which we will revisit in the future with a possible move to rmlocks. Some pieces of code require significant further reworking to be safe from all classes of writer-writer races. Reviewed by: bz MFC after: 6 weeks
|
#
2d9cfaba |
|
25-Jun-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the in_ifaddrhead and INADDR_HASH address lists. Previously, these lists were used unsynchronized as they were effectively never changed in steady state, but we've seen increasing reports of writer-writer races on very busy VPN servers as core count has gone up (and similar configurations where address lists change frequently and concurrently). For the time being, use rwlocks rather than rmlocks in order to take advantage of their better lock debugging support. As a result, we don't enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion is complete and a performance analysis has been done. This means that one class of reader-writer races still exists. MFC after: 6 weeks Reviewed by: bz
|
#
32187eb6 |
|
24-Jun-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Fix CARP build. Reported by: bz
|
#
80af0152 |
|
24-Jun-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Convert netinet6 to using queue(9) rather than hand-crafted linked lists for the global IPv6 address list (in6_ifaddr -> in6_ifaddrhead). Adopt the code styles and conventions present in netinet where possible. Reviewed by: gnn, bz MFC after: 6 weeks (possibly not MFCable?)
|
#
8c0fec80 |
|
23-Jun-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Modify most routines returning 'struct ifaddr *' to return references rather than pointers, requiring callers to properly dispose of those references. The following routines now return references: ifaddr_byindex ifa_ifwithaddr ifa_ifwithbroadaddr ifa_ifwithdstaddr ifa_ifwithnet ifaof_ifpforaddr ifa_ifwithroute ifa_ifwithroute_fib rt_getifa rt_getifa_fib IFP_TO_IA ip_rtaddr in6_ifawithifp in6ifa_ifpforlinklocal in6ifa_ifpwithaddr in6_ifadd carp_iamatch6 ip6_getdstifaddr Remove unused macro which didn't have required referencing: IFP_TO_IA6 This closes many small races in which changes to interface or address lists while an ifaddr was in use could lead to use of freed memory (etc). In a few cases, add missing if_addr_list locking required to safely acquire references. Because of a lack of deep copying support, we accept a race in which an in6_ifaddr pointed to by mbuf tags and extracted with ip6_getdstifaddr() doesn't hold a reference while in transmit. Once we have mbuf tag deep copy support, this can be fixed. Reviewed by: bz Obtained from: Apple, Inc. (portions) MFC after: 6 weeks (portions)
|
#
33cde130 |
|
29-Apr-2009 |
Bruce M Simpson <bms@FreeBSD.org> |
Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit: import from p4 bms_netdev. Summary of changes: * Connect netinet6/in6_mcast.c to build. The legacy KAME KPIs are mostly preserved. * Eliminate now dead code from ip6_output.c. Don't do mbuf bingo, we are not going to do RFC 2292 style CMSG tricks for multicast options as they are not required by any current IPv6 normative reference. * Refactor transports (UDP, raw_ip6) to do own mcast filtering. SCTP, TCP unaffected by this change. * Add ip6_msource, in6_msource structs to in6_var.h. * Hookup mld_ifinfo state to in6_ifextra, allocate from domifattach path. * Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced. Kernel consumers which need this should use in6m_lookup(). * Refactor IPv6 socket group memberships to use a vector (like IPv4). * Update ifmcstat(8) for IPv6 SSM. * Add witness lock order for IN6_MULTI_LOCK. * Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths. * Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup. * Update carp(4) for new IPv6 SSM KPIs. * Virtualize ip6_mrouter socket. Changes mostly localized to IPv6 MROUTING. * Don't do a local group lookup in MROUTING. * Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge(). * Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode. * Bump __FreeBSD_version to 800084. * Update UPDATING. NOTE WELL: * This code hasn't been tested against real MLDv2 queriers (yet), although the on-wire protocol has been verified in Wireshark. * There are a few unresolved issues in the socket layer APIs to do with scope ID propagation. * There is a LOR present in ip6_output()'s use of in6_setscope() which needs to be resolved. See comments in mld6.c. This is believed to be benign and can't be avoided for the moment without re-introducing an indirect netisr. This work was mostly derived from the IGMPv3 implementation, and has been sponsored by a third party.
|
#
db091502 |
|
26-Apr-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Acquire IF_ADDR_LOCK() around most iterations over ifp->if_addrhead (colloquially known as if_addrlist). Currently not acquired around interface address loops that call out to the routing code due to potential lock order issues. MFC after: 3 weeks
|
#
279aa3d4 |
|
16-Apr-2009 |
Kip Macy <kmacy@FreeBSD.org> |
Change if_output to take a struct route as its fourth argument in order to allow passing a cached struct llentry * down to L2 Reviewed by: rwatson
|
#
6bf65bcf |
|
12-Apr-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Update stats in struct carpstats using two new macros: CARPSTATS_ADD() and CARPSTATS_INC(), rather than directly manipulating the fields of the structure. This will make it easier to change the implementation of these statistics, such as using per-CPU versions of the data structure. MFC after: 3 days
|
#
6e6b3f7c |
|
14-Dec-2008 |
Qing Li <qingli@FreeBSD.org> |
This main goals of this project are: 1. separating L2 tables (ARP, NDP) from the L3 routing tables 2. removing as much locking dependencies among these layers as possible to allow for some parallelism in the search operations 3. simplify the logic in the routing code, The most notable end result is the obsolescent of the route cloning (RTF_CLONING) concept, which translated into code reduction in both IPv4 ARP and IPv6 NDP related modules, and size reduction in struct rtentry{}. The change in design obsoletes the semantics of RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland applications such as "arp" and "ndp" have been modified to reflect those changes. The output from "netstat -r" shows only the routing entries. Quite a few developers have contributed to this project in the past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and Andre Oppermann. And most recently: - Kip Macy revised the locking code completely, thus completing the last piece of the puzzle, Kip has also been conducting active functional testing - Sam Leffler has helped me improving/refactoring the code, and provided valuable reviews - Julian Elischer setup the perforce tree for me and has helped me maintaining that branch before the svn conversion
|
#
0b476f1c |
|
05-Dec-2008 |
Gleb Smirnoff <glebius@FreeBSD.org> |
In a case of CARP status change run through the if_link_state_change() routine, so that devd(8) and others are notified about link state change.
|
#
4b79449e |
|
02-Dec-2008 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files. For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h. Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation
|
#
1ede983c |
|
23-Oct-2008 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months
|
#
d7f03759 |
|
19-Oct-2008 |
Ulf Lilleengen <lulf@FreeBSD.org> |
- Import the HEAD csup code which is the basis for the cvsmode work.
|
#
8b615593 |
|
02-Oct-2008 |
Marko Zec <zec@FreeBSD.org> |
Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_*() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(*). (*) netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
|
#
603724d3 |
|
17-Aug-2008 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch
|
#
7972c979 |
|
14-Jul-2008 |
Ermal Luçi <eri@FreeBSD.org> |
Fix carp(4) panics that can occur during carp interface configuration. Approved by: mlaier (mentor) Reported by: Scott Ullrich MFC after: 1 week
|
#
1ead26d4 |
|
02-Jun-2008 |
Max Laier <mlaier@FreeBSD.org> |
Sort IP addresses before hashing them for the signature. Otherwise carp is sensitive to address configuration order. PR: kern/121574 Reported by: Douglas K. Rand, Wouter de Jong Obtained from: OpenBSD (rev 1.114 + fixes) MFC after: 2 weeks
|
#
e60a0104 |
|
07-Feb-2008 |
Gleb Smirnoff <glebius@FreeBSD.org> |
If the vhid already present, return EEXIST instead of non-informative EINVAL.
|
#
4b421e2d |
|
07-Oct-2007 |
Mike Silbersack <silby@FreeBSD.org> |
Add FBSDID to all files in netinet so that people can more easily include file version information in bug reports. Approved by: re (kensmith)
|
#
c6b28997 |
|
28-Jul-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Replace references to NET_CALLOUT_MPSAFE with CALLOUT_MPSAFE, and remove definition of NET_CALLOUT_MPSAFE, which is no longer required now that debug.mpsafenet has been removed. The once over: bz Approved by: re (kensmith)
|
#
71498f30 |
|
12-Jun-2007 |
Bruce M Simpson <bms@FreeBSD.org> |
Import rewrite of IPv4 socket multicast layer to support source-specific and protocol-independent host mode multicast. The code is written to accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work. This change only pertains to FreeBSD's use as a multicast end-station and does not concern multicast routing; for an IGMPv3/MLDv2 router implementation, consider the XORP project. The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6, which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html Summary * IPv4 multicast socket processing is now moved out of ip_output.c into a new module, in_mcast.c. * The in_mcast.c module implements the IPv4 legacy any-source API in terms of the protocol-independent source-specific API. * Source filters are lazy allocated as the common case does not use them. They are part of per inpcb state and are covered by the inpcb lock. * struct ip_mreqn is now supported to allow applications to specify multicast joins by interface index in the legacy IPv4 any-source API. * In UDP, an incoming multicast datagram only requires that the source port matches the 4-tuple if the socket was already bound by source port. An unbound socket SHOULD be able to receive multicasts sent from an ephemeral source port. * The UDP socket multicast filter mode defaults to exclusive, that is, sources present in the per-socket list will be blocked from delivery. * The RFC 3678 userland functions have been added to libc: setsourcefilter, getsourcefilter, setipv4sourcefilter, getipv4sourcefilter. * Definitions for IGMPv3 are merged but not yet used. * struct sockaddr_storage is now referenced from <netinet/in.h>. It is therefore defined there if not already declared in the same way as for the C99 types. * The RFC 1724 hack (specify 0.0.0.0/8 addresses to IP_MULTICAST_IF which are then interpreted as interface indexes) is now deprecated. * A patch for the Rhyolite.com routed in the FreeBSD base system is available in the -net archives. This only affects individuals running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces. * Make IPv6 detach path similar to IPv4's in code flow; functionally same. * Bump __FreeBSD_version to 700048; see UPDATING. This work was financially supported by another FreeBSD committer. Obtained from: p4://bms_netdev Submitted by: Wilbert de Graaf (original work) Reviewed by: rwatson (locking), silence from fenner, net@ (but with encouragement)
|
#
e9bf9fb6 |
|
06-Jun-2007 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Do not leak lock in the case of EEXIST error. PR: kern/92776 Submitted by: Ed Schouten <Ed.Schouten tunix.nl>
|
#
fbfdcf87 |
|
02-Feb-2007 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Since rev. 1.94 of netinet/in.c, the netinet layer frees all its multicast memberships, when interface is detached. Thus, when an underlying interface is detached, we do not need to free our multicast memberships. Reviewed by: bms
|
#
3cf0d024 |
|
25-Jan-2007 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Make it possible that carpdetach() unlocks on return. Then, in carp_clone_destroy() we are on a safe side, we don't need to unlock the cif, that can me already non-existent at this point. Reported by: Anton Yuzhaninov <citrin rambler-co.ru>
|
#
62dae1e9 |
|
25-Jan-2007 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Spacing.
|
#
acd3428b |
|
06-Nov-2006 |
Robert Watson <rwatson@FreeBSD.org> |
Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
|
#
7002145d |
|
07-Oct-2006 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Set scope on MC address so IPv6 carp advertisement will not get dropped in ip6_output. In case this fails handle the error directly and log it[1]. In addition permit CARP over v6 in ip_fw2. PR: kern/98622 Similar patch by: suz Discussed with: glebius [1] Tested by: Paul.Dekkers surfnet.nl, Philippe.Pegon crc.u-strasbg.fr MFC after: 3 days
|
#
13c83844 |
|
25-Sep-2006 |
Bruce M Simpson <bms@FreeBSD.org> |
Fix an incompatibility between CARP and IPv4 multicast routing, whereby the VRRPv2 advertisements will originate from the wrong source address. This only affects kernels compiled with MROUTING and after the MRT_INIT ioctl() has been issued. Set imo_multicast_vif in carp's softc to the invalid value -1 after it is zeroed by softc allocation, to stop the ip_output() path looking up the incorrect source address thinking a vif is set. PR: kern/100532 Submitted by: Bohus Plucinsky MFC after: 1 week
|
#
6b7330e2 |
|
09-Jul-2006 |
Sam Leffler <sam@FreeBSD.org> |
Revise network interface cloning to take an optional opaque parameter that can specify configuration parameters: o rev cloner api's to add optional parameter block o add SIOCCREATE2 that accepts parameter data o rev vlan support to use new api (maintain old code) Reviewed by: arch@
|
#
05206588 |
|
07-Jul-2006 |
Max Laier <mlaier@FreeBSD.org> |
Make in-kernel multicast protocols for pfsync and carp work after enabling dynamic resizing of multicast membership array. Reported and testing by: Maxim Konovalov, Scott Ullrich Reminded by: thompsa MFC after: 2 weeks
|
#
16d878cc |
|
02-Jun-2006 |
Christian S.J. Peron <csjp@FreeBSD.org> |
Fix the following bpf(4) race condition which can result in a panic: (1) bpf peer attaches to interface netif0 (2) Packet is received by netif0 (3) ifp->if_bpf pointer is checked and handed off to bpf (4) bpf peer detaches from netif0 resulting in ifp->if_bpf being initialized to NULL. (5) ifp->if_bpf is dereferenced by bpf machinery (6) Kaboom This race condition likely explains the various different kernel panics reported around sending SIGINT to tcpdump or dhclient processes. But really this race can result in kernel panics anywhere you have frequent bpf attach and detach operations with high packet per second load. Summary of changes: - Remove the bpf interface's "driverp" member - When we attach bpf interfaces, we now set the ifp->if_bpf member to the bpf interface structure. Once this is done, ifp->if_bpf should never be NULL. [1] - Introduce bpf_peers_present function, an inline operation which will do a lockless read bpf peer list associated with the interface. It should be noted that the bpf code will pickup the bpf_interface lock before adding or removing bpf peers. This should serialize the access to the bpf descriptor list, removing the race. - Expose the bpf_if structure in bpf.h so that the bpf_peers_present function can use it. This also removes the struct bpf_if; hack that was there. - Adjust all consumers of the raw if_bpf structure to use bpf_peers_present Now what happens is: (1) Packet is received by netif0 (2) Check to see if bpf descriptor list is empty (3) Pickup the bpf interface lock (4) Hand packet off to process From the attach/detach side: (1) Pickup the bpf interface lock (2) Add/remove from bpf descriptor list Now that we are storing the bpf interface structure with the ifnet, there is is no need to walk the bpf interface list to locate the correct bpf interface. We now simply look up the interface, and initialize the pointer. This has a nice side effect of changing a bpf interface attach operation from O(N) (where N is the number of bpf interfaces), to O(1). [1] From now on, we can no longer check ifp->if_bpf to tell us whether or not we have any bpf peers that might be interested in receiving packets. In collaboration with: sam@ MFC after: 1 month
|
#
0fa08018 |
|
21-Mar-2006 |
Gleb Smirnoff <glebius@FreeBSD.org> |
o Introduce carp_multicast_cleanup(), which removes and frees multicast addresses from carp interface. [1] o Rewrite carpdetach(), so that it does the following things: [1] - Stops callouts. - Decrements carp_suppress_preempt, if needed. - Downs interface and sets CARP state to INIT. - Calls carp_multicast_cleanup(). - Detaches softc from carp_if and if we are the last frees the carp_if. o Use new carpdetach() in carp_clone_destroy(). o In carp_ifdetach() acquire the carp_if lock and cleanup all interfaces hanging on carp_if. [1] o Make carp_ifdetach() static and use EVENT(9) to call it from if_detach(). [2] o In carp_setrun() exit if the softc doesn't have a valid pointer to parent. [1] Obtained from: OpenBSD [1] Submitted by: Dan Lukes <dan obluda.cz> [2] PR: kern/82908 [2]
|
#
21883761 |
|
16-Nov-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
MFOpenBSD 1.62: Prevent backup CARP hosts from replying to arp requests, fixes strangeness with some layer-3 switches. From Bill Marquette. Tested by: Kazuaki Oda <kaakun highway.ne.jp>
|
#
433aaf04 |
|
13-Nov-2005 |
Ruslan Ermilov <ru@FreeBSD.org> |
Unbreak for !INET6 case.
|
#
4a0d6638 |
|
11-Nov-2005 |
Ruslan Ermilov <ru@FreeBSD.org> |
- Store pointer to the link-level address right in "struct ifnet" rather than in ifindex_table[]; all (except one) accesses are through ifp anyway. IF_LLADDR() works faster, and all (except one) ifaddr_byindex() users were converted to use ifp->if_addr. - Stop storing a (pointer to) Ethernet address in "struct arpcom", and drop the IFP2ENADDR() macro; all users have been converted to use IF_LLADDR() instead.
|
#
4e7e0183 |
|
08-Nov-2005 |
Andrew Thompson <thompsa@FreeBSD.org> |
Move the cloned interface list management in to if_clone. For some drivers the softc lists and associated mutex are now unused so these have been removed. Calling if_clone_detach() will now destroy all the cloned interfaces for the driver and in most cases is all thats needed to unload. Idea by: brooks Reviewed by: brooks
|
#
9f4abef9 |
|
25-Oct-2005 |
Yaroslav Tykhiy <ytykhiy@gmail.com> |
Since carp(4) interfaces presently are kinda fake yet possess IP addresses, mark them with LOOPBACK so that routing daemons take them easy for link-state routing protocols. Reviewed by: glebius
|
#
1e4b3606 |
|
22-Oct-2005 |
Max Laier <mlaier@FreeBSD.org> |
Fix build after in6_joingroup change. It remains unclear if DAD breaks CARP or not.
|
#
febd0759 |
|
12-Oct-2005 |
Andrew Thompson <thompsa@FreeBSD.org> |
Change the reference counting to count the number of cloned interfaces for each cloner. This ensures that ifc->ifc_units is not prematurely freed in if_clone_detach() before the clones are destroyed, resulting in memory modified after free. This could be triggered with if_vlan. Assert that all cloners have been destroyed when freeing the memory. Change all simple cloners to destroy their clones with ifc_simple_destroy() on module unload so the reference count is properly updated. This also cleans up the interface destroy routines and allows future optimisation. Discussed with: brooks, pjd, -current Reviewed by: brooks
|
#
5d40d65b |
|
09-Sep-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
When a carp(4) interface is being destroyed and is in a promiscous mode, first interface is detached from parent and then bpfdetach() is called. If the interface was the last carp(4) interface attached to parent, then the mutex on parent is destroyed. When bpfdetach() calls if_setflags() we panic on destroyed mutex. To prevent the above scenario, clear pointer to parent, when we detach ourselves from parent.
|
#
13f4c340 |
|
09-Aug-2005 |
Robert Watson <rwatson@FreeBSD.org> |
Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to ifnet.if_drv_flags. Device drivers are now responsible for synchronizing access to these flags, as they are in if_drv_flags. This helps prevent races between the network stack and device driver in maintaining the interface flags field. Many __FreeBSD__ and __FreeBSD_version checks maintained and continued; some less so. Reviewed by: pjd, bz MFC after: 7 days
|
#
29da8af6 |
|
24-Jul-2005 |
Hajimu UMEMOTO <ume@FreeBSD.org> |
include netinet6/scope6_var.h.
|
#
a1f7e5f8 |
|
24-Jul-2005 |
Hajimu UMEMOTO <ume@FreeBSD.org> |
scope cleanup. with this change - most of the kernel code will not care about the actual encoding of scope zone IDs and won't touch "s6_addr16[1]" directly. - similarly, most of the kernel code will not care about link-local scoped addresses as a special case. - scope boundary check will be stricter. For example, the current *BSD code allows a packet with src=::1 and dst=(some global IPv6 address) to be sent outside of the node, if the application do: s = socket(AF_INET6); bind(s, "::1"); sendto(s, some_global_IPv6_addr); This is clearly wrong, since ::1 is only meaningful within a single node, but the current implementation of the *BSD kernel cannot reject this attempt. Submitted by: JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp> Obtained from: KAME
|
#
a196a3c8 |
|
01-Jul-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
When doing ARP load balancing source IP is taken in network byte order, so residue of division for all hosts on net is the same, and thus only one VHID answers. Change source IP in host byte order. Reviewed by: mlaier Approved by: re (scottl)
|
#
01399f34 |
|
26-Jun-2005 |
David Malone <dwmalone@FreeBSD.org> |
Fix some long standing bugs in writing to the BPF device attached to a DLT_NULL interface. In particular: 1) Consistently use type u_int32_t for the header of a DLT_NULL device - it continues to represent the address family as always. 2) In the DLT_NULL case get bpf_movein to store the u_int32_t in a sockaddr rather than in the mbuf, to be consistent with all the DLT types. 3) Consequently fix a bug in bpf_movein/bpfwrite which only permitted packets up to 4 bytes less than the MTU to be written. 4) Fix all DLT_NULL devices to have the code required to allow writing to their bpf devices. 5) Move the code to allow writing to if_lo from if_simloop to looutput, because it only applies to DLT_NULL devices but was being applied to other devices that use if_simloop possibly incorrectly. PR: 82157 Submitted by: Matthew Luckie <mjl@luckie.org.nz> Approved by: re (scottl)
|
#
fc74a9f9 |
|
10-Jun-2005 |
Brooks Davis <brooks@FreeBSD.org> |
Stop embedding struct ifnet at the top of driver softcs. Instead the struct ifnet or the layer 2 common structure it was embedded in have been replaced with a struct ifnet pointer to be filled by a call to the new function, if_alloc(). The layer 2 common structure is also allocated via if_alloc() based on the interface type. It is hung off the new struct ifnet member, if_l2com. This change removes the size of these structures from the kernel ABI and will allow us to better manage them as interfaces come and go. Other changes of note: - Struct arpcom is no longer referenced in normal interface code. Instead the Ethernet address is accessed via the IFP2ENADDR() macro. To enforce this ac_enaddr has been renamed to _ac_enaddr. - The second argument to ether_ifattach is now always the mac address from driver private storage rather than sometimes being ac_enaddr. Reviewed by: sobomax, sam
|
#
32247f86 |
|
14-May-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
- When carp interface is destroyed, and it affects global preemption suppresion counter, decrease the latter. [1] - Add sysctl to monitor preemption suppression. PR: kern/80972 [1] Submitted by: Frank Volf [1] MFC after: 1 week
|
#
9dc1f8e4 |
|
20-Apr-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Remove anti-LOR bandaid, it is not needed now. Sponsored by: Rambler
|
#
4cb39345 |
|
30-Mar-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
When several carp interfaces are attached to Ethernet interface, carp_carpdev_state_locked() is called every time carp interface is attached. The first call backs up flags of the first interface, and the second call backs up them again, erasing correct values. To solve this, a carp_sc_state_locked() function is introduced. It is called when interface is attached to parent, instead of calling carp_carpdev_state_locked. carp_carpdev_state_locked() calls carp_sc_state_locked() for each sc in chain. Reported by: Yuriy N. Shkandybin, sem
|
#
ee6f2270 |
|
18-Mar-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
If vhid exists return more informative EEXIST instead of EINVAL. While here remove redundant brackets.
|
#
9860bab3 |
|
18-Mar-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Fix a potential crash that could occur when CARP_LOG is being used. Obtained from: OpenBSD (pat)
|
#
b82936c5 |
|
02-Mar-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Fix typo. Unbreak build. Take pointy hat.
|
#
d220759b |
|
01-Mar-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Add more locking when reading/writing to carp softc. When carp softc is attached to a parent interface we use its mutex to lock the softc. This means that in several places like carp_ioctl() we lock softc conditionaly. This should be redesigned. To avoid LORs when MII announces us a link state change, we schedule a quick callout and call carp_carpdev_state_locked() from it. Initialize callouts using NET_CALLOUT_MPSAFE. Sponsored by: Rambler Reviewed by: mlaier
|
#
d92d54d5 |
|
28-Feb-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
- Add carp_mtx. Use it to protect list of all carp interfaces. - In carp_send_ad_all() walk through list of all carp interfaces instead of walking through list of all interfaces. Sponsored by: Rambler Reviewed by: mlaier
|
#
3a84d72a |
|
01-Mar-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Revert change to struct ifnet. Use ifnet pointer in softc. Embedding ifnet into smth will soon be removed. Requested by: brooks
|
#
4358dfc3 |
|
01-Mar-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Remove debugging printf. Reviewed by: mlaier
|
#
630481bb |
|
28-Feb-2005 |
Yaroslav Tykhiy <ytykhiy@gmail.com> |
Support running carp(4) over a vlan(4) parent interface. Encouraged by: glebius
|
#
1d0a2376 |
|
28-Feb-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Remove unused field from carp softc. OK'ed by: mcbride@OpenBSD
|
#
3e07def4 |
|
28-Feb-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Fix tcpdump(8) on carp(4) interface: - Use our loop DLT type, not OpenBSD. [1] - The fields that are converted to network byte order are not 32-bit fields but 16-bit fields, so htons should be used in htonl. [1] - Secondly, ip_input changes ip->ip_len into its value without the ip-header length. So, restore the length to make bpf happy. [1] - Use bpf_mtap2(), use temporary af1, since bpf_mtap2 doesn't understand uint8_t af identifier. Submitted by: Frank Volf [1]
|
#
a4e53905 |
|
27-Feb-2005 |
Max Laier <mlaier@FreeBSD.org> |
Unbreak the build. carp_iamatch6 and carp_macmatch6 are not supposed to be static as they are used elsewhere.
|
#
e8c34a71 |
|
26-Feb-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Remove carp_softc.sc_ifp member in favor of union pointers in struct ifnet. Obtained from: OpenBSD
|
#
5c1f0f6d |
|
26-Feb-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Staticize local functions.
|
#
88bf82a6 |
|
25-Feb-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
New lines when logging.
|
#
947b7cf3 |
|
25-Feb-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Embrace macros with do {} while (0) Submitted by: maxim
|
#
39aeaa0e |
|
25-Feb-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Call carp_carpdev_state() from carp_set_addr6(). See log for rev 1.4. Sponsored by: Rambler
|
#
1e9e6572 |
|
25-Feb-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Improve logging: - Simplify CARP_LOG() and making it working (we don't have addlog in FreeBSD). - Introduce CARP_DEBUG() which logs with LOG_DEBUG severity when net.inet.carp.log > 1 - Use CARP_DEBUG to log state changes of carp interfaces. After CARP_LOG() cleanup it appeared that carp_input_c() does not need sc argument. Remove it. Sponsored by: Rambler
|
#
6fba4c0b |
|
24-Feb-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Fix problem when master comes up with one interface down, and preempts mastering on all other interfaces: - call carp_carpdev_state() on initialize instead of just setting to INIT - in carp_carpdev_state() check that interface is UP, instead of checking that it is not DOWN, because a rebooted machine may have interface in UNKNOWN state. Sponsored by: Rambler Obtained from: OpenBSD (partially)
|
#
23687377 |
|
22-Feb-2005 |
Maxime Henrion <mux@FreeBSD.org> |
Unbreak CARP build on 64-bit architectures. Tested on: sparc64
|
#
67df4214 |
|
22-Feb-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Remove promisc counter from parent interface in carp_clone_destroy(), so that parent interface is not left in promiscous mode after carp interface is destroyed. This is not perfect, since promisc counter is added when carp interface is assigned an IP address. However, when address is removed parent interface is still in promiscuous mode. Only removal of carp interface removes promisc from parent. Same way in OpenBSD. Sponsored by: Rambler
|
#
a9771948 |
|
22-Feb-2005 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Add CARP (Common Address Redundancy Protocol), which allows multiple hosts to share an IP address, providing high availability and load balancing. Original work on CARP done by Michael Shalayeff, with many additions by Marco Pfatschbacher and Ryan McBride. FreeBSD port done solely by Max Laier. Patch by: mlaier Obtained from: OpenBSD (mickey, mcbride)
|