1$NetBSD: TODO.smpnet,v 1.48 2024/04/24 06:44:18 nia Exp $ 2 3MP-safe components 4================== 5 6They work without the big kernel lock (KERNEL_LOCK), i.e., with NET_MPSAFE 7kernel option. Some components scale up and some don't. 8 9 - Device drivers 10 - aq(4) 11 - bcmgenet(4) 12 - bge(4) 13 - ena(4) 14 - iavf(4) 15 - ixg(4) 16 - ixl(4) 17 - ixv(4) 18 - mcx(4) 19 - rge(4) 20 - se(4) 21 - sunxi_emac(4) 22 - vioif(4) 23 - vmx(4) 24 - wm(4) 25 - xennet(4) 26 - usbnet(4) based adapters: 27 - axe(4) 28 - axen(4) 29 - cdce(4) 30 - cue(4) 31 - kue(4) 32 - mos(4) 33 - mue(4) 34 - smsc(4) 35 - udav(4) 36 - upl(4) 37 - ure(4) 38 - url(4) 39 - urndis(4) 40 - Layer 2 41 - Ethernet (if_ethersubr.c) 42 - bridge(4) 43 - STP 44 - Fast forward (ipflow) 45 - Layer 3 46 - All except for items in the below section 47 - Interfaces 48 - canloop(4) 49 - gif(4) 50 - ipsecif(4) 51 - l2tp(4) 52 - lagg(4) 53 - pppoe(4) 54 - if_spppsubr.c 55 - tap(4) 56 - tun(4) 57 - vether(4) 58 - vlan(4) 59 - Packet filters 60 - npf(7) 61 - Others 62 - bpf(4) 63 - ipsec(4) 64 - opencrypto(9) 65 - pfil(9) 66 67Non MP-safe components and kernel options 68========================================= 69 70The components and options aren't MP-safe, i.e., requires the big kernel lock, 71yet. Some of them can be used safely even if NET_MPSAFE is enabled because 72they're still protected by the big kernel lock. The others aren't protected and 73so unsafe, e.g, they may crash the kernel. 74 75Protected ones 76-------------- 77 78 - Device drivers 79 - Most drivers other than ones listed in the above section 80 - Layer 4 81 - DCCP 82 - SCTP 83 - TCP 84 - UDP 85 86Unprotected ones 87---------------- 88 89 - Layer 2 90 - ARCNET (if_arcsubr.c) 91 - IEEE 1394 (if_ieee1394subr.c) 92 - IEEE 802.11 (ieee80211(4)) 93 - Layer 3 94 - IPSELSRC 95 - MROUTING 96 - PIM 97 - MPLS (mpls(4)) 98 - IPv6 address selection policy 99 - Interfaces 100 - agr(4) 101 - carp(4) 102 - faith(4) 103 - gre(4) 104 - ppp(4) 105 - sl(4) 106 - stf(4) 107 - if_srt 108 - Packet filters 109 - ipf(4) 110 - pf(4) 111 - Others 112 - AppleTalk (sys/netatalk/) 113 - Bluetooth (sys/netbt/) 114 - altq(4) 115 - kttcp(4) 116 - NFS 117 118Know issues 119=========== 120 121NOMPSAFE 122-------- 123 124We use "NOMPSAFE" as a mark that indicates that the code around it isn't MP-safe 125yet. We use it in comments and also use as part of function names, for example 126m_get_rcvif_NOMPSAFE. Let's use "NOMPSAFE" to make it easy to find non-MP-safe 127codes by grep. 128 129bpf 130--- 131 132MP-ification of bpf requires all of bpf_mtap* are called in normal LWP context 133or softint context, i.e., not in hardware interrupt context. For Tx, all 134bpf_mtap satisfy the requirement. For Rx, most of bpf_mtap are called in softint. 135Unfortunately some bpf_mtap on Rx are still called in hardware interrupt context. 136 137This is the list of the functions that have such bpf_mtap: 138 139 - sca_frame_process() @ sys/dev/ic/hd64570.c 140 141Ideally we should make the functions run in softint somehow, but we don't have 142actual devices, no time (or interest/love) to work on the task, so instead we 143provide a deferred bpf_mtap mechanism that forcibly runs bpf_mtap in softint 144context. It's a workaround and once the functions run in softint, we should use 145the original bpf_mtap again. 146 147if_mcast_op() - SIOCADDMULTI/SIOCDELMULTI 148----------------------------------------- 149Helper function is called to add or remove multicast addresses for 150interface. When called via ioctl it takes IFNET_LOCK(), when called 151via sosetopt() it doesn't. 152 153Various network drivers can't assert IFNET_LOCKED() in their if_ioctl 154because of this. Generally drivers still take care to splnet() even 155with NET_MPSAFE before calling ether_ioctl(), but they do not take 156KERNEL_LOCK(), so this is actually unsafe. 157 158Lingering obsolete variables 159----------------------------- 160 161Some obsolete global variables and member variables of structures remain to 162avoid breaking old userland programs which directly access such variables via 163kvm(3). 164 165The following programs still use kvm(3) to get some information related to 166the network stack. 167 168 - netstat(1) 169 - vmstat(1) 170 - fstat(1) 171 172netstat(1) accesses ifnet_list, the head of a list of interface objects 173(struct ifnet), and traverses each object through ifnet#if_list member variable. 174ifnet_list and ifnet#if_list is obsoleted by ifnet_pslist and 175ifnet#if_pslist_entry respectively. netstat also accesses the IP address list 176of an interface through ifnet#if_addrlist. struct ifaddr, struct in_ifaddr 177and struct in6_ifaddr are accessed and the following obsolete member variables 178are stuck: ifaddr#ifa_list, in_ifaddr#ia_hash, in_ifaddr#ia_list, 179in6_ifaddr#ia_next and in6_ifaddr#_ia6_multiaddrs. Note that netstat already 180implements alternative methods to fetch the above information via sysctl(3). 181 182vmstat(1) shows statistics of hash tables created by hashinit(9) in the kernel. 183The statistic information is retrieved via kvm(3). The global variables 184in_ifaddrhash and in_ifaddrhashtbl, which are for a hash table of IPv4 185addresses and obsoleted by in_ifaddrhash_pslist and in_ifaddrhashtbl_pslist, 186are kept for this purpose. We should provide a means to fetch statistics of 187hash tables via sysctl(3). 188 189fstat(1) shows information of bpf instances. Each bpf instance (struct bpf) is 190obtained via kvm(3). bpf_d#_bd_next, bpf_d#_bd_filter and bpf_d#_bd_list 191member variables are obsolete but remain. ifnet#if_xname is also accessed 192via struct bpf_if and obsolete ifnet#if_list is required to remain to not change 193the offset of ifnet#if_xname. The statistic counters (bpf#bd_rcount, 194bpf#bd_dcount and bpf#bd_ccount) are also victims of this restriction; for 195scalability the statistic counters should be per-CPU and we should stop using 196atomic operations for them however we have to remain the counters and atomic 197operations. 198 199Scalability 200----------- 201 202 - Per-CPU rtcaches (used in say IP forwarding) aren't scalable on multiple 203 flows per CPU 204 - ipsec(4) isn't scalable on the number of SA/SP; the cost of a look-up 205 is O(n) 206 - opencrypto(9)'s crypto_newsession()/crypto_freesession() aren't scalable 207 as they are serialized by one mutex 208 209ALTQ 210---- 211 212If ALTQ is enabled in the kernel, it enforces to use just one Tx queue (if_snd) 213for packet transmissions, resulting in serializing all Tx packet processing on 214the queue. We should probably design and implement an alternative queuing 215mechanism that deals with multi-core systems at the first place, not making the 216existing ALTQ MP-safe because it's just annoying. 217 218Using kernel modules 219-------------------- 220 221Please note that if you enable NET_MPSAFE in your kernel, and you use and 222loadable kernel modules (including compat_xx modules or individual network 223interface if_xxx device driver modules), you will need to build custom 224modules. For each module you will need to add the following line to its 225Makefile: 226 227 CPPFLAGS+= NET_MPSAFE 228 229Failure to do this may result in unpredictable behavior. 230 231IPv4 address initialization atomicity 232------------------------------------- 233 234An IPv4 address is referenced by several data structures: an associated 235interface, its local route, a connected route (if necessary), the global list, 236the global hash table, etc. These data structures are not updated atomically, 237i.e., there can be inconsistent states on an IPv4 address in the kernel during 238the initialization of an IPv4 address. 239 240One known failure of the issue is that incoming packets destinating to an 241initializing address can loop in the network stack in a short period of time. 242The address initialization creates an local route first and then registers an 243initializing address to the global hash table that is used to decide if an 244incoming packet destinates to the host by checking the destination of the packet 245is registered to the hash table. So, if the host allows forwarding, an incoming 246packet can match on a local route of an initializing address at ip_output while 247it fails the to-self check described above at ip_input. Because a matched local 248route points a loopback interface as its destination interface, an incoming 249packet sends to the network stack (ip_input) again, which results in looping. 250The loop stops once an initializing address is registered to the hash table. 251 252One solution of the issue is to reorder the address initialization instructions, 253first register an address to the hash table then create its routes. Another 254solution is to use the routing table for the to-self check instead of using the 255global hash table, like IPv6. 256 257if_flags 258-------- 259 260To avoid data race on if_flags it should be protected by a lock (currently it's 261IFNET_LOCK). Thus, if_flags should not be accessed on packet processing to 262avoid performance degradation by lock contentions. Traditionally IFF_RUNNING, 263IFF_UP and IFF_OACTIVE flags of if_flags are checked on packet processing. If 264you make a driver MP-safe you must remove such checks. 265 266Drivers should not touch IFF_ALLMULTI. They are tempted to do so when updating 267hardware multicast filters on SIOCADDMULTI/SIOCDELMULTI. Instead, they should 268use the ETHER_F_ALLMULTI bit in struct ethercom::ec_flags, under ETHER_LOCK. 269ether_ioctl takes care of presenting IFF_ALLMULTI according to the current state 270of ETHER_F_ALLMULTI when queried with SIOCGIFFLAGS. 271 272Also IFF_PROMISC is checked in ether_input and we should get rid of it somehow. 273