#
490a79fa |
|
06-Mar-2024 |
Eric Dumazet <edumazet@google.com> |
net: introduce include/net/rps.h Move RPS related structures and helpers from include/linux/netdevice.h and include/net/sock.h to a new include file. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-18-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
a562c0a2 |
|
19-Dec-2023 |
Eric Dumazet <edumazet@google.com> |
sctp: fix busy polling Busy polling while holding the socket lock makes litle sense, because incoming packets wont reach our receive queue. Fixes: 8465a5fcd1ce ("sctp: add support for busy polling to sctp protocol") Reported-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4746b36b |
|
12-Dec-2023 |
Eric Dumazet <edumazet@google.com> |
sctp: support MSG_ERRQUEUE flag in recvmsg() For some reason sctp_poll() generates EPOLLERR if sk->sk_error_queue is not empty but recvmsg() can not drain the error queue yet. This is needed to better support timestamping. I had to export inet_recv_error(), since sctp can be compiled as a module. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20231212145550.3872051-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
1f4e803c |
|
01-Oct-2023 |
Xin Long <lucien.xin@gmail.com> |
sctp: update hb timer immediately after users change hb_interval Currently, when hb_interval is changed by users, it won't take effect until the next expiry of hb timer. As the default value is 30s, users have to wait up to 30s to wait its hb_interval update to work. This becomes pretty bad in containers where a much smaller value is usually set on hb_interval. This patch improves it by resetting the hb timer immediately once the value of hb_interval is updated by users. Note that we don't address the already existing 'problem' when sending a heartbeat 'on demand' if one hb has just been sent(from the timer) mentioned in: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg590224.html Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/r/75465785f8ee5df2fb3acdca9b8fafdc18984098.1696172660.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
dc9511dd |
|
30-Aug-2023 |
Eric Dumazet <edumazet@google.com> |
sctp: annotate data-races around sk->sk_wmem_queued sk->sk_wmem_queued can be read locklessly from sctp_poll() Use sk_wmem_queued_add() when the field is changed, and add READ_ONCE() annotations in sctp_writeable() and sctp_assocs_seq_show() syzbot reported: BUG: KCSAN: data-race in sctp_poll / sctp_wfree read-write to 0xffff888149d77810 of 4 bytes by interrupt on cpu 0: sctp_wfree+0x170/0x4a0 net/sctp/socket.c:9147 skb_release_head_state+0xb7/0x1a0 net/core/skbuff.c:988 skb_release_all net/core/skbuff.c:1000 [inline] __kfree_skb+0x16/0x140 net/core/skbuff.c:1016 consume_skb+0x57/0x180 net/core/skbuff.c:1232 sctp_chunk_destroy net/sctp/sm_make_chunk.c:1503 [inline] sctp_chunk_put+0xcd/0x130 net/sctp/sm_make_chunk.c:1530 sctp_datamsg_put+0x29a/0x300 net/sctp/chunk.c:128 sctp_chunk_free+0x34/0x50 net/sctp/sm_make_chunk.c:1515 sctp_outq_sack+0xafa/0xd70 net/sctp/outqueue.c:1381 sctp_cmd_process_sack net/sctp/sm_sideeffect.c:834 [inline] sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1366 [inline] sctp_side_effects net/sctp/sm_sideeffect.c:1198 [inline] sctp_do_sm+0x12c7/0x31b0 net/sctp/sm_sideeffect.c:1169 sctp_assoc_bh_rcv+0x2b2/0x430 net/sctp/associola.c:1051 sctp_inq_push+0x108/0x120 net/sctp/inqueue.c:80 sctp_rcv+0x116e/0x1340 net/sctp/input.c:243 sctp6_rcv+0x25/0x40 net/sctp/ipv6.c:1120 ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437 ip6_input_finish net/ipv6/ip6_input.c:482 [inline] NF_HOOK include/linux/netfilter.h:303 [inline] ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491 dst_input include/net/dst.h:468 [inline] ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79 NF_HOOK include/linux/netfilter.h:303 [inline] ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309 __netif_receive_skb_one_core net/core/dev.c:5452 [inline] __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566 process_backlog+0x21f/0x380 net/core/dev.c:5894 __napi_poll+0x60/0x3b0 net/core/dev.c:6460 napi_poll net/core/dev.c:6527 [inline] net_rx_action+0x32b/0x750 net/core/dev.c:6660 __do_softirq+0xc1/0x265 kernel/softirq.c:553 run_ksoftirqd+0x17/0x20 kernel/softirq.c:921 smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164 kthread+0x1d7/0x210 kernel/kthread.c:389 ret_from_fork+0x2e/0x40 arch/x86/kernel/process.c:145 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 read to 0xffff888149d77810 of 4 bytes by task 17828 on cpu 1: sctp_writeable net/sctp/socket.c:9304 [inline] sctp_poll+0x265/0x410 net/sctp/socket.c:8671 sock_poll+0x253/0x270 net/socket.c:1374 vfs_poll include/linux/poll.h:88 [inline] do_pollfd fs/select.c:873 [inline] do_poll fs/select.c:921 [inline] do_sys_poll+0x636/0xc00 fs/select.c:1015 __do_sys_ppoll fs/select.c:1121 [inline] __se_sys_ppoll+0x1af/0x1f0 fs/select.c:1101 __x64_sys_ppoll+0x67/0x80 fs/select.c:1101 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x00019e80 -> 0x0000cc80 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 17828 Comm: syz-executor.1 Not tainted 6.5.0-rc7-syzkaller-00185-g28f20a19294d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20230830094519.950007-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
b09bde5c |
|
16-Aug-2023 |
Eric Dumazet <edumazet@google.com> |
inet: move inet->mc_loop to inet->inet_frags IP_MULTICAST_LOOP socket option can now be set/read without locking the socket. v3: fix build bot error reported in ipvs set_mcast_loop() Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f5f80e32 |
|
20-Jul-2023 |
Eric Dumazet <edumazet@google.com> |
ipv6: remove hard coded limitation on ipv6_pinfo IPv6 inet sockets are supposed to have a "struct ipv6_pinfo" field at the end of their definition, so that inet6_sk_generic() can derive from socket size the offset of the "struct ipv6_pinfo". This is very fragile, and prevents adding bigger alignment in sockets, because inet6_sk_generic() does not work if the compiler adds padding after the ipv6_pinfo component. We are currently working on a patch series to reorganize TCP structures for better data locality and found issues similar to the one fixed in commit f5d547676ca0 ("tcp: fix tcp_inet6_sk() for 32bit kernels") Alternative would be to force an alignment on "struct ipv6_pinfo", greater or equal to __alignof__(any ipv6 sock) to ensure there is no padding. This does not look great. v2: fix typo in mptcp_proto_v6_init() (Paolo) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Chao Wu <wwchao@google.com> Cc: Wei Wang <weiwan@google.com> Cc: Coco Li <lixiaoyan@google.com> Cc: YiFei Zhu <zhuyifei@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f866fbc8 |
|
18-Aug-2023 |
Eric Dumazet <edumazet@google.com> |
ipv4: fix data-races around inet->inet_id UDP sendmsg() is lockless, so ip_select_ident_segs() can very well be run from multiple cpus [1] Convert inet->inet_id to an atomic_t, but implement a dedicated path for TCP, avoiding cost of a locked instruction (atomic_add_return()) Note that this patch will cause a trivial merge conflict because we added inet->flags in net-next tree. v2: added missing change in drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c (David Ahern) [1] BUG: KCSAN: data-race in __ip_make_skb / __ip_make_skb read-write to 0xffff888145af952a of 2 bytes by task 7803 on cpu 1: ip_select_ident_segs include/net/ip.h:542 [inline] ip_select_ident include/net/ip.h:556 [inline] __ip_make_skb+0x844/0xc70 net/ipv4/ip_output.c:1446 ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560 udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260 inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg net/socket.c:748 [inline] ____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494 ___sys_sendmsg net/socket.c:2548 [inline] __sys_sendmmsg+0x269/0x500 net/socket.c:2634 __do_sys_sendmmsg net/socket.c:2663 [inline] __se_sys_sendmmsg net/socket.c:2660 [inline] __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff888145af952a of 2 bytes by task 7804 on cpu 0: ip_select_ident_segs include/net/ip.h:541 [inline] ip_select_ident include/net/ip.h:556 [inline] __ip_make_skb+0x817/0xc70 net/ipv4/ip_output.c:1446 ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560 udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260 inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg net/socket.c:748 [inline] ____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494 ___sys_sendmsg net/socket.c:2548 [inline] __sys_sendmmsg+0x269/0x500 net/socket.c:2634 __do_sys_sendmmsg net/socket.c:2663 [inline] __se_sys_sendmmsg net/socket.c:2660 [inline] __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x184d -> 0x184e Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 7804 Comm: syz-executor.1 Not tainted 6.5.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 ================================================================== Fixes: 23f57406b82d ("ipv4: avoid using shared IP generator for connected sockets") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
76f33296 |
|
17-Aug-2023 |
Eric Dumazet <edumazet@google.com> |
sock: annotate data-races around prot->memory_pressure *prot->memory_pressure is read/writen locklessly, we need to add proper annotations. A recent commit added a new race, it is time to audit all accesses. Fixes: 2d0c88e84e48 ("sock: Fix misuse of sk_under_memory_pressure()") Fixes: 4d93df0abd50 ("[SCTP]: Rewrite of sctp buffer management code") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Abel Wu <wuyun.abel@bytedance.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Link: https://lore.kernel.org/r/20230818015132.2699348-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
6feb37b3 |
|
26-Jun-2023 |
Chengfeng Ye <dg573847474@gmail.com> |
sctp: fix potential deadlock on &net->sctp.addr_wq_lock As &net->sctp.addr_wq_lock is also acquired by the timer sctp_addr_wq_timeout_handler() in protocal.c, the same lock acquisition at sctp_auto_asconf_init() seems should disable irq since it is called from sctp_accept() under process context. Possible deadlock scenario: sctp_accept() -> sctp_sock_migrate() -> sctp_auto_asconf_init() -> spin_lock(&net->sctp.addr_wq_lock) <timer interrupt> -> sctp_addr_wq_timeout_handler() -> spin_lock_bh(&net->sctp.addr_wq_lock); (deadlock here) This flaw was found using an experimental static analysis tool we are developing for irq-related deadlock. The tentative patch fix the potential deadlock by spin_lock_bh(). Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Fixes: 34e5b0118685 ("sctp: delay auto_asconf init until binding the first addr") Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/20230627120340.19432-1-dg573847474@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
e1d001fa |
|
09-Jun-2023 |
Breno Leitao <leitao@debian.org> |
net: ioctl: Use kernel memory on protocol ioctl callbacks Most of the ioctls to net protocols operates directly on userspace argument (arg). Usually doing get_user()/put_user() directly in the ioctl callback. This is not flexible, because it is hard to reuse these functions without passing userspace buffers. Change the "struct proto" ioctls to avoid touching userspace memory and operate on kernel buffers, i.e., all protocol's ioctl callbacks is adapted to operate on a kernel memory other than on userspace (so, no more {put,get}_user() and friends being called in the ioctl callback). This changes the "struct proto" ioctl format in the following way: int (*ioctl)(struct sock *sk, int cmd, - unsigned long arg); + int *karg); (Important to say that this patch does not touch the "struct proto_ops" protocols) So, the "karg" argument, which is passed to the ioctl callback, is a pointer allocated to kernel space memory (inside a function wrapper). This buffer (karg) may contain input argument (copied from userspace in a prep function) and it might return a value/buffer, which is copied back to userspace if necessary. There is not one-size-fits-all format (that is I am using 'may' above), but basically, there are three type of ioctls: 1) Do not read from userspace, returns a result to userspace 2) Read an input parameter from userspace, and does not return anything to userspace 3) Read an input from userspace, and return a buffer to userspace. The default case (1) (where no input parameter is given, and an "int" is returned to userspace) encompasses more than 90% of the cases, but there are two other exceptions. Here is a list of exceptions: * Protocol RAW: * cmd = SIOCGETVIFCNT: * input and output = struct sioc_vif_req * cmd = SIOCGETSGCNT * input and output = struct sioc_sg_req * Explanation: for the SIOCGETVIFCNT case, userspace passes the input argument, which is struct sioc_vif_req. Then the callback populates the struct, which is copied back to userspace. * Protocol RAW6: * cmd = SIOCGETMIFCNT_IN6 * input and output = struct sioc_mif_req6 * cmd = SIOCGETSGCNT_IN6 * input and output = struct sioc_sg_req6 * Protocol PHONET: * cmd == SIOCPNADDRESOURCE | SIOCPNDELRESOURCE * input int (4 bytes) * Nothing is copied back to userspace. For the exception cases, functions sock_sk_ioctl_inout() will copy the userspace input, and copy it back to kernel space. The wrapper that prepare the buffer and put the buffer back to user is sk_ioctl(), so, instead of calling sk->sk_prot->ioctl(), the callee now calls sk_ioctl(), which will handle all cases. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230609152800.830401-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
2598619e |
|
11-May-2023 |
Alexander Mikhalitsyn <alexander@mihalicyn.com> |
sctp: add bpf_bypass_getsockopt proto callback Implement ->bpf_bypass_getsockopt proto callback and filter out SCTP_SOCKOPT_PEELOFF, SCTP_SOCKOPT_PEELOFF_FLAGS and SCTP_SOCKOPT_CONNECTX3 socket options from running eBPF hook on them. SCTP_SOCKOPT_PEELOFF and SCTP_SOCKOPT_PEELOFF_FLAGS options do fd_install(), and if BPF_CGROUP_RUN_PROG_GETSOCKOPT hook returns an error after success of the original handler sctp_getsockopt(...), userspace will receive an error from getsockopt syscall and will be not aware that fd was successfully installed into a fdtable. As pointed by Marcelo Ricardo Leitner it seems reasonable to skip bpf getsockopt hook for SCTP_SOCKOPT_CONNECTX3 sockopt too. Because internaly, it triggers connect() and if error is masked then userspace will be confused. This patch was born as a result of discussion around a new SCM_PIDFD interface: https://lore.kernel.org/all/20230413133355.350571-3-aleksandr.mikhalitsyn@canonical.com/ Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks") Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Christian Brauner <brauner@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: Xin Long <lucien.xin@gmail.com> Cc: linux-sctp@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Suggested-by: Stanislav Fomichev <sdf@google.com> Acked-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ab4f1e28 |
|
14-Apr-2023 |
Xin Long <lucien.xin@gmail.com> |
sctp: add intl_capable and reconf_capable in ss peer_capable There are two new peer capables have been added since sctp_diag was introduced into SCTP. When dumping the peer capables, these two new peer capables should also be included. To not break the old capables, reconf_capable takes the old hostname_address bit, and intl_capable uses the higher available bit in sctpi_peer_capable. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bd4b2818 |
|
14-Apr-2023 |
Xin Long <lucien.xin@gmail.com> |
sctp: delete the obsolete code for the host name address param In the latest RFC9260, the Host Name Address param has been deprecated. For INIT chunk: Note 3: An INIT chunk MUST NOT contain the Host Name Address parameter. The receiver of an INIT chunk containing a Host Name Address parameter MUST send an ABORT chunk and MAY include an "Unresolvable Address" error cause. For Supported Address Types: The value indicating the Host Name Address parameter MUST NOT be used when sending this parameter and MUST be ignored when receiving this parameter. Currently Linux SCTP doesn't really support Host Name Address param, but only saves some flag and print debug info, which actually won't even be triggered due to the verification in sctp_verify_param(). This patch is to delete those dead code. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2584024b |
|
01-Apr-2023 |
Xin Long <lucien.xin@gmail.com> |
sctp: check send stream number after wait_for_sndbuf This patch fixes a corner case where the asoc out stream count may change after wait_for_sndbuf. When the main thread in the client starts a connection, if its out stream count is set to N while the in stream count in the server is set to N - 2, another thread in the client keeps sending the msgs with stream number N - 1, and waits for sndbuf before processing INIT_ACK. However, after processing INIT_ACK, the out stream count in the client is shrunk to N - 2, the same to the in stream count in the server. The crash occurs when the thread waiting for sndbuf is awake and sends the msg in a non-existing stream(N - 1), the call trace is as below: KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f] Call Trace: <TASK> sctp_cmd_send_msg net/sctp/sm_sideeffect.c:1114 [inline] sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1777 [inline] sctp_side_effects net/sctp/sm_sideeffect.c:1199 [inline] sctp_do_sm+0x197d/0x5310 net/sctp/sm_sideeffect.c:1170 sctp_primitive_SEND+0x9f/0xc0 net/sctp/primitive.c:163 sctp_sendmsg_to_asoc+0x10eb/0x1a30 net/sctp/socket.c:1868 sctp_sendmsg+0x8d4/0x1d90 net/sctp/socket.c:2026 inet_sendmsg+0x9d/0xe0 net/ipv4/af_inet.c:825 sock_sendmsg_nosec net/socket.c:722 [inline] sock_sendmsg+0xde/0x190 net/socket.c:745 The fix is to add an unlikely check for the send stream number after the thread wakes up from the wait_for_sndbuf. Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Reported-by: syzbot+47c24ca20a2fa01f082e@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
91d0b78c |
|
24-Jan-2023 |
Jakub Sitnicki <jakub@cloudflare.com> |
inet: Add IP_LOCAL_PORT_RANGE socket option Users who want to share a single public IP address for outgoing connections between several hosts traditionally reach for SNAT. However, SNAT requires state keeping on the node(s) performing the NAT. A stateless alternative exists, where a single IP address used for egress can be shared between several hosts by partitioning the available ephemeral port range. In such a setup: 1. Each host gets assigned a disjoint range of ephemeral ports. 2. Applications open connections from the host-assigned port range. 3. Return traffic gets routed to the host based on both, the destination IP and the destination port. An application which wants to open an outgoing connection (connect) from a given port range today can choose between two solutions: 1. Manually pick the source port by bind()'ing to it before connect()'ing the socket. This approach has a couple of downsides: a) Search for a free port has to be implemented in the user-space. If the chosen 4-tuple happens to be busy, the application needs to retry from a different local port number. Detecting if 4-tuple is busy can be either easy (TCP) or hard (UDP). In TCP case, the application simply has to check if connect() returned an error (EADDRNOTAVAIL). That is assuming that the local port sharing was enabled (REUSEADDR) by all the sockets. # Assume desired local port range is 60_000-60_511 s = socket(AF_INET, SOCK_STREAM) s.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1) s.bind(("192.0.2.1", 60_000)) s.connect(("1.1.1.1", 53)) # Fails only if 192.0.2.1:60000 -> 1.1.1.1:53 is busy # Application must retry with another local port In case of UDP, the network stack allows binding more than one socket to the same 4-tuple, when local port sharing is enabled (REUSEADDR). Hence detecting the conflict is much harder and involves querying sock_diag and toggling the REUSEADDR flag [1]. b) For TCP, bind()-ing to a port within the ephemeral port range means that no connecting sockets, that is those which leave it to the network stack to find a free local port at connect() time, can use the this port. IOW, the bind hash bucket tb->fastreuse will be 0 or 1, and the port will be skipped during the free port search at connect() time. 2. Isolate the app in a dedicated netns and use the use the per-netns ip_local_port_range sysctl to adjust the ephemeral port range bounds. The per-netns setting affects all sockets, so this approach can be used only if: - there is just one egress IP address, or - the desired egress port range is the same for all egress IP addresses used by the application. For TCP, this approach avoids the downsides of (1). Free port search and 4-tuple conflict detection is done by the network stack: system("sysctl -w net.ipv4.ip_local_port_range='60000 60511'") s = socket(AF_INET, SOCK_STREAM) s.setsockopt(SOL_IP, IP_BIND_ADDRESS_NO_PORT, 1) s.bind(("192.0.2.1", 0)) s.connect(("1.1.1.1", 53)) # Fails if all 4-tuples 192.0.2.1:60000-60511 -> 1.1.1.1:53 are busy For UDP this approach has limited applicability. Setting the IP_BIND_ADDRESS_NO_PORT socket option does not result in local source port being shared with other connected UDP sockets. Hence relying on the network stack to find a free source port, limits the number of outgoing UDP flows from a single IP address down to the number of available ephemeral ports. To put it another way, partitioning the ephemeral port range between hosts using the existing Linux networking API is cumbersome. To address this use case, add a new socket option at the SOL_IP level, named IP_LOCAL_PORT_RANGE. The new option can be used to clamp down the ephemeral port range for each socket individually. The option can be used only to narrow down the per-netns local port range. If the per-socket range lies outside of the per-netns range, the latter takes precedence. UAPI-wise, the low and high range bounds are passed to the kernel as a pair of u16 values in host byte order packed into a u32. This avoids pointer passing. PORT_LO = 40_000 PORT_HI = 40_511 s = socket(AF_INET, SOCK_STREAM) v = struct.pack("I", PORT_HI << 16 | PORT_LO) s.setsockopt(SOL_IP, IP_LOCAL_PORT_RANGE, v) s.bind(("127.0.0.1", 0)) s.getsockname() # Local address between ("127.0.0.1", 40_000) and ("127.0.0.1", 40_511), # if there is a free port. EADDRINUSE otherwise. [1] https://github.com/cloudflare/cloudflare-blog/blob/232b432c1d57/2022-02-connectx/connectx.py#L116 Reviewed-by: Marek Majkowski <marek@cloudflare.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
40e0b090 |
|
19-Jan-2023 |
Peilin Ye <peilin.ye@bytedance.com> |
net/sock: Introduce trace_sk_data_ready() As suggested by Cong, introduce a tracepoint for all ->sk_data_ready() callback implementations. For example: <...> iperf-609 [002] ..... 70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable iperf-609 [002] ..... 70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable <...> Suggested-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0af03170 |
|
16-Nov-2022 |
Xin Long <lucien.xin@gmail.com> |
sctp: add dif and sdif check in asoc and ep lookup This patch at first adds a pernet global l3mdev_accept to decide if it accepts the packets from a l3mdev when a SCTP socket doesn't bind to any interface. It's set to 1 to avoid any possible incompatible issue, and in next patch, a sysctl will be introduced to allow to change it. Then similar to inet/udp_sk_bound_dev_eq(), sctp_sk_bound_dev_eq() is added to check either dif or sdif is equal to sk_bound_dev_if, and to check sid is 0 or l3mdev_accept is 1 if sk_bound_dev_if is not set. This function is used to match a association or a endpoint, namely called by sctp_addrs_lookup_transport() and sctp_endpoint_is_match(). All functions that needs updating are: sctp_rcv(): asoc: __sctp_rcv_lookup() __sctp_lookup_association() -> sctp_addrs_lookup_transport() __sctp_rcv_lookup_harder() __sctp_rcv_init_lookup() __sctp_lookup_association() -> sctp_addrs_lookup_transport() __sctp_rcv_walk_lookup() __sctp_rcv_asconf_lookup() __sctp_lookup_association() -> sctp_addrs_lookup_transport() ep: __sctp_rcv_lookup_endpoint() -> sctp_endpoint_is_match() sctp_connect(): sctp_endpoint_is_peeled_off() __sctp_lookup_association() sctp_has_association() sctp_lookup_association() __sctp_lookup_association() -> sctp_addrs_lookup_transport() sctp_diag_dump_one(): sctp_transport_lookup_process() -> sctp_addrs_lookup_transport() Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f87b1ac0 |
|
16-Nov-2022 |
Xin Long <lucien.xin@gmail.com> |
sctp: check sk_bound_dev_if when matching ep in get_port In sctp_get_port_local(), when binding to IP and PORT, it should also check sk_bound_dev_if to match listening sk if it's set by SO_BINDTOIFINDEX, so that multiple sockets with the same IP and PORT, but different sk_bound_dev_if can be listened at the same time. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8032bf12 |
|
09-Oct-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
treewide: use get_random_u32_below() instead of deprecated function This is a simple mechanical transformation done by: @@ expression E; @@ - prandom_u32_max + get_random_u32_below (E) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Reviewed-by: SeongJae Park <sj@kernel.org> # for damon Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
#
6431b0f6 |
|
19-Oct-2022 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
sctp: Call inet6_destroy_sock() via sk->sk_destruct(). After commit d38afeec26ed ("tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in sk->sk_destruct() by setting inet6_sock_destruct() to it to make sure we do not leak inet6-specific resources. SCTP sets its own sk->sk_destruct() in the sctp_init_sock(), and SCTPv6 socket reuses it as the init function. To call inet6_sock_destruct() from SCTPv6 sk->sk_destruct(), we set sctp_v6_destruct_sock() in a new init function. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7e3cf084 |
|
05-Oct-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
treewide: use get_random_{u8,u16}() when possible, part 1 Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value, simply use the get_random_{u8,u16}() functions, which are faster than wasting the additional bytes from a 32-bit value. This was done mechanically with this coccinelle script: @@ expression E; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u16; typedef __be16; typedef __le16; typedef u8; @@ ( - (get_random_u32() & 0xffff) + get_random_u16() | - (get_random_u32() & 0xff) + get_random_u8() | - (get_random_u32() % 65536) + get_random_u16() | - (get_random_u32() % 256) + get_random_u8() | - (get_random_u32() >> 16) + get_random_u16() | - (get_random_u32() >> 24) + get_random_u8() | - (u16)get_random_u32() + get_random_u16() | - (u8)get_random_u32() + get_random_u8() | - (__be16)get_random_u32() + (__be16)get_random_u16() | - (__le16)get_random_u32() + (__le16)get_random_u16() | - prandom_u32_max(65536) + get_random_u16() | - prandom_u32_max(256) + get_random_u8() | - E->inet_id = get_random_u32() + E->inet_id = get_random_u16() ) @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u16; identifier v; @@ - u16 v = get_random_u32(); + u16 v = get_random_u16(); @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u8; identifier v; @@ - u8 v = get_random_u32(); + u8 v = get_random_u8(); @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u16; u16 v; @@ - v = get_random_u32(); + v = get_random_u16(); @@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u8; u8 v; @@ - v = get_random_u32(); + v = get_random_u8(); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Examine limits @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value < 256: coccinelle.RESULT = cocci.make_ident("get_random_u8") elif value < 65536: coccinelle.RESULT = cocci.make_ident("get_random_u16") else: print("Skipping large mask of %s" % (literal)) cocci.include_match(False) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; identifier add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + (RESULT() & LITERAL) Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
#
81895a65 |
|
05-Oct-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
treewide: use prandom_u32_max() when possible, part 1 Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done mechanically with this coccinelle script: @basic@ expression E; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u64; @@ ( - ((T)get_random_u32() % (E)) + prandom_u32_max(E) | - ((T)get_random_u32() & ((E) - 1)) + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2) | - ((u64)(E) * get_random_u32() >> 32) + prandom_u32_max(E) | - ((T)get_random_u32() & ~PAGE_MASK) + prandom_u32_max(PAGE_SIZE) ) @multi_line@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; identifier RAND; expression E; @@ - RAND = get_random_u32(); ... when != RAND - RAND %= (E); + RAND = prandom_u32_max(E); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Add one to the literal. @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1: print("Skipping 0x%x for cleanup elsewhere" % (value)) cocci.include_match(False) elif value & (value + 1) != 0: print("Skipping 0x%x because it's not a power of two minus one" % (value)) cocci.include_match(False) elif literal.startswith('0x'): coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1)) else: coccinelle.RESULT = cocci.make_expr("%d" % (value + 1)) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; expression add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + prandom_u32_max(RESULT) @collapse_ret@ type T; identifier VAR; expression E; @@ { - T VAR; - VAR = (E); - return VAR; + return E; } @drop_var@ type T; identifier VAR; @@ { - T VAR; ... when != VAR } Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
#
4890b686 |
|
09-Jun-2022 |
Eric Dumazet <edumazet@google.com> |
net: keep sk->sk_forward_alloc as small as possible Currently, tcp_memory_allocated can hit tcp_mem[] limits quite fast. Each TCP socket can forward allocate up to 2 MB of memory, even after flow became less active. 10,000 sockets can have reserved 20 GB of memory, and we have no shrinker in place to reclaim that. Instead of trying to reclaim the extra allocations in some places, just keep sk->sk_forward_alloc values as small as possible. This should not impact performance too much now we have per-cpu reserves: Changes to tcp_memory_allocated should not be too frequent. For sockets not using SO_RESERVE_MEM: - idle sockets (no packets in tx/rx queues) have zero forward alloc. - non idle sockets have a forward alloc smaller than one page. Note: - Removal of SK_RECLAIM_CHUNK and SK_RECLAIM_THRESHOLD is left to MPTCP maintainers as a follow up. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
0defbb0a |
|
09-Jun-2022 |
Eric Dumazet <edumazet@google.com> |
net: add per_cpu_fw_alloc field to struct proto Each protocol having a ->memory_allocated pointer gets a corresponding per-cpu reserve, that following patches will use. Instead of having reserved bytes per socket, we want to have per-cpu reserves. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
6fd1d51c |
|
27-Apr-2022 |
Erin MacNeil <lnx.erin@gmail.com> |
net: SO_RCVMARK socket option for SO_MARK with recvmsg() Adding a new socket option, SO_RCVMARK, to indicate that SO_MARK should be included in the ancillary data returned by recvmsg(). Renamed the sock_recv_ts_and_drops() function to sock_recv_cmsgs(). Signed-off-by: Erin MacNeil <lnx.erin@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/r/20220427200259.2564-1-lnx.erin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
ec095263 |
|
11-Apr-2022 |
Oliver Hartkopp <socketcan@hartkopp.net> |
net: remove noblock parameter from recvmsg() entities The internal recvmsg() functions have two parameters 'flags' and 'noblock' that were merged inside skb_recv_datagram(). As a follow up patch to commit f4b41f062c42 ("net: remove noblock parameter from skb_recv_datagram()") this patch removes the separate 'noblock' parameter for recvmsg(). Analogue to the referenced patch for skb_recv_datagram() the 'flags' and 'noblock' parameters are unnecessarily split up with e.g. err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT, flags & ~MSG_DONTWAIT, &addr_len); or in err = INDIRECT_CALL_2(sk->sk_prot->recvmsg, tcp_recvmsg, udp_recvmsg, sk, msg, size, flags & MSG_DONTWAIT, flags & ~MSG_DONTWAIT, &addr_len); instead of simply using only flags all the time and check for MSG_DONTWAIT where needed (to preserve for the formerly separated no(n)block condition). Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/r/20220411124955.154876-1-socketcan@hartkopp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
#
8467dda0 |
|
09-Apr-2022 |
Petr Malat <oss@malat.biz> |
sctp: Initialize daddr on peeled off socket Function sctp_do_peeloff() wrongly initializes daddr of the original socket instead of the peeled off socket, which makes getpeername() return zeroes instead of the primary address. Initialize the new socket instead. Fixes: d570ee490fb1 ("[SCTP]: Correctly set daddr for IPv6 sockets during peeloff") Signed-off-by: Petr Malat <oss@malat.biz> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/20220409063611.673193-1-oss@malat.biz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
3d3b2f57 |
|
21-Dec-2021 |
Xin Long <lucien.xin@gmail.com> |
sctp: move hlist_node and hashent out of sctp_ep_common Struct sctp_ep_common is included in both asoc and ep, but hlist_node and hashent are only needed by ep after asoc_hashtable was dropped by Commit b5eff7128366 ("sctp: drop the old assoc hashtable of sctp"). So it is better to move hlist_node and hashent from sctp_ep_common to sctp_endpoint, and it saves some space for each asoc. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b3cb764a |
|
15-Nov-2021 |
Eric Dumazet <edumazet@google.com> |
net: drop nopreempt requirement on sock_prot_inuse_add() This is distracting really, let's make this simpler, because many callers had to take care of this by themselves, even if on x86 this adds more code than really needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f9d31c4c |
|
31-Dec-2021 |
Xin Long <lucien.xin@gmail.com> |
sctp: hold endpoint before calling cb in sctp_transport_lookup_process The same fix in commit 5ec7d18d1813 ("sctp: use call_rcu to free endpoint") is also needed for dumping one asoc and sock after the lookup. Fixes: 86fdb3448cc1 ("sctp: ensure ep is not destroyed before doing the dump") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5ec7d18d |
|
23-Dec-2021 |
Xin Long <lucien.xin@gmail.com> |
sctp: use call_rcu to free endpoint This patch is to delay the endpoint free by calling call_rcu() to fix another use-after-free issue in sctp_sock_dump(): BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20 Call Trace: __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218 lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline] _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168 spin_lock_bh include/linux/spinlock.h:334 [inline] __lock_sock+0x203/0x350 net/core/sock.c:2253 lock_sock_nested+0xfe/0x120 net/core/sock.c:2774 lock_sock include/net/sock.h:1492 [inline] sctp_sock_dump+0x122/0xb20 net/sctp/diag.c:324 sctp_for_each_transport+0x2b5/0x370 net/sctp/socket.c:5091 sctp_diag_dump+0x3ac/0x660 net/sctp/diag.c:527 __inet_diag_dump+0xa8/0x140 net/ipv4/inet_diag.c:1049 inet_diag_dump+0x9b/0x110 net/ipv4/inet_diag.c:1065 netlink_dump+0x606/0x1080 net/netlink/af_netlink.c:2244 __netlink_dump_start+0x59a/0x7c0 net/netlink/af_netlink.c:2352 netlink_dump_start include/linux/netlink.h:216 [inline] inet_diag_handler_cmd+0x2ce/0x3f0 net/ipv4/inet_diag.c:1170 __sock_diag_cmd net/core/sock_diag.c:232 [inline] sock_diag_rcv_msg+0x31d/0x410 net/core/sock_diag.c:263 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2477 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:274 This issue occurs when asoc is peeled off and the old sk is freed after getting it by asoc->base.sk and before calling lock_sock(sk). To prevent the sk free, as a holder of the sk, ep should be alive when calling lock_sock(). This patch uses call_rcu() and moves sock_put and ep free into sctp_endpoint_destroy_rcu(), so that it's safe to try to hold the ep under rcu_read_lock in sctp_transport_traverse_process(). If sctp_endpoint_hold() returns true, it means this ep is still alive and we have held it and can continue to dump it; If it returns false, it means this ep is dead and can be freed after rcu_read_unlock, and we should skip it. In sctp_sock_dump(), after locking the sk, if this ep is different from tsp->asoc->ep, it means during this dumping, this asoc was peeled off before calling lock_sock(), and the sk should be skipped; If this ep is the same with tsp->asoc->ep, it means no peeloff happens on this asoc, and due to lock_sock, no peeloff will happen either until release_sock. Note that delaying endpoint free won't delay the port release, as the port release happens in sctp_endpoint_destroy() before calling call_rcu(). Also, freeing endpoint by call_rcu() makes it safe to access the sk by asoc->base.sk in sctp_assocs_seq_show() and sctp_rcv(). Thanks Jones to bring this issue up. v1->v2: - improve the changelog. - add kfree(ep) into sctp_endpoint_destroy_rcu(), as Jakub noticed. Reported-by: syzbot+9276d76e83e3bcde6c99@syzkaller.appspotmail.com Reported-by: Lee Jones <lee.jones@linaro.org> Fixes: d25adbeb0cdb ("sctp: fix an use-after-free issue in sctp_sock_dump") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c081d53f |
|
02-Nov-2021 |
Xin Long <lucien.xin@gmail.com> |
security: pass asoc to sctp_assoc_request and sctp_sk_clone This patch is to move secid and peer_secid from endpoint to association, and pass asoc to sctp_assoc_request and sctp_sk_clone instead of ep. As ep is the local endpoint and asoc represents a connection, and in SCTP one sk/ep could have multiple asoc/connection, saving secid/peer_secid for new asoc will overwrite the old asoc's. Note that since asoc can be passed as NULL, security_sctp_assoc_request() is moved to the place right after the new_asoc is created in sctp_sf_do_5_1B_init() and sctp_sf_do_unexpected_init(). v1->v2: - fix the description of selinux_netlbl_skbuff_setsid(), as Jakub noticed. - fix the annotation in selinux_sctp_assoc_request(), as Richard Noticed. Fixes: 72e89f50084c ("security: Add support for SCTP security hooks") Reported-by: Prashanth Prahlad <pprahlad@redhat.com> Reviewed-by: Richard Haines <richard_c_haines@btinternet.com> Tested-by: Richard Haines <richard_c_haines@btinternet.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2f3fdd8d |
|
17-Jul-2021 |
Xin Long <lucien.xin@gmail.com> |
sctp: trim optlen when it's a huge value in sctp_setsockopt After commit ca84bd058dae ("sctp: copy the optval from user space in sctp_setsockopt"), it does memory allocation in sctp_setsockopt with the optlen, and it would fail the allocation and return error if the optlen from user space is a huge value. This breaks some sockopts, like SCTP_HMAC_IDENT, SCTP_RESET_STREAMS and SCTP_AUTH_KEY, as when processing these sockopts before, optlen would be trimmed to a biggest value it needs when optlen is a huge value, instead of failing the allocation and returning error. This patch is to fix the allocation failure when it's a huge optlen from user space by trimming it to the biggest size sctp sockopt may need when necessary, and this biggest size is from sctp_setsockopt_reset_streams() for SCTP_RESET_STREAMS, which is bigger than those for SCTP_HMAC_IDENT and SCTP_AUTH_KEY. Fixes: ca84bd058dae ("sctp: copy the optval from user space in sctp_setsockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7307e4fa |
|
22-Jun-2021 |
Xin Long <lucien.xin@gmail.com> |
sctp: enable PLPMTUD when the transport is ready sctp_transport_pl_reset() is called whenever any of these 3 members in transport is changed: - probe_interval - param_flags & SPP_PMTUD_ENABLE - state == ACTIVE If all are true, start the PLPMTUD when it's not yet started. If any of these is false, stop the PLPMTUD when it's already running. sctp_transport_pl_update() is called when the transport dst has changed. It will restart the PLPMTUD probe. Again, the pathmtu won't change but use the dst's mtu until the Search phase is done. Note that after using PLPMTUD, the pathmtu is only initialized with the dst mtu when the transport dst changes. At other time it is updated by pl.pmtu. So sctp_transport_pmtu_check() will be called only when PLPMTUD is disabled in sctp_packet_config(). After this patch, the PLPMTUD feature from RFC8899 will be activated and can be used by users. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3190b649 |
|
22-Jun-2021 |
Xin Long <lucien.xin@gmail.com> |
sctp: add SCTP_PLPMTUD_PROBE_INTERVAL sockopt for sock/asoc/transport With this socket option, users can change probe_interval for a transport, asoc or sock after it's created. Note that if the change is for an asoc, also apply the change to each transport in this asoc. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d1e462a7 |
|
22-Jun-2021 |
Xin Long <lucien.xin@gmail.com> |
sctp: add probe_interval in sysctl and sock/asoc/transport PLPMTUD can be enabled by doing 'sysctl -w net.sctp.probe_interval=n'. 'n' is the interval for PLPMTUD probe timer in milliseconds, and it can't be less than 5000 if it's not 0. All asoc/transport's PLPMTUD in a new socket will be enabled by default. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
297739bd |
|
24-May-2021 |
Xin Long <lucien.xin@gmail.com> |
sctp: add the missing setting for asoc encap_port This patch is to add the missing setting back for asoc encap_port. Fixes: 8dba29603b5c ("sctp: add SCTP_REMOTE_UDP_ENCAPS_PORT sockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
34e5b011 |
|
02-May-2021 |
Xin Long <lucien.xin@gmail.com> |
sctp: delay auto_asconf init until binding the first addr As Or Cohen described: If sctp_destroy_sock is called without sock_net(sk)->sctp.addr_wq_lock held and sp->do_auto_asconf is true, then an element is removed from the auto_asconf_splist without any proper locking. This can happen in the following functions: 1. In sctp_accept, if sctp_sock_migrate fails. 2. In inet_create or inet6_create, if there is a bpf program attached to BPF_CGROUP_INET_SOCK_CREATE which denies creation of the sctp socket. This patch is to fix it by moving the auto_asconf init out of sctp_init_sock(), by which inet_create()/inet6_create() won't need to operate it in sctp_destroy_sock() when calling sk_common_release(). It also makes more sense to do auto_asconf init while binding the first addr, as auto_asconf actually requires an ANY addr bind, see it in sctp_addr_wq_timeout_handler(). This addresses CVE-2021-23133. Fixes: 610236587600 ("bpf: Add new cgroup attach type to enable sock modifications") Reported-by: Or Cohen <orcohen@paloaltonetworks.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
01bfe5e8 |
|
02-May-2021 |
Xin Long <lucien.xin@gmail.com> |
Revert "net/sctp: fix race condition in sctp_destroy_sock" This reverts commit b166a20b07382b8bc1dcee2a448715c9c2c81b5b. This one has to be reverted as it introduced a dead lock, as syzbot reported: CPU0 CPU1 ---- ---- lock(&net->sctp.addr_wq_lock); lock(slock-AF_INET6); lock(&net->sctp.addr_wq_lock); lock(slock-AF_INET6); CPU0 is the thread of sctp_addr_wq_timeout_handler(), and CPU1 is that of sctp_close(). The original issue this commit fixed will be fixed in the next patch. Reported-by: syzbot+959223586843e69a2674@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
21c00a18 |
|
26-Mar-2021 |
Lu Wei <luwei32@huawei.com> |
net: sctp: Fix some typos Modify "unkown" to "unknown" in net/sctp/sm_make_chunk.c and Modify "orginal" to "original" in net/sctp/socket.c. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Lu Wei <luwei32@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b166a20b |
|
13-Apr-2021 |
Or Cohen <orcohen@paloaltonetworks.com> |
net/sctp: fix race condition in sctp_destroy_sock If sctp_destroy_sock is called without sock_net(sk)->sctp.addr_wq_lock held and sp->do_auto_asconf is true, then an element is removed from the auto_asconf_splist without any proper locking. This can happen in the following functions: 1. In sctp_accept, if sctp_sock_migrate fails. 2. In inet_create or inet6_create, if there is a bpf program attached to BPF_CGROUP_INET_SOCK_CREATE which denies creation of the sctp socket. The bug is fixed by acquiring addr_wq_lock in sctp_destroy_sock instead of sctp_close. This addresses CVE-2021-23133. Reported-by: Or Cohen <orcohen@paloaltonetworks.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Fixes: 610236587600 ("bpf: Add new cgroup attach type to enable sock modifications") Signed-off-by: Or Cohen <orcohen@paloaltonetworks.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f1bfe8b5 |
|
29-Oct-2020 |
Xin Long <lucien.xin@gmail.com> |
sctp: add udphdr to overhead when udp_port is set sctp_mtu_payload() is for calculating the frag size before making chunks from a msg. So we should only add udphdr size to overhead when udp socks are listening, as only then sctp can handle the incoming sctp over udp packets and outgoing sctp over udp packets will be possible. Note that we can't do this according to transport->encap_port, as different transports may be set to different values, while the chunks were made before choosing the transport, we could not be able to meet all rfc6951#section-5.6 recommends. v1->v2: - Add udp_port for sctp_sock to avoid a potential race issue, it will be used in xmit path in the next patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
8dba2960 |
|
29-Oct-2020 |
Xin Long <lucien.xin@gmail.com> |
sctp: add SCTP_REMOTE_UDP_ENCAPS_PORT sockopt This patch is to implement: rfc6951#section-6.1: Get or Set the Remote UDP Encapsulation Port Number with the param of the struct: struct sctp_udpencaps { sctp_assoc_t sue_assoc_id; struct sockaddr_storage sue_address; uint16_t sue_port; }; the encap_port of sock, assoc or transport can be changed by users, which also means it allows the different transports of the same asoc to have different encap_port value. v1->v2: - no change. v2->v3: - fix the endian warning when setting values between encap_port and sue_port. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
e8a3001c |
|
29-Oct-2020 |
Xin Long <lucien.xin@gmail.com> |
sctp: add encap_port for netns sock asoc and transport encap_port is added as per netns/sock/assoc/transport, and the latter one's encap_port inherits the former one's by default. The transport's encap_port value would mostly decide if one packet should go out with udp encapsulated or not. This patch also allows users to set netns' encap_port by sysctl. v1->v2: - Change to define encap_port as __be16 for sctp_sock, asoc and transport. v2->v3: - No change. v3->v4: - Add 'encap_port' entry in ip-sysctl.rst. v4->v5: - Improve the description of encap_port in ip-sysctl.rst. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
fe81d9f6 |
|
18-Sep-2020 |
Henry Ptasinski <hptasinski@google.com> |
net: sctp: Fix IPv6 ancestor_size calc in sctp_copy_descendant When calculating ancestor_size with IPv6 enabled, simply using sizeof(struct ipv6_pinfo) doesn't account for extra bytes needed for alignment in the struct sctp6_sock. On x86, there aren't any extra bytes, but on ARM the ipv6_pinfo structure is aligned on an 8-byte boundary so there were 4 pad bytes that were omitted from the ancestor_size calculation. This would lead to corruption of the pd_lobby pointers, causing an oops when trying to free the sctp structure on socket close. Fixes: 636d25d557d1 ("sctp: not copy sctp_sock pd_lobby in sctp_copy_descendant") Signed-off-by: Henry Ptasinski <hptasinski@google.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3106ecb4 |
|
21-Aug-2020 |
Xin Long <lucien.xin@gmail.com> |
sctp: not disable bh in the whole sctp_get_port_local() With disabling bh in the whole sctp_get_port_local(), when snum == 0 and too many ports have been used, the do-while loop will take the cpu for a long time and cause cpu stuck: [ ] watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [ ] RIP: 0010:native_queued_spin_lock_slowpath+0x4de/0x940 [ ] Call Trace: [ ] _raw_spin_lock+0xc1/0xd0 [ ] sctp_get_port_local+0x527/0x650 [sctp] [ ] sctp_do_bind+0x208/0x5e0 [sctp] [ ] sctp_autobind+0x165/0x1e0 [sctp] [ ] sctp_connect_new_asoc+0x355/0x480 [sctp] [ ] __sctp_connect+0x360/0xb10 [sctp] There's no need to disable bh in the whole function of sctp_get_port_local. So fix this cpu stuck by removing local_bh_disable() called at the beginning, and using spin_lock_bh() instead. The same thing was actually done for inet_csk_get_port() in Commit ea8add2b1903 ("tcp/dccp: better use of ephemeral ports in bind()"). Thanks to Marcelo for pointing the buggy code out. v1->v2: - use cond_resched() to yield cpu to other tasks if needed, as Eric noticed. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dfd3d526 |
|
24-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: fix slab-out-of-bounds in SCTP_DELAYED_SACK processing This sockopt accepts two kinds of parameters, using struct sctp_sack_info and struct sctp_assoc_value. The mentioned commit didn't notice an implicit cast from the smaller (latter) struct to the bigger one (former) when copying the data from the user space, which now leads to an attempt to write beyond the buffer (because it assumes the storing buffer is bigger than the parameter itself). Fix it by allocating a sctp_sack_info on stack and filling it out based on the small struct for the compat case. Changelog stole from an earlier patch from Marcelo Ricardo Leitner. Fixes: ebb25defdc17 ("sctp: pass a kernel pointer to sctp_setsockopt_delayed_ack") Reported-by: syzbot+0e4699d000d8b874d8dc@syzkaller.appspotmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a7b75c5a |
|
23-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
net: pass a sockptr_t into ->setsockopt Rework the remaining setsockopt code to pass a sockptr_t instead of a plain user pointer. This removes the last remaining set_fs(KERNEL_DS) outside of architecture specific code. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> [ieee802154] Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6c8983a6 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: remove the out_nounlock label in sctp_setsockopt This is just used once, and a direct return for the redirect to the AF case is much easier to follow than jumping to the end of a very long function. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
26feba80 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_pf_expose Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
92c4f172 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_ecn_supported Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
963855a9 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_auth_supported Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9263ac97 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_event Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
565059cb |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_event Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a4262466 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_reuse_port Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5b8d3b24 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_interleaving_supported Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d636e7f3 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_scheduler_value Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4d2fba3a |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_scheduler Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4d6fb260 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_add_streams Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b97d20ce |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_reset_assoc Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d4922434 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_reset_streams Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
356dc6f1 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_enable_strreset Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3f49f720 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_reconfig_supported Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ac37435b |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_default_prinfo Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4a97fa4f |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_pr_supported Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cfa6fde2 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_recvnxtinfo Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a98af7c8 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_recvrcvinfo Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b0ac3bb8 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_paddr_thresholds Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c9abc2c1 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_auto_asconf Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
76b3d0c4 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_deactivate_key Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
97dc9f2e |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_del_key Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dcab0a7a |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_active_key Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
534d13d0 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_auth_key Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Adapt sctp_setsockopt to use a kzfree for this case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
89fae01e |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: switch sctp_setsockopt_auth_key to use memzero_explicit Switch from kzfree to sctp_setsockopt_auth_key + kfree to prepare for moving the kfree to common code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3564ef44 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_hmac_ident Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
88266d31 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_auth_chunk Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f5bee0ad |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_maxburst Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1031cea0 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_fragment_interleave Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
722eca9e |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_context Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
07e5035c |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_adaptation_layer Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dcd03575 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_maxseg Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ffc08f08 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_mappedv4 Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5b864c8d |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_associnfo Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
af5ae60e |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_rtoinfo Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f87ddbc0 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_nodelay Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
46a0ae9d |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_peer_primary_addr Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1eec6958 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_primary_addr Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8a2409d3 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_default_sndinfo Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c23ad6d2 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_default_send_param Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9dfa6f04 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_initmsg Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bb13d647 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_partial_delivery_point Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ebb25def |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_delayed_ack Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9b7b0d1a |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_peer_addr_params Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0b49a65c |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_autoclose Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a98d21a1 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_events Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
10835825 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_disable_fragments Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ce5b2f89 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to __sctp_setsockopt_connectx Use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8c7517f5 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: pass a kernel pointer to sctp_setsockopt_bindx Rename sctp_setsockopt_bindx_kernel back to sctp_setsockopt_bindx, and use the kernel pointer that sctp_setsockopt has available instead of directly handling the user pointer in the old sctp_setsockopt_bindx. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ca84bd05 |
|
19-Jul-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: copy the optval from user space in sctp_setsockopt Prepare for for moving the copy_from_user from the individual sockopts to the main setsockopt helper. As of this commit the kopt variable is not used yet, but the following commits will start using it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c0425a42 |
|
29-May-2020 |
Christoph Hellwig <hch@lst.de> |
net: add a new bind_add method The SCTP protocol allows to bind multiple address to a socket. That feature is currently only exposed as a socket option. Add a bind_add method struct proto that allows to bind additional addresses, and switch the dlm code to use the method instead of going through the socket option from kernel space. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
05bfd366 |
|
29-May-2020 |
Christoph Hellwig <hch@lst.de> |
sctp: refactor sctp_setsockopt_bindx Split out a sctp_setsockopt_bindx_kernel that takes a kernel pointer to the sockaddr and make sctp_setsockopt_bindx a small wrapper around it. This prepares for adding a new bind_add proto op. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5c3e82fe |
|
26-Mar-2020 |
Qiujun Huang <hqjagain@gmail.com> |
sctp: fix refcount bug in sctp_wfree We should iterate over the datamsgs to move all chunks(skbs) to newsk. The following case cause the bug: for the trouble SKB, it was in outq->transmitted list sctp_outq_sack sctp_check_transmitted SKB was moved to outq->sacked list then throw away the sack queue SKB was deleted from outq->sacked (but it was held by datamsg at sctp_datamsg_to_asoc So, sctp_wfree was not called here) then migrate happened sctp_for_each_tx_datachunk( sctp_clear_owner_w); sctp_assoc_migrate(); sctp_for_each_tx_datachunk( sctp_set_owner_w); SKB was not in the outq, and was not changed to newsk finally __sctp_outq_teardown sctp_chunk_put (for another skb) sctp_datamsg_put __kfree_skb(msg->frag_list) sctp_wfree (for SKB) SKB->sk was still oldsk (skb->sk != asoc->base.sk). Reported-and-tested-by: syzbot+cea71eec5d6de256d54d@syzkaller.appspotmail.com Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Acked-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b77b4f63 |
|
23-Feb-2020 |
Jules Irenge <jbi.octave@gmail.com> |
sctp: Add missing annotation for sctp_transport_walk_stop() Sparse reports a warning at sctp_transport_walk_stop() warning: context imbalance in sctp_transport_walk_stop - wrong count at exit The root cause is the missing annotation at sctp_transport_walk_stop() Add the missing __releases(RCU) annotation Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6c72b774 |
|
23-Feb-2020 |
Jules Irenge <jbi.octave@gmail.com> |
sctp: Add missing annotation for sctp_transport_walk_start() Sparse reports a warning at sctp_transport_walk_start() warning: context imbalance in sctp_transport_walk_start - wrong count at exit The root cause is the missing annotation at sctp_transport_walk_start() Add the missing __acquires(RCU) annotation Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4e7696d9 |
|
08-Dec-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: get netns from asoc and ep base Commit 312434617cb1 ("sctp: cache netns in sctp_ep_common") set netns in asoc and ep base since they're created, and it will never change. It's a better way to get netns from asoc and ep base, comparing to calling sock_net(). This patch is to replace them. v1->v2: - no change. Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
82f31ebf |
|
25-Nov-2019 |
Maciej Żenczykowski <maze@google.com> |
net: port < inet_prot_sock(net) --> inet_port_requires_bind_service(net, port) Note that the sysctl write accessor functions guarantee that: net->ipv4.sysctl_ip_prot_sock <= net->ipv4.ip_local_ports.range[0] invariant is maintained, and as such the max() in selinux hooks is actually spurious. ie. even though if (snum < max(inet_prot_sock(sock_net(sk)), low) || snum > high) { per logic is the same as if ((snum < inet_prot_sock(sock_net(sk)) && snum < low) || snum > high) { it is actually functionally equivalent to: if (snum < low || snum > high) { which is equivalent to: if (snum < inet_prot_sock(sock_net(sk)) || snum < low || snum > high) { even though the first clause is spurious. But we want to hold on to it in case we ever want to change what what inet_port_requires_bind_service() means (for example by changing it from a, by default, [0..1024) range to some sort of set). Test: builds, git 'grep inet_prot_sock' finds no other references Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fb822388 |
|
25-Nov-2019 |
Maciej Żenczykowski <maze@google.com> |
net-sctp: replace some sock_net(sk) with just 'net' It already existed in part of the function, but move it to a higher level and use it consistently throughout. Safe since sk is never written to. Signed-off-by: Maciej Żenczykowski <maze@google.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d467ac0a |
|
07-Nov-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: add SCTP_PEER_ADDR_THLDS_V2 sockopt Section 7.2 of rfc7829: "Peer Address Thresholds (SCTP_PEER_ADDR_THLDS) Socket Option" extends 'struct sctp_paddrthlds' with 'spt_pathcpthld' added to allow a user to change ps_retrans per sock/asoc/transport, as other 2 paddrthlds: pf_retrans, pathmaxrxt. Note: to not break the user's program, here to support pf_retrans dump and setting by adding a new sockopt SCTP_PEER_ADDR_THLDS_V2, and a new structure sctp_paddrthlds_v2 instead of extending sctp_paddrthlds. Also, when setting ps_retrans, the value is not allowed to be greater than pf_retrans. v1->v2: - use SCTP_PEER_ADDR_THLDS_V2 to set/get pf_retrans instead, as Marcelo and David Laight suggested. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
34515e94 |
|
07-Nov-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: add support for Primary Path Switchover This is a new feature defined in section 5 of rfc7829: "Primary Path Switchover". By introducing a new tunable parameter: Primary.Switchover.Max.Retrans (PSMR) The primary path will be changed to another active path when the path error counter on the old primary path exceeds PSMR, so that "the SCTP sender is allowed to continue data transmission on a new working path even when the old primary destination address becomes active again". This patch is to add this tunable parameter, 'ps_retrans' per netns, sock, asoc and transport. It also allows a user to change ps_retrans per netns by sysctl, and ps_retrans per sock/asoc/transport will be initialized with it. The check will be done in sctp_do_8_2_transport_strike() when this feature is enabled. Note this feature is disabled by initializing 'ps_retrans' per netns as 0xffff by default, and its value can't be less than 'pf_retrans' when changing by sysctl. v3->v4: - add define SCTP_PS_RETRANS_MAX 0xffff, and use it on extra2 of sysctl 'ps_retrans'. - add a new entry for ps_retrans on ip-sysctl.txt. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8d2a6935 |
|
07-Nov-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: add SCTP_EXPOSE_POTENTIALLY_FAILED_STATE sockopt This is a sockopt defined in section 7.3 of rfc7829: "Exposing the Potentially Failed Path State", by which users can change pf_expose per sock and asoc. The new sockopt SCTP_EXPOSE_POTENTIALLY_FAILED_STATE is also known as SCTP_EXPOSE_PF_STATE for short. v2->v3: - return -EINVAL if params.assoc_value > SCTP_PF_EXPOSE_MAX. - define SCTP_EXPOSE_PF_STATE SCTP_EXPOSE_POTENTIALLY_FAILED_STATE. v3->v4: - improve changelog. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aef587be |
|
07-Nov-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: add pf_expose per netns and sock and asoc As said in rfc7829, section 3, point 12: The SCTP stack SHOULD expose the PF state of its destination addresses to the ULP as well as provide the means to notify the ULP of state transitions of its destination addresses from active to PF, and vice versa. However, it is recommended that an SCTP stack implementing SCTP-PF also allows for the ULP to be kept ignorant of the PF state of its destinations and the associated state transitions, thus allowing for retention of the simpler state transition model of [RFC4960] in the ULP. Not only does it allow to expose the PF state to ULP, but also allow to ignore sctp-pf to ULP. So this patch is to add pf_expose per netns, sock and asoc. And in sctp_assoc_control_transport(), ulp_notify will be set to false if asoc->expose is not 'enabled' in next patch. It also allows a user to change pf_expose per netns by sysctl, and pf_expose per sock and asoc will be initialized with it. Note that pf_expose also works for SCTP_GET_PEER_ADDR_INFO sockopt, to not allow a user to query the state of a sctp-pf peer address when pf_expose is 'disabled', as said in section 7.3. v1->v2: - Fix a build warning noticed by Nathan Chancellor. v2->v3: - set pf_expose to UNUSED by default to keep compatible with old applications. v3->v4: - add a new entry for pf_expose on ip-sysctl.txt, as Marcelo suggested. - change this patch to 1/5, and move sctp_assoc_control_transport change into 2/5, as Marcelo suggested. - use SCTP_PF_EXPOSE_UNSET instead of SCTP_PF_EXPOSE_UNUSED, and set SCTP_PF_EXPOSE_UNSET to 0 in enum, as Marcelo suggested. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
099ecf59 |
|
05-Nov-2019 |
Eric Dumazet <edumazet@google.com> |
net: annotate lockless accesses to sk->sk_max_ack_backlog sk->sk_max_ack_backlog can be read without any lock being held at least in TCP/DCCP cases. We need to use READ_ONCE()/WRITE_ONCE() to avoid load/store tearing and/or potential KCSAN warnings. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a904a069 |
|
01-Nov-2019 |
Eric Dumazet <edumazet@google.com> |
inet: stop leaking jiffies on the wire Historically linux tried to stick to RFC 791, 1122, 2003 for IPv4 ID field generation. RFC 6864 made clear that no matter how hard we try, we can not ensure unicity of IP ID within maximum lifetime for all datagrams with a given source address/destination address/protocol tuple. Linux uses a per socket inet generator (inet_id), initialized at connection startup with a XOR of 'jiffies' and other fields that appear clear on the wire. Thiemo Nagel pointed that this strategy is a privacy concern as this provides 16 bits of entropy to fingerprint devices. Let's switch to a random starting point, this is just as good as far as RFC 6864 is concerned and does not leak anything critical. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Thiemo Nagel <tnagel@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3f926af3 |
|
23-Oct-2019 |
Eric Dumazet <edumazet@google.com> |
net: use skb_queue_empty_lockless() in busy poll contexts Busy polling usually runs without locks. Let's use skb_queue_empty_lockless() instead of skb_queue_empty() Also uses READ_ONCE() in __skb_try_recv_datagram() to address a similar potential problem. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3ef7cf57 |
|
23-Oct-2019 |
Eric Dumazet <edumazet@google.com> |
net: use skb_queue_empty_lockless() in poll() handlers Many poll() handlers are lockless. Using skb_queue_empty_lockless() instead of skb_queue_empty() is more appropriate. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
63dfb793 |
|
15-Oct-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: change sctp_prot .no_autobind with true syzbot reported a memory leak: BUG: memory leak, unreferenced object 0xffff888120b3d380 (size 64): backtrace: [...] slab_alloc mm/slab.c:3319 [inline] [...] kmem_cache_alloc+0x13f/0x2c0 mm/slab.c:3483 [...] sctp_bucket_create net/sctp/socket.c:8523 [inline] [...] sctp_get_port_local+0x189/0x5a0 net/sctp/socket.c:8270 [...] sctp_do_bind+0xcc/0x200 net/sctp/socket.c:402 [...] sctp_bindx_add+0x4b/0xd0 net/sctp/socket.c:497 [...] sctp_setsockopt_bindx+0x156/0x1b0 net/sctp/socket.c:1022 [...] sctp_setsockopt net/sctp/socket.c:4641 [inline] [...] sctp_setsockopt+0xaea/0x2dc0 net/sctp/socket.c:4611 [...] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3147 [...] __sys_setsockopt+0x10f/0x220 net/socket.c:2084 [...] __do_sys_setsockopt net/socket.c:2100 [inline] It was caused by when sending msgs without binding a port, in the path: inet_sendmsg() -> inet_send_prepare() -> inet_autobind() -> .get_port/sctp_get_port(), sp->bind_hash will be set while bp->port is not. Later when binding another port by sctp_setsockopt_bindx(), a new bucket will be created as bp->port is not set. sctp's autobind is supposed to call sctp_autobind() where it does all things including setting bp->port. Since sctp_autobind() is called in sctp_sendmsg() if the sk is not yet bound, it should have skipped the auto bind. THis patch is to avoid calling inet_autobind() in inet_send_prepare() by changing sctp_prot .no_autobind with true, also remove the unused .get_port. Reported-by: syzbot+d44f7bbebdea49dbc84a@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
29b99f54 |
|
11-Sep-2019 |
Mao Wenan <maowenan@huawei.com> |
sctp: destroy bucket if failed to bind addr There is one memory leak bug report: BUG: memory leak unreferenced object 0xffff8881dc4c5ec0 (size 40): comm "syz-executor.0", pid 5673, jiffies 4298198457 (age 27.578s) hex dump (first 32 bytes): 02 00 00 00 81 88 ff ff 00 00 00 00 00 00 00 00 ................ f8 63 3d c1 81 88 ff ff 00 00 00 00 00 00 00 00 .c=............. backtrace: [<0000000072006339>] sctp_get_port_local+0x2a1/0xa00 [sctp] [<00000000c7b379ec>] sctp_do_bind+0x176/0x2c0 [sctp] [<000000005be274a2>] sctp_bind+0x5a/0x80 [sctp] [<00000000b66b4044>] inet6_bind+0x59/0xd0 [ipv6] [<00000000c68c7f42>] __sys_bind+0x120/0x1f0 net/socket.c:1647 [<000000004513635b>] __do_sys_bind net/socket.c:1658 [inline] [<000000004513635b>] __se_sys_bind net/socket.c:1656 [inline] [<000000004513635b>] __x64_sys_bind+0x3e/0x50 net/socket.c:1656 [<0000000061f2501e>] do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296 [<0000000003d1e05e>] entry_SYSCALL_64_after_hwframe+0x49/0xbe This is because in sctp_do_bind, if sctp_get_port_local is to create hash bucket successfully, and sctp_add_bind_addr failed to bind address, e.g return -ENOMEM, so memory leak found, it needs to destroy allocated bucket. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Mao Wenan <maowenan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e0e4b8de |
|
11-Sep-2019 |
Mao Wenan <maowenan@huawei.com> |
sctp: remove redundant assignment when call sctp_get_port_local There are more parentheses in if clause when call sctp_get_port_local in sctp_do_bind, and redundant assignment to 'ret'. This patch is to do cleanup. Signed-off-by: Mao Wenan <maowenan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8e2ef6ab |
|
11-Sep-2019 |
Mao Wenan <maowenan@huawei.com> |
sctp: change return type of sctp_get_port_local Currently sctp_get_port_local() returns a long which is either 0,1 or a pointer casted to long. It's neither of the callers use the return value since commit 62208f12451f ("net: sctp: simplify sctp_get_port"). Now two callers are sctp_get_port and sctp_do_bind, they actually assumend a casted to an int was the same as a pointer casted to a long, and they don't save the return value just check whether it is zero or non-zero, so it would better change return type from long to int for sctp_get_port_local. Signed-off-by: Mao Wenan <maowenan@huawei.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f794dc23 |
|
09-Sep-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: fix the missing put_user when dumping transport thresholds This issue causes SCTP_PEER_ADDR_THLDS sockopt not to be able to dump a transport thresholds info. Fix it by adding 'goto' put_user in sctp_getsockopt_paddr_thresholds. Fixes: 8add543e369d ("sctp: add SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_THLDS sockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d5886b91 |
|
26-Aug-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: allow users to set ep ecn flag by sockopt SCTP_ECN_SUPPORTED sockopt will be added to allow users to change ep ecn flag, and it's similar with other feature flags. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
56dd525a |
|
19-Aug-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: add SCTP_AUTH_SUPPORTED sockopt SCTP_AUTH_SUPPORTED sockopt is used to set enpoint's auth flag. With this feature, each endpoint will have its own flag for its future asoc's auth_capable, instead of netns auth flag. Note that when both ep's auth_enable is enabled, endpoint auth related data should be initialized. If asconf_enable is also set, SCTP_CID_ASCONF/SCTP_CID_ASCONF_ACK should be added into auth_chunk_list. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
219f9ea4 |
|
19-Aug-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use ep and asoc auth_enable properly sctp has per endpoint auth flag and per asoc auth flag, and the asoc one should be checked when coming to asoc and the endpoint one should be checked when coming to endpoint. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
df2c71ff |
|
19-Aug-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: add SCTP_ASCONF_SUPPORTED sockopt SCTP_ASCONF_SUPPORTED sockopt is used to set enpoint's asconf flag. With this feature, each endpoint will have its own flag for its future asoc's asconf_capable, instead of netns asconf flag. Note that when both ep's asconf_enable and auth_enable are enabled, SCTP_CID_ASCONF and SCTP_CID_ASCONF_ACK should be added into auth_chunk_list. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4e27428f |
|
19-Aug-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: add asconf_enable in struct sctp_endpoint This patch is to make addip/asconf flag per endpoint, and its value is initialized by the per netns flag, net->sctp.addip_enable. It also replaces the checks of net->sctp.addip_enable with ep->asconf_enable in some places. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a64e59c7 |
|
30-Jul-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: factor out sctp_connect_add_peer In this function factored out from sctp_sendmsg_new_asoc() and __sctp_connect(), it adds a peer with the other addr into the asoc after this asoc is created with the 1st addr. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f26f9951 |
|
30-Jul-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: factor out sctp_connect_new_asoc In this function factored out from sctp_sendmsg_new_asoc() and __sctp_connect(), it creates the asoc and adds a peer with the 1st addr. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dd8378b3 |
|
30-Jul-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: clean up __sctp_connect __sctp_connect is doing quit similar things as sctp_sendmsg_new_asoc. To factor out common functions, this patch is to clean up their code to make them look more similar: 1. create the asoc and add a peer with the 1st addr. 2. add peers with the other addrs into this asoc one by one. while at it, also remove the unused 'addrcnt'. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f40f1177 |
|
30-Jul-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: check addr_size with sa_family_t size in __sctp_setsockopt_connectx Now __sctp_connect() is called by __sctp_setsockopt_connectx() and sctp_inet_connect(), the latter has done addr_size check with size of sa_family_t. In the next patch to clean up __sctp_connect(), we will remove addr_size check with size of sa_family_t from __sctp_connect() for the 1st address. So before doing that, __sctp_setsockopt_connectx() should do this check first, as sctp_inet_connect() does. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d4e575ba |
|
29-Jul-2019 |
Enrico Weigelt <info@metux.net> |
net: sctp: drop unneeded likely() call around IS_ERR() IS_ERR() already calls unlikely(), so this extra unlikely() call around IS_ERR() is not needed. Signed-off-by: Enrico Weigelt <info@metux.net> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e55f4b8b |
|
08-Jul-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: rename sp strm_interleave to ep intl_enable Like other endpoint features, strm_interleave should be moved to sctp_endpoint and renamed to intl_enable. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
da1f6d4d |
|
08-Jul-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: rename asoc intl_enable to asoc peer.intl_capable To keep consistent with other asoc features, we move intl_enable to peer.intl_capable in asoc. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1c134753 |
|
08-Jul-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: remove prsctp_enable from asoc Like reconf_enable, prsctp_enable should also be removed from asoc, as asoc->peer.prsctp_capable has taken its job. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a96701fb |
|
08-Jul-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: remove reconf_enable from asoc asoc's reconf support is actually decided by the 4-shakehand negotiation, not something that users can set by sockopt. asoc->peer.reconf_capable is working for this. So remove it from asoc. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9b6c0887 |
|
26-Jun-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: not bind the socket in sctp_connect Now when sctp_connect() is called with a wrong sa_family, it binds to a port but doesn't set bp->port, then sctp_get_af_specific will return NULL and sctp_connect() returns -EINVAL. Then if sctp_bind() is called to bind to another port, the last port it has bound will leak due to bp->port is NULL by then. sctp_connect() doesn't need to bind ports, as later __sctp_connect will do it if bp->port is NULL. So remove it from sctp_connect(). While at it, remove the unnecessary sockaddr.sa_family len check as it's already done in sctp_inet_connect. Fixes: 644fbdeacf1d ("sctp: fix the issue that flags are ignored when using kernel_connect") Reported-by: syzbot+079bf326b38072f849d9@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
47505b8b |
|
23-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 104 Based on 1 normalized pattern(s): this sctp implementation is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 or at your option any later version this sctp implementation is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with gnu cc see the file copying if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 42 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190523091649.683323110@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
1033990a |
|
15-Apr-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: implement memory accounting on tx path Now when sending packets, sk_mem_charge() and sk_mem_uncharge() have been used to set sk_forward_alloc. We just need to call sk_wmem_schedule() to check if the allocated should be raised, and call sk_mem_reclaim() to check if the allocated should be reduced when it's under memory pressure. If sk_wmem_schedule() returns false, which means no memory is allowed to allocate, it will block and wait for memory to become available. Note different from tcp, sctp wait_for_buf happens before allocating any skb, so memory accounting check is done with the whole msg_len before it too. Reported-by: Matteo Croce <mcroce@redhat.com> Tested-by: Matteo Croce <mcroce@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
175f7c1f |
|
12-Apr-2019 |
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> |
sctp: Check address length before reading address family KMSAN will complain if valid address length passed to connect() is shorter than sizeof("struct sockaddr"->sa_family) bytes. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ef82bcfa |
|
20-Mar-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use memdup_user instead of vmemdup_user In sctp_setsockopt_bindx()/__sctp_setsockopt_connectx(), it allocates memory with addrs_size which is passed from userspace. We used flag GFP_USER to put some more restrictions on it in Commit cacc06215271 ("sctp: use GFP_USER for user-controlled kmalloc"). However, since Commit c981f254cc82 ("sctp: use vmemdup_user() rather than badly open-coding memdup_user()"), vmemdup_user() has been used, which doesn't check GFP_USER flag when goes to vmalloc_*(). So when addrs_size is a huge value, it could exhaust memory and even trigger oom killer. This patch is to use memdup_user() instead, in which GFP_USER would work to limit the memory allocation with a huge addrs_size. Note we can't fix it by limiting 'addrs_size', as there's no demand for it from RFC. Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com Fixes: c981f254cc82 ("sctp: use vmemdup_user() rather than badly open-coding memdup_user()") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b59c19d9 |
|
18-Mar-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_STREAM_SCHEDULER sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_STREAM_SCHEDULER sockopt. Fixes: 7efba10d6bd2 ("sctp: add SCTP_FUTURE_ASOC and SCTP_CURRENT_ASSOC for SCTP_STREAM_SCHEDULER sockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
99518619 |
|
18-Mar-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_EVENT sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_EVENT sockopt. Fixes: d251f05e3ba2 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_EVENT sockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9430ff99 |
|
18-Mar-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_ENABLE_STREAM_RESET sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_ENABLE_STREAM_RESET sockopt. Fixes: 99a62135e127 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_ENABLE_STREAM_RESET sockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cbb45c6c |
|
18-Mar-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_PRINFO sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_DEFAULT_PRINFO sockopt. Fixes: 3a583059d187 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_PRINFO sockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
200f3a3b |
|
18-Mar-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_AUTH_DEACTIVATE_KEY sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_AUTH_DEACTIVATE_KEY sockopt. Fixes: 2af66ff3edc7 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_DEACTIVATE_KEY sockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
220675eb |
|
18-Mar-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_AUTH_DELETE_KEY sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_AUTH_DELETE_KEY sockopt. Fixes: 3adcc300603e ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_DELETE_KEY sockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
06b39e85 |
|
18-Mar-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_AUTH_ACTIVE_KEY sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_AUTH_ACTIVE_KEY sockopt. Fixes: bf9fb6ad4f29 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_ACTIVE_KEY sockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0685d6b7 |
|
18-Mar-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_AUTH_KEY sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_AUTH_KEY sockopt. Fixes: 7fb3be13a236 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_KEY sockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
746bc215 |
|
18-Mar-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_MAX_BURST sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_MAX_BURST sockopt. Fixes: e0651a0dc877 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_MAX_BURST sockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cface2cb |
|
18-Mar-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_CONTEXT sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_CONTEXT sockopt. Fixes: 49b037acca8c ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_CONTEXT sockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a842e65b |
|
18-Mar-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SNDINFO sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_DEFAULT_SNDINFO sockopt. Fixes: 92fc3bd928c9 ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_SNDINFO sockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8e2614fc |
|
18-Mar-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DELAYED_SACK sockopt A similar fix as Patch "sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt" on SCTP_DELAYED_SACK sockopt. Fixes: 9c5829e1c49e ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DELAYED_SACK sockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1354e72f |
|
18-Mar-2019 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: fix ignoring asoc_id for tcp-style sockets on SCTP_DEFAULT_SEND_PARAM sockopt Currently if the user pass an invalid asoc_id to SCTP_DEFAULT_SEND_PARAM on a TCP-style socket, it will silently ignore the new parameters. That's because after not finding an asoc, it is checking asoc_id against the known values of CURRENT/FUTURE/ALL values and that fails to match. IOW, if the user supplies an invalid asoc id or not, it should either match the current asoc or the socket itself so that it will inherit these later. Fixes it by forcing asoc_id to SCTP_FUTURE_ASSOC in case it is a TCP-style socket without an asoc, so that the values get set on the socket. Fixes: 707e45b3dc5a ("sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_SEND_PARAM sockopt") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
636d25d5 |
|
18-Mar-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: not copy sctp_sock pd_lobby in sctp_copy_descendant Now sctp_copy_descendant() copies pd_lobby from old sctp scok to new sctp sock. If sctp_sock_migrate() returns error, it will panic when releasing new sock and trying to purge pd_lobby due to the incorrect pointers in pd_lobby. [ 120.485116] kasan: CONFIG_KASAN_INLINE enabled [ 120.486270] kasan: GPF could be caused by NULL-ptr deref or user [ 120.509901] Call Trace: [ 120.510443] sctp_ulpevent_free+0x1e8/0x490 [sctp] [ 120.511438] sctp_queue_purge_ulpevents+0x97/0xe0 [sctp] [ 120.512535] sctp_close+0x13a/0x700 [sctp] [ 120.517483] inet_release+0xdc/0x1c0 [ 120.518215] __sock_release+0x1d2/0x2a0 [ 120.519025] sctp_do_peeloff+0x30f/0x3c0 [sctp] We fix it by not copying sctp_sock pd_lobby in sctp_copy_descendan(), and skb_queue_head_init() can also be removed in sctp_sock_migrate(). Reported-by: syzbot+85e0b422ff140b03672a@syzkaller.appspotmail.com Fixes: 89664c623617 ("sctp: sctp_sock_migrate() returns error if sctp_bind_addr_dup() fails") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c6f33e05 |
|
03-Mar-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: call sctp_auth_init_hmacs() in sctp_sock_migrate() New ep's auth_hmacs should be set if old ep's is set, in case that net->sctp.auth_enable has been changed to 0 by users and new ep's auth_hmacs couldn't be set in sctp_endpoint_init(). It can even crash kernel by doing: 1. on server: sysctl -w net.sctp.auth_enable=1, sysctl -w net.sctp.addip_enable=1, sysctl -w net.sctp.addip_noauth_enable=0, listen() on server, sysctl -w net.sctp.auth_enable=0. 2. on client: connect() to server. 3. on server: accept() the asoc, sysctl -w net.sctp.auth_enable=1. 4. on client: send() asconf packet to server. The call trace: [ 245.280251] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 245.286872] RIP: 0010:sctp_auth_calculate_hmac+0xa3/0x140 [sctp] [ 245.304572] Call Trace: [ 245.305091] <IRQ> [ 245.311287] sctp_sf_authenticate+0x110/0x160 [sctp] [ 245.312311] sctp_sf_eat_auth+0xf2/0x230 [sctp] [ 245.313249] sctp_do_sm+0x9a/0x2d0 [sctp] [ 245.321483] sctp_assoc_bh_rcv+0xed/0x1a0 [sctp] [ 245.322495] sctp_rcv+0xa66/0xc70 [sctp] It's because the old ep->auth_hmacs wasn't copied to the new ep while ep->auth_hmacs is used in sctp_auth_calculate_hmac() when processing the incoming auth chunks, and it should have been done when migrating sock. Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
89664c62 |
|
03-Mar-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: sctp_sock_migrate() returns error if sctp_bind_addr_dup() fails It should fail to create the new sk if sctp_bind_addr_dup() fails when accepting or peeloff an association. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
901efe12 |
|
03-Mar-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: call iov_iter_revert() after sending ABORT The user msg is also copied to the abort packet when doing SCTP_ABORT in sctp_sendmsg_check_sflags(). When SCTP_SENDALL is set, iov_iter_revert() should have been called for sending abort on the next asoc with copying this msg. Otherwise, memcpy_from_msg() in sctp_make_abort_user() will fail and return error. Fixes: 4910280503f3 ("sctp: add support for snd flag SCTP_SENDALL process in sendmsg") Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ba59fb02 |
|
01-Feb-2019 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
sctp: walk the list of asoc safely In sctp_sendmesg(), when walking the list of endpoint associations, the association can be dropped from the list, making the list corrupt. Properly handle this by using list_for_each_entry_safe() Fixes: 4910280503f3 ("sctp: add support for snd flag SCTP_SENDALL process in sendmsg") Reported-by: Secunia Research <vuln@secunia.com> Tested-by: Secunia Research <vuln@secunia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7efba10d |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: add SCTP_FUTURE_ASOC and SCTP_CURRENT_ASSOC for SCTP_STREAM_SCHEDULER sockopt Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_scheduler and check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_scheduler, it's compatible with 0. SCTP_CURRENT_ASSOC is supported for SCTP_STREAM_SCHEDULER in this patch. It also adds default_ss in sctp_sock to support SCTP_FUTURE_ASSOC. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d251f05e |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_EVENT sockopt Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_event and check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_event, it's compatible with 0. SCTP_CURRENT_ASSOC is supported for SCTP_EVENT in this patch. It also adds sctp_assoc_ulpevent_type_set() to make code more readable. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
99a62135 |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_ENABLE_STREAM_RESET sockopt Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_enable_strreset and check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_enable_strreset, it's compatible with 0. SCTP_CURRENT_ASSOC is supported for SCTP_ENABLE_STREAM_RESET in this patch. It also adjusts some code to keep a same check form as other functions. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3a583059 |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_PRINFO sockopt Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_default_prinfo and check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_default_prinfo, it's compatible with 0. SCTP_CURRENT_ASSOC is supported for SCTP_DEFAULT_PRINFO in this patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2af66ff3 |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_DEACTIVATE_KEY sockopt Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_deactivate_key. SCTP_CURRENT_ASSOC is supported for SCTP_AUTH_DEACTIVATE_KEY in this patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3adcc300 |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_DELETE_KEY sockopt Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_del_key. SCTP_CURRENT_ASSOC is supported for SCTP_AUTH_DELETE_KEY in this patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bf9fb6ad |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_ACTIVE_KEY sockopt Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_auth_key. SCTP_CURRENT_ASSOC is supported for SCTP_AUTH_ACTIVE_KEY in this patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7fb3be13 |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_KEY sockopt Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_auth_key. SCTP_CURRENT_ASSOC is supported for SCTP_AUTH_KEY in this patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e0651a0d |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_MAX_BURST sockopt Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_maxburst and check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_maxburst, it's compatible with 0. SCTP_CURRENT_ASSOC is supported for SCTP_CONTEXT in this patch. It also adjusts some code to keep a same check form as other functions. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
49b037ac |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_CONTEXT sockopt Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_context and check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_context, it's compatible with 0. SCTP_CURRENT_ASSOC is supported for SCTP_CONTEXT in this patch. It also adjusts some code to keep a same check form as other functions. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
92fc3bd9 |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_SNDINFO sockopt Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_default_sndinfo and check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_default_sndinfo, it's compatible with 0. SCTP_CURRENT_ASSOC is supported for SCTP_DEFAULT_SNDINFO in this patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
707e45b3 |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_SEND_PARAM sockopt Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_default_send_param and check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_default_send_param, it's compatible with 0. SCTP_CURRENT_ASSOC is supported for SCTP_DEFAULT_SEND_PARAM in this patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9c5829e1 |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DELAYED_SACK sockopt Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_delayed_ack and check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_delayed_ack, it's compatible with 0. SCTP_CURRENT_ASSOC is supported for SCTP_DELAYED_SACK in this patch. It also adds sctp_apply_asoc_delayed_ack() to make code more readable. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e7f28248 |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: add SCTP_CURRENT_ASSOC for SCTP_STREAM_SCHEDULER_VALUE sockopt SCTP_STREAM_SCHEDULER_VALUE is a special one, as its value is not save in sctp_sock, but only in asoc. So only SCTP_CURRENT_ASSOC reserved assoc_id can be used in sctp_setsockopt_scheduler_value. This patch adds SCTP_CURRENT_ASOC support for SCTP_STREAM_SCHEDULER_VALUE. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2e7709d1 |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC for SCTP_INTERLEAVING_SUPPORTED sockopt Check with SCTP_FUTURE_ASSOC instead in sctp_set/getsockopt_reconfig_supported, it's compatible with 0. It also adjusts some code to keep a same check form as other functions. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
acce7f3b |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC for SCTP_RECONFIG_SUPPORTED sockopt Check with SCTP_FUTURE_ASSOC instead in sctp_set/getsockopt_reconfig_supported, it's compatible with 0. It also adjusts some code to keep a same check form as other functions. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fb195605 |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC for SCTP_PR_SUPPORTED sockopt Check with SCTP_FUTURE_ASSOC instead in sctp_set/getsockopt_pr_supported, it's compatible with 0. It also adjusts some code to keep a same check form as other functions. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8add543e |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: add SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_THLDS sockopt Check with SCTP_FUTURE_ASSOC instead in sctp_set/getsockopt_paddr_thresholds, it's compatible with 0. It also adds pf_retrans in sctp_sock to support SCTP_FUTURE_ASSOC. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
48c07217 |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC for SCTP_LOCAL_AUTH_CHUNKS sockopt Check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_local_auth_chunks, it's compatible with 0. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6fd769be |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC for SCTP_MAXSEG sockopt Check with SCTP_FUTURE_ASSOC instead in sctp_set/getsockopt_maxseg, it's compatible with 0. Also check asoc_id early as other sctp setsockopts does. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8889394d |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC for SCTP_ASSOCINFO sockopt Check with SCTP_FUTURE_ASSOC instead in sctp_set/getsockopt_associnfo, it's compatible with 0. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7adb5ed5 |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC for SCTP_RTOINFO sockopt Check with SCTP_FUTURE_ASSOC instead in sctp_set/getsockopt_rtoinfo, it's compatible with 0. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b99e5e02 |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: use SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_PARAMS sockopt Check with SCTP_FUTURE_ASSOC instead in sctp_/setgetsockopt_peer_addr_params, it's compatible with 0. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
80df2704 |
|
28-Jan-2019 |
Xin Long <lucien.xin@gmail.com> |
sctp: introduce SCTP_FUTURE/CURRENT/ALL_ASSOC This patch is to add 3 constants SCTP_FUTURE_ASSOC, SCTP_CURRENT_ASSOC and SCTP_ALL_ASSOC for reserved assoc_ids, as defined in rfc6458#section-7.2. And add the process for them when doing lookup and inserting in sctp_id2assoc and sctp_assoc_set_id. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
afd0a800 |
|
04-Dec-2018 |
Jakub Audykowicz <jakub.audykowicz@gmail.com> |
sctp: frag_point sanity check If for some reason an association's fragmentation point is zero, sctp_datamsg_from_user will try to endlessly try to divide a message into zero-sized chunks. This eventually causes kernel panic due to running out of memory. Although this situation is quite unlikely, it has occurred before as reported. I propose to add this simple last-ditch sanity check due to the severity of the potential consequences. Signed-off-by: Jakub Audykowicz <jakub.audykowicz@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cc3ccf26 |
|
18-Nov-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: not allow to set asoc prsctp_enable by sockopt As rfc7496#section4.5 says about SCTP_PR_SUPPORTED: This socket option allows the enabling or disabling of the negotiation of PR-SCTP support for future associations. For existing associations, it allows one to query whether or not PR-SCTP support was negotiated on a particular association. It means only sctp sock's prsctp_enable can be set. Note that for the limitation of SCTP_{CURRENT|ALL}_ASSOC, we will add it when introducing SCTP_{FUTURE|CURRENT|ALL}_ASSOC for linux sctp in another patchset. v1->v2: - drop the params.assoc_id check as Neil suggested. Fixes: 28aa4c26fce2 ("sctp: add SCTP_PR_SUPPORTED on sctp sockopt") Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
480ba9c1 |
|
18-Nov-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: add sockopt SCTP_EVENT This patch adds sockopt SCTP_EVENT described in rfc6525#section-6.2. With this sockopt users can subscribe to an event from a specified asoc. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a1e3a059 |
|
18-Nov-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: add subscribe per asoc The member subscribe should be per asoc, so that sockopt SCTP_EVENT in the next patch can subscribe a event from one asoc only. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2cc0eeb6 |
|
18-Nov-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: define subscribe in sctp_sock as __u16 The member subscribe in sctp_sock is used to indicate to which of the events it is subscribed, more like a group of flags. So it's better to be defined as __u16 (2 bytpes), instead of struct sctp_event_subscribe (13 bytes). Note that sctp_event_subscribe is an UAPI struct, used on sockopt calls, and thus it will not be removed. This patch only changes the internal storage of the flags. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6ba84574 |
|
12-Nov-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: process sk_reuseport in sctp_get_port_local When socks' sk_reuseport is set, the same port and address are allowed to be bound into these socks who have the same uid. Note that the difference from sk_reuse is that it allows multiple socks to listen on the same port and address. Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
76c6d988 |
|
12-Nov-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: add sock_reuseport for the sock in __sctp_hash_endpoint This is a part of sk_reuseport support for sctp. It defines a helper sctp_bind_addrs_check() to check if the bind_addrs in two socks are matched. It will add sock_reuseport if they are completely matched, and return err if they are partly matched, and alloc sock_reuseport if all socks are not matched at all. It will work until sk_reuseport support is added in sctp_get_port_local() in the next patch. v1->v2: - use 'laddr->valid && laddr2->valid' check instead as Marcelo pointed in sctp_bind_addrs_check(). Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
71335836 |
|
29-Oct-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: check policy more carefully when getting pr status When getting pr_assocstatus and pr_streamstatus by sctp_getsockopt, it doesn't correctly process the case when policy is set with SCTP_PR_SCTP_ALL | SCTP_PR_SCTP_MASK. It even causes a slab-out-of-bounds in sctp_getsockopt_pr_streamstatus(). This patch fixes it by return -EINVAL for this case. Fixes: 0ac1077e3a54 ("sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL") Reported-by: syzbot+5da0d0a72a9e7d791748@syzkaller.appspotmail.com Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cd305c74 |
|
16-Oct-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: use sk_wmem_queued to check for writable space sk->sk_wmem_queued is used to count the size of chunks in out queue while sk->sk_wmem_alloc is for counting the size of chunks has been sent. sctp is increasing both of them before enqueuing the chunks, and using sk->sk_wmem_alloc to check for writable space. However, sk_wmem_alloc is also increased by 1 for the skb allocked for sending in sctp_packet_transmit() but it will not wake up the waiters when sk_wmem_alloc is decreased in this skb's destructor. If msg size is equal to sk_sndbuf and sendmsg is waiting for sndbuf, the check 'msg_len <= sctp_wspace(asoc)' in sctp_wait_for_sndbuf() will keep waiting if there's a skb allocked in sctp_packet_transmit, and later even if this skb got freed, the waiting thread will never get waked up. This issue has been there since very beginning, so we change to use sk->sk_wmem_queued to check for writable space as sk_wmem_queued is not increased for the skb allocked for sending, also as TCP does. SOCK_SNDBUF_LOCK check is also removed here as it's for tx buf auto tuning which I will add in another patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
605c0ac1 |
|
16-Oct-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: count both sk and asoc sndbuf with skb truesize and sctp_chunk size Now it's confusing that asoc sndbuf_used is doing memory accounting with SCTP_DATA_SNDSIZE(chunk) + sizeof(sk_buff) + sizeof(sctp_chunk) while sk sk_wmem_alloc is doing that with skb->truesize + sizeof(sctp_chunk). It also causes sctp_prsctp_prune to count with a wrong freed memory when sndbuf_policy is not set. To make this right and also keep consistent between asoc sndbuf_used, sk sk_wmem_alloc and sk_wmem_queued, use skb->truesize + sizeof(sctp_chunk) for them. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c863850c |
|
16-Oct-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: not free the new asoc when sctp_wait_for_connect returns err When sctp_wait_for_connect is called to wait for connect ready for sp->strm_interleave in sctp_sendmsg_to_asoc, a panic could be triggered if cpu is scheduled out and the new asoc is freed elsewhere, as it will return err and later the asoc gets freed again in sctp_sendmsg. [ 285.840764] list_del corruption, ffff9f0f7b284078->next is LIST_POISON1 (dead000000000100) [ 285.843590] WARNING: CPU: 1 PID: 8861 at lib/list_debug.c:47 __list_del_entry_valid+0x50/0xa0 [ 285.846193] Kernel panic - not syncing: panic_on_warn set ... [ 285.846193] [ 285.848206] CPU: 1 PID: 8861 Comm: sctp_ndata Kdump: loaded Not tainted 4.19.0-rc7.label #584 [ 285.850559] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 285.852164] Call Trace: ... [ 285.872210] ? __list_del_entry_valid+0x50/0xa0 [ 285.872894] sctp_association_free+0x42/0x2d0 [sctp] [ 285.873612] sctp_sendmsg+0x5a4/0x6b0 [sctp] [ 285.874236] sock_sendmsg+0x30/0x40 [ 285.874741] ___sys_sendmsg+0x27a/0x290 [ 285.875304] ? __switch_to_asm+0x34/0x70 [ 285.875872] ? __switch_to_asm+0x40/0x70 [ 285.876438] ? ptep_set_access_flags+0x2a/0x30 [ 285.877083] ? do_wp_page+0x151/0x540 [ 285.877614] __sys_sendmsg+0x58/0xa0 [ 285.878138] do_syscall_64+0x55/0x180 [ 285.878669] entry_SYSCALL_64_after_hwframe+0x44/0xa9 This is a similar issue with the one fixed in Commit ca3af4dd28cf ("sctp: do not free asoc when it is already dead in sctp_sendmsg"). But this one can't be fixed by returning -ESRCH for the dead asoc in sctp_wait_for_connect, as it will break sctp_connect's return value to users. This patch is to simply set err to -ESRCH before it returns to sctp_sendmsg when any err is returned by sctp_wait_for_connect for sp->strm_interleave, so that no asoc would be freed due to this. When users see this error, they will know the packet hasn't been sent. And it also makes sense to not free asoc because waiting connect fails, like the second call for sctp_wait_for_connect in sctp_sendmsg_to_asoc. Fixes: 668c9beb9020 ("sctp: implement assign_number for sctp_stream_interleave") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b336deca |
|
16-Oct-2018 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: fix race on sctp_id2asoc syzbot reported an use-after-free involving sctp_id2asoc. Dmitry Vyukov helped to root cause it and it is because of reading the asoc after it was freed: CPU 1 CPU 2 (working on socket 1) (working on socket 2) sctp_association_destroy sctp_id2asoc spin lock grab the asoc from idr spin unlock spin lock remove asoc from idr spin unlock free(asoc) if asoc->base.sk != sk ... [*] This can only be hit if trying to fetch asocs from different sockets. As we have a single IDR for all asocs, in all SCTP sockets, their id is unique on the system. An application can try to send stuff on an id that matches on another socket, and the if in [*] will protect from such usage. But it didn't consider that as that asoc may belong to another socket, it may be freed in parallel (read: under another socket lock). We fix it by moving the checks in [*] into the protected region. This fixes it because the asoc cannot be freed while the lock is held. Reported-by: syzbot+c7dd55d7aec49d48e49a@syzkaller.appspotmail.com Acked-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0ac1077e |
|
16-Oct-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL instead According to rfc7496 section 4.3 or 4.4: sprstat_policy: This parameter indicates for which PR-SCTP policy the user wants the information. It is an error to use SCTP_PR_SCTP_NONE in sprstat_policy. If SCTP_PR_SCTP_ALL is used, the counters provided are aggregated over all supported policies. We change to dump pr_assoc and pr_stream all status by SCTP_PR_SCTP_ALL instead, and return error for SCTP_PR_SCTP_NONE, as it also said "It is an error to use SCTP_PR_SCTP_NONE in sprstat_policy. " Fixes: 826d253d57b1 ("sctp: add SCTP_PR_ASSOC_STATUS on sctp sockopt") Fixes: d229d48d183f ("sctp: add SCTP_PR_STREAM_STATUS sockopt for prsctp") Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
741880e1 |
|
03-Sep-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabel When users set params.spp_address and get a trans, ipv6_flowlabel flag should be applied into this trans. But even if this one is not an ipv6 trans, it should not go to apply it into all other transes of the asoc but simply ignore it. Fixes: 0b0dce7a36fb ("sctp: add spp_ipv6_flowlabel and spp_dscp for sctp_paddrparams") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
af8a2b8b |
|
03-Sep-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: fix invalid reference to the index variable of the iterator Now in sctp_apply_peer_addr_params(), if SPP_IPV6_FLOWLABEL flag is set and trans is NULL, it would use trans as the index variable to traverse transport_addr_list, then trans is set as the last transport of it. Later, if SPP_DSCP flag is set, it would enter into the wrong branch as trans is actually an invalid reference. So fix it by using a new index variable to traverse transport_addr_list for both SPP_DSCP and SPP_IPV6_FLOWLABEL flags process. Fixes: 0b0dce7a36fb ("sctp: add spp_ipv6_flowlabel and spp_dscp for sctp_paddrparams") Reported-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bab1be79 |
|
27-Aug-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: hold transport before accessing its asoc in sctp_transport_get_next As Marcelo noticed, in sctp_transport_get_next, it is iterating over transports but then also accessing the association directly, without checking any refcnts before that, which can cause an use-after-free Read. So fix it by holding transport before accessing the association. With that, sctp_transport_hold calls can be removed in the later places. Fixes: 626d16f50f39 ("sctp: export some apis or variables for sctp_diag and reuse some for proc") Reported-by: syzbot+fe62a0c9aa6a85c6de16@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
05364ca0 |
|
10-Aug-2018 |
Konstantin Khorenko <khorenko@virtuozzo.com> |
net/sctp: Make wrappers for accessing in/out streams This patch introduces wrappers for accessing in/out streams indirectly. This will enable to replace physically contiguous memory arrays of streams with flexible arrays (or maybe any other appropriate mechanism) which do memory allocation on a per-page basis. Signed-off-by: Oleg Babin <obabin@virtuozzo.com> Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4be4139f |
|
02-Jul-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: add support for setting flowlabel when adding a transport Struct sockaddr_in6 has the member sin6_flowinfo that includes the ipv6 flowlabel, it should also support for setting flowlabel when adding a transport whose ipaddr is from userspace. Note that addrinfo in sctp_sendmsg is using struct in6_addr for the secondary addrs, which doesn't contain sin6_flowinfo, and it needs to copy sin6_flowinfo from the primary addr. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0b0dce7a |
|
02-Jul-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: add spp_ipv6_flowlabel and spp_dscp for sctp_paddrparams spp_ipv6_flowlabel and spp_dscp are added in sctp_paddrparams in this patch so that users could set sctp_sock/asoc/transport dscp and flowlabel with spp_flags SPP_IPV6_FLOWLABEL or SPP_DSCP by SCTP_PEER_ADDR_PARAMS , as described section 8.1.12 in RFC6458. As said in last patch, it uses '| 0x100000' or '|0x1' to mark flowlabel or dscp is set, so that their values could be set to 0. Note that to guarantee that an old app built with old kernel headers could work on the newer kernel, the param's check in sctp_g/setsockopt_peer_addr_params() is also improved, which follows the way that sctp_g/setsockopt_delayed_ack() or some other sockopts' process that accept two types of params does. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b0e9a2fe |
|
28-Jun-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: add support for SCTP_REUSE_PORT sockopt This feature is actually already supported by sk->sk_reuse which can be set by socket level opt SO_REUSEADDR. But it's not working exactly as RFC6458 demands in section 8.1.27, like: - This option only supports one-to-one style SCTP sockets - This socket option must not be used after calling bind() or sctp_bindx(). Besides, SCTP_REUSE_PORT sockopt should be provided for user's programs. Otherwise, the programs with SCTP_REUSE_PORT from other systems will not work in linux. To separate it from the socket level version, this patch adds 'reuse' in sctp_sock and it works pretty much as sk->sk_reuse, but with some extra setup limitations that are needed when it is being enabled. "It should be noted that the behavior of the socket-level socket option to reuse ports and/or addresses for SCTP sockets is unspecified", so it leaves SO_REUSEADDR as is for the compatibility. Note that the name SCTP_REUSE_PORT is somewhat confusing, as its functionality is nearly identical to SO_REUSEADDR, but with some extra restrictions. Here it uses 'reuse' in sctp_sock instead of 'reuseport'. As for sk->sk_reuseport support for SCTP, it will be added in another patch. Thanks to Neil to make this clear. v1->v2: - add sctp_sk->reuse to separate it from the socket level version. v2->v3: - improve changelog according to Marcelo's suggestion. Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a11e1d43 |
|
28-Jun-2018 |
Linus Torvalds <torvalds@linux-foundation.org> |
Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL The poll() changes were not well thought out, and completely unexplained. They also caused a huge performance regression, because "->poll()" was no longer a trivial file operation that just called down to the underlying file operations, but instead did at least two indirect calls. Indirect calls are sadly slow now with the Spectre mitigation, but the performance problem could at least be largely mitigated by changing the "->get_poll_head()" operation to just have a per-file-descriptor pointer to the poll head instead. That gets rid of one of the new indirections. But that doesn't fix the new complexity that is completely unwarranted for the regular case. The (undocumented) reason for the poll() changes was some alleged AIO poll race fixing, but we don't make the common case slower and more complex for some uncommon special case, so this all really needs way more explanations and most likely a fundamental redesign. [ This revert is a revert of about 30 different commits, not reverted individually because that would just be unnecessarily messy - Linus ] Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0eb71a9d |
|
17-Jun-2018 |
NeilBrown <neilb@suse.com> |
rhashtable: split rhashtable.h Due to the use of rhashtables in net namespaces, rhashtable.h is included in lots of the kernel, so a small changes can required a large recompilation. This makes development painful. This patch splits out rhashtable-types.h which just includes the major type declarations, and does not include (non-trivial) inline code. rhashtable.h is no longer included by anything in the include/ directory. Common include files only include rhashtable-types.h so a large recompilation is only triggered when that changes. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
568ea88e |
|
31-Dec-2017 |
Christoph Hellwig <hch@lst.de> |
net/sctp: convert to ->poll_mask Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
644fbdea |
|
20-May-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: fix the issue that flags are ignored when using kernel_connect Now sctp uses inet_dgram_connect as its proto_ops .connect, and the flags param can't be passed into its proto .connect where this flags is really needed. sctp works around it by getting flags from socket file in __sctp_connect. It works for connecting from userspace, as inherently the user sock has socket file and it passes f_flags as the flags param into the proto_ops .connect. However, the sock created by sock_create_kern doesn't have a socket file, and it passes the flags (like O_NONBLOCK) by using the flags param in kernel_connect, which calls proto_ops .connect later. So to fix it, this patch defines a new proto_ops .connect for sctp, sctp_inet_connect, which calls __sctp_connect() directly with this flags param. After this, the sctp's proto .connect can be removed. Note that sctp_inet_connect doesn't need to do some checks that are not needed for sctp, which makes thing better than with inet_dgram_connect. Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
38687b56 |
|
26-Apr-2018 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: allow unsetting sockopt MAXSEG RFC 6458 Section 8.1.16 says that setting MAXSEG as 0 means that the user is not limiting it, and not that it should set to the *current* maximum, as we are doing. This patch thus allow setting it as 0, effectively removing the user limit. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
439ef030 |
|
26-Apr-2018 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: consider idata chunks when setting SCTP_MAXSEG When setting SCTP_MAXSEG sock option, it should consider which kind of data chunk is being used if the asoc is already available, so that the limit better reflect reality. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
63d01330 |
|
26-Apr-2018 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: honor PMTU_DISABLED when handling icmp sctp_sendmsg() could trigger PMTU updates even when PMTU_DISABLED was set, as pmtu_pending could be set unconditionally during icmp handling if the socket was in use by the application. This patch fixes it by checking for PMTU_DISABLED when handling such deferred updates. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6e91b578 |
|
26-Apr-2018 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: re-use sctp_transport_pmtu in sctp_transport_route sctp_transport_route currently is very similar to sctp_transport_pmtu plus a few other bits. This patch reuses sctp_transport_pmtu in sctp_transport_route and removes the duplicated code. Also, as all calls to sctp_transport_route were forcing the dst release before calling it, let's just include such release too. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2521680e |
|
26-Apr-2018 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: remove sctp_assoc_pending_pmtu No need for this helper. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2f5e3c9d |
|
26-Apr-2018 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: introduce sctp_assoc_update_frag_point and avoid the open-coded versions of it. Now sctp_datamsg_from_user can just re-use asoc->frag_point as it will always be updated. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
feddd6c1 |
|
26-Apr-2018 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: introduce sctp_mtu_payload When given a MTU, this function calculates how much payload we can carry on it. Without a MTU, it calculates the amount of header overhead we have. So that when we have extra overhead, like the one added for IP options on SELinux patches, it is easier to handle it. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c4b2893d |
|
26-Apr-2018 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: introduce sctp_assoc_set_pmtu All changes to asoc PMTU should now go through this wrapper, making it easier to track them and to do other actions upon it. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
81e98370 |
|
08-Apr-2018 |
Eric Dumazet <edumazet@google.com> |
sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6 Check must happen before call to ipv6_addr_v4mapped() syzbot report was : BUG: KMSAN: uninit-value in sctp_sockaddr_af net/sctp/socket.c:359 [inline] BUG: KMSAN: uninit-value in sctp_do_bind+0x60f/0xdc0 net/sctp/socket.c:384 CPU: 0 PID: 3576 Comm: syzkaller968804 Not tainted 4.16.0+ #82 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 sctp_sockaddr_af net/sctp/socket.c:359 [inline] sctp_do_bind+0x60f/0xdc0 net/sctp/socket.c:384 sctp_bind+0x149/0x190 net/sctp/socket.c:332 inet6_bind+0x1fd/0x1820 net/ipv6/af_inet6.c:293 SYSC_bind+0x3f2/0x4b0 net/socket.c:1474 SyS_bind+0x54/0x80 net/socket.c:1460 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x43fd49 RSP: 002b:00007ffe99df3d28 EFLAGS: 00000213 ORIG_RAX: 0000000000000031 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fd49 RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8 R10: 00000000004002c8 R11: 0000000000000213 R12: 0000000000401670 R13: 0000000000401700 R14: 0000000000000000 R15: 0000000000000000 Local variable description: ----address@SYSC_bind Variable was created at: SYSC_bind+0x6f/0x4b0 net/socket.c:1461 SyS_bind+0x54/0x80 net/socket.c:1460 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0aee4c25 |
|
12-Mar-2018 |
Neil Horman <nhorman@tuxdriver.com> |
sctp: Fix double free in sctp_sendmsg_to_asoc syzbot/kasan detected a double free in sctp_sendmsg_to_asoc: BUG: KASAN: use-after-free in sctp_association_free+0x7b7/0x930 net/sctp/associola.c:332 Read of size 8 at addr ffff8801d8006ae0 by task syzkaller914861/4202 CPU: 1 PID: 4202 Comm: syzkaller914861 Not tainted 4.16.0-rc4+ #258 Hardware name: Google Google Compute Engine/Google Compute Engine 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x23c/0x360 mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 sctp_association_free+0x7b7/0x930 net/sctp/associola.c:332 sctp_sendmsg+0xc67/0x1a80 net/sctp/socket.c:2075 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg+0xca/0x110 net/socket.c:639 SYSC_sendto+0x361/0x5c0 net/socket.c:1748 SyS_sendto+0x40/0x50 net/socket.c:1716 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 This was introduced by commit: f84af33 sctp: factor out sctp_sendmsg_to_asoc from sctp_sendmsg As the newly refactored function moved the wait_for_sndbuf call to a point after the association was connected, allowing for peeloff events to occur, which in turn caused wait_for_sndbuf to return -EPIPE which was not caught by the logic that determines if an association should be freed or not. Fix it the easy way by returning the ordering of sctp_primitive_ASSOCIATE and sctp_wait_for_sndbuf to the old order, to ensure that EPIPE will not happen. Tested by myself using the syzbot reproducers with positive results Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: davem@davemloft.net CC: Xin Long <lucien.xin@gmail.com> Reported-by: syzbot+a4e4112c3aff00c8cfd8@syzkaller.appspotmail.com Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ec2e506c |
|
14-Mar-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: add SCTP_AUTH_FREE_KEY type for AUTHENTICATION_EVENT This patch is to add SCTP_AUTH_FREE_KEY type for AUTHENTICATION_EVENT, as described in section 6.1.8 of RFC6458. SCTP_AUTH_FREE_KEY: This report indicates that the SCTP implementation will no longer use the key identifier specified in auth_keynumber. After deactivating a key, it would never be used again, which means it's refcnt can't be held/increased by new chunks. But there may be some chunks in out queue still using it. So only when refcnt is 1, which means no chunk in outqueue is using/holding this key either, this EVENT would be sent. When users receive this notification, they could do DEL_KEY sockopt to remove this shkey, and also tell the peer that this key won't be used in any chunk thoroughly from now on, then the peer can remove it as well safely. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
601590ec |
|
14-Mar-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: add sockopt SCTP_AUTH_DEACTIVATE_KEY This patch is to add sockopt SCTP_AUTH_DEACTIVATE_KEY, as described in section 8.3.4 of RFC6458. This set option indicates that the application will no longer send user messages using the indicated key identifier. Note that RFC requires that only deactivated keys that are no longer used by an association can be deleted, but for the backward compatibility, it is not to check deactivated when deleting or replacing one sh_key. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3ff547c0 |
|
14-Mar-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: add support for SCTP AUTH Information for sendmsg This patch is to add support for SCTP AUTH Information for sendmsg, as described in section 5.3.8 of RFC6458. With this option, you can provide shared key identifier used for sending the user message. It's also a necessary send info for sctp_sendv. Note that it reuses sinfo->sinfo_tsn to indicate if this option is set and sinfo->sinfo_ssn to save the shkey ID which can be 0. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1b1e0bc9 |
|
14-Mar-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: add refcnt support for sh_key With refcnt support for sh_key, chunks auth sh_keys can be decided before enqueuing it. Changing the active key later will not affect the chunks already enqueued. Furthermore, this is necessary when adding the support for authinfo for sendmsg in next patch. Note that struct sctp_chunk can't be grown due to that performance drop issue on slow cpu, so it just reuses head_skb memory for shkey in sctp_chunk. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d98985dd |
|
12-Mar-2018 |
Wei Yongjun <weiyongjun1@huawei.com> |
sctp: fix error return code in sctp_sendmsg_new_asoc() Return error code -EINVAL in the address len check error handling case since 'err' can be overwrite to 0 by 'err = sctp_verify_addr()' in the for loop. Fixes: 2c0dbaa0c43d ("sctp: add support for SCTP_DSTADDRV4/6 Information for sendmsg") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
49102805 |
|
05-Mar-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: add support for snd flag SCTP_SENDALL process in sendmsg This patch is to add support for snd flag SCTP_SENDALL process in sendmsg, as described in section 5.3.4 of RFC6458. With this flag, you can send the same data to all the asocs of this sk once. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2c0dbaa0 |
|
05-Mar-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: add support for SCTP_DSTADDRV4/6 Information for sendmsg This patch is to add support for Destination IPv4/6 Address options for sendmsg, as described in section 5.3.9/10 of RFC6458. With this option, you can provide more than one destination addrs to sendmsg when creating asoc, like sctp_connectx. It's also a necessary send info for sctp_sendv. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ed63afb8 |
|
05-Mar-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: add support for PR-SCTP Information for sendmsg This patch is to add support for PR-SCTP Information for sendmsg, as described in section 5.3.7 of RFC6458. With this option, you can specify pr_policy and pr_value for user data in sendmsg. It's also a necessary send info for sctp_sendv. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0a3920d2 |
|
01-Mar-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: adjust some codes in a better order in sctp_sendmsg sctp_sendmsg_new_asoc and SCTP_ADDR_OVER check is only necessary when daddr is set, so move them up to if (daddr) statement. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
007b7e18 |
|
01-Mar-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: improve some variables in sctp_sendmsg This patch mostly is to: - rename sinfo_flags as sflags, to make the indents look better, and also keep consistent with other sctp_sendmsg_xx functions. - replace new_asoc with bool new, no need to define a pointer here, as if new_asoc is set, it must be asoc. - rename the 'out_nounlock:' as 'out', shorter and nicer. - remove associd, only one place is using it now, just use sinfo->sinfo_assoc_id directly. - remove 'cmsgs' initialization in sctp_sendmsg, as it will be done in sctp_sendmsg_parse. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8e87c6eb |
|
01-Mar-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: remove the unnecessary transport looking up from sctp_sendmsg Now sctp_assoc_lookup_paddr can only be called only if daddr has been set. But if daddr has been set, sctp_endpoint_lookup_assoc would be done, where it could already have the transport. So this unnecessary transport looking up should be removed, but only reset transport as NULL when SCTP_ADDR_OVER is not set for UDP type socket. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d42cb06e |
|
01-Mar-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: factor out sctp_sendmsg_update_sinfo from sctp_sendmsg This patch is to move the codes for trying to get sinfo from asoc into sctp_sendmsg_update_sinfo. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
204f817f |
|
01-Mar-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: factor out sctp_sendmsg_parse from sctp_sendmsg This patch is to move the codes for parsing msghdr and checking sk into sctp_sendmsg_parse. Note that different from before, 'sinfo' in sctp_sendmsg won't be NULL any more. It gets the value either from cmsgs->srinfo, cmsgs->sinfo or asoc. With it, the 'sinfo' and 'fill_sinfo_ttl' check can be removed from sctp_sendmsg. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
becef9b1 |
|
01-Mar-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: factor out sctp_sendmsg_get_daddr from sctp_sendmsg This patch is to move the codes for trying to get daddr from msg->msg_name into sctp_sendmsg_get_daddr. Note that after adding 'daddr', 'to' and 'msg_name' can be deleted. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c2666de1 |
|
01-Mar-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: factor out sctp_sendmsg_check_sflags from sctp_sendmsg This patch is to move the codes for checking sinfo_flags on one asoc after this asoc has been found into sctp_sendmsg_check_sflags. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2bfd80f9 |
|
01-Mar-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: factor out sctp_sendmsg_new_asoc from sctp_sendmsg This patch is to move the codes for creating a new asoc if no asoc was found into sctp_sendmsg_new_asoc. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f84af331 |
|
01-Mar-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: factor out sctp_sendmsg_to_asoc from sctp_sendmsg This patch is to move the codes for checking and sending on one asoc after this asoc has been found or created into sctp_sendmsg_to_asoc. Note that 'err != -ESRCH' check is for the case that asoc is freed when waiting for tx buffer in sctp_sendmsg_to_asoc. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2277c7cd |
|
13-Feb-2018 |
Richard Haines <richard_c_haines@btinternet.com> |
sctp: Add LSM hooks Add security hooks allowing security modules to exercise access control over SCTP. Signed-off-by: Richard Haines <richard_c_haines@btinternet.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
|
#
b7e10c25 |
|
24-Feb-2018 |
Richard Haines <richard_c_haines@btinternet.com> |
sctp: Add ip option support Add ip option support to allow LSM security modules to utilise CIPSO/IPv4 and CALIPSO/IPv6 services. Signed-off-by: Richard Haines <richard_c_haines@btinternet.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
|
#
a9a08845 |
|
11-Feb-2018 |
Linus Torvalds <torvalds@linux-foundation.org> |
vfs: do bulk POLL* -> EPOLL* replacement This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL* constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f53d77e1 |
|
23-Jan-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: reset ret in again path in sctp_for_each_transport Commit 97a6ec4ac021 ("rhashtable: Change rhashtable_walk_start to return void") only initialized ret for the first time, when going to again path, the next tsp could be NULL. Without resetting ret, cb_done would be called with tsp as NULL. A kernel crash was caused by this when running sctpdiag testcase in sctp-tests. Note that this issue doesn't affect net.git yet. Fixes: 97a6ec4ac021 ("rhashtable: Change rhashtable_walk_start to return void") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c981f254 |
|
07-Jan-2018 |
Al Viro <viro@zeniv.linux.org.uk> |
sctp: use vmemdup_user() rather than badly open-coding memdup_user() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
c5006b8a |
|
15-Jan-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: do not allow the v4 socket to bind a v4mapped v6 address The check in sctp_sockaddr_af is not robust enough to forbid binding a v4mapped v6 addr on a v4 socket. The worse thing is that v4 socket's bind_verify would not convert this v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4 socket bound a v6 addr. This patch is to fix it by doing the common sa.sa_family check first, then AF_INET check for v4mapped v6 addrs. Fixes: 7dab83de50c7 ("sctp: Support ipv6only AF_INET6 sockets.") Reported-by: syzbot+7b7b518b1228d2743963@syzkaller.appspotmail.com Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a0ff6600 |
|
15-Jan-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf After commit cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep"), it may change to lock another sk if the asoc has been peeled off in sctp_wait_for_sndbuf. However, the asoc's new sk could be already closed elsewhere, as it's in the sendmsg context of the old sk that can't avoid the new sk's closing. If the sk's last one refcnt is held by this asoc, later on after putting this asoc, the new sk will be freed, while under it's own lock. This patch is to revert that commit, but fix the old issue by returning error under the old sk's lock. Fixes: cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep") Reported-by: syzbot+ac6ea7baa4432811eb50@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
625637bf |
|
15-Jan-2018 |
Xin Long <lucien.xin@gmail.com> |
sctp: reinit stream if stream outcnt has been change by sinit in sendmsg After introducing sctp_stream structure, sctp uses stream->outcnt as the out stream nums instead of c.sinit_num_ostreams. However when users use sinit in cmsg, it only updates c.sinit_num_ostreams in sctp_sendmsg. At that moment, stream->outcnt is still using previous value. If it's value is not updated, the sinit_num_ostreams of sinit could not really work. This patch is to fix it by updating stream->outcnt and reiniting stream if stream outcnt has been change by sinit in sendmsg. Fixes: a83863174a61 ("sctp: prepare asoc stream for stream reconf") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b2ce04c2 |
|
10-Jun-2017 |
David Windsor <dave@nullcore.net> |
sctp: Copy struct sctp_sock.autoclose to userspace using put_user() The autoclose field can be copied with put_user(), so there is no need to use copy_to_user(). In both cases, hardened usercopy is being bypassed since the size is constant, and not open to runtime manipulation. This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting code in the last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. Signed-off-by: David Windsor <dave@nullcore.net> [kees: adjust commit log] Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-sctp@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
ab9ee8e3 |
|
24-Aug-2017 |
David Windsor <dave@nullcore.net> |
sctp: Define usercopy region in SCTP proto slab cache The SCTP socket event notification subscription information need to be copied to/from userspace. In support of usercopy hardening, this patch defines a region in the struct proto slab cache in which userspace copy operations are allowed. Additionally moves the usercopy fields to be adjacent for the region to cover both. example usage trace: net/sctp/socket.c: sctp_getsockopt_events(...): ... copy_to_user(..., &sctp_sk(sk)->subscribe, len) sctp_setsockopt_events(...): ... copy_from_user(&sctp_sk(sk)->subscribe, ..., optlen) sctp_getsockopt_initmsg(...): ... copy_to_user(..., &sctp_sk(sk)->initmsg, len) This region is known as the slab cache's usercopy region. Slab caches can now check that each dynamically sized copy operation involving cache-managed memory falls entirely within the slab's usercopy region. This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting code in the last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. Signed-off-by: David Windsor <dave@nullcore.net> [kees: split from network patch, move struct members adjacent] [kees: add SCTPv6 struct whitelist, provide usage trace] Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-sctp@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
c76f97c9 |
|
08-Jan-2018 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: make use of pre-calculated len Some sockopt handling functions were calculating the length of the buffer to be written to userspace and then calculating it again when actually writing the buffer, which could lead to some write not using an up-to-date length. This patch updates such places to just make use of the len variable. Also, replace some sizeof(type) to sizeof(var). Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5960cefa |
|
08-Jan-2018 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: add a ceiling to optlen in some sockopts Hangbin Liu reported that some sockopt calls could cause the kernel to log a warning on memory allocation failure if the user supplied a large optlen value. That is because some of them called memdup_user() without a ceiling on optlen, allowing it to try to allocate really large buffers. This patch adds a ceiling by limiting optlen to the maximum allowed that would still make sense for these sockopt. Reported-by: Hangbin Liu <haliu@redhat.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2e83acb9 |
|
08-Jan-2018 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: GFP_ATOMIC is not needed in sctp_setsockopt_events So replace it with GFP_USER and also add __GFP_NOWARN. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8cb38a60 |
|
22-Dec-2017 |
Tonghao Zhang <xiangxia.m.yue@gmail.com> |
sctp: Replace use of sockets_allocated with specified macro. The patch(180d8cd942ce) replaces all uses of struct sock fields' memory_pressure, memory_allocated, sockets_allocated, and sysctl_mem to accessor macros. But the sockets_allocated field of sctp sock is not replaced at all. Then replace it now for unifying the code. Fixes: 180d8cd942ce ("foundations of per-cgroup memory pressure controlling.") Cc: Glauber Costa <glommer@parallels.com> Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cbabf463 |
|
19-Dec-2017 |
Yafang Shao <laoar.shao@gmail.com> |
net: tracepoint: using sock_set_state tracepoint to trace SCTP state transition With changes in inet_ files, SCTP state transitions are traced with inet_sock_set_state tracepoint. As SCTP state names, i.e. SCTP_SS_CLOSED, SCTP_SS_ESTABLISHED, have the same value with TCP state names. So the output info still print the TCP state names, that makes the code easy. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2342b8d9 |
|
10-Dec-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: make sure stream nums can match optlen in sctp_setsockopt_reset_streams Now in sctp_setsockopt_reset_streams, it only does the check optlen < sizeof(*params) for optlen. But it's not enough, as params->srs_number_streams should also match optlen. If the streams in params->srs_stream_list are less than stream nums in params->srs_number_streams, later when dereferencing the stream list, it could cause a slab-out-of-bounds crash, as reported by syzbot. This patch is to fix it by also checking the stream numbers in sctp_setsockopt_reset_streams to make sure at least it's not greater than the streams in the list. Fixes: 7f9d68ac944e ("sctp: implement sender-side procedures for SSN Reset Request Parameter") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
13228238 |
|
08-Dec-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: add support for the process of unordered idata Unordered idata process is more complicated than unordered data: - It has to add mid into sctp_stream_out to save the next mid value, which is separated from ordered idata's. - To support pd for unordered idata, another mid and pd_mode need to be added to save the message id and pd state in sctp_stream_in. - To make unordered idata reasm easier, it adds a new event queue to save frags for idata. The patch mostly adds the samilar reasm functions for unordered idata as ordered idata's, and also adjusts some other codes on assign_mid, abort_pd and ulpevent_data for idata. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9162e0ed |
|
08-Dec-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: implement enqueue_event for sctp_stream_interleave enqueue_event is added as a member of sctp_stream_interleave, used to enqueue either data, idata or notification events into user socket rx queue. It replaces sctp_ulpq_tail_event used in the other places with enqueue_event. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
668c9beb |
|
08-Dec-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: implement assign_number for sctp_stream_interleave assign_number is added as a member of sctp_stream_interleave, used to assign ssn for data or mid (message id) for idata, called in sctp_packet_append_data. sctp_chunk_assign_ssn is left as it is, and sctp_chunk_assign_mid is added for sctp_stream_interleave_1. This procedure is described in section 2.2.2 of RFC8260. All sizeof(struct sctp_data_chunk) in tx path is replaced with sctp_datachk_len, to make it right for idata as well. And also adjust sctp_chunk_is_data for SCTP_CID_I_DATA. After this patch, idata can be built and sent in tx path. Note that if sp strm_interleave is set, it has to wait_connect in sctp_sendmsg, as asoc intl_enable need to be known after 4 shake- hands, to decide if it should use data or idata later. data and idata can't be mixed to send in one asoc. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
772a5869 |
|
08-Dec-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: add stream interleave enable members and sockopt This patch adds intl_enable in asoc and netns, and strm_interleave in sctp_sock to indicate if stream interleave is enabled and supported. netns intl_enable would be set via procfs, but that is not added yet until all stream interleave codes are completely implemented; asoc intl_enable will be set when doing 4-shakehands. sp strm_interleave can be set by sockopt SCTP_INTERLEAVING_SUPPORTED which is also added in this patch. This socket option is defined in section 4.3.1 of RFC8260. Note that strm_interleave can only be set by sockopt when both netns intl_enable and sp frag_interleave are set. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
97a6ec4a |
|
04-Dec-2017 |
Tom Herbert <tom@quantonium.net> |
rhashtable: Change rhashtable_walk_start to return void Most callers of rhashtable_walk_start don't care about a resize event which is indicated by a return value of -EAGAIN. So calls to rhashtable_walk_start are wrapped wih code to ignore -EAGAIN. Something like this is common: ret = rhashtable_walk_start(rhiter); if (ret && ret != -EAGAIN) goto out; Since zero and -EAGAIN are the only possible return values from the function this check is pointless. The condition never evaluates to true. This patch changes rhashtable_walk_start to return void. This simplifies code for the callers that ignore -EAGAIN. For the few cases where the caller cares about the resize event, particularly where the table can be walked in mulitple parts for netlink or seq file dump, the function rhashtable_walk_start_check has been added that returns -EAGAIN on a resize event. Signed-off-by: Tom Herbert <tom@quantonium.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8e1611e2 |
|
05-Dec-2017 |
Al Viro <viro@ZenIV.linux.org.uk> |
make sock_alloc_file() do sock_release() on failures This changes calling conventions (and simplifies the hell out the callers). New rules: once struct socket had been passed to sock_alloc_file(), it's been consumed either by struct file or by sock_release() done by sock_alloc_file(). Either way the caller should not do sock_release() after that point. Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a8dd3979 |
|
26-Nov-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: use right member as the param of list_for_each_entry Commit d04adf1b3551 ("sctp: reset owner sk for data chunks on out queues when migrating a sock") made a mistake that using 'list' as the param of list_for_each_entry to traverse the retransmit, sacked and abandoned queues, while chunks are using 'transmitted_list' to link into these queues. It could cause NULL dereference panic if there are chunks in any of these queues when peeling off one asoc. So use the chunk member 'transmitted_list' instead in this patch. Fixes: d04adf1b3551 ("sctp: reset owner sk for data chunks on out queues when migrating a sock") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ade994f4 |
|
02-Jul-2017 |
Al Viro <viro@zeniv.linux.org.uk> |
net: annotate ->poll() instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
ecca8f88 |
|
16-Nov-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: set frag_point in sctp_setsockopt_maxseg correctly Now in sctp_setsockopt_maxseg user_frag or frag_point can be set with val >= 8 and val <= SCTP_MAX_CHUNK_LEN. But both checks are incorrect. val >= 8 means frag_point can even be less than SCTP_DEFAULT_MINSEGMENT. Then in sctp_datamsg_from_user(), when it's value is greater than cookie echo len and trying to bundle with cookie echo chunk, the first_len will overflow. The worse case is when it's value is equal as cookie echo len, first_len becomes 0, it will go into a dead loop for fragment later on. In Hangbin syzkaller testing env, oom was even triggered due to consecutive memory allocation in that loop. Besides, SCTP_MAX_CHUNK_LEN is the max size of the whole chunk, it should deduct the data header for frag_point or user_frag check. This patch does a proper check with SCTP_DEFAULT_MINSEGMENT subtracting the sctphdr and datahdr, SCTP_MAX_CHUNK_LEN subtracting datahdr when setting frag_point via sockopt. It also improves sctp_setsockopt_maxseg codes. Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reported-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cea0cc80 |
|
15-Nov-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: use the right sk after waking up from wait_buf sleep Commit dfcb9f4f99f1 ("sctp: deny peeloff operation on asocs with threads sleeping on it") fixed the race between peeloff and wait sndbuf by checking waitqueue_active(&asoc->wait) in sctp_do_peeloff(). But it actually doesn't work, as even if waitqueue_active returns false the waiting sndbuf thread may still not yet hold sk lock. After asoc is peeled off, sk is not asoc->base.sk any more, then to hold the old sk lock couldn't make assoc safe to access. This patch is to fix this by changing to hold the new sk lock if sk is not asoc->base.sk, meanwhile, also set the sk in sctp_sendmsg with the new sk. With this fix, there is no more race between peeloff and waitbuf, the check 'waitqueue_active' in sctp_do_peeloff can be removed. Thanks Marcelo and Neil for making this clear. v1->v2: fix it by changing to lock the new sock instead of adding a flag in asoc. Suggested-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ca3af4dd |
|
15-Nov-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: do not free asoc when it is already dead in sctp_sendmsg Now in sctp_sendmsg sctp_wait_for_sndbuf could schedule out without holding sock sk. It means the current asoc can be freed elsewhere, like when receiving an abort packet. If the asoc is just created in sctp_sendmsg and sctp_wait_for_sndbuf returns err, the asoc will be freed again due to new_asoc is not nil. An use-after-free issue would be triggered by this. This patch is to fix it by setting new_asoc with nil if the asoc is already dead when cpu schedules back, so that it will not be freed again in sctp_sendmsg. v1->v2: set new_asoc as nil in sctp_sendmsg instead of sctp_wait_for_sndbuf. Suggested-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d04adf1b |
|
27-Oct-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: reset owner sk for data chunks on out queues when migrating a sock Now when migrating sock to another one in sctp_sock_migrate(), it only resets owner sk for the data in receive queues, not the chunks on out queues. It would cause that data chunks length on the sock is not consistent with sk sk_wmem_alloc. When closing the sock or freeing these chunks, the old sk would never be freed, and the new sock may crash due to the overflow sk_wmem_alloc. syzbot found this issue with this series: r0 = socket$inet_sctp() sendto$inet(r0) listen(r0) accept4(r0) close(r0) Although listen() should have returned error when one TCP-style socket is in connecting (I may fix this one in another patch), it could also be reproduced by peeling off an assoc. This issue is there since very beginning. This patch is to reset owner sk for the chunks on out queues so that sk sk_wmem_alloc has correct value after accept one sock or peeloff an assoc to one sock. Note that when resetting owner sk for chunks on outqueue, it has to sctp_clear_owner_w/skb_orphan chunks before changing assoc->base.sk first and then sctp_set_owner_w them after changing assoc->base.sk, due to that sctp_wfree and it's callees are using assoc->base.sk. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
df80cd9b |
|
17-Oct-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: do not peel off an assoc from one netns to another one Now when peeling off an association to the sock in another netns, all transports in this assoc are not to be rehashed and keep use the old key in hashtable. As a transport uses sk->net as the hash key to insert into hashtable, it would miss removing these transports from hashtable due to the new netns when closing the sock and all transports are being freeed, then later an use-after-free issue could be caused when looking up an asoc and dereferencing those transports. This is a very old issue since very beginning, ChunYu found it with syzkaller fuzz testing with this series: socket$inet6_sctp() bind$inet6() sendto$inet6() unshare(0x40000000) getsockopt$inet_sctp6_SCTP_GET_ASSOC_ID_LIST() getsockopt$inet_sctp6_SCTP_SOCKOPT_PEELOFF() This patch is to block this call when peeling one assoc off from one netns to another one, so that the netns of all transport would not go out-sync with the key in hashtable. Note that this patch didn't fix it by rehashing transports, as it's difficult to handle the situation when the tuple is already in use in the new netns. Besides, no one would like to peel off one assoc to another netns, considering ipaddrs, ifaces, etc. are usually different. Reported-by: ChunYu Wang <chunwang@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0ccdf3c7 |
|
03-Oct-2017 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: add sockopt to get/set stream scheduler parameters As defined per RFC Draft ndata Section 4.3.3, named as SCTP_STREAM_SCHEDULER_VALUE. See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13 Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
13aa8770 |
|
03-Oct-2017 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: add sockopt to get/set stream scheduler As defined per RFC Draft ndata Section 4.3.2, named as SCTP_STREAM_SCHEDULER. See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13 Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f952be79 |
|
03-Oct-2017 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: introduce struct sctp_stream_out_ext With the stream schedulers, sctp_stream_out will become too big to be allocated by kmalloc and as we need to allocate with BH disabled, we cannot use __vmalloc in sctp_stream_init(). This patch moves out the stats from sctp_stream_out to sctp_stream_out_ext, which will be allocated only when the application tries to sendmsg something on it. Just the introduction of sctp_stream_out_ext would already fix the issue described above by splitting the allocation in two. Moving the stats to it also reduces the pressure on the allocator as we will ask for less memory atomically when creating the socket and we will use GFP_KERNEL later. Then, for stream schedulers, we will just use sctp_stream_out_ext. Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d25adbeb |
|
14-Sep-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: fix an use-after-free issue in sctp_sock_dump Commit 86fdb3448cc1 ("sctp: ensure ep is not destroyed before doing the dump") tried to fix an use-after-free issue by checking !sctp_sk(sk)->ep with holding sock and sock lock. But Paolo noticed that endpoint could be destroyed in sctp_rcv without sock lock protection. It means the use-after-free issue still could be triggered when sctp_rcv put and destroy ep after sctp_sock_dump checks !ep, although it's pretty hard to reproduce. I could reproduce it by mdelay in sctp_rcv while msleep in sctp_close and sctp_sock_dump long time. This patch is to add another param cb_done to sctp_for_each_transport and dump ep->assocs with holding tsp after jumping out of transport's traversal in it to avoid this issue. It can also improve sctp diag dump to make it run faster, as no need to save sk into cb->args[5] and keep calling sctp_for_each_transport any more. This patch is also to use int * instead of int for the pos argument in sctp_for_each_transport, which could make postion increment only in sctp_for_each_transport and no need to keep changing cb->args[2] in sctp_sock_filter and sctp_sock_dump any more. Fixes: 86fdb3448cc1 ("sctp: ensure ep is not destroyed before doing the dump") Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ee6c88bb |
|
23-Aug-2017 |
Stefano Brivio <sbrivio@redhat.com> |
sctp: Avoid out-of-bounds reads from address storage inet_diag_msg_sctp{,l}addr_fill() and sctp_get_sctp_info() copy sizeof(sockaddr_storage) bytes to fill in sockaddr structs used to export diagnostic information to userspace. However, the memory allocated to store sockaddr information is smaller than that and depends on the address family, so we leak up to 100 uninitialized bytes to userspace. Just use the size of the source structs instead, in all the three cases this is what userspace expects. Zero out the remaining memory. Unused bytes (i.e. when IPv4 addresses are used) in source structs sctp_sockaddr_entry and sctp_transport are already cleared by sctp_add_bind_addr() and sctp_transport_new(), respectively. Noticed while testing KASAN-enabled kernel with 'ss': [ 2326.885243] BUG: KASAN: slab-out-of-bounds in inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag] at addr ffff881be8779800 [ 2326.896800] Read of size 128 by task ss/9527 [ 2326.901564] CPU: 0 PID: 9527 Comm: ss Not tainted 4.11.0-22.el7a.x86_64 #1 [ 2326.909236] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017 [ 2326.917585] Call Trace: [ 2326.920312] dump_stack+0x63/0x8d [ 2326.924014] kasan_object_err+0x21/0x70 [ 2326.928295] kasan_report+0x288/0x540 [ 2326.932380] ? inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag] [ 2326.938500] ? skb_put+0x8b/0xd0 [ 2326.942098] ? memset+0x31/0x40 [ 2326.945599] check_memory_region+0x13c/0x1a0 [ 2326.950362] memcpy+0x23/0x50 [ 2326.953669] inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag] [ 2326.959596] ? inet_diag_msg_sctpasoc_fill+0x460/0x460 [sctp_diag] [ 2326.966495] ? __lock_sock+0x102/0x150 [ 2326.970671] ? sock_def_wakeup+0x60/0x60 [ 2326.975048] ? remove_wait_queue+0xc0/0xc0 [ 2326.979619] sctp_diag_dump+0x44a/0x760 [sctp_diag] [ 2326.985063] ? sctp_ep_dump+0x280/0x280 [sctp_diag] [ 2326.990504] ? memset+0x31/0x40 [ 2326.994007] ? mutex_lock+0x12/0x40 [ 2326.997900] __inet_diag_dump+0x57/0xb0 [inet_diag] [ 2327.003340] ? __sys_sendmsg+0x150/0x150 [ 2327.007715] inet_diag_dump+0x4d/0x80 [inet_diag] [ 2327.012979] netlink_dump+0x1e6/0x490 [ 2327.017064] __netlink_dump_start+0x28e/0x2c0 [ 2327.021924] inet_diag_handler_cmd+0x189/0x1a0 [inet_diag] [ 2327.028045] ? inet_diag_rcv_msg_compat+0x1b0/0x1b0 [inet_diag] [ 2327.034651] ? inet_diag_dump_compat+0x190/0x190 [inet_diag] [ 2327.040965] ? __netlink_lookup+0x1b9/0x260 [ 2327.045631] sock_diag_rcv_msg+0x18b/0x1e0 [ 2327.050199] netlink_rcv_skb+0x14b/0x180 [ 2327.054574] ? sock_diag_bind+0x60/0x60 [ 2327.058850] sock_diag_rcv+0x28/0x40 [ 2327.062837] netlink_unicast+0x2e7/0x3b0 [ 2327.067212] ? netlink_attachskb+0x330/0x330 [ 2327.071975] ? kasan_check_write+0x14/0x20 [ 2327.076544] netlink_sendmsg+0x5be/0x730 [ 2327.080918] ? netlink_unicast+0x3b0/0x3b0 [ 2327.085486] ? kasan_check_write+0x14/0x20 [ 2327.090057] ? selinux_socket_sendmsg+0x24/0x30 [ 2327.095109] ? netlink_unicast+0x3b0/0x3b0 [ 2327.099678] sock_sendmsg+0x74/0x80 [ 2327.103567] ___sys_sendmsg+0x520/0x530 [ 2327.107844] ? __get_locked_pte+0x178/0x200 [ 2327.112510] ? copy_msghdr_from_user+0x270/0x270 [ 2327.117660] ? vm_insert_page+0x360/0x360 [ 2327.122133] ? vm_insert_pfn_prot+0xb4/0x150 [ 2327.126895] ? vm_insert_pfn+0x32/0x40 [ 2327.131077] ? vvar_fault+0x71/0xd0 [ 2327.134968] ? special_mapping_fault+0x69/0x110 [ 2327.140022] ? __do_fault+0x42/0x120 [ 2327.144008] ? __handle_mm_fault+0x1062/0x17a0 [ 2327.148965] ? __fget_light+0xa7/0xc0 [ 2327.153049] __sys_sendmsg+0xcb/0x150 [ 2327.157133] ? __sys_sendmsg+0xcb/0x150 [ 2327.161409] ? SyS_shutdown+0x140/0x140 [ 2327.165688] ? exit_to_usermode_loop+0xd0/0xd0 [ 2327.170646] ? __do_page_fault+0x55d/0x620 [ 2327.175216] ? __sys_sendmsg+0x150/0x150 [ 2327.179591] SyS_sendmsg+0x12/0x20 [ 2327.183384] do_syscall_64+0xe3/0x230 [ 2327.187471] entry_SYSCALL64_slow_path+0x25/0x25 [ 2327.192622] RIP: 0033:0x7f41d18fa3b0 [ 2327.196608] RSP: 002b:00007ffc3b731218 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 2327.205055] RAX: ffffffffffffffda RBX: 00007ffc3b731380 RCX: 00007f41d18fa3b0 [ 2327.213017] RDX: 0000000000000000 RSI: 00007ffc3b731340 RDI: 0000000000000003 [ 2327.220978] RBP: 0000000000000002 R08: 0000000000000004 R09: 0000000000000040 [ 2327.228939] R10: 00007ffc3b730f30 R11: 0000000000000246 R12: 0000000000000003 [ 2327.236901] R13: 00007ffc3b731340 R14: 00007ffc3b7313d0 R15: 0000000000000084 [ 2327.244865] Object at ffff881be87797e0, in cache kmalloc-64 size: 64 [ 2327.251953] Allocated: [ 2327.254581] PID = 9484 [ 2327.257215] save_stack_trace+0x1b/0x20 [ 2327.261485] save_stack+0x46/0xd0 [ 2327.265179] kasan_kmalloc+0xad/0xe0 [ 2327.269165] kmem_cache_alloc_trace+0xe6/0x1d0 [ 2327.274138] sctp_add_bind_addr+0x58/0x180 [sctp] [ 2327.279400] sctp_do_bind+0x208/0x310 [sctp] [ 2327.284176] sctp_bind+0x61/0xa0 [sctp] [ 2327.288455] inet_bind+0x5f/0x3a0 [ 2327.292151] SYSC_bind+0x1a4/0x1e0 [ 2327.295944] SyS_bind+0xe/0x10 [ 2327.299349] do_syscall_64+0xe3/0x230 [ 2327.303433] return_from_SYSCALL_64+0x0/0x6a [ 2327.308194] Freed: [ 2327.310434] PID = 4131 [ 2327.313065] save_stack_trace+0x1b/0x20 [ 2327.317344] save_stack+0x46/0xd0 [ 2327.321040] kasan_slab_free+0x73/0xc0 [ 2327.325220] kfree+0x96/0x1a0 [ 2327.328530] dynamic_kobj_release+0x15/0x40 [ 2327.333195] kobject_release+0x99/0x1e0 [ 2327.337472] kobject_put+0x38/0x70 [ 2327.341266] free_notes_attrs+0x66/0x80 [ 2327.345545] mod_sysfs_teardown+0x1a5/0x270 [ 2327.350211] free_module+0x20/0x2a0 [ 2327.354099] SyS_delete_module+0x2cb/0x2f0 [ 2327.358667] do_syscall_64+0xe3/0x230 [ 2327.362750] return_from_SYSCALL_64+0x0/0x6a [ 2327.367510] Memory state around the buggy address: [ 2327.372855] ffff881be8779700: fc fc fc fc 00 00 00 00 00 00 00 00 fc fc fc fc [ 2327.380914] ffff881be8779780: fb fb fb fb fb fb fb fb fc fc fc fc 00 00 00 00 [ 2327.388972] >ffff881be8779800: 00 00 00 00 fc fc fc fc fb fb fb fb fb fb fb fb [ 2327.397031] ^ [ 2327.401792] ffff881be8779880: fc fc fc fc fb fb fb fb fb fb fb fb fc fc fc fc [ 2327.409850] ffff881be8779900: 00 00 00 00 00 04 fc fc fc fc fc fc 00 00 00 00 [ 2327.417907] ================================================================== This fixes CVE-2017-7558. References: https://bugzilla.redhat.com/show_bug.cgi?id=1480266 Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Cc: Xin Long <lucien.xin@gmail.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b7ef2618 |
|
10-Aug-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: remove the typedef sctp_socket_type_t This patch is to remove the typedef sctp_socket_type_t, and replace with enum sctp_socket_type in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a05437ac |
|
10-Aug-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: remove the typedef sctp_cmsgs_t This patch is to remove the typedef sctp_cmsgs_t, and replace with struct sctp_cmsgs in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1c662018 |
|
05-Aug-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: remove the typedef sctp_scope_t This patch is to remove the typedef sctp_scope_t, and replace with enum sctp_scope in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2cb5c8e3 |
|
30-Jun-2017 |
Neil Horman <nhorman@tuxdriver.com> |
sctp: Add peeloff-flags socket option Based on a request raised on the sctp devel list, there is a need to augment the sctp_peeloff operation while specifying the O_CLOEXEC and O_NONBLOCK flags (simmilar to the socket syscall). Since modifying the SCTP_SOCKOPT_PEELOFF socket option would break user space ABI for existing programs, this patch creates a new socket option SCTP_SOCKOPT_PEELOFF_FLAGS, which accepts a third flags parameter to allow atomic assignment of the socket descriptor flags. Tested successfully by myself and the requestor Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: Andreas Steinmetz <ast@domdv.de> CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3c918704 |
|
29-Jun-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: remove the typedef sctp_paramhdr_t This patch is to remove the typedef sctp_paramhdr_t, and replace with struct sctp_paramhdr in the places where it's using this typedef. It is also to fix some indents and use sizeof(variable) instead of sizeof(type). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
14afee4b |
|
30-Jun-2017 |
Reshetova, Elena <elena.reshetova@intel.com> |
net: convert sock.sk_wmem_alloc from atomic_t to refcount_t refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
63354797 |
|
30-Jun-2017 |
Reshetova, Elena <elena.reshetova@intel.com> |
net: convert sk_buff.users from atomic_t to refcount_t refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
988c7322 |
|
15-Jun-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: return next obj by passing pos + 1 into sctp_transport_get_idx In sctp_for_each_transport, pos is used to save how many objs it has dumped. Now it gets the last obj by sctp_transport_get_idx, then gets the next obj by sctp_transport_get_next. The issue is that in the meanwhile if some objs in transport hashtable are removed and the objs nums are less than pos, sctp_transport_get_idx would return NULL and hti.walker.tbl is NULL as well. At this moment it should stop hti, instead of continue getting the next obj. Or it would cause a NULL pointer dereference in sctp_transport_get_next. This patch is to pass pos + 1 into sctp_transport_get_idx to get the next obj directly, even if pos > objs nums, it would return NULL and stop hti. Fixes: 626d16f50f39 ("sctp: export some apis or variables for sctp_diag and reuse some for proc") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6dfe4b97 |
|
10-Jun-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: fix recursive locking warning in sctp_do_peeloff Dmitry got the following recursive locking report while running syzkaller fuzzer, the Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:52 print_deadlock_bug kernel/locking/lockdep.c:1729 [inline] check_deadlock kernel/locking/lockdep.c:1773 [inline] validate_chain kernel/locking/lockdep.c:2251 [inline] __lock_acquire+0xef2/0x3430 kernel/locking/lockdep.c:3340 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755 lock_sock_nested+0xcb/0x120 net/core/sock.c:2536 lock_sock include/net/sock.h:1460 [inline] sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432 sock_release+0x8d/0x1e0 net/socket.c:597 __sock_create+0x38b/0x870 net/socket.c:1226 sock_create+0x7f/0xa0 net/socket.c:1237 sctp_do_peeloff+0x1a2/0x440 net/sctp/socket.c:4879 sctp_getsockopt_peeloff net/sctp/socket.c:4914 [inline] sctp_getsockopt+0x111a/0x67e0 net/sctp/socket.c:6628 sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2690 SYSC_getsockopt net/socket.c:1817 [inline] SyS_getsockopt+0x240/0x380 net/socket.c:1799 entry_SYSCALL_64_fastpath+0x1f/0xc2 This warning is caused by the lock held by sctp_getsockopt() is on one socket, while the other lock that sctp_close() is getting later is on the newly created (which failed) socket during peeloff operation. This patch is to avoid this warning by use lock_sock with subclass SINGLE_DEPTH_NESTING as Wang Cong and Marcelo's suggestion. Reported-by: Dmitry Vyukov <dvyukov@google.com> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
581409da |
|
10-Jun-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: disable BH in sctp_for_each_endpoint Now sctp holds read_lock when foreach sctp_ep_hashtable without disabling BH. If CPU schedules to another thread A at this moment, the thread A may be trying to hold the write_lock with disabling BH. As BH is disabled and CPU cannot schedule back to the thread holding the read_lock, while the thread A keeps waiting for the read_lock. A dead lock would be triggered by this. This patch is to fix this dead lock by calling read_lock_bh instead to disable BH when holding the read_lock in sctp_for_each_endpoint. Fixes: 626d16f50f39 ("sctp: export some apis or variables for sctp_diag and reuse some for proc") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
06044751 |
|
07-Jun-2017 |
Eric Dumazet <edumazet@google.com> |
tcp: add TCPMemoryPressuresChrono counter DRAM supply shortage and poor memory pressure tracking in TCP stack makes any change in SO_SNDBUF/SO_RCVBUF (or equivalent autotuning limits) and tcp_mem[] quite hazardous. TCPMemoryPressures SNMP counter is an indication of tcp_mem sysctl limits being hit, but only tracking number of transitions. If TCP stack behavior under stress was perfect : 1) It would maintain memory usage close to the limit. 2) Memory pressure state would be entered for short times. We certainly prefer 100 events lasting 10ms compared to one event lasting 200 seconds. This patch adds a new SNMP counter tracking cumulative duration of memory pressure events, given in ms units. $ cat /proc/sys/net/ipv4/tcp_mem 3088 4117 6176 $ grep TCP /proc/net/sockstat TCP: inuse 180 orphan 0 tw 2 alloc 234 mem 4140 $ nstat -n ; sleep 10 ; nstat |grep Pressure TcpExtTCPMemoryPressures 1700 TcpExtTCPMemoryPressuresChrono 5209 v2: Used EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL() as David instructed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cee360ab |
|
31-May-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: define the member stream as an object instead of pointer in asoc As Marcelo's suggestion, stream is a fixed size member of asoc and would not grow with more streams. To avoid an allocation for it, this patch is to define it as an object instead of pointer and update the places using it, also create sctp_stream_update() called in sctp_assoc_update() to migrate the stream info from one stream to another. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
34b2789f |
|
05-Apr-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: listen on the sock only when it's state is listening or closed Now sctp doesn't check sock's state before listening on it. It could even cause changing a sock with any state to become a listening sock when doing sctp_listen. This patch is to fix it by checking sock's state in sctp_listen, so that it will listen on the sock with right state. Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3ebfdf08 |
|
03-Apr-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: get sock from transport in sctp_transport_update_pmtu This patch is almost to revert commit 02f3d4ce9e81 ("sctp: Adjust PMTU updates to accomodate route invalidation."). As t->asoc can't be NULL in sctp_transport_update_pmtu, it could get sk from asoc, and no need to pass sk into that function. It is also to remove some duplicated codes from that function. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d229d48d |
|
01-Apr-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: add SCTP_PR_STREAM_STATUS sockopt for prsctp Before when implementing sctp prsctp, SCTP_PR_STREAM_STATUS wasn't added, as it needs to save abandoned_(un)sent for every stream. After sctp stream reconf is added in sctp, assoc has structure sctp_stream_out to save per stream info. This patch is to add SCTP_PR_STREAM_STATUS by putting the prsctp per stream statistics into sctp_stream_out. v1->v2: fix an indent issue. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
afe89962 |
|
31-Mar-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: use right in and out stream cnt Since sctp reconf was added in sctp, the real cnt of in/out stream have not been c.sinit_max_instreams and c.sinit_num_ostreams any more. This patch is to replace them with stream->in/outcnt. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f9ba3501 |
|
26-Mar-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: change to save MSG_MORE flag into assoc David Laight noticed the support for MSG_MORE with datamsg->force_delay didn't really work as we expected, as the first msg with MSG_MORE set would always block the following chunks' dequeuing. This Patch is to rewrite it by saving the MSG_MORE flag into assoc as David Laight suggested. asoc->force_delay is used to save MSG_MORE flag before a msg is sent. All chunks in queue would not be sent out if asoc->force_delay is set by the msg with MSG_MORE flag, until a new msg without MSG_MORE flag clears asoc->force_delay. Note that this change would not affect the flush is generated by other triggers, like asoc->state != ESTABLISHED, queue size > pmtu etc. v1->v2: Not clear asoc->force_delay after sending the msg with MSG_MORE flag. Fixes: 4ea0c32f5f42 ("sctp: add support for MSG_MORE") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: David Laight <david.laight@aculab.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2b5cd0df |
|
24-Mar-2017 |
Alexander Duyck <alexander.h.duyck@intel.com> |
net: Change return type of sk_busy_loop from bool to void checking the return value of sk_busy_loop. As there are only a few consumers of that data, and the data being checked for can be replaced with a check for !skb_queue_empty() we might as well just pull the code out of sk_busy_loop and place it in the spots that actually need it. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c0d8bab6 |
|
09-Mar-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: add get and set sockopt for reconf_enable This patchset is to add SCTP_RECONFIG_SUPPORTED sockopt, it would set and get asoc reconf_enable value when asoc_id is set, or it would set and get ep reconf_enalbe value if asoc_id is 0. It is also to add sysctl interface for users to set the default value for reconf_enable. After this patch, stream reconf will work. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cdfbabfb |
|
09-Mar-2017 |
David Howells <dhowells@redhat.com> |
net: Work around lockdep limitation in sockets that use sockets Lockdep issues a circular dependency warning when AFS issues an operation through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem. The theory lockdep comes up with is as follows: (1) If the pagefault handler decides it needs to read pages from AFS, it calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but creating a call requires the socket lock: mmap_sem must be taken before sk_lock-AF_RXRPC (2) afs_open_socket() opens an AF_RXRPC socket and binds it. rxrpc_bind() binds the underlying UDP socket whilst holding its socket lock. inet_bind() takes its own socket lock: sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET (3) Reading from a TCP socket into a userspace buffer might cause a fault and thus cause the kernel to take the mmap_sem, but the TCP socket is locked whilst doing this: sk_lock-AF_INET must be taken before mmap_sem However, lockdep's theory is wrong in this instance because it deals only with lock classes and not individual locks. The AF_INET lock in (2) isn't really equivalent to the AF_INET lock in (3) as the former deals with a socket entirely internal to the kernel that never sees userspace. This is a limitation in the design of lockdep. Fix the general case by: (1) Double up all the locking keys used in sockets so that one set are used if the socket is created by userspace and the other set is used if the socket is created by the kernel. (2) Store the kern parameter passed to sk_alloc() in a variable in the sock struct (sk_kern_sock). This informs sock_lock_init(), sock_init_data() and sk_clone_lock() as to the lock keys to be used. Note that the child created by sk_clone_lock() inherits the parent's kern setting. (3) Add a 'kern' parameter to ->accept() that is analogous to the one passed in to ->create() that distinguishes whether kernel_accept() or sys_accept4() was the caller and can be passed to sk_alloc(). Note that a lot of accept functions merely dequeue an already allocated socket. I haven't touched these as the new socket already exists before we get the parameter. Note also that there are a couple of places where I've made the accepted socket unconditionally kernel-based: irda_accept() rds_rcp_accept_one() tcp_accept_from_sock() because they follow a sock_create_kern() and accept off of that. Whilst creating this, I noticed that lustre and ocfs don't create sockets through sock_create_kern() and thus they aren't marked as for-kernel, though they appear to be internal. I wonder if these should do that so that they use the new set of lock keys. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3f07c014 |
|
08-Feb-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/signal.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
dfcb9f4f |
|
23-Feb-2017 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: deny peeloff operation on asocs with threads sleeping on it commit 2dcab5984841 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf") attempted to avoid a BUG_ON call when the association being used for a sendmsg() is blocked waiting for more sndbuf and another thread did a peeloff operation on such asoc, moving it to another socket. As Ben Hutchings noticed, then in such case it would return without locking back the socket and would cause two unlocks in a row. Further analysis also revealed that it could allow a double free if the application managed to peeloff the asoc that is created during the sendmsg call, because then sctp_sendmsg() would try to free the asoc that was created only for that call. This patch takes another approach. It will deny the peeloff operation if there is a thread sleeping on the asoc, so this situation doesn't exist anymore. This avoids the issues described above and also honors the syscalls that are already being handled (it can be multiple sendmsg calls). Joint work with Xin Long. Fixes: 2dcab5984841 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf") Cc: Alexander Popov <alex.popov@linux.com> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4ea0c32f |
|
18-Feb-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: add support for MSG_MORE This patch is to add support for MSG_MORE on sctp. It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after creating datamsg according to the send flag. sctp_packet_can_append_data then uses it to decide if the chunks of this msg will be sent at once or delay it. Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of in assoc. As sctp enqueues the chunks first, then dequeue them one by one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may affect other chunks' bundling. Since last patch, sctp flush out queue once assoc state falls into SHUTDOWN_PENDING, the close block problem mentioned in [1] has been solved as well. [1] https://patchwork.ozlabs.org/patch/372404/ Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
242bd2d5 |
|
08-Feb-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: implement sender-side procedures for Add Incoming/Outgoing Streams Request Parameter This patch is to implement Sender-Side Procedures for the Add Outgoing and Incoming Streams Request Parameter described in rfc6525 section 5.1.5-5.1.6. It is also to add sockopt SCTP_ADD_STREAMS in rfc6525 section 6.3.4 for users. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a92ce1a4 |
|
08-Feb-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: implement sender-side procedures for SSN/TSN Reset Request Parameter This patch is to implement Sender-Side Procedures for the SSN/TSN Reset Request Parameter descibed in rfc6525 section 5.1.4. It is also to add sockopt SCTP_RESET_ASSOC in rfc6525 section 6.3.3 for users. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
912964ea |
|
07-Feb-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: check af before verify address in sctp_addr_id2transport Commit 6f29a1306131 ("sctp: sctp_addr_id2transport should verify the addr before looking up assoc") invoked sctp_verify_addr to verify the addr. But it didn't check af variable beforehand, once users pass an address with family = 0 through sockopt, sctp_get_af_specific will return NULL and NULL pointer dereference will be caused by af->sockaddr_len. This patch is to fix it by returning NULL if af variable is NULL. Fixes: 6f29a1306131 ("sctp: sctp_addr_id2transport should verify the addr before looking up assoc") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c86a773c |
|
06-Feb-2017 |
Julian Anastasov <ja@ssi.bg> |
sctp: add dst_pending_confirm flag Add new transport flag to allow sockets to confirm neighbour. When same struct dst_entry can be used for many different neighbours we can not use it for pending confirmations. The flag is propagated from transport to every packet. It is reset when cached dst is reset. Reported-by: YueHaibing <yuehaibing@huawei.com> Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.") Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2dcab598 |
|
06-Feb-2017 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: avoid BUG_ON on sctp_wait_for_sndbuf Alexander Popov reported that an application may trigger a BUG_ON in sctp_wait_for_sndbuf if the socket tx buffer is full, a thread is waiting on it to queue more data and meanwhile another thread peels off the association being used by the first thread. This patch replaces the BUG_ON call with a proper error handling. It will return -EPIPE to the original sendmsg call, similarly to what would have been done if the association wasn't found in the first place. Acked-by: Alexander Popov <alex.popov@linux.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6f29a130 |
|
23-Jan-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: sctp_addr_id2transport should verify the addr before looking up assoc sctp_addr_id2transport is a function for sockopt to look up assoc by address. As the address is from userspace, it can be a v4-mapped v6 address. But in sctp protocol stack, it always handles a v4-mapped v6 address as a v4 address. So it's necessary to convert it to a v4 address before looking up assoc by address. This patch is to fix it by calling sctp_verify_addr in which it can do this conversion before calling sctp_endpoint_lookup_assoc, just like what sctp_sendmsg and __sctp_connect do for the address from users. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4548b683 |
|
20-Jan-2017 |
Krister Johansen <kjlx@templeofstupid.com> |
Introduce a sysctl that modifies the value of PROT_SOCK. Add net.ipv4.ip_unprivileged_port_start, which is a per namespace sysctl that denotes the first unprivileged inet port in the namespace. To disable all privileged ports set this to zero. It also checks for overlap with the local port range. The privileged and local range may not overlap. The use case for this change is to allow containerized processes to bind to priviliged ports, but prevent them from ever being allowed to modify their container's network configuration. The latter is accomplished by ensuring that the network namespace is not a child of the user namespace. This modification was needed to allow the container manager to disable a namespace's priviliged port restrictions without exposing control of the network namespace to processes in the user namespace. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7f9d68ac |
|
17-Jan-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: implement sender-side procedures for SSN Reset Request Parameter This patch is to implement sender-side procedures for the Outgoing and Incoming SSN Reset Request Parameter described in rfc6525 section 5.1.2 and 5.1.3. It is also add sockopt SCTP_RESET_STREAMS in rfc6525 section 6.3.2 for users. Note that the new asoc member strreset_outstanding is to make sure only one reconf request chunk on the fly as rfc6525 section 5.1.1 demands. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9fb657ae |
|
17-Jan-2017 |
Xin Long <lucien.xin@gmail.com> |
sctp: add sockopt SCTP_ENABLE_STREAM_RESET This patch is to add sockopt SCTP_ENABLE_STREAM_RESET to get/set strreset_enable to indicate which reconf request type it supports, which is described in rfc6525 section 6.3.1. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cdfb1a9f |
|
13-Jan-2017 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: remove useless code from sctp_apply_peer_addr_params sctp_frag_point() doesn't store anything, and thus just calling it cannot do anything useful. sctp_apply_peer_addr_params is only called by sctp_setsockopt_peer_addr_params. When operating on an asoc, sctp_setsockopt_peer_addr_params will call sctp_apply_peer_addr_params once for the asoc, and then once for each transport this asoc has, meaning that the frag_point will be recomputed when updating the transports and calling it when updating the asoc is not necessary. IOW, no action is needed here and we can remove this call. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
08abb795 |
|
15-Dec-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: sctp_transport_lookup_process should rcu_read_unlock when transport is null Prior to this patch, sctp_transport_lookup_process didn't rcu_read_unlock when it failed to find a transport by sctp_addrs_lookup_transport. This patch is to fix it by moving up rcu_read_unlock right before checking transport and also to remove the out path. Fixes: 1cceda784980 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7fda702f |
|
15-Nov-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: use new rhlist interface on sctp transport rhashtable Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key to hash a node to one chain. If in one host thousands of assocs connect to one server with the same lport and different laddrs (although it's not a normal case), all the transports would be hashed into the same chain. It may cause to keep returning -EBUSY when inserting a new node, as the chain is too long and sctp inserts a transport node in a loop, which could even lead to system hangs there. The new rhlist interface works for this case that there are many nodes with the same key in one chain. It puts them into a list then makes this list be as a node of the chain. This patch is to replace rhashtable_ interface with rhltable_ interface. Since a chain would not be too long and it would not return -EBUSY with this fix when inserting a node, the reinsert loop is also removed here. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5bf35ddf |
|
13-Nov-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: change sk state only when it has assocs in sctp_shutdown Now when users shutdown a sock with SEND_SHUTDOWN in sctp, even if this sock has no connection (assoc), sk state would be changed to SCTP_SS_CLOSING, which is not as we expect. Besides, after that if users try to listen on this sock, kernel could even panic when it dereference sctp_sk(sk)->bind_hash in sctp_inet_listen, as bind_hash is null when sock has no assoc. This patch is to move sk state change after checking sk assocs is not empty, and also merge these two if() conditions and reduce indent level. Fixes: d46e416c11c8 ("sctp: sctp should change socket state when shutdown is received") Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7233bc84 |
|
03-Nov-2016 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: assign assoc_id earlier in __sctp_connect sctp_wait_for_connect() currently already holds the asoc to keep it alive during the sleep, in case another thread release it. But Andrey Konovalov and Dmitry Vyukov reported an use-after-free in such situation. Problem is that __sctp_connect() doesn't get a ref on the asoc and will do a read on the asoc after calling sctp_wait_for_connect(), but by then another thread may have closed it and the _put on sctp_wait_for_connect will actually release it, causing the use-after-free. Fix is, instead of doing the read after waiting for the connect, do it before so, and avoid this issue as the socket is still locked by then. There should be no issue on returning the asoc id in case of failure as the application shouldn't trust on that number in such situations anyway. This issue doesn't exist in sctp_sendmsg() path. Reported-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cd26da4f |
|
31-Oct-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: hold transport instead of assoc in sctp_diag In sctp_transport_lookup_process(), Commit 1cceda784980 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock") moved cb() out of rcu lock, but it put transport and hold assoc instead, and ignore that cb() still uses transport. It may cause a use-after-free issue. This patch is to hold transport instead of assoc there. Fixes: 1cceda784980 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a4b8e71b |
|
21-Oct-2016 |
Jiri Slaby <jirislaby@kernel.org> |
net: sctp, forbid negative length Most of getsockopt handlers in net/sctp/socket.c check len against sizeof some structure like: if (len < sizeof(int)) return -EINVAL; On the first look, the check seems to be correct. But since len is int and sizeof returns size_t, int gets promoted to unsigned size_t too. So the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is false. Fix this in sctp by explicitly checking len < 0 before any getsockopt handler is called. Note that sctp_getsockopt_events already handled the negative case. Since we added the < 0 check elsewhere, this one can be removed. If not checked, this is the result: UBSAN: Undefined behaviour in ../mm/page_alloc.c:2722:19 shift exponent 52 is too large for 32-bit type 'int' CPU: 1 PID: 24535 Comm: syz-executor Not tainted 4.8.1-0-syzkaller #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 0000000000000000 ffff88006d99f2a8 ffffffffb2f7bdea 0000000041b58ab3 ffffffffb4363c14 ffffffffb2f7bcde ffff88006d99f2d0 ffff88006d99f270 0000000000000000 0000000000000000 0000000000000034 ffffffffb5096422 Call Trace: [<ffffffffb3051498>] ? __ubsan_handle_shift_out_of_bounds+0x29c/0x300 ... [<ffffffffb273f0e4>] ? kmalloc_order+0x24/0x90 [<ffffffffb27416a4>] ? kmalloc_order_trace+0x24/0x220 [<ffffffffb2819a30>] ? __kmalloc+0x330/0x540 [<ffffffffc18c25f4>] ? sctp_getsockopt_local_addrs+0x174/0xca0 [sctp] [<ffffffffc18d2bcd>] ? sctp_getsockopt+0x10d/0x1b0 [sctp] [<ffffffffb37c1219>] ? sock_common_getsockopt+0xb9/0x150 [<ffffffffb37be2f5>] ? SyS_getsockopt+0x1a5/0x270 Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-sctp@vger.kernel.org Cc: netdev@vger.kernel.org Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1cceda78 |
|
28-Sep-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock When sctp dumps all the ep->assocs, it needs to lock_sock first, but now it locks sock in rcu_read_lock, and lock_sock may sleep, which would break rcu_read_lock. This patch is to get and hold one sock when traversing the list. After that and get out of rcu_read_lock, lock and dump it. Then it will traverse the list again to get the next one until all sctp socks are dumped. For sctp_diag_dump_one, it fixes this issue by holding asoc and moving cb() out of rcu_read_lock in sctp_transport_lookup_process. Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b61c654f |
|
13-Sep-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: free msg->chunks when sctp_primitive_SEND return err Last patch "sctp: do not return the transmit err back to sctp_sendmsg" made sctp_primitive_SEND return err only when asoc state is unavailable. In this case, chunks are not enqueued, they have no chance to be freed if we don't take care of them later. This Patch is actually to revert commit 1cd4d5c4326a ("sctp: remove the unused sctp_datamsg_free()"), commit 69b5777f2e57 ("sctp: hold the chunks only after the chunk is enqueued in outq") and commit 8b570dc9f7b6 ("sctp: only drop the reference on the datamsg after sending a msg"), to use sctp_datamsg_free to free the chunks of current msg. Fixes: 8b570dc9f7b6 ("sctp: only drop the reference on the datamsg after sending a msg") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e0878694 |
|
30-Jul-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: allow receiving msg when TCP-style sk is in CLOSED state Commit 141ddefce7c8 ("sctp: change sk state to CLOSED instead of CLOSING in sctp_sock_migrate") changed sk state to CLOSED if the assoc is closed when sctp_accept clones a new sk. If there is still data in sk receive queue, users will not be able to read it any more, as sctp_recvmsg returns directly if sk state is CLOSED. This patch is to add CLOSED state check in sctp_recvmsg to allow reading data from TCP-style sk with CLOSED state as what TCP does. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5fc382d8 |
|
23-Jul-2016 |
Vegard Nossum <vegard.nossum@oracle.com> |
net/sctp: terminate rhashtable walk correctly I was seeing a lot of these: BUG: sleeping function called from invalid context at mm/slab.h:388 in_atomic(): 0, irqs_disabled(): 0, pid: 14971, name: trinity-c2 Preemption disabled at:[<ffffffff819bcd46>] rhashtable_walk_start+0x46/0x150 [<ffffffff81149abb>] preempt_count_add+0x1fb/0x280 [<ffffffff83295722>] _raw_spin_lock+0x12/0x40 [<ffffffff811aac87>] console_unlock+0x2f7/0x930 [<ffffffff811ab5bb>] vprintk_emit+0x2fb/0x520 [<ffffffff811aba6a>] vprintk_default+0x1a/0x20 [<ffffffff812c171a>] printk+0x94/0xb0 [<ffffffff811d6ed0>] print_stack_trace+0xe0/0x170 [<ffffffff8115835e>] ___might_sleep+0x3be/0x460 [<ffffffff81158490>] __might_sleep+0x90/0x1a0 [<ffffffff8139b823>] kmem_cache_alloc+0x153/0x1e0 [<ffffffff819bca1e>] rhashtable_walk_init+0xfe/0x2d0 [<ffffffff82ec64de>] sctp_transport_walk_start+0x1e/0x60 [<ffffffff82edd8ad>] sctp_transport_seq_start+0x4d/0x150 [<ffffffff8143a82b>] seq_read+0x27b/0x1180 [<ffffffff814f97fc>] proc_reg_read+0xbc/0x180 [<ffffffff813d471b>] __vfs_read+0xdb/0x610 [<ffffffff813d4d3a>] vfs_read+0xea/0x2d0 [<ffffffff813d615b>] SyS_pread64+0x11b/0x150 [<ffffffff8100334c>] do_syscall_64+0x19c/0x410 [<ffffffff832960a5>] return_from_SYSCALL_64+0x0/0x6a [<ffffffffffffffff>] 0xffffffffffffffff Apparently we always need to call rhashtable_walk_stop(), even when rhashtable_walk_start() fails: * rhashtable_walk_start - Start a hash table walk * @iter: Hash table iterator * * Start a hash table walk. Note that we take the RCU lock in all * cases including when we return an error. So you must always call * rhashtable_walk_stop to clean up. otherwise we never call rcu_read_unlock() and we get the splat above. Fixes: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag") See-also: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag") See-also: f2dba9c6 ("rhashtable: Introduce rhashtable_walk_*") Cc: Xin Long <lucien.xin@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: stable@vger.kernel.org Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e5b13f34 |
|
15-Jul-2016 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: recvmsg should be able to run even if sock is in closing state Commit d46e416c11c8 missed to update some other places which checked for the socket being TCP-style AND Established state, as Closing state has some overlapping with the previous understanding of Established. Without this fix, one of the effects is that some already queued rx messages may not be readable anymore depending on how the association teared down, and sending may also not be possible if peer initiated the shutdown. Also merge two if() blocks into one condition on sctp_sendmsg(). Cc: Xin Long <lucien.xin@gmail.com> Fixes: d46e416c11c8 ("sctp: sctp should change socket state when shutdown is received") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1f45f78f |
|
13-Jul-2016 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: allow GSO frags to access the chunk too SCTP will try to access original IP headers on sctp_recvmsg in order to copy the addresses used. There are also other places that do similar access to IP or even SCTP headers. But after 90017accff61 ("sctp: Add GSO support") they aren't always there because they are only present in the header skb. SCTP handles the queueing of incoming data by cloning the incoming skb and limiting to only the relevant payload. This clone has its cb updated to something different and it's then queued on socket rx queue. Thus we need to fix this in two moments. For rx path, not related to socket queue yet, this patch uses a partially copied sctp_input_cb to such GSO frags. This restores the ability to access the headers for this part of the code. Regarding the socket rx queue, it removes iif member from sctp_event and also add a chunk pointer on it. With these changes we're always able to reach the headers again. The biggest change here is that now the sctp_chunk struct and the original skb are only freed after the application consumed the buffer. Note however that the original payload was already like this due to the skb cloning. For iif, SCTP's IPv4 code doesn't use it, so no change is necessary. IPv6 now can fetch it directly from original's IPv6 CB as the original skb is still accessible. In the future we probably can simplify sctp_v*_skb_iif() stuff, as sctp_v4_skb_iif() was called but it's return value not used, and now it's not even called, but such cleanup is out of scope for this change. Fixes: 90017accff61 ("sctp: Add GSO support") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8dbdf1f5 |
|
09-Jul-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: implement prsctp PRIO policy prsctp PRIO policy is a policy to abandon lower priority chunks when asoc doesn't have enough snd buffer, so that the current chunk with higher priority can be queued successfully. Similar to TTL/RTX policy, we will set the priority of the chunk to prsctp_param with sinfo->sinfo_timetolive in sctp_set_prsctp_policy(). So if PRIO policy is enabled, msg->expire_at won't work. asoc->sent_cnt_removable will record how many chunks can be checked to remove. If priority policy is enabled, when the chunk is queued into the out_queue, we will increase sent_cnt_removable. When the chunk is moved to abandon_queue or dequeue and free, we will decrease sent_cnt_removable. In sctp_sendmsg, we will check if there is enough snd buffer for current msg and if sent_cnt_removable is not 0. Then try to abandon chunks in sctp_prune_prsctp when sendmsg from the retransmit/transmited queue, and free chunks from out_queue in right order until the abandon+free size > msg_len - sctp_wfree. For the abandon size, we have to wait until it sends FORWARD TSN, receives the sack and the chunks are really freed. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a6c2f792 |
|
09-Jul-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: implement prsctp TTL policy prsctp TTL policy is a policy to abandon chunks when they expire at the specific time in local stack. It's similar with expires_at in struct sctp_datamsg. This patch uses sinfo->sinfo_timetolive to set the specific time for TTL policy. sinfo->sinfo_timetolive is also used for msg->expires_at. So if prsctp_enable or TTL policy is not enabled, msg->expires_at still works as before. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
826d253d |
|
09-Jul-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: add SCTP_PR_ASSOC_STATUS on sctp sockopt This patch adds SCTP_PR_ASSOC_STATUS to sctp sockopt, which is used to dump the prsctp statistics info from the asoc. The prsctp statistics includes abandoned_sent/unsent from the asoc. abandoned_sent is the count of the packets we drop packets from retransmit/transmited queue, and abandoned_unsent is the count of the packets we drop from out_queue according to the policy. Note: another option for prsctp statistics dump described in rfc is SCTP_PR_STREAM_STATUS, which is used to dump the prsctp statistics info from each stream. But by now, linux doesn't yet have per stream statistics info, it needs rfc6525 to be implemented. As the prsctp statistics for each stream has to be based on per stream statistics, we will delay it until rfc6525 is done in linux. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f959fb44 |
|
09-Jul-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: add SCTP_DEFAULT_PRINFO into sctp sockopt This patch adds SCTP_DEFAULT_PRINFO to sctp sockopt. It is used to set/get sctp Partially Reliable Policies' default params, which includes 3 policies (ttl, rtx, prio) and their values. Still, if we set policy params in sndinfo, we will use the params of sndinfo against chunks, instead of the default params. In this patch, we will use 5-8bit of sp/asoc->default_flags to store prsctp policies, and reuse asoc->default_timetolive to store their values. It means if we enable and set prsctp policy, prior ttl timeout in sctp will not work any more. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
28aa4c26 |
|
09-Jul-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: add SCTP_PR_SUPPORTED on sctp sockopt According to section 4.5 of rfc7496, prsctp_enable should be per asoc. We will add prsctp_enable to both asoc and ep, and replace the places where it used net.sctp->prsctp_enable with asoc->prsctp_enable. ep->prsctp_enable will be initialized with net.sctp->prsctp_enable, and asoc->prsctp_enable will be initialized with ep->prsctp_enable. We can also modify it's value through sockopt SCTP_PR_SUPPORTED. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
141ddefc |
|
15-Jun-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: change sk state to CLOSED instead of CLOSING in sctp_sock_migrate Commit d46e416c11c8 ("sctp: sctp should change socket state when shutdown is received") may set sk_state CLOSING in sctp_sock_migrate, but inet_accept doesn't allow the sk_state other than ESTABLISHED/ CLOSED for sctp. So we will change sk_state to CLOSED, instead of CLOSING, as actually sk is closed already there. Fixes: d46e416c11c8 ("sctp: sctp should change socket state when shutdown is received") Reported-by: Ye Xiaolong <xiaolong.ye@intel.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d46e416c |
|
09-Jun-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: sctp should change socket state when shutdown is received Now sctp doesn't change socket state upon shutdown reception. It changes just the assoc state, even though it's a TCP-style socket. For some cases, if we really need to check sk->sk_state, it's necessary to fix this issue, at least when we use ss or netstat to dump, we can get a more exact information. As an improvement, we will change sk->sk_state when we change asoc->state to SHUTDOWN_RECEIVED, and also do it in sctp_shutdown to keep consistent with sctp_close. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
90017acc |
|
02-Jun-2016 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: Add GSO support SCTP has this pecualiarity that its packets cannot be just segmented to (P)MTU. Its chunks must be contained in IP segments, padding respected. So we can't just generate a big skb, set gso_size to the fragmentation point and deliver it to IP layer. This patch takes a different approach. SCTP will now build a skb as it would be if it was received using GRO. That is, there will be a cover skb with protocol headers and children ones containing the actual segments, already segmented to a way that respects SCTP RFCs. With that, we can tell skb_segment() to just split based on frag_list, trusting its sizes are already in accordance. This way SCTP can benefit from GSO and instead of passing several packets through the stack, it can pass a single large packet. v2: - Added support for receiving GSO frames, as requested by Dave Miller. - Clear skb->cb if packet is GSO (otherwise it's not used by SCTP) - Added heuristics similar to what we have in TCP for not generating single GSO packets that fills cwnd. v3: - consider sctphdr size in skb_gso_transport_seglen() - rebased due to 5c7cdf339af5 ("gso: Remove arbitrary checks for unsupported GSO") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
40eb90e9 |
|
29-May-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: sctp_diag should dump sctp socket type Now we cannot distinguish that one sk is a udp or sctp style when we use ss to dump sctp_info. it's necessary to dump it as well. For sctp_diag, ss support is not officially available, thus there are no official users of this yet, so we can add this field in the middle of sctp_info without breaking user API. v1->v2: - move 'sctpi_s_type' field to the end of struct sctp_info, so that it won't cause incompatibility with applications already built. - add __reserved3 in sctp_info to make sure sctp_info is 8-byte alignment. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
53fa1036 |
|
14-Apr-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: fix some rhashtable functions using in sctp proc/diag When rhashtable_walk_init return err, no release function should be called, and when rhashtable_walk_start return err, we should only invoke rhashtable_walk_exit to release the source. But now when sctp_transport_walk_start return err, we just call rhashtable_walk_stop/exit, and never care about if rhashtable_walk_init or start return err, which is so bad. We will fix it by calling rhashtable_walk_exit if rhashtable_walk_start return err in sctp_transport_walk_start, and if sctp_transport_walk_start return err, we do not need to call sctp_transport_walk_stop any more. For sctp proc, we will use 'iter->start_fail' to decide if we will call rhashtable_walk_stop/exit. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
626d16f5 |
|
14-Apr-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: export some apis or variables for sctp_diag and reuse some for proc For some main variables in sctp.ko, we couldn't export it to other modules, so we have to define some api to access them. It will include sctp transport and endpoint's traversal. There are some transport traversal functions for sctp_diag, we can also use it for sctp_proc. cause they have the similar situation to traversal transport. v2->v3: - rhashtable_walk_init need the parameter gfp, because of recent upstrem update Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
52c52a61 |
|
14-Apr-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: add sctp_info dump api for sctp_diag sctp_diag will dump some important details of sctp's assoc or ep, we use sctp_info to describe them, sctp_get_sctp_info to get them, and export it to sctp_diag.ko. v2->v3: - we will not use list_for_each_safe in sctp_get_sctp_info, cause all the callers of it will use lock_sock. - fix the holes in struct sctp_info with __reserved* field. because sctp_diag is a new feature, and sctp_info is just for now, it may be changed in the future. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
311b2177 |
|
13-Apr-2016 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: simplify sk_receive_queue locking SCTP already serializes access to rcvbuf through its sock lock: sctp_recvmsg takes it right in the start and release at the end, while rx path will also take the lock before doing any socket processing. On sctp_rcv() it will check if there is an user using the socket and, if there is, it will queue incoming packets to the backlog. The backlog processing will do the same. Even timers will do such check and re-schedule if an user is using the socket. Simplifying this will allow us to remove sctp_skb_list_tail and get ride of some expensive lockings. The lists that it is used on are also mangled with functions like __skb_queue_tail and __skb_unlink in the same context, like on sctp_ulpq_tail_event() and sctp_clear_pd(). sctp_close() will also purge those while using only the sock lock. Therefore the lockings performed by sctp_skb_list_tail() are not necessary. This patch removes this function and replaces its calls with just skb_queue_splice_tail_init() instead. The biggest gain is at sctp_ulpq_tail_event(), because the events always contain a list, even if it's queueing a single skb and this was triggering expensive calls to spin_lock_irqsave/_irqrestore for every data chunk received. As SCTP will deliver each data chunk on a corresponding recvmsg, the more effective the change will be. Before this patch, with chunks with 30 bytes: netperf -t SCTP_STREAM -H 192.168.1.2 -cC -l 60 -- -m 30 -S 400000 400000 -s 400000 400000 on a 10Gbit link with 1500 MTU: SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 425984 425984 30 60.00 137.45 7.34 7.36 52.504 52.608 With it: SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 425984 425984 30 60.00 179.10 7.97 6.70 43.740 36.788 Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
486bdee0 |
|
12-Apr-2016 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: add support for RPS and RFS This patch adds what's missing to properly support RPS and RFS on SCTP, as some of it is already implemented in common calls. Having support for RPS and RFS allows better scaling specially because not all NICs support hashing SCTP headers. Save the hash right when we dequeue a skb from inqueue so we do it only once per skb instead of per chunk. New sockets will then inherit the hash through sctp_copy_sock(). Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
96c0e0a9 |
|
22-Mar-2016 |
Andy Lutomirski <luto@kernel.org> |
net/sctp: use in_compat_syscall for sctp_getsockopt_connectx3 SCTP unfortunately has a different ABI for SCTP_SOCKOPT_CONNECTX3 for 32-bit and 64-bit callers. Use in_compat_syscall to correctly distinguish them on all architectures. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
489ce5f4 |
|
13-Mar-2016 |
Nicholas Mc Guire <hofrat@osadl.org> |
sctp: consolidate local_bh_disable/enable + spin_lock/unlock to _bh variant local_bh_disable() + spin_lock() is equivalent to spin_lock_bh(), same for the unlock/enable case, so replace the calls by the appropriate wrappers. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
133800d1 |
|
08-Mar-2016 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: fix copying more bytes than expected in sctp_add_bind_addr Dmitry reported that sctp_add_bind_addr may read more bytes than expected in case the parameter is a IPv4 addr supplied by the user through calls such as sctp_bindx_add(), because it always copies sizeof(union sctp_addr) while the buffer may be just a struct sockaddr_in, which is smaller. This patch then fixes it by limiting the memcpy to the min between the union size and a (new parameter) provided addr size. Where possible this parameter still is the size of that union, except for reading from user-provided buffers, which then it accounts for protocol type. Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
086c653f |
|
10-Feb-2016 |
Craig Gallek <kraig@google.com> |
sock: struct proto hash function may error In order to support fast reuseport lookups in TCP, the hash function defined in struct proto must be capable of returning an error code. This patch changes the function signature of all related hash functions to return an integer and handles or propagates this return value at all call sites. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7a84bd46 |
|
03-Feb-2016 |
Xin Long <lucien.xin@gmail.com> |
sctp: translate network order to host order when users get a hmacid Commit ed5a377d87dc ("sctp: translate host order to network order when setting a hmacid") corrected the hmacid byte-order when setting a hmacid. but the same issue also exists on getting a hmacid. We fix it by changing hmacids to host order when users get them with getsockopt. Fixes: Commit ed5a377d87dc ("sctp: translate host order to network order when setting a hmacid") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5821c769 |
|
24-Jan-2016 |
Herbert Xu <herbert@gondor.apana.org.au> |
sctp: Use shash This patch replaces uses of the long obsolete hash interface with shash. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: David S. Miller <davem@davemloft.net>
|
#
27f7ed2b |
|
22-Jan-2016 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: allow setting SCTP_SACK_IMMEDIATELY by the application This patch extends commit b93d6471748d ("sctp: implement the sender side for SACK-IMMEDIATELY extension") as it didn't white list SCTP_SACK_IMMEDIATELY on sctp_msghdr_parse(), causing it to be understood as an invalid flag and returning -EINVAL to the application. Note that the actual handling of the flag is already there in sctp_datamsg_from_user(). https://tools.ietf.org/html/rfc7053#section-7 Fixes: b93d6471748d ("sctp: implement the sender side for SACK-IMMEDIATELY extension") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b5eff712 |
|
30-Dec-2015 |
Xin Long <lucien.xin@gmail.com> |
sctp: drop the old assoc hashtable of sctp transport hashtable will replace the association hashtable, so association hashtable is not used in sctp any more, so drop the codes about that. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
068d8bd3 |
|
29-Dec-2015 |
Xin Long <lucien.xin@gmail.com> |
sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close In sctp_close, sctp_make_abort_user may return NULL because of memory allocation failure. If this happens, it will bypass any state change and never free the assoc. The assoc has no chance to be freed and it will be kept in memory with the state it had even after the socket is closed by sctp_close(). So if sctp_make_abort_user fails to allocate memory, we should abort the asoc via sctp_primitive_ABORT as well. Just like the annotation in sctp_sf_cookie_wait_prm_abort and sctp_sf_do_9_1_prm_abort said, "Even if we can't send the ABORT due to low memory delete the TCB. This is a departure from our typical NOMEM handling". But then the chunk is NULL (low memory) and the SCTP_CMD_REPLY cmd would dereference the chunk pointer, and system crash. So we should add SCTP_CMD_REPLY cmd only when the chunk is not NULL, just like other places where it adds SCTP_CMD_REPLY cmd. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3538a5c8 |
|
23-Dec-2015 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: label accepted/peeled off sockets Accepted or peeled off sockets were missing a security label (e.g. SELinux) which means that socket was in "unlabeled" state. This patch clones the sock's label from the parent sock and resolves the issue (similar to AF_BLUETOOTH protocol family). Cc: Paul Moore <pmoore@redhat.com> Cc: David Teigland <teigland@redhat.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9ba0b963 |
|
23-Dec-2015 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: use GFP_USER for user-controlled kmalloc Commit cacc06215271 ("sctp: use GFP_USER for user-controlled kmalloc") missed two other spots. For connectx, as it's more likely to be used by kernel users of the API, it detects if GFP_USER should be used or not. Fixes: cacc06215271 ("sctp: use GFP_USER for user-controlled kmalloc") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8b570dc9 |
|
05-Dec-2015 |
lucien <lucien.xin@gmail.com> |
sctp: only drop the reference on the datamsg after sending a msg If the chunks are enqueued successfully but sctp_cmd_interpreter() return err to sctp_sendmsg() (mainly because of no mem), the chunks will get re-queued, but we are dropping the reference and freeing them. The fix is to just drop the reference on the datamsg just as it had succeeded, as: - if the chunks weren't queued, this is enough to get them freed. - if they were queued, they will get freed when they finally get out or discarded. Signed-off-by: Xin Long <lucien.xin@gmail.com> Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
69b5777f |
|
05-Dec-2015 |
lucien <lucien.xin@gmail.com> |
sctp: hold the chunks only after the chunk is enqueued in outq When a msg is sent, sctp will hold the chunks of this msg and then try to enqueue them. But if the chunks are not enqueued in sctp_outq_tail() because of the invalid state, sctp_cmd_interpreter() may still return success to sctp_sendmsg() after calling sctp_outq_flush(), these chunks will become orphans and will leak. So we fix them by moving sctp_chunk_hold() to sctp_outq_tail(), where we are sure that the chunk is going to get queued. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
50a5ffb1 |
|
04-Dec-2015 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: also copy sk_tsflags when copying the socket As we are keeping timestamps on when copying the socket, we also have to copy sk_tsflags. This is needed since b9f40e21ef42 ("net-timestamp: move timestamp flags out of sk_flags"). Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
01ce63c9 |
|
04-Dec-2015 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: update the netstamp_needed counter when copying sockets Dmitry Vyukov reported that SCTP was triggering a WARN on socket destroy related to disabling sock timestamp. When SCTP accepts an association or peel one off, it copies sock flags but forgot to call net_enable_timestamp() if a packet timestamping flag was copied, leading to extra calls to net_disable_timestamp() whenever such clones were closed. The fix is to call net_enable_timestamp() whenever we copy a sock with that flag on, like tcp does. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
602dd62d |
|
01-Dec-2015 |
Eric Dumazet <edumazet@google.com> |
ipv6: sctp: implement sctp_v6_destroy_sock() Dmitry Vyukov reported a memory leak using IPV6 SCTP sockets. We need to call inet6_destroy_sock() to properly release inet6 specific fields. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cacc0621 |
|
30-Nov-2015 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: use GFP_USER for user-controlled kmalloc Dmitry Vyukov reported that the user could trigger a kernel warning by using a large len value for getsockopt SCTP_GET_LOCAL_ADDRS, as that value directly affects the value used as a kmalloc() parameter. This patch thus switches the allocation flags from all user-controllable kmalloc size to GFP_USER to put some more restrictions on it and also disables the warn, as they are not necessary. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ceb5d58b |
|
29-Nov-2015 |
Eric Dumazet <edumazet@google.com> |
net: fix sock_wake_async() rcu protection Dmitry provided a syzkaller (http://github.com/google/syzkaller) triggering a fault in sock_wake_async() when async IO is requested. Said program stressed af_unix sockets, but the issue is generic and should be addressed in core networking stack. The problem is that by the time sock_wake_async() is called, we should not access the @flags field of 'struct socket', as the inode containing this socket might be freed without further notice, and without RCU grace period. We already maintain an RCU protected structure, "struct socket_wq" so moving SOCKWQ_ASYNC_NOSPACE & SOCKWQ_ASYNC_WAITDATA into it is the safe route. It also reduces number of cache lines needing dirtying, so might provide a performance improvement anyway. In followup patches, we might move remaining flags (SOCK_NOSPACE, SOCK_PASSCRED, SOCK_PASSSEC) to save 8 bytes and let 'struct socket' being mostly read and let it being shared between cpus. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9cd3e072 |
|
29-Nov-2015 |
Eric Dumazet <edumazet@google.com> |
net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA This patch is a cleanup to make following patch easier to review. Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA from (struct socket)->flags to a (struct socket_wq)->flags to benefit from RCU protection in sock_wake_async() To ease backports, we rename both constants. Two new helpers, sk_set_bit(int nr, struct sock *sk) and sk_clear_bit(int net, struct sock *sk) are added so that following patch can change their implementation. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1ce0bf50 |
|
25-Nov-2015 |
Herbert Xu <herbert@gondor.apana.org.au> |
net: Generalise wq_has_sleeper helper The memory barrier in the helper wq_has_sleeper is needed by just about every user of waitqueue_active. This patch generalises it by making it take a wait_queue_head_t directly. The existing helper is renamed to skwq_has_sleeper. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b5ffe634 |
|
12-Aug-2015 |
Viresh Kumar <viresh.kumar@linaro.org> |
net: Drop unlikely before IS_ERR(_OR_NULL) IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there is no need to do that again from its callers. Drop it. Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
81296fc6 |
|
22-Jul-2015 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: stop spamming klog with rfc6458, 5.3.2. deprecation warnings Back then when we added support for SCTP_SNDINFO/SCTP_RCVINFO from RFC6458 5.3.4/5.3.5, we decided to add a deprecation warning for the (as per RFC deprecated) SCTP_SNDRCV via commit bbbea41d5e53 ("net: sctp: deprecate rfc6458, 5.3.2. SCTP_SNDRCV support"), see [1]. Imho, it was not a good idea, and we should just revert that message for a couple of reasons: 1) It's uapi and therefore set in stone forever. 2) To be able to run on older and newer kernels, an SCTP application would need to probe for both, SCTP_SNDRCV, but also SCTP_SNDINFO/ SCTP_RCVINFO support, so that on older kernels, it can make use of SCTP_SNDRCV, and on newer kernels SCTP_SNDINFO/SCTP_RCVINFO. In my (limited) experience, a lot of SCTP appliances are migrating to newer kernels only ve(ee)ry slowly. 3) Some people don't have the chance to change their applications, f.e. due to proprietary legacy stuff. So, they'll hit this warning in fast path and are stuck with older kernels. But i.e. due to point 1) I really fail to see the benefit of a warning. So just revert that for now, the issue was reported up Jamal. [1] http://thread.gmane.org/gmane.linux.network/321960/ Reported-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Michael Tuexen <tuexen@fh-muenster.de> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1830fcea |
|
25-Jun-2015 |
David Miller <davem@davemloft.net> |
net: Kill sock->sk_protinfo No more users, so it can now be removed. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2d45a02d |
|
12-Jun-2015 |
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> |
sctp: fix ASCONF list handling ->auto_asconf_splist is per namespace and mangled by functions like sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization. Also, the call to inet_sk_copy_descendant() was backuping ->auto_asconf_list through the copy but was not honoring ->do_auto_asconf, which could lead to list corruption if it was different between both sockets. This commit thus fixes the list handling by using ->addr_wq_lock spinlock to protect the list. A special handling is done upon socket creation and destruction for that. Error handlig on sctp_init_sock() will never return an error after having initialized asconf, so sctp_destroy_sock() can be called without addrq_wq_lock. The lock now will be take on sctp_close_sock(), before locking the socket, so we don't do it in inverse order compared to sctp_addr_wq_timeout_handler(). Instead of taking the lock on sctp_sock_migrate() for copying and restoring the list values, it's preferred to avoid rewritting it by implementing sctp_copy_descendant(). Issue was found with a test application that kept flipping sysctl default_auto_asconf on and off, but one could trigger it by issuing simultaneous setsockopt() calls on multiple sockets or by creating/destroying sockets fast enough. This is only triggerable locally. Fixes: 9f7d653b67ae ("sctp: Add Auto-ASCONF support (core).") Reported-by: Ji Jianwen <jiji@redhat.com> Suggested-by: Neil Horman <nhorman@tuxdriver.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7e3ea6d5 |
|
25-Mar-2015 |
Ying Xue <ying.xue@windriver.com> |
sctp: avoid to repeatedly declare external variables Move the declaration for external variables to sctp.h file avoiding to repeatedly declare them with extern keyword. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1b784140 |
|
02-Mar-2015 |
Ying Xue <ying.xue@windriver.com> |
net: Remove iocb argument from sendmsg and recvmsg After TIPC doesn't depend on iocb argument in its internal implementations of sendmsg() and recvmsg() hooks defined in proto structure, no any user is using iocb argument in them at all now. Then we can drop the redundant iocb argument completely from kinds of implementations of both sendmsg() and recvmsg() in the entire networking stack. Cc: Christoph Hellwig <hch@lst.de> Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2061dcd6 |
|
15-Jan-2015 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: fix race for one-to-many sockets in sendmsg's auto associate I.e. one-to-many sockets in SCTP are not required to explicitly call into connect(2) or sctp_connectx(2) prior to data exchange. Instead, they can directly invoke sendmsg(2) and the SCTP stack will automatically trigger connection establishment through 4WHS via sctp_primitive_ASSOCIATE(). However, this in its current implementation is racy: INIT is being sent out immediately (as it cannot be bundled anyway) and the rest of the DATA chunks are queued up for later xmit when connection is established, meaning sendmsg(2) will return successfully. This behaviour can result in an undesired side-effect that the kernel made the application think the data has already been transmitted, although none of it has actually left the machine, worst case even after close(2)'ing the socket. Instead, when the association from client side has been shut down e.g. first gracefully through SCTP_EOF and then close(2), the client could afterwards still receive the server's INIT_ACK due to a connection with higher latency. This INIT_ACK is then considered out of the blue and hence responded with ABORT as there was no alive assoc found anymore. This can be easily reproduced f.e. with sctp_test application from lksctp. One way to fix this race is to wait for the handshake to actually complete. The fix defers waiting after sctp_primitive_ASSOCIATE() and sctp_primitive_SEND() succeeded, so that DATA chunks cooked up from sctp_sendmsg() have already been placed into the output queue through the side-effect interpreter, and therefore can then be bundeled together with COOKIE_ECHO control chunks. strace from example application (shortened): socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP) = 3 sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")}, msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5 sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")}, msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5 sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")}, msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5 sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")}, msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5 sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")}, msg_iov(0)=[], msg_controllen=48, {cmsg_len=48, cmsg_level=0x84 /* SOL_??? */, cmsg_type=, ...}, msg_flags=0}, 0) = 0 // graceful shutdown for SOCK_SEQPACKET via SCTP_EOF close(3) = 0 tcpdump before patch (fooling the application): 22:33:36.306142 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 3879023686] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3139201684] 22:33:36.316619 IP 192.168.1.115.8888 > 192.168.1.114.41462: sctp (1) [INIT ACK] [init tag: 3345394793] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 3380109591] 22:33:36.317600 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [ABORT] tcpdump after patch: 14:28:58.884116 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 438593213] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3092969729] 14:28:58.888414 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [INIT ACK] [init tag: 381429855] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 2141904492] 14:28:58.888638 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [COOKIE ECHO] , (2) [DATA] (B)(E) [TSN: 3092969729] [...] 14:28:58.893278 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [COOKIE ACK] , (2) [SACK] [cum ack 3092969729] [a_rwnd 106491] [#gap acks 0] [#dup tsns 0] 14:28:58.893591 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969730] [...] 14:28:59.096963 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969730] [a_rwnd 106496] [#gap acks 0] [#dup tsns 0] 14:28:59.097086 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969731] [...] , (2) [DATA] (B)(E) [TSN: 3092969732] [...] 14:28:59.103218 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969732] [a_rwnd 106486] [#gap acks 0] [#dup tsns 0] 14:28:59.103330 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN] 14:28:59.107793 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SHUTDOWN ACK] 14:28:59.107890 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN COMPLETE] Looks like this bug is from the pre-git history museum. ;) Fixes: 08707d5482df ("lksctp-2_5_31-0_5_1.patch") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f95b414e |
|
10-Dec-2014 |
Gu Zheng <guz.fnst@cn.fujitsu.com> |
net: introduce helper macro for_each_cmsghdr Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating cmsghdr from msghdr, just cleanup. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c0371da6 |
|
24-Nov-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
put iov_iter into msghdr Note that the code _using_ ->msg_iter at that point will be very unhappy with anything other than unshifted iovec-backed iov_iter. We still need to convert users to proper primitives. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
e0eb093e |
|
14-Nov-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
switch sctp_user_addto_chunk() and sctp_datamsg_from_user() to passing iov_iter Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
f869c912 |
|
19-Nov-2014 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: keep owned chunk in destructor_arg instead of skb->cb It's just silly to hold the skb destructor argument around inside skb->cb[] as we currently do in SCTP. Nowadays, we're sort of cheating on data accounting in the sense that due to commit 4c3a5bdae293 ("sctp: Don't charge for data in sndbuf again when transmitting packet"), we orphan the skb already in the SCTP output path, i.e. giving back charged data memory, and use a different destructor only to make sure the sk doesn't vanish on skb destruction time. Thus, cb[] is still valid here as we operate within the SCTP layer. (It's generally actually a big candidate for future rework, imho.) However, storing the destructor in the cb[] can easily cause issues should an non sctp_packet_set_owner_w()'ed skb ever escape the SCTP layer, since cb[] may get overwritten by lower layers and thus can corrupt the chunk pointer. There are no such issues at present, but lets keep the chunk in destructor_arg, as this is the actual purpose for it. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
51f3d02b |
|
05-Nov-2014 |
David S. Miller <davem@davemloft.net> |
net: Add and use skb_copy_datagram_msg() helper. This encapsulates all of the skb_copy_datagram_iovec() callers with call argument signature "skb, offset, msghdr->msg_iov, length". When we move to iov_iters in the networking, the iov_iter object will sit in the msghdr. Having a helper like this means there will be less places to touch during that transformation. Based upon descriptions and patch from Al Viro. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
38ab1fa9 |
|
28-Aug-2014 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: fix ABI mismatch through sctp_assoc_to_state helper Since SCTP day 1, that is, 19b55a2af145 ("Initial commit") from lksctp tree, the official <netinet/sctp.h> header carries a copy of enum sctp_sstat_state that looks like (compared to the current in-kernel enumeration): User definition: Kernel definition: enum sctp_sstat_state { typedef enum { SCTP_EMPTY = 0, <removed> SCTP_CLOSED = 1, SCTP_STATE_CLOSED = 0, SCTP_COOKIE_WAIT = 2, SCTP_STATE_COOKIE_WAIT = 1, SCTP_COOKIE_ECHOED = 3, SCTP_STATE_COOKIE_ECHOED = 2, SCTP_ESTABLISHED = 4, SCTP_STATE_ESTABLISHED = 3, SCTP_SHUTDOWN_PENDING = 5, SCTP_STATE_SHUTDOWN_PENDING = 4, SCTP_SHUTDOWN_SENT = 6, SCTP_STATE_SHUTDOWN_SENT = 5, SCTP_SHUTDOWN_RECEIVED = 7, SCTP_STATE_SHUTDOWN_RECEIVED = 6, SCTP_SHUTDOWN_ACK_SENT = 8, SCTP_STATE_SHUTDOWN_ACK_SENT = 7, }; } sctp_state_t; This header was later on also placed into the uapi, so that user space programs can compile without having <netinet/sctp.h>, but the shipped with <linux/sctp.h> instead. While RFC6458 under 8.2.1.Association Status (SCTP_STATUS) says that sstat_state can range from SCTP_CLOSED to SCTP_SHUTDOWN_ACK_SENT, we nevertheless have a what it appears to be dummy SCTP_EMPTY state from the very early days. While it seems to do just nothing, commit 0b8f9e25b0aa ("sctp: remove completely unsed EMPTY state") did the right thing and removed this dead code. That however, causes an off-by-one when the user asks the SCTP stack via SCTP_STATUS API and checks for the current socket state thus yielding possibly undefined behaviour in applications as they expect the kernel to tell the right thing. The enumeration had to be changed however as based on the current socket state, we access a function pointer lookup-table through this. Therefore, I think the best way to deal with this is just to add a helper function sctp_assoc_to_state() to encapsulate the off-by-one quirk. Reported-by: Tristan Su <sooqing@gmail.com> Fixes: 0b8f9e25b0aa ("sctp: remove completely unsed EMPTY state") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
299ee123 |
|
30-Jul-2014 |
Jason Gunthorpe <jgg@ziepe.ca> |
sctp: Fixup v4mapped behaviour to comply with Sock API The SCTP socket extensions API document describes the v4mapping option as follows: 8.1.15. Set/Clear IPv4 Mapped Addresses (SCTP_I_WANT_MAPPED_V4_ADDR) This socket option is a Boolean flag which turns on or off the mapping of IPv4 addresses. If this option is turned on, then IPv4 addresses will be mapped to V6 representation. If this option is turned off, then no mapping will be done of V4 addresses and a user will receive both PF_INET6 and PF_INET type addresses on the socket. See [RFC3542] for more details on mapped V6 addresses. This description isn't really in line with what the code does though. Introduce addr_to_user (renamed addr_v4map), which should be called before any sockaddr is passed back to user space. The new function places the sockaddr into the correct format depending on the SCTP_I_WANT_MAPPED_V4_ADDR option. Audit all places that touched v4mapped and either sanely construct a v4 or v6 address then call addr_to_user, or drop the unnecessary v4mapped check entirely. Audit all places that call addr_to_user and verify they are on a sycall return path. Add a custom getname that formats the address properly. Several bugs are addressed: - SCTP_I_WANT_MAPPED_V4_ADDR=0 often returned garbage for addresses to user space - The addr_len returned from recvmsg was not correct when returning AF_INET on a v6 socket - flowlabel and scope_id were not zerod when promoting a v4 to v6 - Some syscalls like bind and connect behaved differently depending on v4mapped Tested bind, getpeername, getsockname, connect, and recvmsg for proper behaviour in v4mapped = 1 and 0 cases. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bbbea41d |
|
12-Jul-2014 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: deprecate rfc6458, 5.3.2. SCTP_SNDRCV support With support of SCTP_SNDINFO/SCTP_RCVINFO as described in RFC6458, 5.3.4/5.3.5, we can now deprecate SCTP_SNDRCV. The RFC already declares it as deprecated: This structure mixes the send and receive path. SCTP_SNDINFO (described in Section 5.3.4) and SCTP_RCVINFO (described in Section 5.3.5) split this information. These structures should be used, when possible, since SCTP_SNDRCV is deprecated. So whenever a user tries to subscribe to sctp_data_io_event via setsockopt(2) which triggers inclusion of SCTP_SNDRCV cmsg_type, issue a warning in the log. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6b3fd5f3 |
|
12-Jul-2014 |
Geir Ola Vaagland <geirola@gmail.com> |
net: sctp: implement rfc6458, 8.1.31. SCTP_DEFAULT_SNDINFO support This patch implements section 8.1.31. of RFC6458, which adds support for setting/retrieving SCTP_DEFAULT_SNDINFO: Applications that wish to use the sendto() system call may wish to specify a default set of parameters that would normally be supplied through the inclusion of ancillary data. This socket option allows such an application to set the default sctp_sndinfo structure. The application that wishes to use this socket option simply passes the sctp_sndinfo structure (defined in Section 5.3.4) to this call. The input parameters accepted by this call include snd_sid, snd_flags, snd_ppid, and snd_context. The snd_flags parameter is composed of a bitwise OR of SCTP_UNORDERED, SCTP_EOF, and SCTP_SENDALL. The snd_assoc_id field specifies the association to which to apply the parameters. For a one-to-many style socket, any of the predefined constants are also allowed in this field. The field is ignored for one-to-one style sockets. Joint work with Daniel Borkmann. Signed-off-by: Geir Ola Vaagland <geirola@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2347c80f |
|
12-Jul-2014 |
Geir Ola Vaagland <geirola@gmail.com> |
net: sctp: implement rfc6458, 5.3.6. SCTP_NXTINFO cmsg support This patch implements section 5.3.6. of RFC6458, that is, support for 'SCTP Next Receive Information Structure' (SCTP_NXTINFO) which is placed into ancillary data cmsghdr structure for each recvmsg() call, if this information is already available when delivering the current message. This option can be enabled/disabled via setsockopt(2) on SOL_SCTP level by setting an int value with 1/0 for SCTP_RECVNXTINFO in user space applications as per RFC6458, section 8.1.30. The sctp_nxtinfo structure is defined as per RFC as below ... struct sctp_nxtinfo { uint16_t nxt_sid; uint16_t nxt_flags; uint32_t nxt_ppid; uint32_t nxt_length; sctp_assoc_t nxt_assoc_id; }; ... and provided under cmsg_level IPPROTO_SCTP, cmsg_type SCTP_NXTINFO, while cmsg_data[] contains struct sctp_nxtinfo. Joint work with Daniel Borkmann. Signed-off-by: Geir Ola Vaagland <geirola@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0d3a421d |
|
12-Jul-2014 |
Geir Ola Vaagland <geirola@gmail.com> |
net: sctp: implement rfc6458, 5.3.5. SCTP_RCVINFO cmsg support This patch implements section 5.3.5. of RFC6458, that is, support for 'SCTP Receive Information Structure' (SCTP_RCVINFO) which is placed into ancillary data cmsghdr structure for each recvmsg() call. This option can be enabled/disabled via setsockopt(2) on SOL_SCTP level by setting an int value with 1/0 for SCTP_RECVRCVINFO in user space applications as per RFC6458, section 8.1.29. The sctp_rcvinfo structure is defined as per RFC as below ... struct sctp_rcvinfo { uint16_t rcv_sid; uint16_t rcv_ssn; uint16_t rcv_flags; <-- 2 bytes hole --> uint32_t rcv_ppid; uint32_t rcv_tsn; uint32_t rcv_cumtsn; uint32_t rcv_context; sctp_assoc_t rcv_assoc_id; }; ... and provided under cmsg_level IPPROTO_SCTP, cmsg_type SCTP_RCVINFO, while cmsg_data[] contains struct sctp_rcvinfo. An sctp_rcvinfo item always corresponds to the data in msg_iov. Joint work with Daniel Borkmann. Signed-off-by: Geir Ola Vaagland <geirola@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
63b94938 |
|
12-Jul-2014 |
Geir Ola Vaagland <geirola@gmail.com> |
net: sctp: implement rfc6458, 5.3.4. SCTP_SNDINFO cmsg support This patch implements section 5.3.4. of RFC6458, that is, support for 'SCTP Send Information Structure' (SCTP_SNDINFO) which can be placed into ancillary data cmsghdr structure for sendmsg() calls. The sctp_sndinfo structure is defined as per RFC as below ... struct sctp_sndinfo { uint16_t snd_sid; uint16_t snd_flags; uint32_t snd_ppid; uint32_t snd_context; sctp_assoc_t snd_assoc_id; }; ... and supplied under cmsg_level IPPROTO_SCTP, cmsg_type SCTP_SNDINFO, while cmsg_data[] contains struct sctp_sndinfo. An sctp_sndinfo item always corresponds to the data in msg_iov. Joint work with Daniel Borkmann. Signed-off-by: Geir Ola Vaagland <geirola@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
28448b80 |
|
23-May-2014 |
Tom Herbert <therbert@google.com> |
net: Split sk_no_check into sk_no_check_{rx,tx} Define separate fields in the sock structure for configuring disabling checksums in both TX and RX-- sk_no_check_tx and sk_no_check_rx. The SO_NO_CHECK socket option only affects sk_no_check_tx. Also, removed UDP_CSUM_* defines since they are no longer necessary. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
122ff243 |
|
12-May-2014 |
WANG Cong <xiyou.wangcong@gmail.com> |
ipv4: make ip_local_reserved_ports per netns ip_local_port_range is already per netns, so should ip_local_reserved_ports be. And since it is none by default we don't actually need it when we don't enable CONFIG_SYSCTL. By the way, rename inet_is_reserved_local_port() to inet_is_local_reserved_port() Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8465a5fc |
|
17-Apr-2014 |
Neil Horman <nhorman@tuxdriver.com> |
sctp: add support for busy polling to sctp protocol The busy polling socket option adds support for sockets to busy wait on data arriving on the napi queue from which they have most recently received a frame. Currently only tcp and udp support this feature, but theres no reason sctp can't do so as well. Add it in so appliations can take advantage of it Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: Daniel Borkmann <dborkman@redhat.com> CC: netdev@vger.kernel.org Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b14878cc |
|
17-Apr-2014 |
Vlad Yasevich <vyasevic@redhat.com> |
net: sctp: cache auth_enable per endpoint Currently, it is possible to create an SCTP socket, then switch auth_enable via sysctl setting to 1 and crash the system on connect: Oops[#1]: CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1 task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000 [...] Call Trace: [<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80 [<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4 [<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c [<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8 [<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214 [<ffffffff8043af68>] sctp_rcv+0x588/0x630 [<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24 [<ffffffff803acb50>] ip6_input+0x2c0/0x440 [<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564 [<ffffffff80310650>] process_backlog+0xb4/0x18c [<ffffffff80313cbc>] net_rx_action+0x12c/0x210 [<ffffffff80034254>] __do_softirq+0x17c/0x2ac [<ffffffff800345e0>] irq_exit+0x54/0xb0 [<ffffffff800075a4>] ret_from_irq+0x0/0x4 [<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48 [<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148 [<ffffffff805a88b0>] start_kernel+0x37c/0x398 Code: dd0900b8 000330f8 0126302d <dcc60000> 50c0fff1 0047182a a48306a0 03e00008 00000000 ---[ end trace b530b0551467f2fd ]--- Kernel panic - not syncing: Fatal exception in interrupt What happens while auth_enable=0 in that case is, that ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs() when endpoint is being created. After that point, if an admin switches over to auth_enable=1, the machine can crash due to NULL pointer dereference during reception of an INIT chunk. When we enter sctp_process_init() via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk, the INIT verification succeeds and while we walk and process all INIT params via sctp_process_param() we find that net->sctp.auth_enable is set, therefore do not fall through, but invoke sctp_auth_asoc_set_default_hmac() instead, and thus, dereference what we have set to NULL during endpoint initialization phase. The fix is to make auth_enable immutable by caching its value during endpoint initialization, so that its original value is being carried along until destruction. The bug seems to originate from the very first days. Fix in joint work with Daniel Borkmann. Reported-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
362d5204 |
|
14-Apr-2014 |
Daniel Borkmann <daniel@iogearbox.net> |
Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer" This reverts commit ef2820a735f7 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer") as it introduced a serious performance regression on SCTP over IPv4 and IPv6, though a not as dramatic on the latter. Measurements are on 10Gbit/s with ixgbe NICs. Current state: [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64 Time: Fri, 11 Apr 2014 17:56:21 GMT Connecting to host 192.168.241.3, port 5201 Cookie: Lab200slot2.1397238981.812898.548918 [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec [ 4] 16.79-17.82 sec 5.94 MBytes 48.7 Mbits/sec (etc) [root@Lab200slot2 ~]# iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64 Time: Fri, 11 Apr 2014 19:08:41 GMT Connecting to host 2001:db8:0:f101::1, port 5201 Cookie: Lab200slot2.1397243321.714295.2b3f7c [ 4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201 Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 169 MBytes 1.42 Gbits/sec [ 4] 1.00-2.00 sec 201 MBytes 1.69 Gbits/sec [ 4] 2.00-3.00 sec 188 MBytes 1.58 Gbits/sec [ 4] 3.00-4.00 sec 174 MBytes 1.46 Gbits/sec [ 4] 4.00-5.00 sec 165 MBytes 1.39 Gbits/sec [ 4] 5.00-6.00 sec 199 MBytes 1.67 Gbits/sec [ 4] 6.00-7.00 sec 163 MBytes 1.36 Gbits/sec [ 4] 7.00-8.00 sec 174 MBytes 1.46 Gbits/sec [ 4] 8.00-9.00 sec 193 MBytes 1.62 Gbits/sec [ 4] 9.00-10.00 sec 196 MBytes 1.65 Gbits/sec [ 4] 10.00-11.00 sec 157 MBytes 1.31 Gbits/sec [ 4] 11.00-12.00 sec 175 MBytes 1.47 Gbits/sec [ 4] 12.00-13.00 sec 192 MBytes 1.61 Gbits/sec [ 4] 13.00-14.00 sec 199 MBytes 1.67 Gbits/sec (etc) After patch: [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64 Time: Mon, 14 Apr 2014 16:40:48 GMT Connecting to host 192.168.240.3, port 5201 Cookie: Lab200slot2.1397493648.413274.65e131 [ 4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 240 MBytes 2.02 Gbits/sec [ 4] 1.00-2.00 sec 239 MBytes 2.01 Gbits/sec [ 4] 2.00-3.00 sec 240 MBytes 2.01 Gbits/sec [ 4] 3.00-4.00 sec 239 MBytes 2.00 Gbits/sec [ 4] 4.00-5.00 sec 245 MBytes 2.05 Gbits/sec [ 4] 5.00-6.00 sec 240 MBytes 2.01 Gbits/sec [ 4] 6.00-7.00 sec 240 MBytes 2.02 Gbits/sec [ 4] 7.00-8.00 sec 239 MBytes 2.01 Gbits/sec With the reverted patch applied, the SCTP/IPv4 performance is back to normal on latest upstream for IPv4 and IPv6 and has same throughput as 3.4.2 test kernel, steady and interval reports are smooth again. Fixes: ef2820a735f7 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer") Reported-by: Peter Butler <pbutler@sonusnet.com> Reported-by: Dongsheng Song <dongsheng.song@gmail.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: Peter Butler <pbutler@sonusnet.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com> Cc: Alexander Sverdlin <alexander.sverdlin@nsn.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
676d2369 |
|
11-Apr-2014 |
David S. Miller <davem@davemloft.net> |
net: Fix use after free by removing length arg from sk_data_ready callbacks. Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1e1cdf8a |
|
09-Apr-2014 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: test if association is dead in sctp_wake_up_waiters In function sctp_wake_up_waiters(), we need to involve a test if the association is declared dead. If so, we don't have any reference to a possible sibling association anymore and need to invoke sctp_write_space() instead, and normally walk the socket's associations and notify them of new wmem space. The reason for special casing is that otherwise, we could run into the following issue when a sctp_primitive_SEND() call from sctp_sendmsg() fails, and tries to flush an association's outq, i.e. in the following way: sctp_association_free() `-> list_del(&asoc->asocs) <-- poisons list pointer asoc->base.dead = true sctp_outq_free(&asoc->outqueue) `-> __sctp_outq_teardown() `-> sctp_chunk_free() `-> consume_skb() `-> sctp_wfree() `-> sctp_wake_up_waiters() <-- dereferences poisoned pointers if asoc->ep->sndbuf_policy=0 Therefore, only walk the list in an 'optimized' way if we find that the current association is still active. We could also use list_del_init() in addition when we call sctp_association_free(), but as Vlad suggests, we want to trap such bugs and thus leave it poisoned as is. Why is it safe to resolve the issue by testing for asoc->base.dead? Parallel calls to sctp_sendmsg() are protected under socket lock, that is lock_sock()/release_sock(). Only within that path under lock held, we're setting skb/chunk owner via sctp_set_owner_w(). Eventually, chunks are freed directly by an association still under that lock. So when traversing association list on destruction time from sctp_wake_up_waiters() via sctp_wfree(), a different CPU can't be running sctp_wfree() while another one calls sctp_association_free() as both happens under the same lock. Therefore, this can also not race with setting/testing against asoc->base.dead as we are guaranteed for this to happen in order, under lock. Further, Vlad says: the times we check asoc->base.dead is when we've cached an association pointer for later processing. In between cache and processing, the association may have been freed and is simply still around due to reference counts. We check asoc->base.dead under a lock, so it should always be safe to check and not race against sctp_association_free(). Stress-testing seems fine now, too. Fixes: cd253f9f357d ("net: sctp: wake up all assocs if sndbuf policy is per socket") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
52c35bef |
|
08-Apr-2014 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: wake up all assocs if sndbuf policy is per socket SCTP charges chunks for wmem accounting via skb->truesize in sctp_set_owner_w(), and sctp_wfree() respectively as the reverse operation. If a sender runs out of wmem, it needs to wait via sctp_wait_for_sndbuf(), and gets woken up by a call to __sctp_write_space() mostly via sctp_wfree(). __sctp_write_space() is being called per association. Although we assign sk->sk_write_space() to sctp_write_space(), which is then being done per socket, it is only used if send space is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE is set and therefore not invoked in sock_wfree(). Commit 4c3a5bdae293 ("sctp: Don't charge for data in sndbuf again when transmitting packet") fixed an issue where in case sctp_packet_transmit() manages to queue up more than sndbuf bytes, sctp_wait_for_sndbuf() will never be woken up again unless it is interrupted by a signal. However, a still remaining issue is that if net.sctp.sndbuf_policy=0, that is accounting per socket, and one-to-many sockets are in use, the reclaimed write space from sctp_wfree() is 'unfairly' handed back on the server to the association that is the lucky one to be woken up again via __sctp_write_space(), while the remaining associations are never be woken up again (unless by a signal). The effect disappears with net.sctp.sndbuf_policy=1, that is wmem accounting per association, as it guarantees a fair share of wmem among associations. Therefore, if we have reclaimed memory in case of per socket accounting, wake all related associations to a socket in a fair manner, that is, traverse the socket association list starting from the current neighbour of the association and issue a __sctp_write_space() to everyone until we end up waking ourselves. This guarantees that no association is preferred over another and even if more associations are taken into the one-to-many session, all receivers will get messages from the server and are not stalled forever on high load. This setting still leaves the advantage of per socket accounting in touch as an association can still use up global limits if unused by others. Fixes: 4eb701dfc618 ("[SCTP] Fix SCTP sendbuffer accouting.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Thomas Graf <tgraf@suug.ch> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ffd59393 |
|
16-Feb-2014 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: fix sctp_connectx abi for ia32 emulation/compat mode SCTP's sctp_connectx() abi breaks for 64bit kernels compiled with 32bit emulation (e.g. ia32 emulation or x86_x32). Due to internal usage of 'struct sctp_getaddrs_old' which includes a struct sockaddr pointer, sizeof(param) check will always fail in kernel as the structure in 64bit kernel space is 4bytes larger than for user binaries compiled in 32bit mode. Thus, applications making use of sctp_connectx() won't be able to run under such circumstances. Introduce a compat interface in the kernel to deal with such situations by using a 'struct compat_sctp_getaddrs_old' structure where user data is copied into it, and then sucessively transformed into a 'struct sctp_getaddrs_old' structure with the help of compat_ptr(). That fixes sctp_connectx() abi without any changes needed in user space, and lets the SCTP test suite pass when compiled in 32bit and run on 64bit kernels. Fixes: f9c67811ebc0 ("sctp: Fix regression introduced by new sctp_connectx api") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ef2820a7 |
|
14-Feb-2014 |
Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com> |
net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer Implementation of (a)rwnd calculation might lead to severe performance issues and associations completely stalling. These problems are described and solution is proposed which improves lksctp's robustness in congestion state. 1) Sudden drop of a_rwnd and incomplete window recovery afterwards Data accounted in sctp_assoc_rwnd_decrease takes only payload size (sctp data), but size of sk_buff, which is blamed against receiver buffer, is not accounted in rwnd. Theoretically, this should not be the problem as actual size of buffer is double the amount requested on the socket (SO_RECVBUF). Problem here is that this will have bad scaling for data which is less then sizeof sk_buff. E.g. in 4G (LTE) networks, link interfacing radio side will have a large portion of traffic of this size (less then 100B). An example of sudden drop and incomplete window recovery is given below. Node B exhibits problematic behavior. Node A initiates association and B is configured to advertise rwnd of 10000. A sends messages of size 43B (size of typical sctp message in 4G (LTE) network). On B data is left in buffer by not reading socket in userspace. Lets examine when we will hit pressure state and declare rwnd to be 0 for scenario with above stated parameters (rwnd == 10000, chunk size == 43, each chunk is sent in separate sctp packet) Logic is implemented in sctp_assoc_rwnd_decrease: socket_buffer (see below) is maximum size which can be held in socket buffer (sk_rcvbuf). current_alloced is amount of data currently allocated (rx_count) A simple expression is given for which it will be examined after how many packets for above stated parameters we enter pressure state: We start by condition which has to be met in order to enter pressure state: socket_buffer < currently_alloced; currently_alloced is represented as size of sctp packets received so far and not yet delivered to userspace. x is the number of chunks/packets (since there is no bundling, and each chunk is delivered in separate packet, we can observe each chunk also as sctp packet, and what is important here, having its own sk_buff): socket_buffer < x*each_sctp_packet; each_sctp_packet is sctp chunk size + sizeof(struct sk_buff). socket_buffer is twice the amount of initially requested size of socket buffer, which is in case of sctp, twice the a_rwnd requested: 2*rwnd < x*(payload+sizeof(struc sk_buff)); sizeof(struct sk_buff) is 190 (3.13.0-rc4+). Above is stated that rwnd is 10000 and each payload size is 43 20000 < x(43+190); x > 20000/233; x ~> 84; After ~84 messages, pressure state is entered and 0 rwnd is advertised while received 84*43B ~= 3612B sctp data. This is why external observer notices sudden drop from 6474 to 0, as it will be now shown in example: IP A.34340 > B.12345: sctp (1) [INIT] [init tag: 1875509148] [rwnd: 81920] [OS: 10] [MIS: 65535] [init TSN: 1096057017] IP B.12345 > A.34340: sctp (1) [INIT ACK] [init tag: 3198966556] [rwnd: 10000] [OS: 10] [MIS: 10] [init TSN: 902132839] IP A.34340 > B.12345: sctp (1) [COOKIE ECHO] IP B.12345 > A.34340: sctp (1) [COOKIE ACK] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057017] [SID: 0] [SSEQ 0] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057017] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057018] [SID: 0] [SSEQ 1] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057018] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057019] [SID: 0] [SSEQ 2] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057019] [a_rwnd 9914] [#gap acks 0] [#dup tsns 0] <...> IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057098] [SID: 0] [SSEQ 81] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057098] [a_rwnd 6517] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057099] [SID: 0] [SSEQ 82] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057099] [a_rwnd 6474] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057100] [SID: 0] [SSEQ 83] [PPID 0x18] --> Sudden drop IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 0] [#gap acks 0] [#dup tsns 0] At this point, rwnd_press stores current rwnd value so it can be later restored in sctp_assoc_rwnd_increase. This however doesn't happen as condition to start slowly increasing rwnd until rwnd_press is returned to rwnd is never met. This condition is not met since rwnd, after it hit 0, must first reach rwnd_press by adding amount which is read from userspace. Let us observe values in above example. Initial a_rwnd is 10000, pressure was hit when rwnd was ~6500 and the amount of actual sctp data currently waiting to be delivered to userspace is ~3500. When userspace starts to read, sctp_assoc_rwnd_increase will be blamed only for sctp data, which is ~3500. Condition is never met, and when userspace reads all data, rwnd stays on 3569. IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 1505] [#gap acks 0] [#dup tsns 0] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 3010] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057101] [SID: 0] [SSEQ 84] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057101] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0] --> At this point userspace read everything, rwnd recovered only to 3569 IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057102] [SID: 0] [SSEQ 85] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057102] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0] Reproduction is straight forward, it is enough for sender to send packets of size less then sizeof(struct sk_buff) and receiver keeping them in its buffers. 2) Minute size window for associations sharing the same socket buffer In case multiple associations share the same socket, and same socket buffer (sctp.rcvbuf_policy == 0), different scenarios exist in which congestion on one of the associations can permanently drop rwnd of other association(s). Situation will be typically observed as one association suddenly having rwnd dropped to size of last packet received and never recovering beyond that point. Different scenarios will lead to it, but all have in common that one of the associations (let it be association from 1)) nearly depleted socket buffer, and the other association blames socket buffer just for the amount enough to start the pressure. This association will enter pressure state, set rwnd_press and announce 0 rwnd. When data is read by userspace, similar situation as in 1) will occur, rwnd will increase just for the size read by userspace but rwnd_press will be high enough so that association doesn't have enough credit to reach rwnd_press and restore to previous state. This case is special case of 1), being worse as there is, in the worst case, only one packet in buffer for which size rwnd will be increased. Consequence is association which has very low maximum rwnd ('minute size', in our case down to 43B - size of packet which caused pressure) and as such unusable. Scenario happened in the field and labs frequently after congestion state (link breaks, different probabilities of packet drop, packet reordering) and with scenario 1) preceding. Here is given a deterministic scenario for reproduction: >From node A establish two associations on the same socket, with rcvbuf_policy being set to share one common buffer (sctp.rcvbuf_policy == 0). On association 1 repeat scenario from 1), that is, bring it down to 0 and restore up. Observe scenario 1). Use small payload size (here we use 43). Once rwnd is 'recovered', bring it down close to 0, as in just one more packet would close it. This has as a consequence that association number 2 is able to receive (at least) one more packet which will bring it in pressure state. E.g. if association 2 had rwnd of 10000, packet received was 43, and we enter at this point into pressure, rwnd_press will have 9957. Once payload is delivered to userspace, rwnd will increase for 43, but conditions to restore rwnd to original state, just as in 1), will never be satisfied. --> Association 1, between A.y and B.12345 IP A.55915 > B.12345: sctp (1) [INIT] [init tag: 836880897] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 4032536569] IP B.12345 > A.55915: sctp (1) [INIT ACK] [init tag: 2873310749] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3799315613] IP A.55915 > B.12345: sctp (1) [COOKIE ECHO] IP B.12345 > A.55915: sctp (1) [COOKIE ACK] --> Association 2, between A.z and B.12346 IP A.55915 > B.12346: sctp (1) [INIT] [init tag: 534798321] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 2099285173] IP B.12346 > A.55915: sctp (1) [INIT ACK] [init tag: 516668823] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3676403240] IP A.55915 > B.12346: sctp (1) [COOKIE ECHO] IP B.12346 > A.55915: sctp (1) [COOKIE ACK] --> Deplete socket buffer by sending messages of size 43B over association 1 IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315613] [SID: 0] [SSEQ 0] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315613] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0] <...> IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315696] [a_rwnd 6388] [#gap acks 0] [#dup tsns 0] IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315697] [SID: 0] [SSEQ 84] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315697] [a_rwnd 6345] [#gap acks 0] [#dup tsns 0] --> Sudden drop on 1 IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315698] [SID: 0] [SSEQ 85] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315698] [a_rwnd 0] [#gap acks 0] [#dup tsns 0] --> Here userspace read, rwnd 'recovered' to 3698, now deplete again using association 1 so there is place in buffer for only one more packet IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315799] [SID: 0] [SSEQ 186] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315799] [a_rwnd 86] [#gap acks 0] [#dup tsns 0] IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315800] [SID: 0] [SSEQ 187] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 43] [#gap acks 0] [#dup tsns 0] --> Socket buffer is almost depleted, but there is space for one more packet, send them over association 2, size 43B IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403240] [SID: 0] [SSEQ 0] [PPID 0x18] IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403240] [a_rwnd 0] [#gap acks 0] [#dup tsns 0] --> Immediate drop IP A.60995 > B.12346: sctp (1) [SACK] [cum ack 387491510] [a_rwnd 0] [#gap acks 0] [#dup tsns 0] --> Read everything from the socket, both association recover up to maximum rwnd they are capable of reaching, note that association 1 recovered up to 3698, and association 2 recovered only to 43 IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 1548] [#gap acks 0] [#dup tsns 0] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 3053] [#gap acks 0] [#dup tsns 0] IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315801] [SID: 0] [SSEQ 188] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315801] [a_rwnd 3698] [#gap acks 0] [#dup tsns 0] IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403241] [SID: 0] [SSEQ 1] [PPID 0x18] IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403241] [a_rwnd 43] [#gap acks 0] [#dup tsns 0] A careful reader might wonder why it is necessary to reproduce 1) prior reproduction of 2). It is simply easier to observe when to send packet over association 2 which will push association into the pressure state. Proposed solution: Both problems share the same root cause, and that is improper scaling of socket buffer with rwnd. Solution in which sizeof(sk_buff) is taken into concern while calculating rwnd is not possible due to fact that there is no linear relationship between amount of data blamed in increase/decrease with IP packet in which payload arrived. Even in case such solution would be followed, complexity of the code would increase. Due to nature of current rwnd handling, slow increase (in sctp_assoc_rwnd_increase) of rwnd after pressure state is entered is rationale, but it gives false representation to the sender of current buffer space. Furthermore, it implements additional congestion control mechanism which is defined on implementation, and not on standard basis. Proposed solution simplifies whole algorithm having on mind definition from rfc: o Receiver Window (rwnd): This gives the sender an indication of the space available in the receiver's inbound buffer. Core of the proposed solution is given with these lines: sctp_assoc_rwnd_update: if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0) asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1; else asoc->rwnd = 0; We advertise to sender (half of) actual space we have. Half is in the braces depending whether you would like to observe size of socket buffer as SO_RECVBUF or twice the amount, i.e. size is the one visible from userspace, that is, from kernelspace. In this way sender is given with good approximation of our buffer space, regardless of the buffer policy - we always advertise what we have. Proposed solution fixes described problems and removes necessity for rwnd restoration algorithm. Finally, as proposed solution is simplification, some lines of code, along with some bytes in struct sctp_association are saved. Version 2 of the patch addressed comments from Vlad. Name of the function is set to be more descriptive, and two parts of code are changed, in one removing the superfluous call to sctp_assoc_rwnd_update since call would not result in update of rwnd, and the other being reordering of the code in a way that call to sctp_assoc_rwnd_update updates rwnd. Version 3 corrected change introduced in v2 in a way that existing function is not reordered/copied in line, but it is correctly called. Thanks Vlad for suggesting. Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nsn.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5bc1d1b4 |
|
21-Jan-2014 |
wangweidong <wangweidong1@huawei.com> |
sctp: remove macros sctp_bh_[un]lock_sock Redefined bh_[un]lock_sock to sctp_bh[un]lock_sock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
048ed4b6 |
|
21-Jan-2014 |
wangweidong <wangweidong1@huawei.com> |
sctp: remove macros sctp_{lock|release}_sock Redefined {lock|release}_sock to sctp_{lock|release}_sock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3c8e43ba |
|
21-Jan-2014 |
wangweidong <wangweidong1@huawei.com> |
sctp: remove macros sctp_spin_[un]lock Redefined spin_[un]lock to sctp_spin_[un]lock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
79b91130 |
|
21-Jan-2014 |
wangweidong <wangweidong1@huawei.com> |
sctp: remove macros sctp_local_bh_{disable|enable} Redefined local_bh_{disable|enable} to sctp_local_bh_{disable|enable} for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0ea5e4df |
|
15-Jan-2014 |
wangweidong <wangweidong1@huawei.com> |
sctp: create helper function to enable|disable sackdelay add sctp_spp_sackdelay_{enable|disable} helper function for avoiding code duplication. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0e864b21 |
|
13-Jan-2014 |
Dan Carpenter <dan.carpenter@oracle.com> |
sctp: remove a redundant NULL check It confuses Smatch when we check "sinit" for NULL and then non-NULL and that causes a false positive warning later. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
63862b5b |
|
11-Jan-2014 |
Aruna-Hewapathirane <aruna.hewapathirane@gmail.com> |
net: replace macros net_random and net_srandom with direct calls to prandom This patch removes the net_random and net_srandom macros and replaces them with direct calls to the prandom ones. As new commits only seem to use prandom_u32 there is no use to keep them around. This change makes it easier to grep for users of prandom_u32. Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f916ec96 |
|
01-Jan-2014 |
Neil Horman <nhorman@tuxdriver.com> |
sctp: Add process name and pid to deprecation warnings Recently I updated the sctp socket option deprecation warnings to be both a bit more clear and ratelimited to prevent user processes from spamming the log file. Ben Hutchings suggested that I add the process name and pid to these warnings so that users can tell who is responsible for using the deprecated apis. This patch accomplishes that. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: "David S. Miller" <davem@davemloft.net> CC: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
94f65193 |
|
23-Dec-2013 |
Neil Horman <nhorman@tuxdriver.com> |
SCTP: Reduce log spamming for sctp setsockopt During a recent discussion regarding some sctp socket options, it was noted that we have several points at which we issue log warnings that can be flooded at an unbounded rate by any user. Fix this by converting all the pr_warns in the sctp_setsockopt path to be pr_warn_ratelimited. Note there are several debug level messages as well. I'm leaving those alone, as, if you turn on pr_debug, you likely want lots of verbosity. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: David Miller <davem@davemloft.net> CC: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4e2d52bf |
|
22-Dec-2013 |
wangweidong <wangweidong1@huawei.com> |
sctp: fix checkpatch errors with //commen Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8d72651d |
|
22-Dec-2013 |
wangweidong <wangweidong1@huawei.com> |
sctp: fix checkpatch errors with open brace '{' and trailing statements fix checkpatch errors below: ERROR: that open brace { should be on the previous line ERROR: open brace '{' following function declarations go on the next line ERROR: trailing statements should be on next line Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
26ac8e5f |
|
22-Dec-2013 |
wangweidong <wangweidong1@huawei.com> |
sctp: fix checkpatch errors with (foo*)|foo * bar|foo* bar fix checkpatch errors below: ERROR: "(foo*)" should be "(foo *)" ERROR: "foo * bar" should be "foo *bar" ERROR: "foo* bar" should be "foo *bar" Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cb3f837b |
|
22-Dec-2013 |
wangweidong <wangweidong1@huawei.com> |
sctp: fix checkpatch errors with space required or prohibited fix checkpatch errors while the space is required or prohibited to the "=,()++..." Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
85f935d4 |
|
10-Dec-2013 |
wangweidong <wangweidong1@huawei.com> |
sctp: check the rto_min and rto_max in setsockopt When we set 0 to rto_min or rto_max, just not change the value. Also we should check the rto_min > rto_max. Suggested-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9f70f46b |
|
10-Dec-2013 |
Neil Horman <nhorman@tuxdriver.com> |
sctp: properly latch and use autoclose value from sock to association Currently, sctp associations latch a sockets autoclose value to an association at association init time, subject to capping constraints from the max_autoclose sysctl value. This leads to an odd situation where an application may set a socket level autoclose timeout, but sliently sctp will limit the autoclose timeout to something less than that. Fix this by modifying the autoclose setsockopt function to check the limit, cap it and warn the user via syslog that the timeout is capped. This will allow getsockopt to return valid autoclose timeout values that reflect what subsequent associations actually use. While were at it, also elimintate the assoc->autoclose variable, it duplicates whats in the timeout array, which leads to multiple sources for the same information, that may differ (as the former isn't subject to any capping). This gives us the timeout information in a canonical place and saves some space in the association structure as well. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> CC: Wang Weidong <wangweidong1@huawei.com> CC: David Miller <davem@davemloft.net> CC: Vlad Yasevich <vyasevich@gmail.com> CC: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4b2f13a2 |
|
06-Dec-2013 |
Jeff Kirsher <jeffrey.t.kirsher@intel.com> |
sctp: Fix FSF address in file headers Several files refer to an old address for the Free Software Foundation in the file header comment. Resolve by replacing the address with the URL <http://www.gnu.org/licenses/> so that we do not have to keep updating the header comments anytime the address changes. CC: Vlad Yasevich <vyasevich@gmail.com> CC: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0bbf87d8 |
|
28-Sep-2013 |
Eric W. Biederman <ebiederm@xmission.com> |
net ipv4: Convert ipv4.ip_local_port_range to be per netns v3 - Move sysctl_local_ports from a global variable into struct netns_ipv4. - Modify inet_get_local_port_range to take a struct net, and update all of the callers. - Move the initialization of sysctl_local_ports into sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c v2: - Ensure indentation used tabs - Fixed ip.h so it applies cleanly to todays net-next v3: - Compile fixes of strange callers of inet_get_local_port_range. This patch now successfully passes an allmodconfig build. Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range Originally-by: Samya <samya@twitter.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
88362ad8 |
|
07-Sep-2013 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: fix smatch warning in sctp_send_asconf_del_ip This was originally reported in [1] and posted by Neil Horman [2], he said: Fix up a missed null pointer check in the asconf code. If we don't find a local address, but we pass in an address length of more than 1, we may dereference a NULL laddr pointer. Currently this can't happen, as the only users of the function pass in the value 1 as the addrcnt parameter, but its not hot path, and it doesn't hurt to check for NULL should that ever be the case. The callpath from sctp_asconf_mgmt() looks okay. But this could be triggered from sctp_setsockopt_bindx() call with SCTP_BINDX_REM_ADDR and addrcnt > 1 while passing all possible addresses from the bind list to SCTP_BINDX_REM_ADDR so that we do *not* find a single address in the association's bind address list that is not in the packed array of addresses. If this happens when we have an established association with ASCONF-capable peers, then we could get a NULL pointer dereference as we only check for laddr == NULL && addrcnt == 1 and call later sctp_make_asconf_update_ip() with NULL laddr. BUT: this actually won't happen as sctp_bindx_rem() will catch such a case and return with an error earlier. As this is incredably unintuitive and error prone, add a check to catch at least future bugs here. As Neil says, its not hot path. Introduced by 8a07eb0a5 ("sctp: Add ASCONF operation on the single-homed host"). [1] http://www.spinics.net/lists/linux-sctp/msg02132.html [2] http://www.spinics.net/lists/linux-sctp/msg02133.html Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Michio Honda <micchie@sfc.wide.ad.jp> Acked-By: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a0fb05d1 |
|
07-Sep-2013 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: fix bug in sctp_poll for SOCK_SELECT_ERR_QUEUE If we do not add braces around ... mask |= POLLERR | sock_flag(sk, SOCK_SELECT_ERR_QUEUE) ? POLLPRI : 0; ... then this condition always evaluates to true as POLLERR is defined as 8 and binary or'd with whatever result comes out of sock_flag(). Hence instead of (X | Y) ? A : B, transform it into X | (Y ? A : B). Unfortunatelty, commit 8facd5fb73 ("net: fix smatch warnings inside datagram_poll") forgot about SCTP. :-( Introduced by 7d4c04fc170 ("net: add option to enable error queue packets waking select"). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
477143e3 |
|
06-Aug-2013 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: trivial: update bug report in header comment With the restructuring of the lksctp.org site, we only allow bug reports through the SCTP mailing list linux-sctp@vger.kernel.org, not via SF, as SF is only used for web hosting and nothing more. While at it, also remove the obvious statement that bugs will be fixed and incooperated into the kernel. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
91705c61 |
|
23-Jul-2013 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: trivial: update mailing list address The SCTP mailing list address to send patches or questions to is linux-sctp@vger.kernel.org and not lksctp-developers@lists.sourceforge.net anymore. Therefore, update all occurences. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8a59bd3e |
|
02-Jul-2013 |
Yann Droneaud <ydroneaud@opteya.com> |
sctp: use get_unused_fd_flags(0) instead of get_unused_fd() Macro get_unused_fd() is used to allocate a file descriptor with default flags. Those default flags (0) can be "unsafe": O_CLOEXEC must be used by default to not leak file descriptor across exec(). Instead of macro get_unused_fd(), functions anon_inode_getfd() or get_unused_fd_flags() should be used with flags given by userspace. If not possible, flags should be set to O_CLOEXEC to provide userspace with a default safe behavor. In a further patch, get_unused_fd() will be removed so that new code start using anon_inode_getfd() or get_unused_fd_flags() with correct flags. This patch replaces calls to get_unused_fd() with equivalent call to get_unused_fd_flags(0) to preserve current behavor for existing code. The hard coded flag value (0) should be reviewed on a per-subsystem basis, and, if possible, set to O_CLOEXEC. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bb33381d |
|
28-Jun-2013 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: rework debugging framework to use pr_debug and friends We should get rid of all own SCTP debug printk macros and use the ones that the kernel offers anyway instead. This makes the code more readable and conform to the kernel code, and offers all the features of dynamic debbuging that pr_debug() et al has, such as only turning on/off portions of debug messages at runtime through debugfs. The runtime cost of having CONFIG_DYNAMIC_DEBUG enabled, but none of the debug statements printing, is negligible [1]. If kernel debugging is completly turned off, then these statements will also compile into "empty" functions. While we're at it, we also need to change the Kconfig option as it /now/ only refers to the ifdef'ed code portions in outqueue.c that enable further debugging/tracing of SCTP transaction fields. Also, since SCTP_ASSERT code was enabled with this Kconfig option and has now been removed, we transform those code parts into WARNs resp. where appropriate BUG_ONs so that those bugs can be more easily detected as probably not many people have SCTP debugging permanently turned on. To turn on all SCTP debugging, the following steps are needed: # mount -t debugfs none /sys/kernel/debug # echo -n 'module sctp +p' > /sys/kernel/debug/dynamic_debug/control This can be done more fine-grained on a per file, per line basis and others as described in [2]. [1] https://www.kernel.org/doc/ols/2009/ols2009-pages-39-46.pdf [2] Documentation/dynamic-debug-howto.txt Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
62208f12 |
|
25-Jun-2013 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: simplify sctp_get_port No need to have an extra ret variable when we directly can return the value of sctp_get_port_local(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0a2fbac1 |
|
25-Jun-2013 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: decouple cleaning some socket data from endpoint Rather instead of having the endpoint clean the garbage from the socket, use a sk_destruct handler sctp_destruct_sock(), that does the job for that when there are no more references on the socket. At least do this for our crypto transform through crypto_free_hash() that is allocated when in listening state. Also, perform sctp_put_port() only when sk is valid. At a later point in time we can still determine if there's an option of placing this into sk_prot->unhash() or sctp_endpoint_free() without any races. For now, leave it in sctp_endpoint_destroy() though. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
52db882f |
|
25-Jun-2013 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: migrate cookie life from timeval to ktime Currently, SCTP code defines its own timeval functions (since timeval is rarely used inside the kernel by others), namely tv_lt() and TIMEVAL_ADD() macros, that operate on SCTP cookie expiration. We might as well remove all those, and operate directly on ktime structures for a couple of reasons: ktime is available on all archs; complexity of ktime calculations depending on the arch is less than (reduces to a simple arithmetic operations on archs with BITS_PER_LONG == 64 or CONFIG_KTIME_SCALAR) or equal to timeval functions (other archs); code becomes more readable; macros can be thrown out. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dda91928 |
|
17-Jun-2013 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: remove SCTP_STATIC macro SCTP_STATIC is just another define for the static keyword. It's use is inconsistent in the SCTP code anyway and it was introduced in the initial implementation of SCTP in 2.5. We have a regression suite in lksctp-tools, but this is for user space only, so noone makes use of this macro anymore. The kernel test suite for 2.5 is incompatible with the current SCTP code anyway. So simply Remove it, to be more consistent with the rest of the kernel code. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c164b838 |
|
14-Jun-2013 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: minor: remove variable in sctp_init_sock It's only used at this one time, so we could remove it as well. This is valid and also makes it more explicit/obvious that in case of error the sp->ep is NULL here, i.e. for the sctp_destroy_sock() check that was recently added. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1abd165e |
|
06-Jun-2013 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: fix NULL pointer dereference in socket destruction While stress testing sctp sockets, I hit the following panic: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: [<ffffffffa0490c4e>] sctp_endpoint_free+0xe/0x40 [sctp] PGD 7cead067 PUD 7ce76067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: sctp(F) libcrc32c(F) [...] CPU: 7 PID: 2950 Comm: acc Tainted: GF 3.10.0-rc2+ #1 Hardware name: Dell Inc. PowerEdge T410/0H19HD, BIOS 1.6.3 02/01/2011 task: ffff88007ce0e0c0 ti: ffff88007b568000 task.ti: ffff88007b568000 RIP: 0010:[<ffffffffa0490c4e>] [<ffffffffa0490c4e>] sctp_endpoint_free+0xe/0x40 [sctp] RSP: 0018:ffff88007b569e08 EFLAGS: 00010292 RAX: 0000000000000000 RBX: ffff88007db78a00 RCX: dead000000200200 RDX: ffffffffa049fdb0 RSI: ffff8800379baf38 RDI: 0000000000000000 RBP: ffff88007b569e18 R08: ffff88007c230da0 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff880077990d00 R14: 0000000000000084 R15: ffff88007db78a00 FS: 00007fc18ab61700(0000) GS:ffff88007fc60000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000020 CR3: 000000007cf9d000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff88007b569e38 ffff88007db78a00 ffff88007b569e38 ffffffffa049fded ffffffff81abf0c0 ffff88007db78a00 ffff88007b569e58 ffffffff8145b60e 0000000000000000 0000000000000000 ffff88007b569eb8 ffffffff814df36e Call Trace: [<ffffffffa049fded>] sctp_destroy_sock+0x3d/0x80 [sctp] [<ffffffff8145b60e>] sk_common_release+0x1e/0xf0 [<ffffffff814df36e>] inet_create+0x2ae/0x350 [<ffffffff81455a6f>] __sock_create+0x11f/0x240 [<ffffffff81455bf0>] sock_create+0x30/0x40 [<ffffffff8145696c>] SyS_socket+0x4c/0xc0 [<ffffffff815403be>] ? do_page_fault+0xe/0x10 [<ffffffff8153cb32>] ? page_fault+0x22/0x30 [<ffffffff81544e02>] system_call_fastpath+0x16/0x1b Code: 0c c9 c3 66 2e 0f 1f 84 00 00 00 00 00 e8 fb fe ff ff c9 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 <48> 8b 47 20 48 89 fb c6 47 1c 01 c6 40 12 07 e8 9e 68 01 00 48 RIP [<ffffffffa0490c4e>] sctp_endpoint_free+0xe/0x40 [sctp] RSP <ffff88007b569e08> CR2: 0000000000000020 ---[ end trace e0d71ec1108c1dd9 ]--- I did not hit this with the lksctp-tools functional tests, but with a small, multi-threaded test program, that heavily allocates, binds, listens and waits in accept on sctp sockets, and then randomly kills some of them (no need for an actual client in this case to hit this). Then, again, allocating, binding, etc, and then killing child processes. This panic then only occurs when ``echo 1 > /proc/sys/net/sctp/auth_enable'' is set. The cause for that is actually very simple: in sctp_endpoint_init() we enter the path of sctp_auth_init_hmacs(). There, we try to allocate our crypto transforms through crypto_alloc_hash(). In our scenario, it then can happen that crypto_alloc_hash() fails with -EINTR from crypto_larval_wait(), thus we bail out and release the socket via sk_common_release(), sctp_destroy_sock() and hit the NULL pointer dereference as soon as we try to access members in the endpoint during sctp_endpoint_free(), since endpoint at that time is still NULL. Now, if we have that case, we do not need to do any cleanup work and just leave the destruction handler. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
524fba6c |
|
02-Apr-2013 |
Wei Yongjun <yongjun_wei@trendmicro.com.cn> |
sctp: fix error return code in __sctp_connect() Fix to return a negative error code from the error handling case instead of 0, as returned elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7d4c04fc |
|
28-Mar-2013 |
Keller, Jacob E <jacob.e.keller@intel.com> |
net: add option to enable error queue packets waking select Currently, when a socket receives something on the error queue it only wakes up the socket on select if it is in the "read" list, that is the socket has something to read. It is useful also to wake the socket if it is in the error list, which would enable software to wait on error queue packets without waking up for regular data on the socket. The main use case is for receiving timestamped transmit packets which return the timestamp to the socket via the error queue. This enables an application to select on the socket for the error queue only instead of for the regular traffic. -v2- * Added the SO_SELECT_ERR_QUEUE socket option to every architechture specific file * Modified every socket poll function that checks error queue Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Cc: Jeffrey Kirsher <jeffrey.t.kirsher@intel.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Matthew Vick <matthew.vick@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b67bfe0d |
|
27-Feb-2013 |
Sasha Levin <sasha.levin@oracle.com> |
hlist: drop the node parameter from iterators I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
726bc6b0 |
|
27-Feb-2013 |
Guenter Roeck <linux@roeck-us.net> |
net/sctp: Validate parameter size for SCTP_GET_ASSOC_STATS Building sctp may fail with: In function ‘copy_from_user’, inlined from ‘sctp_getsockopt_assoc_stats’ at net/sctp/socket.c:5656:20: arch/x86/include/asm/uaccess_32.h:211:26: error: call to ‘copy_from_user_overflow’ declared with attribute error: copy_from_user() buffer size is not provably correct if built with W=1 due to a missing parameter size validation before the call to copy_from_user. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6ba542a2 |
|
07-Feb-2013 |
Daniel Borkmann <daniel@iogearbox.net> |
net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree In sctp_setsockopt_auth_key, we create a temporary copy of the user passed shared auth key for the endpoint or association and after internal setup, we free it right away. Since it's sensitive data, we should zero out the key before returning the memory back to the allocator. Thus, use kzfree instead of kfree, just as we do in sctp_auth_key_put(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
196d6759 |
|
30-Nov-2012 |
Michele Baldessari <michele@acksyn.org> |
sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call The current SCTP stack is lacking a mechanism to have per association statistics. This is an implementation modeled after OpenSolaris' SCTP_GET_ASSOC_STATS. Userspace part will follow on lksctp if/when there is a general ACK on this. V4: - Move ipackets++ before q->immediate.func() for consistency reasons - Move sctp_max_rto() at the end of sctp_transport_update_rto() to avoid returning bogus RTO values - return asoc->rto_min when max_obs_rto value has not changed V3: - Increase ictrlchunks in sctp_assoc_bh_rcv() as well - Move ipackets++ to sctp_inq_push() - return 0 when no rto updates took place since the last call V2: - Implement partial retrieval of stat struct to cope for future expansion - Kill the rtxpackets counter as it cannot be precise anyway - Rename outseqtsns to outofseqtsns to make it clearer that these are out of sequence unexpected TSNs - Move asoc->ipackets++ under a lock to avoid potential miscounts - Fold asoc->opackets++ into the already existing asoc check - Kill unneeded (q->asoc) test when increasing rtxchunks - Do not count octrlchunks if sending failed (SCTP_XMIT_OK != 0) - Don't count SHUTDOWNs as SACKs - Move SCTP_GET_ASSOC_STATS to the private space API - Adjust the len check in sctp_getsockopt_assoc_stats() to allow for future struct growth - Move association statistics in their own struct - Update idupchunks when we send a SACK with dup TSNs - return min_rto in max_rto when RTO has not changed. Also return the transport when max_rto last changed. Signed-off: Michele Baldessari <michele@acksyn.org> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6e51fe75 |
|
21-Nov-2012 |
Tommi Rantala <tt.rantala@gmail.com> |
sctp: fix -ENOMEM result with invalid user space pointer in sendto() syscall Consider the following program, that sets the second argument to the sendto() syscall incorrectly: #include <string.h> #include <arpa/inet.h> #include <sys/socket.h> int main(void) { int fd; struct sockaddr_in sa; fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/); if (fd < 0) return 1; memset(&sa, 0, sizeof(sa)); sa.sin_family = AF_INET; sa.sin_addr.s_addr = inet_addr("127.0.0.1"); sa.sin_port = htons(11111); sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa)); return 0; } We get -ENOMEM: $ strace -e sendto ./demo sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ENOMEM (Cannot allocate memory) Propagate the error code from sctp_user_addto_chunk(), so that we will tell user space what actually went wrong: $ strace -e sendto ./demo sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EFAULT (Bad address) Noticed while running Trinity (the syscall fuzzer). Signed-off-by: Tommi Rantala <tt.rantala@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3594698a |
|
15-Nov-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Make CAP_NET_BIND_SERVICE per user namespace Allow privileged users in any user namespace to bind to privileged sockets in network namespaces they control. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d3e9a1dc |
|
30-Oct-2012 |
Masanari Iida <standby24x7@gmail.com> |
net: sctp: Fix typo in net/sctp Correct spelling typo in net/sctp/socket.c Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3c68198e |
|
24-Oct-2012 |
Neil Horman <nhorman@tuxdriver.com> |
sctp: Make hmac algorithm selection for cookie generation dynamic Currently sctp allows for the optional use of md5 of sha1 hmac algorithms to generate cookie values when establishing new connections via two build time config options. Theres no real reason to make this a static selection. We can add a sysctl that allows for the dynamic selection of these algorithms at run time, with the default value determined by the corresponding crypto library availability. This comes in handy when, for example running a system in FIPS mode, where use of md5 is disallowed, but SHA1 is permitted. Note: This new sysctl has no corresponding socket option to select the cookie hmac algorithm. I chose not to implement that intentionally, as RFC 6458 contains no option for this value, and I opted not to pollute the socket option namespace. Change notes: v2) * Updated subject to have the proper sctp prefix as per Dave M. * Replaced deafult selection options with new options that allow developers to explicitly select available hmac algs at build time as per suggestion by Vlad Y. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: netdev@vger.kernel.org Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
56b31d1c |
|
17-Aug-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
unexport sock_map_fd(), switch to sock_alloc_file() Both modular callers of sock_map_fd() had been buggy; sctp one leaks descriptor and file if copy_to_user() fails, 9p one shouldn't be exposing file in the descriptor table at all. Switch both to sock_alloc_file(), export it, unexport sock_map_fd() and make it static. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
e1fc3b14 |
|
07-Aug-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
sctp: Make sysctl tunables per net Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
55e26eb9 |
|
07-Aug-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
sctp: Push struct net down to sctp_chunk_event_lookup This trickles up through sctp_sm_lookup_event up to sctp_do_sm and up further into sctp_primitiv_NAME before the code reaches places where struct net can be reliably found. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4db67e80 |
|
06-Aug-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
sctp: Make the address lists per network namespace - Move the address lists into struct net - Add per network namespace initialization and cleanup - Pass around struct net so it is everywhere I need it. - Rename all of the global variable references into references to the variables moved into struct net Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f1f43763 |
|
06-Aug-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
sctp: Make the port hash table use struct net in it's key. - Add struct net into the port hash table hash calculation - Add struct net inot the struct sctp_bind_bucket so there is a memory of which network namespace a port is allocated in. No need for a ref count because sctp_bind_bucket only exists when there are sockets in the hash table and sockets can not change their network namspace, and sockets already ref count their network namespace. - Add struct net into the key comparison when we are testing to see if we have found the port hash table entry we are looking for. With these changes lookups in the port hash table becomes safe to use in multiple network namespaces. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5aa93bcf |
|
21-Jul-2012 |
Neil Horman <nhorman@tuxdriver.com> |
sctp: Implement quick failover draft from tsvwg I've seen several attempts recently made to do quick failover of sctp transports by reducing various retransmit timers and counters. While its possible to implement a faster failover on multihomed sctp associations, its not particularly robust, in that it can lead to unneeded retransmits, as well as false connection failures due to intermittent latency on a network. Instead, lets implement the new ietf quick failover draft found here: http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05 This will let the sctp stack identify transports that have had a small number of errors, and avoid using them quickly until their reliability can be re-established. I've tested this out on two virt guests connected via multiple isolated virt networks and believe its in compliance with the above draft and works well. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: Sridhar Samudrala <sri@us.ibm.com> CC: "David S. Miller" <davem@davemloft.net> CC: linux-sctp@vger.kernel.org CC: joe@perches.com Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2eebc1e1 |
|
16-Jul-2012 |
Neil Horman <nhorman@tuxdriver.com> |
sctp: Fix list corruption resulting from freeing an association on a list A few days ago Dave Jones reported this oops: [22766.294255] general protection fault: 0000 [#1] PREEMPT SMP [22766.295376] CPU 0 [22766.295384] Modules linked in: [22766.387137] ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90 ffff880147c03a74 [22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000 [22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000, [22766.387137] Stack: [22766.387140] ffff880147c03a10 [22766.387140] ffffffffa169f2b6 [22766.387140] ffff88013ed95728 [22766.387143] 0000000000000002 [22766.387143] 0000000000000000 [22766.387143] ffff880003fad062 [22766.387144] ffff88013c120000 [22766.387144] [22766.387145] Call Trace: [22766.387145] <IRQ> [22766.387150] [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0 [sctp] [22766.387154] [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp] [22766.387157] [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp] [22766.387161] [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0 [22766.387163] [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210 [22766.387166] [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0 [22766.387168] [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0 [22766.387169] [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0 [22766.387171] [<ffffffff81590a07>] ip_local_deliver+0x47/0x80 [22766.387172] [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680 [22766.387174] [<ffffffff81590c54>] ip_rcv+0x214/0x320 [22766.387176] [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910 [22766.387178] [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910 [22766.387180] [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40 [22766.387182] [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0 [22766.387183] [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440 [22766.387185] [<ffffffff81559280>] napi_skb_finish+0x70/0xa0 [22766.387187] [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130 [22766.387218] [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e] [22766.387242] [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e] [22766.387266] [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e] [22766.387268] [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0 [22766.387270] [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130 [22766.387273] [<ffffffff810734d0>] __do_softirq+0xe0/0x420 [22766.387275] [<ffffffff8169826c>] call_softirq+0x1c/0x30 [22766.387278] [<ffffffff8101db15>] do_softirq+0xd5/0x110 [22766.387279] [<ffffffff81073bc5>] irq_exit+0xd5/0xe0 [22766.387281] [<ffffffff81698b03>] do_IRQ+0x63/0xd0 [22766.387283] [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f [22766.387283] <EOI> [22766.387284] [22766.387285] [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b [22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00 48 89 fb 49 89 f5 66 c1 c0 08 66 39 46 02 [22766.387307] [22766.387307] RIP [22766.387311] [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp] [22766.387311] RSP <ffff880147c039b0> [22766.387142] ffffffffa16ab120 [22766.599537] ---[ end trace 3f6dae82e37b17f5 ]--- [22766.601221] Kernel panic - not syncing: Fatal exception in interrupt It appears from his analysis and some staring at the code that this is likely occuring because an association is getting freed while still on the sctp_assoc_hashtable. As a result, we get a gpf when traversing the hashtable while a freed node corrupts part of the list. Nominally I would think that an mibalanced refcount was responsible for this, but I can't seem to find any obvious imbalance. What I did note however was that the two places where we create an association using sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths which free a newly created association after calling sctp_primitive_ASSOCIATE. sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to the aforementioned hash table. the sctp command interpreter that process side effects has not way to unwind previously processed commands, so freeing the association from the __sctp_connect or sctp_sendmsg error path would lead to a freed association remaining on this hash table. I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init, which allows us to proerly use hlist_unhashed to check if the node is on a hashlist safely during a delete. That in turn alows us to safely call sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths before freeing them, regardles of what the associations state is on the hash list. I noted, while I was doing this, that the __sctp_unhash_endpoint was using hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes pointers to make that function work properly, so I fixed that up in a simmilar fashion. I attempted to test this using a virtual guest running the SCTP_RR test from netperf in a loop while running the trinity fuzzer, both in a loop. I wasn't able to recreate the problem prior to this fix, nor was I able to trigger the failure after (neither of which I suppose is suprising). Given the trace above however, I think its likely that this is what we hit. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: davej@redhat.com CC: davej@redhat.com CC: "David S. Miller" <davem@davemloft.net> CC: Vlad Yasevich <vyasevich@gmail.com> CC: Sridhar Samudrala <sri@us.ibm.com> CC: linux-sctp@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
02f3d4ce |
|
16-Jul-2012 |
David S. Miller <davem@davemloft.net> |
sctp: Adjust PMTU updates to accomodate route invalidation. This adjusts the call to dst_ops->update_pmtu() so that we can transparently handle the fact that, in the future, the dst itself can be invalidated by the PMTU update (when we have non-host routes cached in sockets). Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e87cc472 |
|
13-May-2012 |
Joe Perches <joe@perches.com> |
net: Convert net_ratelimit uses to net_<level>_ratelimited Standardize the net core ratelimited logging functions. Coalesce formats, align arguments. Change a printk then vprintk sequence to use printf extension %pV. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
acdd5985 |
|
03-Apr-2012 |
Thomas Graf <tgraf@infradead.org> |
sctp: Allow struct sctp_event_subscribe to grow without breaking binaries getsockopt(..., SCTP_EVENTS, ...) performs a length check and returns an error if the user provides less bytes than the size of struct sctp_event_subscribe. Struct sctp_event_subscribe needs to be extended by an u8 for every new event or notification type that is added. This obviously makes getsockopt fail for binaries that are compiled against an older versions of <net/sctp/user.h> which do not contain all event types. This patch changes getsockopt behaviour to no longer return an error if not enough bytes are being provided by the user. Instead, it returns as much of sctp_event_subscribe as fits into the provided buffer. This leads to the new behavior that users see what they have been aware of at compile time. The setsockopt(..., SCTP_EVENTS, ...) API is already behaving like this. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0343c554 |
|
07-Mar-2012 |
Benjamin Poirier <bpoirier@suse.de> |
sctp: Export sctp_do_peeloff lookup sctp_association within sctp_do_peeloff() to enable its use outside of the sctp code with minimal knowledge of the former. Signed-off-by: Benjamin Poirier <bpoirier@suse.de> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2692ba61 |
|
15-Dec-2011 |
Xi Wang <xi.wang@gmail.com> |
sctp: fix incorrect overflow check on autoclose Commit 8ffd3208 voids the previous patches f6778aab and 810c0719 for limiting the autoclose value. If userspace passes in -1 on 32-bit platform, the overflow check didn't work and autoclose would be set to 0xffffffff. This patch defines a max_autoclose (in seconds) for limiting the value and exposes it through sysctl, with the following intentions. 1) Avoid overflowing autoclose * HZ. 2) Keep the default autoclose bound consistent across 32- and 64-bit platforms (INT_MAX / HZ in this patch). 3) Keep the autoclose value consistent between setsockopt() and getsockopt() calls. Suggested-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dfd56b8b |
|
10-Dec-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: use IS_ENABLED(CONFIG_IPV6) Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4e3fd7a0 |
|
20-Nov-2011 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: remove ipv6_addr_copy() C assignment can handle struct in6_addr copying. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bc3b2d7f |
|
15-Jul-2011 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules These files are non modular, but need to export symbols using the macros now living in export.h -- call out the include so that things won't break when we remove the implicit presence of module.h from everywhere. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
cd4fcc70 |
|
07-Jul-2011 |
Thomas Graf <tgraf@infradead.org> |
sctp: ABORT if receive, reassmbly, or reodering queue is not empty while closing socket Trigger user ABORT if application closes a socket which has data queued on the socket receive queue or chunks waiting on the reassembly or ordering queue as this would imply data being lost which defeats the point of a graceful shutdown. This behavior is already practiced in TCP. We do not check the input queue because that would mean to parse all chunks on it to look for unacknowledged data which seems too much of an effort. Control chunks or duplicated chunks may also be in the input queue and should not be stopping a graceful shutdown. Signed-off-by: Thomas Graf <tgraf@infradead.org> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
94912301 |
|
02-Jul-2011 |
Wei Yongjun <yjwei@cn.fujitsu.com> |
sctp: fix missing send up SCTP_SENDER_DRY_EVENT when subscribe it We forgot to send up SCTP_SENDER_DRY_EVENT notification when user app subscribes to this event, and there is no data to be sent or retransmit. This is required by the Socket API and used by the DTLS/SCTP implementation. Reported-by: Michael Tüxen <Michael.Tuexen@lurchi.franken.de> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Tested-by: Robin Seggelmann <seggelmann@fh-muenster.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7fd71b1e |
|
01-Jul-2011 |
Joe Perches <joe@perches.com> |
sctp: Reduce switch/case indent Make the case labels the same indent as the switch. git diff -w shows useless break;s removed after returns and a comment added to an unnecessary default: break; because of a dubious gcc warning. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ea110733 |
|
13-Jun-2011 |
Joe Perches <joe@perches.com> |
net: Remove casts of void * Unnecessary casts of void * clutter the code. These are the remainder casts after several specific patches to remove netdev_priv and dev_priv. Done via coccinelle script: $ cat cast_void_pointer.cocci @@ type T; T *pt; void *pv; @@ - pt = (T *)pv; + pt = pv; Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
|
#
6d65e5ee |
|
10-Jun-2011 |
Michio Honda <micchie@sfc.wide.ad.jp> |
sctp: kzalloc() error handling on deleting last address Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp> Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8a07eb0a |
|
26-Apr-2011 |
Michio Honda <micchie@sfc.wide.ad.jp> |
sctp: Add ASCONF operation on the single-homed host In this case, the SCTP association transmits an ASCONF packet including addition of the new IP address and deletion of the old address. This patch implements this functionality. In this case, the ASCONF chunk is added to the beginning of the queue, because the other chunks cannot be transmitted in this state. Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7dc04d71 |
|
26-Apr-2011 |
Michio Honda <micchie@sfc.wide.ad.jp> |
sctp: Add socket option operation for Auto-ASCONF. This patch allows the application to operate Auto-ASCONF on/off behavior via setsockopt() and getsockopt(). Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9f7d653b |
|
26-Apr-2011 |
Michio Honda <micchie@sfc.wide.ad.jp> |
sctp: Add Auto-ASCONF support (core). SCTP reconfigure the IP addresses in the association by using ASCONF chunks as mentioned in RFC5061. For example, we can start to use the newly configured IP address in the existing association. This patch implements automatic ASCONF operation in the SCTP stack with address events in the host computer, which is called auto_asconf. Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
afd7614c |
|
12-May-2011 |
Joe Perches <joe@perches.com> |
sctp: sctp_sendmsg: Don't test known non-null sinfo It's already known non-null above. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
517aa0bc |
|
12-May-2011 |
Joe Perches <joe@perches.com> |
sctp: sctp_sendmsg: Don't initialize default_sinfo This variable only needs initialization when cmsgs.info is NULL. Use memset to ensure padding is also zeroed so kernel doesn't leak any data. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9914ae3c |
|
26-Apr-2011 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: cache the ipv6 source after route lookup The ipv6 routing lookup does give us a source address, but instead of filling it into the dst, it's stored in the flowi. We can use that instead of going through the entire source address selection again. Also the useless ->dst_saddr member of sctp_pf is removed. And sctp_v6_dst_saddr() is removed, instead by introduce sctp_v6_to_addr(), which can be reused to cleanup some dup code. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
209ba424 |
|
17-Apr-2011 |
Wei Yongjun <yjwei@cn.fujitsu.com> |
sctp: implement socket option SCTP_GET_ASSOC_ID_LIST This patch Implement socket option SCTP_GET_ASSOC_ID_LIST. SCTP Socket API Extension: 8.2.6. Get the Current Identifiers of Associations (SCTP_GET_ASSOC_ID_LIST) This option gets the current list of SCTP association identifiers of the SCTP associations handled by a one-to-many style socket. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ee9cbaca |
|
18-Apr-2011 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: Allow bindx_del to accept 0 port We allow 0 port when adding new addresses. It only makes sence to allow 0 port when removing addresses. When removing the currently bound port will be used when the port in the address is set to 0. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
934253a7 |
|
18-Apr-2011 |
Shan Wei <shanwei@cn.fujitsu.com> |
sctp: use memdup_user to copy data from userspace Use common function to simply code. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
25985edc |
|
30-Mar-2011 |
Lucas De Marchi <lucas.demarchi@profusion.mobi> |
Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
|
#
efea2c6b |
|
04-Mar-2011 |
Hagen Paul Pfeifer <hagen@jauu.net> |
sctp: several declared/set but unused fixes Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eaefd110 |
|
17-Feb-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: add __rcu annotations to sk_wq and wq Add proper RCU annotations/verbs to sk_wq and wq members Fix __sctp_write_space() sk_sleep() abuse (and sock->wq access) Fix sunrpc sk_sleep() abuse too Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4580ccc0 |
|
18-Jan-2011 |
Shan Wei <shanwei@cn.fujitsu.com> |
sctp: user perfect name for Delayed SACK Timer option The option name of Delayed SACK Timer should be SCTP_DELAYED_SACK, not SCTP_DELAYED_ACK. Left SCTP_DELAYED_ACK be concomitant with SCTP_DELAYED_SACK, for making compatibility with existing applications. Reference: 8.1.19. Get or Set Delayed SACK Timer (SCTP_DELAYED_SACK) (http://tools.ietf.org/html/draft-ietf-tsvwg-sctpsocket-25) Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7d743b7e |
|
14-Dec-2010 |
Wei Yongjun <yjwei@cn.fujitsu.com> |
sctp: fix the return value of getting the sctp partial delivery point Get the sctp partial delivery point using SCTP_PARTIAL_DELIVERY_POINT socket option should return 0 if success, not -ENOTSUPP. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
40a01039 |
|
07-Dec-2010 |
Wei Yongjun <yjwei@cn.fujitsu.com> |
SCTP: Fix SCTP_SET_PEER_PRIMARY_ADDR to accpet v4mapped address SCTP_SET_PEER_PRIMARY_ADDR does not accpet v4mapped address, using v4mapped address in SCTP_SET_PEER_PRIMARY_ADDR socket option will get -EADDRNOTAVAIL error if v4map is enabled. This patch try to fix it by mapping v4mapped address to v4 address if allowed. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8917a3c0 |
|
02-Dec-2010 |
David Shwatrz <dshwatrz@gmail.com> |
Fix a typo in datagram.c and sctp/socket.c. Hi, This patch fixes a typo in net/core/datagram.c and in net/sctp/socket.c Regards, David Shwartz Signed-off-by: David Shwartz <dshwatrz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8d987e5c |
|
09-Nov-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: avoid limits overflow Robin Holt tried to boot a 16TB machine and found some limits were reached : sysctl_tcp_mem[2], sysctl_udp_mem[2] We can switch infrastructure to use long "instead" of "int", now atomic_long_t primitives are available for free. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Robin Holt <holt@sgi.com> Reviewed-by: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9a7241c2 |
|
03-Oct-2010 |
David S. Miller <davem@davemloft.net> |
sctp: Fix break indentation in sctp_ioctl(). Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d7e0d19a |
|
01-Oct-2010 |
Dan Rosenberg <dan.j.rosenberg@gmail.com> |
sctp: prevent reading out-of-bounds memory Two user-controlled allocations in SCTP are subsequently dereferenced as sockaddr structs, without checking if the dereferenced struct members fall beyond the end of the allocated chunk. There doesn't appear to be any information leakage here based on how these members are used and additional checking, but it's still worth fixing. [akpm@linux-foundation.org: remove unfashionable newlines, fix gmail tab->space conversion] Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a02cec21 |
|
22-Sep-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: return operator cleanup Change "return (EXPR);" to "return EXPR;" return is not a function, parentheses are not required. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
65040c33 |
|
02-Sep-2010 |
Diego Elio 'Flameeyes' Pettenò <flameeyes@gmail.com> |
sctp: implement SIOCINQ ioctl() (take 3) This simple patch copies the current approach for SIOCINQ ioctl() from DCCP into SCTP so that the userland code working with SCTP can use a similar interface across different protocols to know how much space to allocate for a buffer. Signed-off-by: Diego Elio Pettenò <flameeyes@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
db40980f |
|
06-Sep-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: poll() optimizations No need to test twice sk->sk_shutdown & RCV_SHUTDOWN Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
145ce502 |
|
24-Aug-2010 |
Joe Perches <joe@perches.com> |
net/sctp: Use pr_fmt and pr_<level> Change SCTP_DEBUG_PRINTK and SCTP_DEBUG_PRINTK_IPADDR to use do { print } while (0) guards. Add SCTP_DEBUG_PRINTK_CONT to fix errors in log when lines were continued. Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt Add a missing newline in "Failed bind hash alloc" Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e3826f1e |
|
04-May-2010 |
Amerigo Wang <amwang@redhat.com> |
net: reserve ports for applications using fixed port numbers (Dropped the infiniband part, because Tetsuo modified the related code, I will send a separate patch for it once this is accepted.) This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which allows users to reserve ports for third-party applications. The reserved ports will not be used by automatic port assignments (e.g. when calling connect() or bind() with port number 0). Explicit port allocation behavior is unchanged. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
43815482 |
|
29-Apr-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: sock_def_readable() and friends RCU conversion sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we need two atomic operations (and associated dirtying) per incoming packet. RCU conversion is pretty much needed : 1) Add a new structure, called "struct socket_wq" to hold all fields that will need rcu_read_lock() protection (currently: a wait_queue_head_t and a struct fasync_struct pointer). [Future patch will add a list anchor for wakeup coalescing] 2) Attach one of such structure to each "struct socket" created in sock_alloc_inode(). 3) Respect RCU grace period when freeing a "struct socket_wq" 4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct socket_wq" 5) Change sk_sleep() function to use new sk->sk_wq instead of sk->sk_sleep 6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside a rcu_read_lock() section. 7) Change all sk_has_sleeper() callers to : - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock) - Use wq_has_sleeper() to eventually wakeup tasks. - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock) 8) sock_wake_async() is modified to use rcu protection as well. 9) Exceptions : macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq" instead of dynamically allocated ones. They dont need rcu freeing. Some cleanups or followups are probably needed, (possible sk_callback_lock conversion to a spinlock for example...). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a5f4cea7 |
|
30-Apr-2010 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: Use correct address family in sctp_getsockopt_peer_addrs() The function should use the address family of the address when trying to determine the length of the structure. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
81419d86 |
|
28-Apr-2010 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: per_cpu variables should be in bh_disabled section Since the change of the atomics to percpu variables, we now have to disable BH in process context when touching percpu variables. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
561b1733 |
|
28-Apr-2010 |
Wei Yongjun <yjwei@cn.fujitsu.com> |
sctp: avoid irq lock inversion while call sk->sk_data_ready() sk->sk_data_ready() of sctp socket can be called from both BH and non-BH contexts, but the default sk->sk_data_ready(), sock_def_readable(), can not be used in this case. Therefore, we have to make a new function sctp_data_ready() to grab sk->sk_data_ready() with BH disabling. ========================================================= [ INFO: possible irq lock inversion dependency detected ] 2.6.33-rc6 #129 --------------------------------------------------------- sctp_darn/1517 just changed the state of lock: (clock-AF_INET){++.?..}, at: [<c06aab60>] sock_def_readable+0x20/0x80 but this lock took another, SOFTIRQ-unsafe lock in the past: (slock-AF_INET){+.-...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: 1 lock held by sctp_darn/1517: #0: (sk_lock-AF_INET){+.+.+.}, at: [<cdfe363d>] sctp_sendmsg+0x23d/0xc00 [sctp] Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c377411f |
|
27-Apr-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: sk_add_backlog() take rmem_alloc into account Current socket backlog limit is not enough to really stop DDOS attacks, because user thread spend many time to process a full backlog each round, and user might crazy spin on socket lock. We should add backlog size and receive_queue size (aka rmem_alloc) to pace writers, and let user run without being slow down too much. Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in stress situations. Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp receiver can now process ~200.000 pps (instead of ~100 pps before the patch) on a 8 core machine. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aa395145 |
|
20-Apr-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: sk_sleep() helper Define a new function to return the waitqueue of a "struct sock". static inline wait_queue_head_t *sk_sleep(struct sock *sk) { return sk->sk_sleep; } Change all read occurrences of sk_sleep by a call to this function. Needed for a future RCU conversion. sk_sleep wont be a field directly available. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b68c9246 |
|
30-Mar-2010 |
Hagen Paul Pfeifer <hagen@jauu.net> |
sctp: eliminate useless code Remove duplicate declaration of symbol: struct hlist_node *node was already declared, the seconds declaration shadows the first one. CC: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a0e3ad6 |
|
24-Mar-2010 |
Tejun Heo <tj@kernel.org> |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
#
50b1a782 |
|
04-Mar-2010 |
Zhu Yi <yi.zhu@intel.com> |
sctp: use limited socket backlog Make sctp adapt to the limited socket backlog change. Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
09cb47a2 |
|
21-Jan-2010 |
Julia Lawall <julia@diku.dk> |
net/sctp: Eliminate useless code The variable newinet is initialized twice to the same (side effect-free) expression. Drop one initialization. A simplified version of the semantic match that finds this problem is: (http://coccinelle.lip6.fr/) // <smpl> @forall@ idexpression *x; identifier f!=ERR_PTR; @@ x = f(...) ... when != x ( x = f(...,<+...x...+>,...) | * x = f(...) ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8ffd3208 |
|
21-Dec-2009 |
Andrew Morton <akpm@linux-foundation.org> |
net/sctp/socket.c: squish warning net/sctp/socket.c: In function 'sctp_setsockopt_autoclose': net/sctp/socket.c:2090: warning: comparison is always false due to limited range of data type Cc: Andrei Pelinescu-Onciul <andrei@iptel.org> Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
810c0719 |
|
02-Dec-2009 |
Andrei Pelinescu-Onciul <andrei@iptel.org> |
sctp: fix sctp_setsockopt_autoclose compile warning Fix the following warning, when building on 64 bits: net/sctp/socket.c:2091: warning: large integer implicitly truncated to unsigned type Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f64f9e71 |
|
29-Nov-2009 |
Joe Perches <joe@perches.com> |
net: Move && and || to end of previous line Not including net/atm/ Compiled tested x86 allyesconfig only Added a > 80 column line or two, which I ignored. Existing checkpatch plaints willfully, cheerfully ignored. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f6778aab |
|
23-Nov-2009 |
Andrei Pelinescu-Onciul <andrei@iptel.org> |
sctp: limit maximum autoclose setsockopt value To avoid overflowing the maximum timer interval when transforming the autoclose interval from seconds to jiffies, limit the maximum autoclose value to MAX_SCHEDULE_TIMEOUT/HZ. Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
a242b41d |
|
23-Nov-2009 |
Amerigo Wang <amwang@redhat.com> |
sctp: remove deprecated SCTP_GET_*_OLD stuffs SCTP_GET_*_OLD stuffs are schedlued to be removed. Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
37051f73 |
|
23-Nov-2009 |
Andrei Pelinescu-Onciul <andrei@iptel.org> |
sctp: allow setting path_maxrxt independent of SPP_PMTUD_ENABLE Since draft-ietf-tsvwg-sctpsocket-15.txt, setting the SPP_MTUD_ENABLE flag when changing pathmaxrxt via the SCTP_PEER_ADDR_PARAMS setsockopt is not required any longer. Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
f9c67811 |
|
11-Nov-2009 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: Fix regression introduced by new sctp_connectx api A new (unrealeased to the user) sctp_connectx api c6ba68a26645dbc5029a9faa5687ebe6fcfc53e4 sctp: support non-blocking version of the new sctp_connectx() API introduced a regression cought by the user regression test suite. In particular, the API requires the user library to re-allocate the buffer and could potentially trigger a SIGFAULT. This change corrects that regression by passing the original address buffer to the kernel unmodified, but still allows for a returned association id. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
409b95af |
|
10-Nov-2009 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: Set source addresses on the association before adding transports Recent commit 8da645e101a8c20c6073efda3c7cc74eec01b87f sctp: Get rid of an extra routing lookup when adding a transport introduced a regression in the connection setup. The behavior was different between IPv4 and IPv6. IPv4 case ended up working because the route lookup routing returned a NULL route, which triggered another route lookup later in the output patch that succeeded. In the IPv6 case, a valid route was returned for first call, but we could not find a valid source address at the time since the source addresses were not set on the association yet. Thus resulted in a hung connection. The solution is to set the source addresses on the association prior to adding peers. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c720c7e8 |
|
15-Oct-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
inet: rename some inet_sock fields In order to have better cache layouts of struct sock (separate zones for rx/tx paths), we need this preliminary patch. Goal is to transfert fields used at lookup time in the first read-mostly cache line (inside struct sock_common) and move sk_refcnt to a separate cache line (only written by rx path) This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr, sport and id fields. This allows a future patch to define these fields as macros, like sk_refcnt, without name clashes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3b885787 |
|
12-Oct-2009 |
Neil Horman <nhorman@tuxdriver.com> |
net: Generalize socket rx gap / receive queue overflow cmsg Create a new socket level option to report number of queue overflows Recently I augmented the AF_PACKET protocol to report the number of frames lost on the socket receive queue between any two enqueued frames. This value was exported via a SOL_PACKET level cmsg. AFter I completed that work it was requested that this feature be generalized so that any datagram oriented socket could make use of this option. As such I've created this patch, It creates a new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue overflowed between any two given frames. It also augments the AF_PACKET protocol to take advantage of this new feature (as it previously did not touch sk->sk_drops, which this patch uses to record the overflow count). Tested successfully by me. Notes: 1) Unlike my previous patch, this patch simply records the sk_drops value, which is not a number of drops between packets, but rather a total number of drops. Deltas must be computed in user space. 2) While this patch currently works with datagram oriented protocols, it will also be accepted by non-datagram oriented protocols. I'm not sure if thats agreeable to everyone, but my argument in favor of doing so is that, for those protocols which aren't applicable to this option, sk_drops will always be zero, and reporting no drops on a receive queue that isn't used for those non-participating protocols seems reasonable to me. This also saves us having to code in a per-protocol opt in mechanism. 3) This applies cleanly to net-next assuming that commit 977750076d98c7ff6cbda51858bb5a5894a9d9ab (my af packet cmsg patch) is reverted Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b7058842 |
|
30-Sep-2009 |
David S. Miller <davem@davemloft.net> |
net: Make setsockopt() optlen be unsigned. This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f68b2e05 |
|
04-Sep-2009 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: Fix SCTP_MAXSEG socket option to comply to spec. We had a bug that we never stored the user-defined value for MAXSEG when setting the value on an association. Thus future PMTU events ended up re-writing the frag point and increasing it past user limit. Additionally, when setting the option on the socket/endpoint, we effect all current associations, which is against spec. Now, we store the user 'maxseg' value along with the computed 'frag_point'. We inherit 'maxseg' from the socket at association creation and use it as an upper limit for 'frag_point' when its set. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
9c5c62be |
|
10-Aug-2009 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: Send user messages to the lower layer as one Currenlty, sctp breaks up user messages into fragments and sends each fragment to the lower layer by itself. This means that for each fragment we go all the way down the stack and back up. This also discourages bundling of multiple fragments when they can fit into a sigle packet (ex: due to user setting a low fragmentation threashold). We introduce a new command SCTP_CMD_SND_MSG and hand the whole message down state machine. The state machine and the side-effect parser will cork the queue, add all chunks from the message to the queue, and then un-cork the queue thus causing the chunks to get transmitted. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
bec9640b |
|
30-Jul-2009 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: Disallow new connection on a closing socket If a socket has a lot of association that are in the process of of being closed/aborted, it is possible for a remote to establish new associations during the time period that the old ones are shutting down. If this was a result of a close() call, there will be no socket and will cause a memory leak. We'll prevent this by setting the socket state to CLOSING and disallow new associations when in this state. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
1bc4ee40 |
|
05-Jul-2009 |
Wei Yongjun <yjwei@cn.fujitsu.com> |
sctp: fix warning at inet_sock_destruct() while release sctp socket Commit 'net: Move rx skb_orphan call to where needed' broken sctp protocol with warning at inet_sock_destruct(). Actually, sctp can do this right with sctp_sock_rfree_frag() and sctp_skb_set_owner_r_frag() pair. sctp_sock_rfree_frag(skb); sctp_skb_set_owner_r_frag(skb, newsk); This patch not revert the commit d55d87fdff8252d0e2f7c28c2d443aee17e9d70f, instead remove the sctp_sock_rfree_frag() function. ------------[ cut here ]------------ WARNING: at net/ipv4/af_inet.c:151 inet_sock_destruct+0xe0/0x142() Modules linked in: sctp ipv6 dm_mirror dm_region_hash dm_log dm_multipath scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan] Pid: 1808, comm: sctp_test Not tainted 2.6.31-rc2 #40 Call Trace: [<c042dd06>] warn_slowpath_common+0x6a/0x81 [<c064a39a>] ? inet_sock_destruct+0xe0/0x142 [<c042dd2f>] warn_slowpath_null+0x12/0x15 [<c064a39a>] inet_sock_destruct+0xe0/0x142 [<c05fde44>] __sk_free+0x19/0xcc [<c05fdf50>] sk_free+0x18/0x1a [<ca0d14ad>] sctp_close+0x192/0x1a1 [sctp] [<c0649f7f>] inet_release+0x47/0x4d [<c05fba4d>] sock_release+0x19/0x5e [<c05fbab3>] sock_close+0x21/0x25 [<c049c31b>] __fput+0xde/0x189 [<c049c3de>] fput+0x18/0x1a [<c049988f>] filp_close+0x56/0x60 [<c042f422>] put_files_struct+0x5d/0xa1 [<c042f49f>] exit_files+0x39/0x3d [<c043086a>] do_exit+0x1a5/0x5dd [<c04a86c2>] ? d_kill+0x35/0x3b [<c0438fa4>] ? dequeue_signal+0xa6/0x115 [<c0430d05>] do_group_exit+0x63/0x8a [<c0439504>] get_signal_to_deliver+0x2e1/0x2f9 [<c0401d9e>] do_notify_resume+0x7c/0x6b5 [<c043f601>] ? autoremove_wake_function+0x0/0x34 [<c04a864e>] ? __d_free+0x3d/0x40 [<c04a867b>] ? d_free+0x2a/0x3c [<c049ba7e>] ? vfs_write+0x103/0x117 [<c05fc8fa>] ? sys_socketcall+0x178/0x182 [<c0402a56>] work_notifysig+0x13/0x19 ---[ end trace 9db92c463e789fba ]--- Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
31e6d363 |
|
17-Jun-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: correct off-by-one write allocations reports commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) changed initial sk_wmem_alloc value. We need to take into account this offset when reporting sk_wmem_alloc to user, in PROC_FS files or various ioctls (SIOCOUTQ/TIOCOUTQ) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1b003be3 |
|
09-Jun-2009 |
David S. Miller <davem@davemloft.net> |
sctp: Use frag list abstraction interfaces. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c6ba68a2 |
|
31-May-2009 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: support non-blocking version of the new sctp_connectx() API Prior implementation of the new sctp_connectx() call that returns an association ID did not work correctly on non-blocking socket. This is because we could not return both a EINPROGRESS error and an association id. This is a new implementation that supports this. Originally from Ivan Skytte Jørgensen <isj-sctp@i1.dk Signed-off-by: Ivan Skytte Jørgensen <isj-sctp@i1.dk Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
5e8f3f70 |
|
12-Mar-2009 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: simplify sctp listening code sctp_inet_listen() call is split between UDP and TCP style. Looking at the code, the two functions are almost the same and can be merged into a single helper. This also fixes a bug that was fixed in the UDP function, but missed in the TCP function. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c6db93a5 |
|
02-Mar-2009 |
Wei Yongjun <yjwei@cn.fujitsu.com> |
sctp: fix the length check in sctp_getsockopt_maxburst() The code in sctp_getsockopt_maxburst() doesn't allow len to be larger then struct sctp_assoc_value, which is a common case where app writers just pass down the sizeof(buf) or something similar. This patch fix the problem. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d212318c |
|
02-Mar-2009 |
Wei Yongjun <yjwei@cn.fujitsu.com> |
sctp: remove dup code in net/sctp/socket.c Remove dup check of "if (optlen < sizeof(int))". Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
914e1c8b |
|
13-Feb-2009 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: Inherit all socket options from parent correctly. During peeloff/accept() sctp needs to save the parent socket state into the new socket so that any options set on the parent are inherited by the child socket. This was found when the parent/listener socket issues SO_BINDTODEVICE, but the data was misrouted after a route cache flush. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
025dfdaf |
|
16-Oct-2008 |
Frederik Schwarzer <schwarzerf@gmail.com> |
trivial: fix then -> than typos in comments and documentation - (better, more, bigger ...) then -> (...) than Signed-off-by: Frederik Schwarzer <schwarzerf@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
8510b937 |
|
25-Dec-2008 |
Wei Yongjun <yjwei@cn.fujitsu.com> |
sctp: Add validity check for SCTP_PARTIAL_DELIVERY_POINT socket option The latest ietf socket extensions API draft said: 8.1.21. Set or Get the SCTP Partial Delivery Point Note also that the call will fail if the user attempts to set this value larger than the socket receive buffer size. This patch add this validity check for SCTP_PARTIAL_DELIVERY_POINT socket option. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aea3c5c0 |
|
25-Dec-2008 |
Wei Yongjun <yjwei@cn.fujitsu.com> |
sctp: Implement socket option SCTP_GET_ASSOC_NUMBER Implement socket option SCTP_GET_ASSOC_NUMBER of the latest ietf socket extensions API draft. 8.2.5. Get the Current Number of Associations (SCTP_GET_ASSOC_NUMBER) This option gets the current number of associations that are attached to a one-to-many style socket. The option value is an uint32_t. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ea686a26 |
|
25-Dec-2008 |
Wei Yongjun <yjwei@cn.fujitsu.com> |
sctp: Fix a typo in socket.c Just fix a typo in socket.c. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e89c2095 |
|
25-Dec-2008 |
Wei Yongjun <yjwei@cn.fujitsu.com> |
sctp: Bring SCTP_MAXSEG socket option into ietf API extension compliance Brings maxseg socket option set/get into line with the latest ietf socket extensions API draft, while maintaining backwards compatibility. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1748376b |
|
25-Nov-2008 |
Eric Dumazet <dada1@cosmosbay.com> |
net: Use a percpu_counter for sockets_allocated Instead of using one atomic_t per protocol, use a percpu_counter for "sockets_allocated", to reduce cache line contention on heavy duty network servers. Note : We revert commit (248969ae31e1b3276fc4399d67ce29a5d81e6fd9 net: af_unix can make unix_nr_socks visbile in /proc), since it is not anymore used after sock_prot_inuse_add() addition Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5bc0b3bf |
|
25-Nov-2008 |
Eric Dumazet <dada1@cosmosbay.com> |
net: Make sure BHs are disabled in sock_prot_inuse_add() prot->destroy is not called with BH disabled. So we must add explicit BH disable around call to sock_prot_inuse_add() in sctp_destroy_sock() Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6f756a8c |
|
23-Nov-2008 |
David S. Miller <davem@davemloft.net> |
net: Make sure BHs are disabled in sock_prot_inuse_add() The rule of calling sock_prot_inuse_add() is that BHs must be disabled. Some new calls were added where this was not true and this tiggers warnings as reported by Ilpo. Fix this by adding explicit BH disabling around those call sites. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9a57f7fa |
|
17-Nov-2008 |
Eric Dumazet <dada1@cosmosbay.com> |
net: sctp should update its inuse counter This patch is a preparation to namespace conversion of /proc/net/protocols In order to have relevant information for SCTP protocols, we should use sock_prot_inuse_add() to update a (percpu and pernamespace) counter of inuse sockets. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
52cae8f0 |
|
18-Aug-2008 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: try harder to figure out address family when checking wildcards sctp_is_any() function that is used to check for wildcard addresses only looks at the address itself to determine the address family. This function is used in the API to check the address passed in from the user. If the user simply zerroes out the sockaddr_storage and pass that in, we'll end up failing. So, let's try harder to determine the address family by also checking the socket if it's possible. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
d9724055 |
|
27-Aug-2008 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: fix random memory dereference with SCTP_HMAC_IDENT option. The number of identifiers needs to be checked against the option length. Also, the identifier index provided needs to be verified to make sure that it doesn't exceed the bounds of the array. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
328fc47e |
|
27-Aug-2008 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: correct bounds check in sctp_setsockopt_auth_key The bonds check to prevent buffer overlflow was not exactly right. It still allowed overflow of up to 8 bytes which is sizeof(struct sctp_authkey). Since optlen is already checked against the size of that struct, we are guaranteed not to cause interger overflow either. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
30c2235c |
|
25-Aug-2008 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: add verification checks to SCTP_AUTH_KEY option The structure used for SCTP_AUTH_KEY option contains a length that needs to be verfied to prevent buffer overflow conditions. Spoted by Eugene Teo <eteo@redhat.com>. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5e739d17 |
|
21-Aug-2008 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: fix potential panics in the SCTP-AUTH API. All of the SCTP-AUTH socket options could cause a panic if the extension is disabled and the API is envoked. Additionally, there were some additional assumptions that certain pointers would always be valid which may not always be the case. This patch hardens the API and address all of the crash scenarios. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a677a039 |
|
23-Jul-2008 |
Ulrich Drepper <drepper@redhat.com> |
flag parameters: socket and socketpair This patch adds support for flag values which are ORed to the type passwd to socket and socketpair. The additional code is minimal. The flag values in this implementation can and must match the O_* flags. This avoids overhead in the conversion. The internal functions sock_alloc_fd and sock_map_fd get a new parameters and all callers are changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <netinet/in.h> #include <sys/socket.h> #define PORT 57392 /* For Linux these must be the same. */ #define SOCK_CLOEXEC O_CLOEXEC int main (void) { int fd; fd = socket (PF_INET, SOCK_STREAM, 0); if (fd == -1) { puts ("socket(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("socket(0) set close-on-exec flag"); return 1; } close (fd); fd = socket (PF_INET, SOCK_STREAM|SOCK_CLOEXEC, 0); if (fd == -1) { puts ("socket(SOCK_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("socket(SOCK_CLOEXEC) does not set close-on-exec flag"); return 1; } close (fd); int fds[2]; if (socketpair (PF_UNIX, SOCK_STREAM, 0, fds) == -1) { puts ("socketpair(0) failed"); return 1; } for (int i = 0; i < 2; ++i) { coe = fcntl (fds[i], F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { printf ("socketpair(0) set close-on-exec flag for fds[%d]\n", i); return 1; } close (fds[i]); } if (socketpair (PF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, fds) == -1) { puts ("socketpair(SOCK_CLOEXEC) failed"); return 1; } for (int i = 0; i < 2; ++i) { coe = fcntl (fds[i], F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { printf ("socketpair(SOCK_CLOEXEC) does not set close-on-exec flag for fds[%d]\n", i); return 1; } close (fds[i]); } puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4e54064e |
|
19-Jul-2008 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: Allow only 1 listening socket with SO_REUSEADDR When multiple socket bind to the same port with SO_REUSEADDR, only 1 can be listining. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
23b29ed8 |
|
19-Jul-2008 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: Do not leak memory on multiple listen() calls SCTP permits multiple listen call and on subsequent calls we leak he memory allocated for the crypto transforms. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7dab83de |
|
19-Jul-2008 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: Support ipv6only AF_INET6 sockets. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5c52ba17 |
|
16-Jul-2008 |
Pavel Emelyanov <xemul@openvz.org> |
sock: add net to prot->enter_memory_pressure callback The tcp_enter_memory_pressure calls NET_INC_STATS, but doesn't have where to get the net from. I decided to add a sk argument, not the net itself, only to factor all the required sock_net(sk) calls inside the enter_memory_pressure callback itself. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ecbed6a4 |
|
01-Jul-2008 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: Mark GET_PEER|LOCAL_ADDR_OLD deprecated. Socket options SCTP_GET_PEER_ADDR_OLD, SCTP_GET_PEER_ADDR_NUM_OLD, SCTP_GET_LOCAL_ADDR_OLD, and SCTP_GET_PEER_LOCAL_ADDR_NUM_OLD have been replaced by newer versions a since 2005. It's time to officially deprecate them and schedule them for removal. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
735ce972 |
|
20-Jun-2008 |
David S. Miller <davem@davemloft.net> |
sctp: Make sure N * sizeof(union sctp_addr) does not overflow. As noticed by Gabriel Campana, the kmalloc() length arg passed in by sctp_getsockopt_local_addrs_old() can overflow if ->addr_num is large enough. Therefore, enforce an appropriate limit. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7d06b2e0 |
|
14-Jun-2008 |
Brian Haley <brian.haley@hp.com> |
net: change proto destroy method to return void Change struct proto destroy function pointer to return void. Noticed by Al Viro. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7bfe8bdb |
|
09-Jun-2008 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: Fix problems with the new SCTP_DELAYED_ACK code The default sack frequency should be 2. Also fix copy/paste error when updating all transports. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
88a0a948 |
|
09-May-2008 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
sctp: Support the new specification of sctp_connectx() The specification of sctp_connectx() has been changed to return an association id. We've added a new socket option that will return the association id as the return value from the setsockopt() call. The library that implements sctp_connectx() interface will implement both socket options. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d364d927 |
|
09-May-2008 |
Wei Yongjun <yjwei@cn.fujitsu.com> |
sctp: Bring SCTP_DELAYED_ACK socket option into API compliance Brings delayed_ack socket option set/get into line with the latest ietf socket extensions API draft, while maintaining backwards compatibility. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9dbc15f0 |
|
12-Apr-2008 |
Robert P. J. Day <rpjday@crashcourse.ca> |
[SCTP]: "list_for_each()" -> "list_for_each_entry()" where appropriate. Replacing (almost) all invocations of list_for_each() with list_for_each_entry() tightens up the code and allows for the deletion of numerous list iterator variables that are no longer necessary. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ab38fb04 |
|
12-Apr-2008 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Fix compiler warning about const qualifiers Fix 3 warnings about discarding const qualifiers: net/sctp/ulpevent.c:862: warning: passing argument 1 of 'sctp_event2skb' discards qualifiers from pointer target type net/sctp/sm_statefuns.c:4393: warning: passing argument 1 of 'SCTP_ASOC' discards qualifiers from pointer target type net/sctp/socket.c:5874: warning: passing argument 1 of 'cmsg_nxthdr' discards qualifiers from pointer target type Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
935a7f6e |
|
10-Apr-2008 |
Li Zefan <lizf@cn.fujitsu.com> |
SCTP: fix wrong debug counting of bind_bucket Should not count it if the allocation of the object is failed. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bdcde3d7 |
|
28-Mar-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[SOCK]: Drop inuse pcounter from struct proto (v2). An uppercut - do not use the pcounter on struct proto. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
80445cfb |
|
23-Mar-2008 |
Florian Westphal <fw@strlen.de> |
[SCTP]: Remove redundant wrapper functions. sctp_datamsg_free and sctp_datamsg_track are just aliases for sctp_datamsg_put and sctp_chunk_hold, respectively. Saves 32 Bytes on x86. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0dc47877 |
|
05-Mar-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
net: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
219b99a9 |
|
05-Mar-2008 |
Neil Horman <nhorman@tuxdriver.com> |
[SCTP]: Bring MAX_BURST socket option into ietf API extension compliance Brings max_burst socket option set/get into line with the latest ietf socket extensions api draft, while maintaining backwards compatibility. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7e8616d8 |
|
27-Feb-2008 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Update AUTH structures to match declarations in draft-16. The new SCTP socket api (draft 16) updates the AUTH API structures. We never exported these since we knew they would change. Update the rest to match the draft. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
b40db684 |
|
27-Feb-2008 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Incorrect length was used in SCTP_*_AUTH_CHUNKS socket option The chunks are stored inside a parameter structure in the kernel and when we copy them to the user, we need to account for the parameter header. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
5f31886f |
|
20-Feb-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[SCTP]: Pick up an orphaned sctp_sockets_allocated counter. This counter is currently write-only. Drawing an analogy with the similar tcp counter, I think that this one should be pointed by the sockets_allocated members of sctp_prot and sctpv6_prot. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b46ae36d |
|
28-Jan-2008 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Set ports in every address returned by sctp_getladdrs() Thomas Dreibholz has reported that port numbers are not filled in the results of sctp_getladdrs() when the socket was bound to an ephemeral port. This is only true, if the address was not specified either. So, fill in the port number correctly. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
0eca8fee |
|
11-Jan-2008 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Do not increase rwnd when reading partial notification. When a user reads a partial notification message, do not update rwnd since notifications must not be counted towards receive window. Tested-by: Oliver Roll <mail@oliroll.de> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
60c778b2 |
|
11-Jan-2008 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Stop claiming that this is a "reference implementation" I was notified by Randy Stewart that lksctp claims to be "the reference implementation". First of all, "the refrence implementation" was the original implementation of SCTP in usersapce written ty Randy and a few others. Second, after looking at the definiton of 'reference implementation', we don't really meet the requirements. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
3ab224be |
|
31-Dec-2007 |
Hideo Aoki <haoki@redhat.com> |
[NET] CORE: Introducing new memory accounting interface. This patch introduces new memory accounting functions for each network protocol. Most of them are renamed from memory accounting functions for stream protocols. At the same time, some stream memory accounting functions are removed since other functions do same thing. Renaming: sk_stream_free_skb() -> sk_wmem_free_skb() __sk_stream_mem_reclaim() -> __sk_mem_reclaim() sk_stream_mem_reclaim() -> sk_mem_reclaim() sk_stream_mem_schedule -> __sk_mem_schedule() sk_stream_pages() -> sk_mem_pages() sk_stream_rmem_schedule() -> sk_rmem_schedule() sk_stream_wmem_schedule() -> sk_wmem_schedule() sk_charge_skb() -> sk_mem_charge() Removeing sk_stream_rfree(): consolidates into sock_rfree() sk_stream_set_owner_r(): consolidates into skb_set_owner_r() sk_stream_mem_schedule() The following functions are added. sk_has_account(): check if the protocol supports accounting sk_mem_uncharge(): do the opposite of sk_mem_charge() In addition, to achieve consolidation, updating sk_wmem_queued is removed from sk_mem_charge(). Next, to consolidate memory accounting functions, this patch adds memory accounting calls to network core functions. Moreover, present memory accounting call is renamed to new accounting call. Finally we replace present memory accounting calls with new interface in TCP and SCTP. Signed-off-by: Takahiro Yasui <tyasui@redhat.com> Signed-off-by: Hideo Aoki <haoki@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f57d96b2 |
|
20-Dec-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Change use_as_src into a full address state Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8d8ad9d7 |
|
26-Nov-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[NET]: Name magic constants in sock_wake_async() The sock_wake_async() performs a bit different actions depending on "how" argument. Unfortunately this argument ony has numerical magic values. I propose to give names to their constants to help people reading this function callers understand what's going on without looking into this function all the time. I suppose this is 2.6.25 material, but if it's not (or the naming seems poor/bad/awful), I can rework it against the current net-2.6 tree. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8e71a11c |
|
06-Dec-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Fix the bind_addr info during migration. During accept/migrate the code attempts to copy the addresses from the parent endpoint to the new endpoint. However, if the parent was bound to a wildcard address, then we end up pointlessly copying all of the current addresses on the system. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f26f7c48 |
|
06-Dec-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Add bind hash locking to the migrate code SCTP accept code tries to add a newliy created socket to a bind bucket without holding a lock. On a really busy system, that can causes slab corruptions. Add a lock around this code. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d970dbf8 |
|
09-Nov-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
SCTP: Convert custom hash lists to use hlist. Convert the custom hash list traversals to use hlist functions. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
0ed90fb0 |
|
24-Oct-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
SCTP: Update RCU handling during the ADD-IP case After learning more about rcu, it looks like the ADD-IP hadling doesn't need to call call_rcu_bh. All the rcu critical sections use rcu_read_lock, so using call_rcu_bh is wrong here. Now, restore the local_bh_disable() code blocks and use normal call_rcu() calls. Also restore the missing return statement. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
8295b6d9 |
|
06-Nov-2007 |
Eric Dumazet <dada1@cosmosbay.com> |
[SCTP]: Use the {DEFINE|REF}_PROTO_INUSE infrastructure Trivial patch to make "sctcp,sctpv6" protocols uses the fast "inuse sockets" infrastructure Each protocol use then a static percpu var, instead of a dynamic one. This saves some ram and some cpu cycles Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
411223c0 |
|
14-Oct-2007 |
Al Viro <viro@ftp.linux.org.uk> |
fix breakage in sctp getsockopt copy_to_user() into on-stack array Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
227b60f5 |
|
10-Oct-2007 |
Stephen Hemminger <shemminger@linux-foundation.org> |
[INET]: local port range robustness Expansion of original idea from Denis V. Lunev <den@openvz.org> Add robustness and locking to the local_port_range sysctl. 1. Enforce that low < high when setting. 2. Use seqlock to ensure atomic update. The locking might seem like overkill, but there are cases where sysadmin might want to change value in the middle of a DoS attack. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
06393009 |
|
10-Oct-2007 |
Stephen Hemminger <shemminger@linux-foundation.org> |
[SCTP]: port randomization Add port randomization rather than a simple fixed rover for use with SCTP. This makes it act similar to TCP, UDP, DCCP when allocating ports. No longer need port_alloc_lock as well (suggestion by Brian Haley). Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
65b07e5d |
|
16-Sep-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: API updates to suport SCTP-AUTH extensions. Add SCTP-AUTH API. The API implemented here was agreed to between implementors at the 9th SCTP Interop. It will be documented in the next revision of the SCTP socket API spec. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b6fa1a4d |
|
12-Sep-2007 |
Adrian Bunk <bunk@kernel.org> |
[SCTP] net/sctp/socket.c: make 3 variables static This patch makes the following needlessly global variables static: - sctp_memory_pressure - sctp_memory_allocated - sctp_sockets_allocated Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4d93df0a |
|
15-Aug-2007 |
Neil Horman <nhorman@tuxdriver.com> |
[SCTP]: Rewrite of sctp buffer management code This patch introduces autotuning to the sctp buffer management code similar to the TCP. The buffer space can be grown if the advertised receive window still has room. This might happen if small message sizes are used, which is common in telecom environmens. New tunables are introduced that provide limits to buffer growth and memory pressure is entered if to much buffer spaces is used. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
559cf710 |
|
16-Sep-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Convert bind_addr_list locking to RCU Since the sctp_sockaddr_entry is now RCU enabled as part of the patch to synchronize sctp_localaddr_list, it makes sense to change all handling of these entries to RCU. This includes the sctp_bind_addrs structure and it's list of bound addresses. This list is currently protected by an external rw_lock and that looks like an overkill. There are only 2 writers to the list: bind()/bindx() calls, and BH processing of ASCONF-ACK chunks. These are already seriealized via the socket lock, so they will not step on each other. These are also relatively rare, so we should be good with RCU. The readers are varied and they are easily converted to RCU. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Sridhar Samdurala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
29303547 |
|
16-Sep-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Add RCU synchronization around sctp_localaddr_list sctp_localaddr_list is modified dynamically via NETDEV_UP and NETDEV_DOWN events, but there is not synchronization between writer (even handler) and readers. As a result, the readers can access an entry that has been freed and crash the sytem. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Sridhar Samdurala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
498d6307 |
|
30-Aug-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
SCTP: Correctly disable listening when backlog is 0. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
2772b495 |
|
20-Aug-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
SCTP: Pick the correct port when binding to 0. sctp_bindx() allows the use of unspecified port. The problem is that every address we bind to ends up selecting a new port if the user specified port 0. This patch allows re-use of the already selected port when the port from bindx was 0. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
e4d1feab |
|
01-Aug-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
SCTP: IPv4 mapped addr not returned in SCTPv6 accept() When issuing a connect call on an AF_INET6 sctp socket with a IPv4-mapped destination, the peer address that is returned by getpeeraddr() should be v4-mapped as well. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
d6f9fdaf |
|
27-Jul-2007 |
Sebastian Siewior <sebastian@breakpoint.cc> |
sctp: try to fix readlock unlock the reader lock in error case. Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
c86dabcf |
|
26-Jul-2007 |
sebastian@breakpoint.cc <sebastian@breakpoint.cc> |
sctp: remove shadowed symbols Fixes the following sparse warnings: net/sctp/sm_make_chunk.c:1457:9: warning: symbol 'len' shadows an earlier one net/sctp/sm_make_chunk.c:1356:23: originally declared here net/sctp/socket.c:1534:22: warning: symbol 'chunk' shadows an earlier one net/sctp/socket.c:1387:20: originally declared here Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
0a5fcb9c |
|
26-Jul-2007 |
sebastian@breakpoint.cc <sebastian@breakpoint.cc> |
sctp: move global declaration to header file. sctp_chunk_cachep & sctp_bucket_cachep is used module global, so move it to a header file. Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
04675210 |
|
26-Jul-2007 |
sebastian@breakpoint.cc <sebastian@breakpoint.cc> |
sctp: make locally used function static Forward declarion is static, the function itself is not. Make it consistent. Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
9cbcbf4e |
|
18-Jul-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] SCTP: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
#
f50f95ca |
|
02-Jul-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
SCTP: Check to make sure file is valid before setting timeout In-kernel sockets created with sock_create_kern don't usually have a file and file descriptor allocated to them. As a result, when SCTP tries to check the non-blocking flag, we Oops when dereferencing a NULL file pointer. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3663c306 |
|
02-Jul-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
SCTP: Fix thinko in sctp_copy_laddrs() Correctly dereference bytes_copied in sctp_copy_laddrs(). I totally must have spaced when doing this. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5131a184 |
|
22-Jun-2007 |
Zach Brown <zach.brown@oracle.com> |
SCTP: lock_sock_nested in sctp_sock_migrate sctp_sock_migrate() grabs the socket lock on a newly allocated socket while holding the socket lock on an old socket. lockdep worries that this might be a recursive lock attempt. task/3026 is trying to acquire lock: (sk_lock-AF_INET){--..}, at: [<ffffffff88105b8c>] sctp_sock_migrate+0x2e3/0x327 [sctp] but task is already holding lock: (sk_lock-AF_INET){--..}, at: [<ffffffff8810891f>] sctp_accept+0xdf/0x1e3 [sctp] This patch tells lockdep that this locking is safe by using lock_sock_nested(). Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
186e2343 |
|
18-Jun-2007 |
Neil Horman <nhorman@tuxdriver.com> |
SCTP: Fix sctp_getsockopt_get_peer_addrs This is the split out of the patch that we agreed I should split out from my last patch. It changes space_left to be computed in the same way the to variable is. I know we talked about changing space_left to an int, but I think size_t is more appropriate, since we should never have negative space in our buffer, and computing using offsetof means space_left should now never drop below zero. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
408f22e8 |
|
16-Jun-2007 |
Neil Horman <nhorman@tuxdriver.com> |
SCTP: update sctp_getsockopt helpers to allow oversized buffers I noted the other day while looking at a bug that was ostensibly in some perl networking library, that we strictly avoid allowing getsockopt operations to complete if we pass in oversized buffers. This seems to make libraries like Perl::NET malfunction since it seems to allocate oversized buffers for use in several operations. It also seems to be out of line with the way udp, tcp and ip getsockopt routines handle buffer input (since the *optlen pointer in both an input and an output and gets set to the length of the data that we copy into the buffer). This patch brings our getsockopt helpers into line with other protocols, and allows us to accept oversized buffers for our getsockopt operations. Tested by me with good results. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
|
#
8a479491 |
|
07-Jun-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP] Flag a pmtu change request Currently, if the socket is owned by the user, we drop the ICMP message. As a result SCTP forgets that path MTU changed and never adjusting it's estimate. This causes all subsequent packets to be fragmented. With this patch, we'll flag the association that it needs to udpate it's estimate based on the already updated routing information. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Acked-by: Sridhar Samudrala <sri@us.ibm.com>
|
#
fe979ac1 |
|
23-May-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP] Fix leak in sctp_getsockopt_local_addrs when copy_to_user fails If the copy_to_user or copy_user calls fail in sctp_getsockopt_local_addrs(), the function should free locally allocated storage before returning error. Spotted by Coverity. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Acked-by: Sridhar Samudrala <sri@us.ibm.com>
|
#
8b358056 |
|
15-May-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Allow unspecified port in sctp_bindx() Allow sctp_bindx() to accept multiple address with unspecified port. In this case, all addresses inherit the first bound port. We still catch full mis-matches. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Acked-by: Sridhar Samudrala <sri@us.ibm.com>
|
#
d570ee49 |
|
15-May-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Correctly set daddr for IPv6 sockets during peeloff During peeloff of AF_INET6 socket, the inet6_sk(sk)->daddr wasn't set correctly since the code was assuming IPv4 only. Now we use a correct call to set the destination address. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Acked-by: Sridhar Samudrala <sri@us.ibm.com>
|
#
70b57b81 |
|
09-May-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Correctly copy addresses in sctp_copy_laddrs I broke the non-wildcard case recently. This is to fixes it. Now, explictitly bound addresses can ge retrieved using the API. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8dc4984a |
|
09-May-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Prevent OOPS if hmac modules didn't load SCTP was checking for NULL when trying to detect hmac allocation failure where it should have been using IS_ERR. Also, print a rate limited warning to the log telling the user what happend. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
59c51591 |
|
09-May-2007 |
Michael Opdenacker <michael@free-electrons.com> |
Fix occurrences of "the the " Signed-off-by: Michael Opdenacker <michael@free-electrons.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
ce5325c1 |
|
04-May-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Fix the SO_REUSEADDR handling to be similar to TCP. Update the SO_REUSEADDR handling to also check for listen state. This was muliple listening server sockets can't be created and they will not steal packets from each other. Reported by Paolo Galtieri <pgaltieri@mvista.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
16d00fb7 |
|
04-May-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Verify all destination ports in sctp_connectx. We need to make sure that all destination ports are the same, since the association really must not connect to multiple different ports at once. This was reported on the sctp-impl list. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aad97f38 |
|
28-Apr-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Fix sctp_getsockopt_local_addrs_old() to use local storage. sctp_getsockopt_local_addrs_old() in net/sctp/socket.c calls copy_to_user() while the spinlock addr_lock is held. this should not be done as copy_to_user() might sleep. the call to sctp_copy_laddrs_to_user() while holding the lock is also problematic as it calls copy_to_user() Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3ff50b79 |
|
20-Apr-2007 |
Stephen Hemminger <shemminger@linux-foundation.org> |
[NET]: cleanup extra semicolons Spring cleaning time... There seems to be a lot of places in the network code that have extra bogus semicolons after conditionals. Most commonly is a bogus semicolon after: switch() { } Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
70331571 |
|
23-Mar-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Implement SCTP_MAX_BURST socket option. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bdf3092a |
|
23-Mar-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Honor flags when setting peer address parameters Parameters only take effect when a corresponding flag bit is set and a value is specified. This means we need to check the flags in addition to checking for non-zero value. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d49d91d7 |
|
23-Mar-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Implement SCTP_PARTIAL_DELIVERY_POINT option. This option induces partial delivery to run as soon as the specified amount of data has been accumulated on the association. However, we give preference to fully reassembled messages over PD messages. In any case, window and buffer is freed up. Signed-off-by: Vlad Yasevich <vladislav.yasevich@.hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b6e1331f |
|
20-Apr-2007 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Implement SCTP_FRAGMENT_INTERLEAVE socket option This option was introduced in draft-ietf-tsvwg-sctpsocket-13. It prevents head-of-line blocking in the case of one-to-many endpoint. Applications enabling this option really must enable SCTP_SNDRCV event so that they would know where the data belongs. Based on an earlier patch by Ivan Skytte Jørgensen. Additionally, this functionality now permits multiple associations on the same endpoint to enter Partial Delivery. Applications should be extra careful, when using this functionality, to track EOR indicators. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0304ff8a |
|
17-Apr-2007 |
Paolo Galtieri <pgaltieri@mvista.com> |
[SCTP]: Unmap v4mapped addresses during SCTP_BINDX_REM_ADDR operation. During the sctp_bindx() call to add additional addresses to the endpoint, any v4mapped addresses are converted and stored as regular v4 addresses. However, when trying to remove these addresses, the v4mapped addresses are not converted and the operation fails. This patch unmaps the addresses on during the remove operation as well. Signed-off-by: Paolo Galtieri <pgaltieri@mvista.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ea2bc483 |
|
17-Apr-2007 |
Tsutomu Fujii <t-fujii@nb.jp.nec.com> |
[SCTP]: Fix assertion (!atomic_read(&sk->sk_rmem_alloc)) failed message In current implementation, LKSCTP does receive buffer accounting for data in sctp_receive_queue and pd_lobby. However, LKSCTP don't do accounting for data in frag_list when data is fragmented. In addition, LKSCTP doesn't do accounting for data in reasm and lobby queue in structure sctp_ulpq. When there are date in these queue, assertion failed message is printed in inet_sock_destruct because sk_rmem_alloc of oldsk does not become 0 when socket is destroyed. Signed-off-by: Tsutomu Fujii <t-fujii@nb.jp.nec.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d808ad9a |
|
09-Feb-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] SCTP: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0f3fffd8 |
|
20-Dec-2006 |
Ivan Skytte Jorgensen <isj-sctp@i1.dk> |
[SCTP]: Fix typo adaption -> adaptation as per the latest API draft. Signed-off-by: Ivan Skytte Jorgensen <isj-sctp@i1.dk> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6ab792f5 |
|
13-Dec-2006 |
Ivan Skytte Jorgensen <isj-sctp@i1.dk> |
[SCTP]: Add support for SCTP_CONTEXT socket option. Signed-off-by: Ivan Skytte Jorgensen <isj-sctp@i1.dk> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
29c7cf96 |
|
13-Dec-2006 |
Sridhar Samudrala <sri@us.ibm.com> |
[SCTP]: Handle address add/delete events in a more efficient way. Currently in SCTP, we maintain a local address list by rebuilding the whole list from the device list whenever we get a address add/delete event. This patch fixes it by only adding/deleting the address for which we receive the event. Also removed the sctp_local_addr_lock() which is no longer needed as we now use list_for_each_safe() to traverse this list. This fixes the bugs in sctp_copy_laddrs_xxx() routines where we do copy_to_user() while holding this lock. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e18b890b |
|
06-Dec-2006 |
Christoph Lameter <clameter@sgi.com> |
[PATCH] slab: remove kmem_cache_t Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
54e6ecb2 |
|
06-Dec-2006 |
Christoph Lameter <clameter@sgi.com> |
[PATCH] slab: remove SLAB_ATOMIC SLAB_ATOMIC is an alias of GFP_ATOMIC Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
dce116ae |
|
20-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[SCTP]: Get rid of the last remnants of sin_port flipping. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6fbfa9f9 |
|
20-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[SCTP]: Annotate ->inaddr_any(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8cec6b80 |
|
20-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[SCTP]: We need to be careful when copying to sockaddr_storage. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b3f5b3b6 |
|
20-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[SCTP]: Trivial ->ipaddr_h -> ->ipaddr conversions. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5ae955cf |
|
20-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[SCTP]: sctp_make_asconf_update_ip() and sctp_find_unmatch_addr(). ... switched to taking and returning pointers to net-endian sctp_addr resp. Together, since the only user of sctp_find_unmatch_addr() just passes its value to sctp_make_asconf_update_ip(). sctp_make_asconf_update_ip() is actually endian-agnostic. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6244be4e |
|
20-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[SCTP]: Trivial parts of a_h -> a switch. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6c7be55c |
|
20-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[SCTP]: sctp_has_association() switched to net-endian. Ditto for its only caller (sctp_endpoint_is_peeled_off) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cd4ff034 |
|
20-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[SCTP]: sctp_endpoint_lookup_assoc() switched to net-endian. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5ab7b859 |
|
20-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[SCTP]: Switch sctp_add_bind_addr() to net-endian. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4bdf4b5f |
|
20-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[SCTP]: Switch sctp_assoc_add_peer() to net-endian. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c9a08505 |
|
20-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[SCTP]: Switch sctp_del_bind_addr() to net-endian. Callers adjusted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
be29681e |
|
20-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[SCTP]: Switch sctp_assoc_lookup_paddr() to net-endian. Callers updated. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7e1e4a2b |
|
20-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[SCTP]: Switch sctp_bind_addr_match() to net-endian. Callers adjusted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5f242a13 |
|
20-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[SCTP]: Switch ->cmp_addr() and sctp_cmp_addr_exact() to net-endian. instances of ->cmp_addr() are fine with switching both arguments to net-endian; callers other than in sctp_cmp_addr_exact() (both as ->cmp_addr(...) and direct calls of instances) adjusted; sctp_cmp_addr_exact() switched to net-endian itself and adjustment is done in its callers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
09ef7fec |
|
20-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[SCTP]: Beginning of conversion to net-endian for embedded sctp_addr. Part 1: rename sctp_chunk->source, sctp_sockaddr_entry->a, sctp_transport->ipaddr and sctp_transport->saddr (to ..._h) The next patch will reintroduce these fields and keep them as net-endian mirrors of the original (renamed) ones. Split in two patches to make sure that we hadn't forgotten any instanes. Later in the series we'll eliminate uses of host-endian variants (basically switching users to net-endian counterparts as we progress through that mess). Then host-endian ones will die. Other embedded host-endian sctp_addr will be easier to switch directly, so we leave them alone for now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
30330ee0 |
|
20-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[SCTP] bug: endianness problem in sctp_getsockopt_sctp_status() Again, invalid sockaddr passed to userland - host-endiand sin_port. Potential leak, again, but less dramatic than in previous case. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
04afd8b2 |
|
20-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[SCTP]: Beginning of sin_port fixes. That's going to be a long series. Introduced temporary helpers doing copy-and-convert for sctp_addr; they are used to kill flip-in-place in global data structures and will be used to gradually push host-endian uses of sctp_addr out of existence. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4f444308 |
|
30-Oct-2006 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Correctly set IP id for SCTP traffic Make SCTP 1-1 style and peeled-off associations behave like TCP when setting IP id. In both cases, we set the inet_sk(sk)->daddr and initialize inet_sk(sk)->id to a random value. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
23c435f7 |
|
16-Oct-2006 |
Ville Nuorvala <vnuorval@tcs.hut.fi> |
[SCTP]: Fix minor typo Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
331c4ee7 |
|
09-Oct-2006 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Fix receive buffer accounting. When doing receiver buffer accounting, we always used skb->truesize. This is problematic when processing bundled DATA chunks because for every DATA chunk that could be small part of one large skb, we would charge the size of the entire skb. The new approach is to store the size of the DATA chunk we are accounting for in the sctp_ulpevent structure and use that stored value for accounting. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
208edef6 |
|
29-Sep-2006 |
Sridhar Samudrala <sri@us.ibm.com> |
[SCTP]: Enable Nagle algorithm by default. This allows more aggressive bundling of chunks when sending small messages. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
16164366 |
|
18-Sep-2006 |
Adrian Bunk <bunk@stusta.de> |
[SCTP]: Cleanups This patch contains the following cleanups: - make the following needlessly global function static: - socket.c: sctp_apply_peer_addr_params() - add proper prototypes for the several global functions in include/net/sctp/sctp.h Note that this fixes wrong prototypes for the following functions: - sctp_snmp_proc_exit() - sctp_eps_proc_exit() - sctp_assocs_proc_exit() The latter was spotted by the GNU C compiler and reported by David Woodhouse. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3fd091e7 |
|
22-Aug-2006 |
Vladislav Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Remove multiple levels of msecs to jiffies conversions. The SCTP sysctl entries are displayed in milliseconds, but stored internally in jiffies. This results in multiple levels of msecs to jiffies conversion and as a result produces a truncation error. This patch makes things consistent in that we store and display defaults in milliseconds and only convert once for use by association. This patch also adds some sane min/max values so that we don't go off the deep end. Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8abfedd8 |
|
22-Aug-2006 |
Sridhar Samudrala <sri@us.ibm.com> |
[SCTP]: Use the flags value that is passed as an arg to sctp_accept. No need to do multiple dereferences - sk->sk_socket->file->f_flags Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eb5fa39f |
|
22-Aug-2006 |
Vladislav Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Fix IPv6 address flag setting when doing peel-off/accept. During accept/peeloff we try to copy the list of bound addresses from the original endpoint to the new one. However, we forgot to set the flag to say that IPv6 is allowed on the new endpoint. Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1b489e11 |
|
19-Aug-2006 |
Herbert Xu <herbert@gondor.apana.org.au> |
[SCTP]: Use HMAC template and hash interface This patch converts SCTP to use the new HMAC template and hash interface. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b9ac8672 |
|
28-Aug-2006 |
Sridhar Samudrala <sri@us.ibm.com> |
[SCTP]: Fix sctp_primitive_ABORT() call in sctp_close(). With the recent fix, the callers of sctp_primitive_ABORT() need to create an ABORT chunk and pass it as an argument rather than msghdr that was passed earlier. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c164a9ba |
|
22-Aug-2006 |
Sridhar Samudrala <sri@us.ibm.com> |
Fix sctp privilege elevation (CVE-2006-3745) sctp_make_abort_user() now takes the msg_len along with the msg so that we don't have to recalculate the bytes in iovec. It also uses memcpy_fromiovec() so that we don't go beyond the length allocated. It is good to have this fix even if verify_iovec() is fixed to return error on overflow. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
#
dc022a98 |
|
21-Jul-2006 |
Sridhar Samudrala <sri@us.ibm.com> |
[SCTP]: ADDIP: Don't use an address as source until it is ASCONF-ACKed This implements Rules D1 and D4 of Sec 4.3 in the ADDIP draft. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
37fa6878 |
|
21-Jul-2006 |
Sridhar Samudrala <sri@us.ibm.com> |
[SCTP]: Check for NULL arg to sctp_bucket_destroy(). Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6ab3d562 |
|
30-Jun-2006 |
Jörn Engel <joern@wohnheim.fh-wedel.de> |
Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
5636bef7 |
|
17-Jun-2006 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Reject sctp packets with broadcast addresses. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
402d68c4 |
|
17-Jun-2006 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Limit association max_retrans setting in setsockopt. When using ASSOCINFO socket option, we need to limit the number of maximum association retransmissions to be no greater than the sum of all the path retransmissions. This is specified in Section 7.1.2 of the SCTP socket API draft. However, we only do this if the association has multiple paths. If there is only one path, the protocol stack will use the assoc_max_retrans setting when trying to retransmit packets. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b89498a1 |
|
19-May-2006 |
Vladislav Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Allow linger to abort 1-N style sockets. Enable SO_LINGER functionality for 1-N style sockets. The socket API draft will be clarfied to allow for this functionality. The linger settings will apply to all associations on a given socket. Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
|
#
61c9fed4 |
|
19-May-2006 |
Vladislav Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: A better solution to fix the race between sctp_peeloff() and sctp_rcv(). The goal is to hold the ref on the association/endpoint throughout the state-machine process. We accomplish like this: /* ref on the assoc/ep is taken during lookup */ if owned_by_user(sk) sctp_add_backlog(skb, sk); else inqueue_push(skb, sk); /* drop the ref on the assoc/ep */ However, in sctp_add_backlog() we take the ref on assoc/ep and hold it while the skb is on the backlog queue. This allows us to get rid of the sock_hold/sock_put in the lookup routines. Now sctp_backlog_rcv() needs to account for potential association move. In the unlikely event that association moved, we need to retest if the new socket is locked by user. If we don't this, we may have two packets racing up the stack toward the same socket and we can't deal with it. If the new socket is still locked, we'll just add the skb to its backlog continuing to hold the ref on the association. This get's rid of the need to move packets from one backlog to another and it also safe in case new packets arrive on the same backlog queue. The last step, is to lock the new socket when we are moving the association to it. This is needed in case any new packets arrive on the association when it moved. We want these to go to the backlog since we would like to avoid the race between this new packet and a packet that may be sitting on the backlog queue of the old socket toward the same association. Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
|
#
8de8c873 |
|
19-May-2006 |
Sridhar Samudrala <sri@us.ibm.com> |
[SCTP]: Set sk_err so that poll wakes up after a non-blocking connect failure. Also fix some other cases where sk_err is not set for 1-1 style sockets. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
|
#
f348d70a |
|
25-Mar-2006 |
Davide Libenzi <davidel@xmailserver.org> |
[PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notifications Implement the half-closed devices notifiation, by adding a new POLLRDHUP (and its alias EPOLLRDHUP) bit to the existing poll/select sets. Since the existing POLLHUP handling, that does not report correctly half-closed devices, was feared to be changed, this implementation leaves the current POLLHUP reporting unchanged and simply add a new bit that is set in the few places where it makes sense. The same thing was discussed and conceptually agreed quite some time ago: http://lkml.org/lkml/2003/7/12/116 Since this new event bit is added to the existing Linux poll infrastruture, even the existing poll/select system calls will be able to use it. As far as the existing POLLHUP handling, the patch leaves it as is. The pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing archs and sets the bit in the six relevant files. The other attached diff is the simple change required to sys/epoll.h to add the EPOLLRDHUP definition. There is "a stupid program" to test POLLRDHUP delivery here: http://www.xmailserver.org/pollrdhup-test.c It tests poll(2), but since the delivery is same epoll(2) will work equally. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
81845c21 |
|
30-Jan-2006 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: correct the number of INIT retransmissions We currently count the initial INIT/COOKIE_ECHO chunk toward the retransmit count and thus sends a total of sctp_max_retrans_init chunks. The correct behavior is to retransmit the chunk sctp_max_retrans_init in addition to sending the original. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c4d2444e |
|
17-Jan-2006 |
Sridhar Samudrala <sri@us.ibm.com> |
[SCTP]: Fix couple of races between sctp_peeloff() and sctp_rcv(). Validate and update the sk in sctp_rcv() to avoid the race where an assoc/ep could move to a different socket after we get the sk, but before the skb is added to the backlog. Also migrate the skb's in backlog queue to new sk when doing a peeloff. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
|
#
8116ffad |
|
17-Jan-2006 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Fix bad sysctl formatting of SCTP timeout values on 64-bit m/cs. Change all the structure members that hold jiffies to be of type unsigned long. This also corrects bad sysctl formating on 64 bit architectures. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
|
#
4fc268d2 |
|
11-Jan-2006 |
Randy Dunlap <rdunlap@infradead.org> |
[PATCH] capable/capability.h (net/) net: Use <linux/capability.h> where capable() is used. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
8b3a7005 |
|
11-Jan-2006 |
Kris Katterjohn <kjak@users.sourceforge.net> |
[NET]: Remove more unneeded typecasts on *malloc() This removes more unneeded casts on the return value for kmalloc(), sock_kmalloc(), and vmalloc(). Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7708610b |
|
22-Dec-2005 |
Frank Filz <ffilz@us.ibm.com> |
[SCTP]: Add support for SCTP_DELAYED_ACK_TIME socket option. Signed-off-by: Frank Filz <ffilz@us.ibm.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
52ccb8e9 |
|
22-Dec-2005 |
Frank Filz <ffilz@us.ibm.com> |
[SCTP]: Update SCTP_PEER_ADDR_PARAMS socket option to the latest api draft. This patch adds support to set/get heartbeat interval, maximum number of retransmissions, pathmtu, sackdelay time for a particular transport/ association/socket as per the latest SCTP sockets api draft11. Signed-off-by: Frank Filz <ffilz@us.ibm.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9bffc4ac |
|
19-Dec-2005 |
Neil Horman <nhorman@tuxdriver.com> |
[SCTP]: Fix sctp to not return erroneous POLLOUT events. Make sctp_writeable() use sk_wmem_alloc rather than sk_wmem_queued to determine the sndbuf space available. It also removes all the modifications to sk_wmem_queued as it is not currently used in SCTP. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d3a880e1 |
|
15-Dec-2005 |
Al Viro <viro@ftp.linux.org.uk> |
[PATCH] Address of void __user * is void __user * *, not void * __user * Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
6736dc35 |
|
02-Dec-2005 |
Neil Horman <nhorman@tuxdriver.com> |
[SCTP]: Return socket errors only if the receive queue is empty. This patch fixes an issue where it is possible to get valid data after a ENOTCONN error. It returns socket errors only after data queued on socket receive queue is consumed. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
049b3ff5 |
|
11-Nov-2005 |
Neil Horman <nhorman@tuxdriver.com> |
[SCTP]: Include ulpevents in socket receive buffer accounting. Also introduces a sysctl option to configure the receive buffer accounting policy to be either at socket or association level. Default is all the associations on the same socket share the receive buffer. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1e7d3d90 |
|
11-Nov-2005 |
Vladislav Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Remove timeouts[] array from sctp_endpoint. The socket level timeout values are maintained in sctp_sock and association level timeouts are in sctp_association. So there is no need for ep->timeouts. Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
64a0c1c8 |
|
28-Oct-2005 |
Ivan Skytte Jorgensen <isj-sctp@i1.dk> |
[SCTP] Do not allow unprivileged programs initiating new associations on privileged ports. Signed-off-by: Ivan Skytte Jorgensen <isj-sctp@i1.dk> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
|
#
96a33998 |
|
28-Oct-2005 |
Ivan Skytte Jorgensen <isj-sctp@i1.dk> |
[SCTP] Allow SCTP_MAXSEG to revert to default frag point with a '0' value. Signed-off-by: Ivan Skytte Jorgensen <isj-sctp@i1.dk> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
|
#
a1ab3582 |
|
28-Oct-2005 |
Ivan Skytte Jorgensen <isj-sctp@i1.dk> |
[SCTP] Fix SCTP_SETADAPTION sockopt to use the correct structure. Signed-off-by: Ivan Skytte Jorgensen <isj-sctp@i1.dk> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
|
#
eaa5c54d |
|
28-Oct-2005 |
Ivan Skytte Jorgensen <isj-sctp@i1.dk> |
[SCTP] Rename SCTP specific control message flags. Rename SCTP specific control message flags to use SCTP_ prefix rather than MSG_ prefix as per the latest sctp sockets API draft. Signed-off-by: Ivan Skytte Jorgensen <isj-sctp@i1.dk> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
|
#
5fe467ee |
|
06-Oct-2005 |
Ivan Skytte J�rgensen <isj-sctp@i1.dk> |
[SCTP] Fix sctp_get{pl}addrs() API to work with 32-bit apps on 64-bit kernels. The old socket options are marked with a _OLD suffix so that the existing 32-bit apps on 32-bit kernels do not break. Signed-off-by: Ivan Skytte J�rgensen <isj-sctp@i1.dk> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
573dbd95 |
|
01-Sep-2005 |
Jesper Juhl <jesper.juhl@gmail.com> |
[CRYPTO]: crypto_free_tfm() callers no longer need to check for NULL Since the patch to add a NULL short-circuit to crypto_free_tfm() went in, there's no longer any need for callers of that function to check for NULL. This patch removes the redundant NULL checks and also a few similar checks for NULL before calls to kfree() that I ran into while doing the crypto_free_tfm bits. I've succesfuly compile tested this patch, and a kernel with the patch applied boots and runs just fine. When I posted the patch to LKML (and other lists/people on Cc) it drew the following comments : J. Bruce Fields commented "I've no problem with the auth_gss or nfsv4 bits.--b." Sridhar Samudrala said "sctp change looks fine." Herbert Xu signed off on the patch. So, I guess this is ready to be dropped into -mm and eventually mainline. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8728b834 |
|
09-Aug-2005 |
David S. Miller <davem@davemloft.net> |
[NET]: Kill skb->list Remove the "list" member of struct sk_buff, as it is entirely redundant. All SKB list removal callers know which list the SKB is on, so storing this in sk_buff does nothing other than taking up some space. Two tricky bits were SCTP, which I took care of, and two ATM drivers which Francois Romieu <romieu@fr.zoreil.com> fixed up. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
|
#
79af02c2 |
|
08-Jul-2005 |
David S. Miller <davem@davemloft.net> |
[SCTP]: Use struct list_head for chunk lists, not sk_buff_head. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3f7a87d2 |
|
20-Jun-2005 |
Frank Filz <ffilzlnx@us.ibm.com> |
[SCTP] sctp_connectx() API support Implements sctp_connectx() as defined in the SCTP sockets API draft by tunneling the request through a setsockopt(). Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1e061ab2 |
|
18-Jun-2005 |
Herbert Xu <herbert@gondor.apana.org.au> |
[SCTP]: Replace spin_lock_irqsave with spin_lock_bh This patch replaces the spin_lock_irqsave call on the receive queue lock in SCTP with spin_lock_bh. Despite the proliferation of spin_lock_irqsave calls in this stack, it is only entered from the IPv4/IPv6 stack and user space. That is, it is never entered from hardirq context. The call in question is only called from recvmsg which means that IRQs aren't disabled. Therefore it is safe to replace it with spin_lock_bh. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4243cac1 |
|
13-Jun-2005 |
Vladislav Yasevich <vladislav.yasevich@hp.com> |
[SCTP]: Fix bug in restart of peeled-off associations. Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4eb701df |
|
28-Apr-2005 |
Neil Horman <nhorman@redhat.com> |
[SCTP] Fix SCTP sendbuffer accouting. - Include chunk and skb sizes in sendbuffer accounting. - 2 policies are supported. 0: per socket accouting, 1: per association accounting DaveM: I've made the default per-socket. Signed-off-by: Neil Horman <nhorman@redhat.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
17337216 |
|
28-Apr-2005 |
Vladislav Yasevich <vladislav.yasevich@hp.com> |
[SCTP] Fix SCTP_ASSOCINFO getsockopt for 1-1 style Signed-off-by: Vladislav Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1da177e4 |
|
16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|