History log of /linux-master/include/linux/dccp.h
Revision Date Author Comments
# ae6084b7 17-Mar-2023 Eric Dumazet <edumazet@google.com>

dccp: preserve const qualifier in dccp_sk()

We can change dccp_sk() to propagate its argument const qualifier,
thanks to container_of_const().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8402a31d 28-Feb-2020 Gustavo A. R. Silva <gustavo@embeddedor.com>

net: dccp: Replace zero-length array with flexible-array member

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b2441318 01-Nov-2017 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

License cleanup: add SPDX GPL-2.0 license identifier to files with no license

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.

For non */uapi/* files that summary was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139

and resulted in the first patch in this series.

If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:

SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930

and resulted in the second patch in this series.

- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:

SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

and that resulted in the third patch in this series.

- when the two scanners agreed on the detected license(s), that became
the concluded license(s).

- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.

- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).

- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.

- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct

This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 62f8f4d9 05-Mar-2017 Eric Dumazet <edumazet@google.com>

dccp: fix use-after-free in dccp_feat_activate_values

Dmitry reported crashes in DCCP stack [1]

Problem here is that when I got rid of listener spinlock, I missed the
fact that DCCP stores a complex state in struct dccp_request_sock,
while TCP does not.

Since multiple cpus could access it at the same time, we need to add
protection.

[1]
BUG: KASAN: use-after-free in dccp_feat_activate_values+0x967/0xab0
net/dccp/feat.c:1541 at addr ffff88003713be68
Read of size 8 by task syz-executor2/8457
CPU: 2 PID: 8457 Comm: syz-executor2 Not tainted 4.10.0-rc7+ #127
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x292/0x398 lib/dump_stack.c:51
kasan_object_err+0x1c/0x70 mm/kasan/report.c:162
print_address_description mm/kasan/report.c:200 [inline]
kasan_report_error mm/kasan/report.c:289 [inline]
kasan_report.part.1+0x20e/0x4e0 mm/kasan/report.c:311
kasan_report mm/kasan/report.c:332 [inline]
__asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
dccp_feat_activate_values+0x967/0xab0 net/dccp/feat.c:1541
dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121
dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457
dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186
dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711
ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
NF_HOOK include/linux/netfilter.h:257 [inline]
ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
dst_input include/net/dst.h:507 [inline]
ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
NF_HOOK include/linux/netfilter.h:257 [inline]
ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
__netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
process_backlog+0xe5/0x6c0 net/core/dev.c:4839
napi_poll net/core/dev.c:5202 [inline]
net_rx_action+0xe70/0x1900 net/core/dev.c:5267
__do_softirq+0x2fb/0xb7d kernel/softirq.c:284
do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
</IRQ>
do_softirq.part.17+0x1e8/0x230 kernel/softirq.c:328
do_softirq kernel/softirq.c:176 [inline]
__local_bh_enable_ip+0x1f2/0x200 kernel/softirq.c:181
local_bh_enable include/linux/bottom_half.h:31 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:971 [inline]
ip6_finish_output2+0xbb0/0x23d0 net/ipv6/ip6_output.c:123
ip6_finish_output+0x302/0x960 net/ipv6/ip6_output.c:148
NF_HOOK_COND include/linux/netfilter.h:246 [inline]
ip6_output+0x1cb/0x8d0 net/ipv6/ip6_output.c:162
ip6_xmit+0xcdf/0x20d0 include/net/dst.h:501
inet6_csk_xmit+0x320/0x5f0 net/ipv6/inet6_connection_sock.c:179
dccp_transmit_skb+0xb09/0x1120 net/dccp/output.c:141
dccp_xmit_packet+0x215/0x760 net/dccp/output.c:280
dccp_write_xmit+0x168/0x1d0 net/dccp/output.c:362
dccp_sendmsg+0x79c/0xb10 net/dccp/proto.c:796
inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
sock_sendmsg_nosec net/socket.c:635 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:645
SYSC_sendto+0x660/0x810 net/socket.c:1687
SyS_sendto+0x40/0x50 net/socket.c:1655
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x4458b9
RSP: 002b:00007f8ceb77bb58 EFLAGS: 00000282 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 00000000004458b9
RDX: 0000000000000023 RSI: 0000000020e60000 RDI: 0000000000000017
RBP: 00000000006e1b90 R08: 00000000200f9fe1 R09: 0000000000000020
R10: 0000000000008010 R11: 0000000000000282 R12: 00000000007080a8
R13: 0000000000000000 R14: 00007f8ceb77c9c0 R15: 00007f8ceb77c700
Object at ffff88003713be50, in cache kmalloc-64 size: 64
Allocated:
PID = 8446
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
save_stack+0x43/0xd0 mm/kasan/kasan.c:502
set_track mm/kasan/kasan.c:514 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2738
kmalloc include/linux/slab.h:490 [inline]
dccp_feat_entry_new+0x214/0x410 net/dccp/feat.c:467
dccp_feat_push_change+0x38/0x220 net/dccp/feat.c:487
__feat_register_sp+0x223/0x2f0 net/dccp/feat.c:741
dccp_feat_propagate_ccid+0x22b/0x2b0 net/dccp/feat.c:949
dccp_feat_server_ccid_dependencies+0x1b3/0x250 net/dccp/feat.c:1012
dccp_make_response+0x1f1/0xc90 net/dccp/output.c:423
dccp_v6_send_response+0x4ec/0xc20 net/dccp/ipv6.c:217
dccp_v6_conn_request+0xaba/0x11b0 net/dccp/ipv6.c:377
dccp_rcv_state_process+0x51e/0x1650 net/dccp/input.c:606
dccp_v6_do_rcv+0x213/0x350 net/dccp/ipv6.c:632
sk_backlog_rcv include/net/sock.h:893 [inline]
__sk_receive_skb+0x36f/0xcc0 net/core/sock.c:479
dccp_v6_rcv+0xba5/0x1d00 net/dccp/ipv6.c:742
ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
NF_HOOK include/linux/netfilter.h:257 [inline]
ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
dst_input include/net/dst.h:507 [inline]
ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
NF_HOOK include/linux/netfilter.h:257 [inline]
ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
__netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
process_backlog+0xe5/0x6c0 net/core/dev.c:4839
napi_poll net/core/dev.c:5202 [inline]
net_rx_action+0xe70/0x1900 net/core/dev.c:5267
__do_softirq+0x2fb/0xb7d kernel/softirq.c:284
Freed:
PID = 15
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
save_stack+0x43/0xd0 mm/kasan/kasan.c:502
set_track mm/kasan/kasan.c:514 [inline]
kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
slab_free_hook mm/slub.c:1355 [inline]
slab_free_freelist_hook mm/slub.c:1377 [inline]
slab_free mm/slub.c:2954 [inline]
kfree+0xe8/0x2b0 mm/slub.c:3874
dccp_feat_entry_destructor.part.4+0x48/0x60 net/dccp/feat.c:418
dccp_feat_entry_destructor net/dccp/feat.c:416 [inline]
dccp_feat_list_pop net/dccp/feat.c:541 [inline]
dccp_feat_activate_values+0x57f/0xab0 net/dccp/feat.c:1543
dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121
dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457
dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186
dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711
ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
NF_HOOK include/linux/netfilter.h:257 [inline]
ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
dst_input include/net/dst.h:507 [inline]
ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
NF_HOOK include/linux/netfilter.h:257 [inline]
ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
__netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
__netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
process_backlog+0xe5/0x6c0 net/core/dev.c:4839
napi_poll net/core/dev.c:5202 [inline]
net_rx_action+0xe70/0x1900 net/core/dev.c:5267
__do_softirq+0x2fb/0xb7d kernel/softirq.c:284
Memory state around the buggy address:
ffff88003713bd00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88003713bd80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88003713be00: fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb
^

Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0c6119d9 08-Oct-2015 Yaowei Bai <bywxiaobai@163.com>

net/dccp: dccp_list_has_service can be boolean

This patch makes dccp_list_has_service return bool due to this
particular function only using either one or zero as its return
value.

No functional change.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 85645bab 22-Mar-2015 Eric Dumazet <edumazet@google.com>

ipv4: dccp: handle ICMP messages on DCCP_NEW_SYN_RECV request sockets

dccp_v4_err() can restrict lookups to ehash table, and not to listeners.

Note this patch creates the infrastructure, but this means that ICMP
messages for request sockets are ignored until complete conversion.

New dccp_req_err() helper is exported so that we can use it in IPv6
in following patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 42cb80a2 22-Mar-2015 Eric Dumazet <edumazet@google.com>

inet: remove sk_listener parameter from syn_ack_timeout()

It is not needed, and req->sk_listener points to the listener anyway.
request_sock argument can be const.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 607ca46e 13-Oct-2012 David Howells <dhowells@redhat.com>

UAPI: (Scripted) Disintegrate include/linux

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>


# c72e1183 12-Apr-2012 Eric Dumazet <edumazet@google.com>

inet: makes syn_ack_timeout mandatory

There are two struct request_sock_ops providers, tcp and dccp.

inet_csk_reqsk_queue_prune() can avoid testing syn_ack_timeout being
NULL if we make it non NULL like syn_ack_timeout

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: dccp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>


# f541fb7e 26-Feb-2012 Samuel Jero <sj323707@ohio.edu>

dccp: fix bug in sequence number validation during connection setup

This fixes a bug in the sequence number validation during the initial handshake.

The code did not treat the initial sequence numbers ISS and ISR as read-only and
did not keep state for GSR and GSS as required by the specification. This causes
problems with retransmissions during the initial handshake, causing the
budding connection to be reset.

This patch now treats ISS/ISR as read-only and tracks GSS/GSR as required.

Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# a6b7a407 06-Jun-2011 Alexey Dobriyan <adobriyan@gmail.com>

net: remove interrupt.h inclusion from netdevice.h

* remove interrupt.g inclusion from netdevice.h -- not needed
* fixup fallout, add interrupt.h and hardirq.h back where needed.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d18046b3 19-Jan-2011 Shan Wei <shanwei@cn.fujitsu.com>

dccp: clean up unused DCCP_STATE_MASK definition

Remove unused DCCP_STATE_MASK macro.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 871a2c16 04-Dec-2010 Tomasz Grobelny <tomasz@grobelny.oswiecenia.net>

dccp: Policy-based packet dequeueing infrastructure

This patch adds a generic infrastructure for policy-based dequeueing of
TX packets and provides two policies:
* a simple FIFO policy (which is the default) and
* a priority based policy (set via socket options).
Both policies honour the tx_qlen sysctl for the maximum size of the write
queue (can be overridden via socket options).

The priority policy uses skb->priority internally to assign an u32 priority
identifier, using the same ranking as SO_PRIORITY. The skb->priority field
is set to 0 when the packet leaves DCCP. The priority is supplied as ancillary
data using cmsg(3), the patch also provides the requisite parsing routines.

Signed-off-by: Tomasz Grobelny <tomasz@grobelny.oswiecenia.net>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# d83447f0 14-Nov-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-2: Schedule Sync as out-of-band mechanism

The problem with Ack Vectors is that
i) their length is variable and can in principle grow quite large,
ii) it is hard to predict exactly how large they will be.

Due to the second point it seems not a good idea to reduce the MPS; in
particular when on average there is enough room for the Ack Vector and an
increase in length is momentarily due to some burst loss, after which the
Ack Vector returns to its normal/average length.

The solution taken by this patch is to subtract a minimum-expected Ack Vector
length from the MPS, and to defer any larger Ack Vectors onto a separate
Sync - but only if indeed there is no space left on the skb.

This patch provides the infrastructure to schedule Sync-packets for transporting
(urgent) out-of-band data. Its signalling is quicker than scheduling an Ack, since
it does not need to wait for new application data.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# dc841e30 27-Oct-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Extend CCID packet dequeueing interface

This extends the packet dequeuing interface of dccp_write_xmit() to allow
1. CCIDs to take care of timing when the next packet may be sent;
2. delayed sending (as before, with an inter-packet gap up to 65.535 seconds).

The main purpose is to take CCID-2 out of its polling mode (when it is network-
limited, it tries every millisecond to send, without interruption).

The mode of operation for (2) is as follows:
* new packet is enqueued via dccp_sendmsg() => dccp_write_xmit(),
* ccid_hc_tx_send_packet() detects that it may not send (e.g. window full),
* it signals this condition via `CCID_PACKET_WILL_DEQUEUE_LATER',
* dccp_write_xmit() returns without further action;
* after some time the wait-condition for CCID becomes true,
* that CCID schedules the tasklet,
* tasklet function calls ccid_hc_tx_send_packet() via dccp_write_xmit(),
* since the wait-condition is now true, ccid_hc_tx_packet() returns "send now",
* packet is sent, and possibly more (since dccp_write_xmit() loops).

Code reuse: the taskled function calls dccp_write_xmit(), the timer function
reduces to a wrapper around the same code.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a18213d1 19-Sep-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Replace magic CCID-specific numbers by symbolic constants

The constants DCCPO_{MIN,MAX}_CCID_SPECIFIC are nowhere used in the code, but
instead for the CCID-specific options numbers are used.

This patch unifies the use of CCID-specific option numbers, by adding symbolic
names reflecting the definitions in RFC 4340, 10.3.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 883ca833 16-Jan-2009 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Initialisation and type-checking of feature sysctls

This patch takes care of initialising and type-checking sysctls
related to feature negotiation. Type checking is important since some
of the sysctls now directly impact the feature-negotiation process.

The sysctls are initialised with the known default values for each
feature. For the type-checking the value constraints from RFC 4340
are used:

* Sequence Window uses the specified Wmin=32, the maximum is ulong (4 bytes),
tested and confirmed that it works up to 4294967295 - for Gbps speed;
* Ack Ratio is between 0 .. 0xffff (2-byte unsigned integer);
* CCIDs are between 0 .. 255;
* request_retries, retries1, retries2 also between 0..255 for good measure;
* tx_qlen is checked to be non-negative;
* sync_ratelimit remains as before.

Notes:
------
1. Die s@sysctl_dccp_feat@sysctl_dccp@g since the sysctls are now in feat.c.
2. As pointed out by Arnaldo, the pattern of type-checking repeats itself in
other places, sometimes with exactly the same kind of definitions (e.g.
"static int zero;"). It may be a good idea (kernel janitors?) to consolidate
type checking. For the sake of keeping the changeset small and in order not
to affect other subsystems, I have not strived to generalise here.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 792b4878 16-Jan-2009 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Implement both feature-local and feature-remote Sequence Window feature

This adds full support for local/remote Sequence Window feature, from which the
* sequence-number-validity (W) and
* acknowledgment-number-validity (W') windows
derive as specified in RFC 4340, 7.5.3.

Specifically, the following is contained in this patch:
* integrated new socket fields into dccp_sk;
* updated the update_gsr/gss routines with regard to these fields;
* updated handler code: the Sequence Window feature is located at the TX side,
so the local feature is meant if the handler-rx flag is false;
* the initialisation of `rcv_wnd' in reqsk is removed, since
- rcv_wnd is not used by the code anywhere;
- sequence number checks are not done in the LISTEN state (cf. 7.5.3);
- dccp_check_req checks the Ack number validity more rigorously;
* the `struct dccp_minisock' became empty and is now removed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f90f92ee 16-Jan-2009 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Initialisation framework for feature negotiation

This initialises feature negotiation from two tables, which are in
turn are initialised from sysctls.

As a novel feature, specifics of the implementation (e.g. that short
seqnos and ECN are not yet available) are advertised for robustness.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6fdd34d4 08-Dec-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-2: Phase out the use of boolean Ack Vector sysctl

This removes the use of the sysctl and the minisock variable for the Send Ack
Vector feature, as it now is handled fully dynamically via feature negotiation
(i.e. when CCID-2 is enabled, Ack Vectors are automatically enabled as per
RFC 4341, 4.).

Using a sysctl in parallel to this implementation would open the door to
crashes, since much of the code relies on tests of the boolean minisock /
sysctl variable. Thus, this patch replaces all tests of type

if (dccp_msk(sk)->dccpms_send_ack_vector)
/* ... */
with
if (dp->dccps_hc_rx_ackvec != NULL)
/* ... */

The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
negotiation concluded that Ack Vectors are to be used on the half-connection.
Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
so that the test is a valid one.

The activation handler for Ack Vectors is called as soon as the feature
negotiation has concluded at the
* server when the Ack marking the transition RESPOND => OPEN arrives;
* client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.

Adding the sequence number of the Response packet to the Ack Vector has been
removed, since
(a) connection establishment implies that the Response has been received;
(b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
this entry will always be ignored;
(c) it can not be used for anything useful - to detect loss for instance, only
packets received after the loss can serve as pseudo-dupacks.

There was a FIXME to change the error code when dccp_ackvec_add() fails.
I removed this after finding out that:
* the check whether ackno < ISN is already made earlier,
* this Response is likely the 1st packet with an Ackno that the client gets,
* so when dccp_ackvec_add() fails, the reason is likely not a packet error.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4098dce5 08-Dec-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Remove manual influence on NDP Count feature

Updating the NDP count feature is handled automatically now:
* for CCID-2 it is disabled, since the code does not use NDP counts;
* for CCID-3 it is enabled, as NDP counts are used to determine loss lengths.

Allowing the user to change NDP values leads to unpredictable and failing
behaviour, since it is then possible to disable NDP counts even when they
are needed (e.g. in CCID-3).

This means that only those user settings are sensible that agree with the
values for Send NDP Count implied by the choice of CCID. But those settings
are already activated by the feature negotiation (CCID dependency tracking),
hence this form of support is redundant.

At startup the initialisation of the NDP count feature uses the default
value of 0, which is done implicitly by the zeroing-out of the socket when
it is allocated. If the choice of CCID or feature negotiation enables NDP
count, this will then be updated via the NDP activation handler.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0049bab5 08-Dec-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Remove obsolete parts of the old CCID interface

The TX/RX CCIDs of the minisock are now redundant: similar to the Ack Vector
case, their value equals initially that of the sysctl, but at the end of
feature negotiation may be something different.

The old interface removed by this patch thus has been replaced by the newer
interface to dynamically query the currently loaded CCIDs.

Also removed are the constructors for the TX CCID and the RX CCID, since the
switch "rx <-> non-rx" is done by the handler in minisocks.c (and the handler
is the only place in the code where CCIDs are loaded).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b20a9c24 23-Nov-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Set per-connection CCIDs via socket options

With this patch, TX/RX CCIDs can now be changed on a per-connection
basis, which overrides the defaults set by the global sysctl variables
for TX/RX CCIDs.

To make full use of this facility, the remaining patches of this patch
set are needed, which track dependencies and activate negotiated
feature values.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dd9c0e36 16-Nov-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Deprecate Ack Ratio sysctl

This patch deprecates the Ack Ratio sysctl, since
* Ack Ratio is entirely ignored by CCID-3 and CCID-4,
* Ack Ratio currently doesn't work in CCID-2 (i.e. is always set to 1);
* even if it would work in CCID-2, there is no point for a user to change it:
- Ack Ratio is constrained by cwnd (RFC 4341, 6.1.2),
- if Ack Ratio > cwnd, the system resorts to spurious RTO timeouts
(since waiting for Acks which will never arrive in this window),
- cwnd is not a user-configurable value.

The only reasonable place for Ack Ratio is to print it for debugging. It is
planned to do this later on, as part of e.g. dccp_probe.

With this patch Ack Ratio is now under full control of feature negotiation:
* Ack Ratio is resolved as a dependency of the selected CCID;
* if the chosen CCID supports it (i.e. CCID == CCID-2), Ack Ratio is set to
the default of 2, following RFC 4340, 11.3 - "New connections start with Ack
Ratio 2 for both endpoints";
* what happens then is part of another patch set, since it concerns the
dynamic update of Ack Ratio while the connection is in full flight.

Thanks to Tomasz Grobelny for discussion leading up to this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 29450559 16-Nov-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Feature negotiation for minimum-checksum-coverage

This provides feature negotiation for server minimum checksum coverage
which so far has been missing.

Since sender/receiver coverage values range only from 0...15, their
type has also been reduced in size from u16 to u4.

Feature-negotiation options are now generated for both sender and receiver
coverage, i.e. when the peer has `forgotten' to enable partial coverage
then feature negotiation will automatically enable (negotiate) the partial
coverage value for this connection.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 49aebc66 16-Nov-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Deprecate old setsockopt framework

The previous setsockopt interface, which passed socket options via struct
dccp_so_feat, is complicated/difficult to use. Continuing to support it leads to
ugly code since the old approach did not distinguish between NN and SP values.

This patch removes the old setsockopt interface and replaces it with two new
functions to register NN/SP values for feature negotiation.
These are essentially wrappers around the internal __feat_register functions,
with checking added to avoid

* wrong usage (type);
* changing values while the connection is in progress.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d90ebcbf 12-Nov-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Query supported CCIDs

This provides a data structure to record which CCIDs are locally supported
and three accessor functions:
- a test function for internal use which is used to validate CCID requests
made by the user;
- a copy function so that the list can be used for feature-negotiation;
- documented getsockopt() support so that the user can query capabilities.

The data structure is a table which is filled in at compile-time with the
list of available CCIDs (which in turn depends on the Kconfig choices).

Using the copy function for cloning the list of supported CCIDs is useful for
feature negotiation, since the negotiation is now with the full list of available
CCIDs (e.g. {2, 3}) instead of the default value {2}. This means negotiation
will not fail if the peer requests to use CCID3 instead of CCID2.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ac75773c 05-Nov-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Per-socket initialisation of feature negotiation

This provides feature-negotiation initialisation for both DCCP sockets
and DCCP request_sockets, to support feature negotiation during
connection setup.

It also resolves a FIXME regarding the congestion control
initialisation.

Thanks to Wei Yongjun for help with the IPv6 side of this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7d43d1a0 05-Nov-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Implement lookup table for feature-negotiation information

A lookup table for feature-negotiation information, extracted from RFC
4340/42, is provided by this patch. All currently known features can
be found in this table, along with their feature location, their
default value, and type.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 410e27a4 09-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

This reverts "Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/dccp_exp"
as it accentally contained the wrong set of patches. These will be
submitted separately.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# d6da3511 03-Sep-2008 Tomasz Grobelny <tomasz@grobelny.oswiecenia.net>

dccp: Policy-based packet dequeueing infrastructure

This patch adds a generic infrastructure for policy-based dequeueing of
TX packets and provides two policies:
* a simple FIFO policy (which is the default) and
* a priority based policy (set via socket options).
Both policies honour the tx_qlen sysctl for the maximum size of the write
queue (can be overridden via socket options).

The priority policy uses skb->priority internally to assign an u32 priority
identifier, using the same ranking as SO_PRIORITY. The skb->priority field
is set to 0 when the packet leaves DCCP. The priority is supplied as ancillary
data using cmsg(3), the patch also provides the requisite parsing routines.

Signed-off-by: Tomasz Grobelny <tomasz@grobelny.oswiecenia.net>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# e7937772 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Extend CCID packet dequeueing interface

This extends the packet dequeuing interface of dccp_write_xmit() to allow
1. CCIDs to take care of timing when the next packet may be sent;
2. delayed sending (as before, with an inter-packet gap up to 65.535 seconds).

The main purpose is to take CCID2 out of its polling mode (when it is network-
limited, it tries every millisecond to send, without interruption).
The interface can also be used to support other CCIDs.

The mode of operation for (2) is as follows:
* new packet is enqueued via dccp_sendmsg() => dccp_write_xmit(),
* ccid_hc_tx_send_packet() detects that it may not send (e.g. window full),
* it signals this condition via `CCID_PACKET_WILL_DEQUEUE_LATER',
* dccp_write_xmit() returns without further action;
* after some time the wait-condition for CCID becomes true,
* that CCID schedules the tasklet,
* tasklet function calls ccid_hc_tx_send_packet() via dccp_write_xmit(),
* since the wait-condition is now true, ccid_hc_tx_packet() returns "send now",
* packet is sent, and possibly more (since dccp_write_xmit() loops).

Code reuse: the taskled function calls dccp_write_xmit(), the timer function
reduces to a wrapper around the same code.

If the tasklet finds that the socket is locked, it re-schedules the tasklet
function (not the tasklet) after one jiffy.

Changed DCCP_BUG to dccp_pr_debug when transmit_skb returns an error (e.g. when a
local qdisc is used, NET_XMIT_DROP=1 can be returned for many packets).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# c2f42077 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-2: Schedule Sync as out-of-band mechanism

The problem with Ack Vectors is that

i) their length is variable and can in principle grow quite large,
ii) it is hard to predict exactly how large they will be.

Due to the second point it seems not a good idea to reduce the MPS; in
particular when on average there is enough room for the Ack Vector and an
increase in length is momentarily due to some burst loss, after which the
Ack Vector returns to its normal/average length.

The solution taken by this patch is to subtract a minimum-expected Ack Vector
length from the MPS (previous patch), and to defer any larger Ack Vectors onto
a separate Sync - but only if indeed there is no space left on the skb.

This patch provides the infrastructure to schedule Sync-packets for transporting
(urgent) out-of-band data. Its signalling is quicker than scheduling an Ack, since
it does not need to wait for new application data.

It can thus serve other parts of the DCCP code as well.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# f10ecaee 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Replace magic CCID-specific numbers by symbolic constants

The constants DCCPO_{MIN,MAX}_CCID_SPECIFIC are nowhere used in the code, but
instead for the CCID-specific options numbers are used.

This patch unifies the use of CCID-specific option numbers, by adding symbolic
names reflecting the definitions in RFC 4340, 10.3.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 0a482267 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Initialisation and type-checking of feature sysctls

This patch takes care of initialising and type-checking sysctls related to
feature negotiation. Type checking is important since some of the sysctls
now directly act on the feature-negotiation process.

The sysctls are initialised with the known default values for each feature.
For the type-checking the value constraints from RFC 4340 are used:

* Sequence Window uses the specified Wmin=32, the maximum is ulong (4 bytes),
tested and confirmed that it works up to 4294967295 - for Gbps speed;
* Ack Ratio is between 0 .. 0xffff (2-byte unsigned integer);
* CCIDs are between 0 .. 255;
* request_retries, retries1, retries2 also between 0..255 for good measure;
* tx_qlen is checked to be non-negative;
* sync_ratelimit remains as before.

Further changes:
----------------
Performed s@sysctl_dccp_feat@sysctl_dccp@g since the sysctls are now in feat.c.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>


# 51c7d4fa 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Implement both feature-local and feature-remote Sequence Window feature

This adds full support for local/remote Sequence Window feature, from which the
* sequence-number-validity (W) and
* acknowledgment-number-validity (W') windows
derive as specified in RFC 4340, 7.5.3.

Specifically, the following changes are introduced:
* integrated new socket fields into dccp_sk;
* updated the update_gsr/gss routines with regard to these fields;
* updated handler code: the Sequence Window feature is located at the TX side,
so the local feature is meant if the handler-rx flag is false;
* the initialisation of `rcv_wnd' in reqsk is removed, since
- rcv_wnd is not used by the code anywhere;
- sequence number checks are not done in the LISTEN state (cf. 7.5.3);
- dccp_check_req checks the Ack number validity more rigorously;
* the `struct dccp_minisock' became empty and is now removed.

Until the handshake completes with activating negotiated values, the local/remote
Sequence-Window values are undefined and thus can not reliably be estimated.
This issue is addressed in a separate patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>


# 5d3dac26 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Initialisation framework for feature negotiation

This initialises feature negotiation from two tables, which are initialised
from sysctls.

As a novel feature, specifics of the implementation (e.g. currently short
seqnos and ECN are not supported) are advertised for robustness.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>


# b235dc4a 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-2: Phase out the use of boolean Ack Vector sysctl

This removes the use of the sysctl and the minisock variable for the Send Ack
Vector feature, which is now handled fully dynamically via feature negotiation;
i.e. when CCID2 is enabled, Ack Vectors are automatically enabled (as per
RFC 4341, 4.).

Using a sysctl in parallel to this implementation would open the door to
crashes, since much of the code relies on tests of the boolean minisock /
sysctl variable. Thus, this patch replaces all tests of type

if (dccp_msk(sk)->dccpms_send_ack_vector)
/* ... */
with
if (dp->dccps_hc_rx_ackvec != NULL)
/* ... */

The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
negotiation concluded that Ack Vectors are to be used on the half-connection.
Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
so that the test is a valid one.

The activation handler for Ack Vectors is called as soon as the feature
negotiation has concluded at the
* server when the Ack marking the transition RESPOND => OPEN arrives;
* client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.

Adding the sequence number of the Response packet to the Ack Vector has been
removed, since
(a) connection establishment implies that the Response has been received;
(b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
this entry will always be ignored;
(c) it can not be used for anything useful - to detect loss for instance, only
packets received after the loss can serve as pseudo-dupacks.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>


# 68e074bf 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Remove manual influence on NDP Count feature

Updating the NDP count feature is handled automatically now:
* for CCID-2 it is disabled, since the code does not use NDP counts;
* for CCID-3 it is enabled, as NDP counts are used to determine loss lengths.

Allowing the user to change NDP values leads to unpredictable and failing
behaviour, since it is then possible to disable NDP counts even when they
are needed (e.g. in CCID-3).

This means that only those user settings are sensible that agree with the
values for Send NDP Count implied by the choice of CCID. But those settings
are already activated by the feature negotiation (CCID dependency tracking),
hence this form of support is redundant.

At startup the initialisation of the NDP count feature is with the default
value of 0, which is done implicitly by the zeroing-out of the socket when
it is allocated. If the choice of CCID or feature negotiation enables NDP
count, this will then be updated via the NDP activation handler.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>


# 78673e24 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Remove obsolete parts of the old CCID interface

The TX/RX CCIDs of the minisock are now redundant: similar to the Ack Vector
case, their value equals initially that of the sysctl, but at the end of
feature negotiation may be something different.

The old interface removed by this patch thus has been replaced by the newer
interface to dynamically query the currently loaded CCIDs earlier in this
patch set.

Also removed the constructors for the TX CCID and the RX CCID, since the
switch rx/non-rx is done by the handler in minisocks.c (and the handler is
the only place in the code where CCIDs are loaded).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>


# fade756f 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Set per-connection CCIDs via socket options

With this patch, TX/RX CCIDs can now be changed on a per-connection basis, which
overrides the defaults set by the global sysctl variables for TX/RX CCIDs.

To make full use of this facility, the remaining patches of this patch set are
needed, which track dependencies and activate negotiated feature values.

Note on the maximum number of CCIDs that can be registered:
-----------------------------------------------------------
The maximum number of CCIDs that can be registered on the socket is constrained
by the space in a Confirm/Change feature negotiation option.

The space in these in turn depends on the size of header options as defined
in RFC 4340, 5.8. Since this is a recurring constant, it has been moved from
ackvec.h into linux/dccp.h, clarifying its purpose.

Relative to this size, the maximum number of CCID identifiers that can be
present in a Confirm option (which always consumes 1 byte more than a Change
option, cf. 6.1) is 2 bytes less than the maximum TLV size: one for the
CCID-feature-type and one for the selected value.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 17c30b40 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Deprecate Ack Ratio sysctl

This patch deprecates the Ack Ratio sysctl, since
* Ack Ratio is entirely ignored by CCID-3 and CCID-4,
* Ack Ratio currently doesn't work in CCID-2 (i.e. is always set to 1);
* even if it would work in CCID-2, there is no point for a user to change it:
- Ack Ratio is constrained by cwnd (RFC 4341, 6.1.2),
- if Ack Ratio > cwnd, the system resorts to spurious RTO timeouts
(since waiting for Acks which will never arrive in this window),
- cwnd is not a user-configurable value.

The only reasonable place for Ack Ratio is to print it for debugging. It is
planned to do this later on, as part of e.g. dccp_probe.

With this patch Ack Ratio is now under full control of feature negotiation:
* Ack Ratio is resolved as a dependency of the selected CCID;
* if the chosen CCID supports it (i.e. CCID == CCID-2), Ack Ratio is set to
the default of 2, following RFC 4340, 11.3 - "New connections start with Ack
Ratio 2 for both endpoints";
* what happens then is part of another patch set, since it concerns the
dynamic update of Ack Ratio while the connection is in full flight.

Thanks to Tomasz Grobelny for discussion leading up to this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 20f41eee 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Feature negotiation for minimum-checksum-coverage

This provides feature negotiation for server minimum checksum coverage
which so far has been missing.

Since sender/receiver coverage values range only from 0...15, their
type has also been reduced in size from u16 to u4.

Feature-negotiation options are now generated for both sender and receiver
coverage, i.e. when the peer has `forgotten' to enable partial coverage
then feature negotiation will automatically enable (negotiate) the partial
coverage value for this connection.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>


# 668144f7 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Deprecate old setsockopt framework

The previous setsockopt interface, which passed socket options via struct
dccp_so_feat, is complicated/difficult to use. Continuing to support it leads to
ugly code since the old approach did not distinguish between NN and SP values.

This patch removes the old setsockopt interface and replaces it with two new
functions to register NN/SP values for feature negotiation. These are
essentially wrappers around the internal __feat_register functions, with
checking added to avoid
* wrong usage (type);
* changing values while the connection is in progress.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 71bb4959 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Query supported CCIDs

This provides a data structure to record which CCIDs are locally supported
and three accessor functions:
- a test function for internal use which is used to validate CCID requests
made by the user;
- a copy function so that the list can be used for feature-negotiation;
- documented getsockopt() support so that the user can query capabilities.

The data structure is a table which is filled in at compile-time with the
list of available CCIDs (which in turn depends on the Kconfig choices).

Using the copy function for cloning the list of supported CCIDs is useful for
feature negotiation, since the negotiation is now with the full list of available
CCIDs (e.g. {2, 3}) instead of the default value {2}. This means negotiation
will not fail if the peer requests to use CCID3 instead of CCID2.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>


# 828755ce 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Per-socket initialisation of feature negotiation

This provides feature-negotiation initialisation for both DCCP sockets and
DCCP request_sockets, to support feature negotiation during connection setup.

It also resolves a FIXME regarding the congestion control initialisation.

Thanks to Wei Yongjun for help with the IPv6 side of this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>


# b4eec206 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Implement lookup table for feature-negotiation information

A lookup table for feature-negotiation information, extracted from RFC 4340/42,
is provided by this patch. All currently known features can be found in this
table, along with their feature location, their default value, and type.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>


# 5b5d0e70 13-Jul-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Upgrade NDP count from 3 to 6 bytes

RFC 4340, 7.7 specifies up to 6 bytes for the NDP Count option, whereas the code
is currently limited to up to 3 bytes. This seems to be a relict of an earlier
draft version and is brought up to date by the patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# cf6b5fbe 03-Feb-2008 Arnaldo Carvalho de Melo <acme@redhat.com>

[DCCP]: Reorganize struct dccp_sock to save 8 bytes

/home/acme/git/net-2.6/net/dccp/ipv6.c:
struct dccp_sock | -8
struct dccp6_sock | -8
2 structs changed

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b4d4f7c7 12-Dec-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Handle timestamps on Request/Response exchange separately

In DCCP, timestamps can occur on packets anytime, CCID3 uses a timestamp(/echo) on the Request/Response
exchange. This patch addresses the following situation:
* timestamps are recorded on the listening socket;
* Responses are sent from dccp_request_sockets;
* suppose two connections reach the listening socket with very small time in between:
* the first timestamp value gets overwritten by the second connection request.

This is not really good, so this patch separates timestamps into
* those which are received by the server during the initial handshake (on dccp_request_sock);
* those which are received by the client or the client after connection establishment.

As before, a timestamp of 0 is regarded as indicating that no (meaningful) timestamp has been
received (in addition, a warning message is printed if hosts send 0-valued timestamps).

The timestamp-echoing now works as follows:
* when a timestamp is present on the initial Request, it is placed into dreq, due to the
call to dccp_parse_options in dccp_v{4,6}_conn_request;
* when a timestamp is present on the Ack leading from RESPOND => OPEN, it is copied over
from the request_sock into the child cocket in dccp_create_openreq_child;
* timestamps received on an (established) dccp_sock are treated as before.

Since Elapsed Time is measured in hundredths of milliseconds (13.2), the new dccp_timestamp()
function is used, as it is expected that the time between receiving the timestamp and
sending the timestamp echo will be very small against the wrap-around time. As a byproduct,
this allows smaller timestamping-time fields.

Furthermore, inserting the Timestamp Echo option has been taken out of the block starting with
'!dccp_packet_without_ack()', since Timestamp Echo can be carried on any packet (5.8 and 13.3).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8b819412 12-Dec-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Allow to parse options on Request Sockets

The option parsing code currently only parses on full sk's. This causes a problem for
options sent during the initial handshake (in particular timestamps and feature-negotiation
options). Therefore, this patch extends the option parsing code with an additional argument
for request_socks: if it is non-NULL, options are parsed on the request socket, otherwise
the normal path (parsing on the sk) is used.

Subsequent patches, which implement feature negotiation during connection setup, make use
of this facility.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b8599d20 12-Dec-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Support for server holding timewait state

This adds a socket option and signalling support for the case where the server
holds timewait state on closing the connection, as described in RFC 4340, 8.3.

Since holding timewait state at the server is the non-usual case, it is enabled
via a socket option. Documentation for this socket option has been added.

The setsockopt statement has been made resilient against different possible cases
of expressing boolean `true' values using a suggestion by Ian McDonald.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0c869620 28-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Integrate state transitions for passive-close

This adds the necessary state transitions for the two forms of passive-close

* PASSIVE_CLOSE - which is entered when a host receives a Close;
* PASSIVE_CLOSEREQ - which is entered when a client receives a CloseReq.

Here is a detailed account of what the patch does in each state.

1) Receiving CloseReq

The pseudo-code in 8.5 says:

Step 13: Process CloseReq
If P.type == CloseReq and S.state < CLOSEREQ,
Generate Close
S.state := CLOSING
Set CLOSING timer.

This means we need to address what to do in CLOSED, LISTEN, REQUEST, RESPOND, PARTOPEN, and OPEN.

* CLOSED: silently ignore - it may be a late or duplicate CloseReq;
* LISTEN/RESPOND: will not appear, since Step 7 is performed first (we know we are the client);
* REQUEST: perform Step 13 directly (no need to enqueue packet);
* OPEN/PARTOPEN: enter PASSIVE_CLOSEREQ so that the application has a chance to process unread data.

When already in PASSIVE_CLOSEREQ, no second CloseReq is enqueued. In any other state, the CloseReq is ignored.
I think that this offers some robustness against rare and pathological cases: e.g. a simultaneous close where
the client sends a Close and the server a CloseReq. The client will then be retransmitting its Close until it
gets the Reset, so ignoring the CloseReq while in state CLOSING is sane.

2) Receiving Close

The code below from 8.5 is unconditional.

Step 14: Process Close
If P.type == Close,
Generate Reset(Closed)
Tear down connection
Drop packet and return

Thus we need to consider all states:
* CLOSED: silently ignore, since this can happen when a retransmitted or late Close arrives;
* LISTEN: dccp_rcv_state_process() will generate a Reset ("No Connection");
* REQUEST: perform Step 14 directly (no need to enqueue packet);
* RESPOND: dccp_check_req() will generate a Reset ("Packet Error") -- left it at that;
* OPEN/PARTOPEN: enter PASSIVE_CLOSE so that application has a chance to process unread data;
* CLOSEREQ: server performed active-close -- perform Step 14;
* CLOSING: simultaneous-close: use a tie-breaker to avoid message ping-pong (see comment);
* PASSIVE_CLOSEREQ: ignore - the peer has a bug (sending first a CloseReq and now a Close);
* TIMEWAIT: packet is ignored.

Note that the condition of receiving a packet in state CLOSED here is different from the condition "there
is no socket for such a connection": the socket still exists, but its state indicates it is unusable.

Last, dccp_finish_passive_close sets either DCCP_CLOSED or DCCP_CLOSING = TCP_CLOSING, so that
sk_stream_wait_close() will wait for the final Reset (which will trigger CLOSING => CLOSED).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f11135a3 28-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Dedicated auxiliary states to support passive-close

This adds two auxiliary states to deal with passive closes:
* PASSIVE_CLOSE (reached from OPEN via reception of Close) and
* PASSIVE_CLOSEREQ (reached from OPEN via reception of CloseReq)
as internal intermediate states.

These states are used to allow a receiver to process unread data before
acknowledging the received connection-termination-request (the Close/CloseReq).

Without such support, it will happen that passively-closed sockets enter CLOSED
state while there is still unprocessed data in the queue; leading to unexpected
and erratic API behaviour.

PASSIVE_CLOSE has been mapped into TCPF_CLOSE_WAIT, so that the code will
seamlessly work with inet_accept() (which tests for this state).

The state names are thanks to Arnaldo, who suggested this naming scheme
following an earlier revision of this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9b91ad27 20-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Make PARTOPEN an autonomous state

This decouples PARTOPEN from TCP-specific stream-states.

It thus addresses the FIXME.

The code has been checked with regard to dependency on PARTOPEN and FIN_WAIT1
states (to which PARTOPEN previously was mapped): there is no difference, as
PARTOPEN is always referred to directly (i.e. not via the mapping to TCP
state).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d8ef2c29 24-Oct-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Convert Reset code into socket error number

This adds support for converting the 11 currently defined Reset codes into system
error numbers, which are stored in sk_err for further interpretation.

This makes the externally visible API behaviour similar to TCP, since a client
connecting to a non-existing port will experience ECONNREFUSED.

* Code 0, Unspecified, is interpreted as non-error (0);
* Code 1, Closed (normal termination), also maps into 0;
* Code 2, Aborted, maps into "Connection reset by peer" (ECONNRESET);
* Code 3, No Connection and
Code 7, Connection Refused, map into "Connection refused" (ECONNREFUSED);
* Code 4, Packet Error, maps into "No message of desired type" (ENOMSG);
* Code 5, Option Error, maps into "Illegal byte sequence" (EILSEQ);
* Code 6, Mandatory Error, maps into "Operation not supported on transport endpoint" (EOPNOTSUPP);
* Code 8, Bad Service Code, maps into "Invalid request code" (EBADRQC);
* Code 9, Too Busy, maps into "Too many users" (EUSERS);
* Code 10, Bad Init Cookie, maps into "Invalid request descriptor" (EBADR);
* Code 11, Aggression Penalty, maps into "Quota exceeded" (EDQUOT)
which makes sense in terms of using more than the `fair share' of bandwidth.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# fde20105 24-Oct-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Retrieve packet sequence number for error reporting

This fixes a problem when analysing erroneous packets in dccp_v{4,6}_err:
* dccp_hdr_seq currently takes an skb
* however, the transport headers in the skb are shifted, due to the
preceding IPv4/v6 header.
Fixed for v4 and v6 by changing dccp_hdr_seq to take a struct dccp_hdr as
argument. Verified that the correct sequence number is now reported in the
error handler.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 451bc047 04-Oct-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Tidy-up -- minisock initialisation

This

* removes a declaration of a non-existent function
__dccp_minisock_init;

* shifts the initialisation function dccp_minisock_init() from
options.c to minisocks.c, where it is more naturally expected to
be.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7c559a9e 04-Oct-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Add socket option to query the current MPS

This enables applications to query the current value of the Maximum
Packet Size via a socket option, suggested as a SHOULD in (RFC 4340,
p. 102).

This socket option is useful to avoid the annoying bail-out via
`-EMSGSIZE'. In particular, as fragmentation is not currently
supported (and its use is partly discouraged in RFC 4340).

With this option, it is possible to size buffers accordingly, e.g.

int buflen = dccp_get_cur_mps(sockfd);

/* or */
if (msgsize > dccp_get_cur_mps(sockfd))
die("message is too large for this path");

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a94f0f97 26-Sep-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Rate-limit DCCP-Syncs

This implements a SHOULD from RFC 4340, 7.5.4:
"To protect against denial-of-service attacks, DCCP implementations SHOULD
impose a rate limit on DCCP-Syncs sent in response to sequence-invalid packets,
such as not more than eight DCCP-Syncs per second."

The rate-limit is maintained on a per-socket basis. This is a more stringent
policy than enforcing the rate-limit on a per-source-address basis and
protects against attacks with forged source addresses.

Moreover, the mechanism is deliberately kept simple. In contrast to
xrlim_allow(), bursts of Sync packets in reply to sequence-invalid packets
are not supported. This foils such attacks where the receipt of a Sync
triggers further sequence-invalid packets. (I have tested this mechanism against
xrlim_allow algorithm for Syncs, permitting bursts just increases the problems.)

In order to keep flexibility, the timeout parameter can be set via sysctl; and
the whole mechanism can even be disabled (which is however not recommended).

The algorithm in this patch has been improved with regard to wrapping issues
thanks to a suggestion by Arnaldo.

Commiter note: Rate limited the step 6 DCCP_WARN too, as it says we're
sending a sync.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>


# 0430ee34 26-Sep-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Add Support for Data 1 .. 3 fields of Reset packets

This adds fields to support the informational Data 1..3 fields of the
DCCP-Reset packets (RFC 4340, 5.6), and makes minor cosmetic changes
to documentation.
Code which fills in these fields follows in subsequent patches, it is
primarily used for reporting option-processing and feature-negotiation
errors.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>


# 8fb8354a 19-Aug-2007 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

[DCCP]: Nuke dccp_timestamp and dccps_epoch, not used anymore

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 19ac2146 19-Aug-2007 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

[DCCP]: Convert dccps_timestamp_time to ktime_t

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 91d73c15 20-Apr-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Complete documentation of dccp_sock

This fills in missing documentation for dccp_sock fields.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 89560b53 20-Mar-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Sample RTT from SYN exchange

Function:


# 9c70220b 25-Apr-2007 Arnaldo Carvalho de Melo <acme@redhat.com>

[SK_BUFF]: Introduce skb_transport_header(skb)

For the places where we need a pointer to the transport header, it is
still legal to touch skb->h.raw directly if just adding to,
subtracting from or setting it to another layer header.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# badff6d0 13-Mar-2007 Arnaldo Carvalho de Melo <acme@redhat.com>

[SK_BUFF]: Introduce skb_reset_transport_header(skb)

For the common, open coded 'skb->h.raw = skb->data' operation, so that we can
later turn skb->h.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple cases:

skb->h.raw = skb->data;
skb->h.raw = {skb_push|[__]skb_pull}()

The next ones will handle the slightly more "complex" cases.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8109b02b 10-Dec-2006 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP]: Whitespace cleanups

That accumulated over the last months hackaton, shame on me for not
using git-apply whitespace helping hand, will do that from now on.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 5aed3243 28-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Tidy up unused structures

This removes and cleans up unused variables and structures which have become
unnecessary following the introduction of the EWMA patch to automatically track
the CCID 3 receiver/sender packet sizes `s'.

It deprecates the PACKET_SIZE socket option by returning an error code and
printing a deprecation warning if an application tries to read or write this
socket option.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 9981a0e3 14-Nov-2006 Al Viro <viro@zeniv.linux.org.uk>

[NET]: Annotate checksums in on-the-wire packets.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 09dbc389 13-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Miscellaneous code tidy-ups

This patch does not change code; it performs some trivial clean/tidy-ups:

* removal of a `debug_prefix' string in favour of the
already existing dccp_role(sk)

* add documentation of structures and constants

* separated out the cases for invalid packets (step 1
of the packet validation)

* removing duplicate statements

* combining declaration & initialisation

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# c02fdc0e 13-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Make feature negotiation more readable

This patch replaces cryptic feature negotiation messages of type

Oct 31 15:42:20 kernel: dccp_feat_change: feat change type=32 feat=1
Oct 31 15:42:21 kernel: dccp_feat_change: feat change type=34 feat=1
Oct 31 15:42:21 kernel: dccp_feat_change: feat change type=32 feat=5

into ones of type:

Nov 2 13:54:45 kernel: dccp_feat_change: ChangeL(CCID (1), 3)
Nov 2 13:54:45 kernel: dccp_feat_change: ChangeR(CCID (1), 3)
Nov 2 13:54:45 kernel: dccp_feat_change: ChangeL(Ack Ratio (5), 2)

Also,
* completed the feature number list wrt RFC 4340 sec. 6.4
* annotating which ones have been implemented so far
* implemented rudimentary sanity checking in feat.c (FIXMEs)
* some minor fixes

Commiter note: uninlined dccp_feat_name and dccp_feat_typename, for
consistency with dccp_{state,packet}_name, that, BTW,
should be compiled only if CONFIG_IP_DCCP_DEBUG is
selected, leaving this to another cset tho. Also
shortened dccp_feat_negotiation_debug to dccp_feat_debug.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 6f4e5fff 10-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Support for partial checksums (RFC 4340, sec. 9.2)

This patch does the following:
a) introduces variable-length checksums as specified in [RFC 4340, sec. 9.2]
b) provides necessary socket options and documentation as to how to use them
c) basic support and infrastructure for the Minimum Checksum Coverage feature
[RFC 4340, sec. 9.2.1]: acceptability tests, user notification and user
interface

In addition, it

(1) fixes two bugs in the DCCPv4 checksum computation:
* pseudo-header used checksum_len instead of skb->len
* incorrect checksum coverage calculation based on dccph_x
(2) removes dccp_v4_verify_checksum() since it reduplicates code of the
checksum computation; code calling this function is updated accordingly.
(3) now uses skb_checksum(), which is safer than checksum_partial() if the
sk_buff has is a non-linear buffer (has pages attached to it).
(4) fixes an outstanding TODO item:
* If P.CsCov is too large for the packet size, drop packet and return.

The code has been tested with applications, the latest version of tcpdump now
comes with support for partial DCCP checksums.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 9b42078e 10-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Combine allocating & zeroing header space on skb

This is a code simplification:
it combines three often recurring operations into one inline function,

* allocate `len' bytes header space in skb
* fill these `len' bytes with zeroes
* cast the start of this header space as dccp_hdr

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 1acc04cd 10-Oct-2006 Al Viro <viro@ftp.linux.org.uk>

[PATCH] dccp __user annotations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# b83eff64 21-Sep-2006 Ian McDonald <ian.mcdonald@jandi.co.nz>

[DCCP]: Introduce constants for CCID numbers

Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 00e4d116 22-Sep-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Allow default/fallback service code.

This has been discussed on dccp@vger and removes the necessity for applications
to supply service codes in each and every case.

If an application does not want to provide a service code, that's fine, it will
be given 0. Otherwise, service codes can be set via socket options as before.

This patch has been tested using various client/server configurations
(including listening on multiple service codes).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 97e5848d 26-Aug-2006 Ian McDonald <ian.mcdonald@jandi.co.nz>

[DCCP]: Introduce tx buffering

This adds transmit buffering to DCCP.

I have tested with CCID2/3 and with loss and rate limiting.

Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a4bf3902 20-Mar-2006 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP] minisock: Rename struct dccp_options to struct dccp_minisock

This will later be included in struct dccp_request_sock so that we can
have per connection feature negotiation state while in the 3way
handshake, when we clone the DCCP_ROLE_LISTEN socket (in
dccp_create_openreq_child) we'll just copy this state from
dreq_minisock to dccps_minisock.

Also the feature negotiation and option parsing code will mostly touch
dccps_minisock, which will simplify some stuff.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 93ce2092 20-Mar-2006 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP]: Make CCID2 be the default

As per the draft. This fixes the build when netfilter dccp components
are built and dccp isn't. Thanks to Reuben Farrelly for reporting
this.

The following changesets will introduce /proc/sys/net/dccp/defaults/
to give more flexibility to DCCP developers and testers while apps
doesn't use setsockopt to specify the desired CCID, etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 60fe62e7 20-Mar-2006 Andrea Bittau <a.bittau@cs.ucl.ac.uk>

[DCCP]: sparse endianness annotations

This also fixes the layout of dccp_hdr short sequence numbers, problem
was not fatal now as we only support long (48 bits) sequence numbers.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 91f0ebf7 20-Mar-2006 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP] CCID: Improve CCID infrastructure

1. No need for ->ccid_init nor ->ccid_exit, this is what module_{init,exit}
does and anynways neither ccid2 nor ccid3 were using it.

2. Rename struct ccid to struct ccid_operations and introduce struct ccid
with a pointer to ccid_operations and rigth after it the rx or tx
private state.

3. Remove the pointer to the state of the half connections from struct
dccp_sock, now its derived thru ccid_priv() from the ccid pointer.

Now we also can implement the setsockopt for changing the CCID easily as
no ccid init routines can affect struct dccp_sock in any way that prevents
other CCIDs from working if a CCID switch operation is asked by apps.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ba66c6e8 20-Mar-2006 Ian McDonald <imcdnzl@gmail.com>

[DCCP]: Set the default CCID according to kernel config selection

Now CCID2 is the default, as stated in the RFC drafts, but we allow
a config where just CCID3 is built, where CCID3 becomes the default.

Signed-off-by: Ian McDonald <imcdnzl@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# afe00251 20-Mar-2006 Andrea Bittau <a.bittau@cs.ucl.ac.uk>

[DCCP]: Initial feature negotiation implementation

Still needs more work, but boots and doesn't crashes, even
does some negotiation!

18:38:52.174934 127.0.0.1.43458 > 127.0.0.1.5001: request <change_l ack_ratio 2, change_r ccid 2, change_l ccid 2>
18:38:52.218526 127.0.0.1.5001 > 127.0.0.1.43458: response <nop, nop, change_l ack_ratio 2, confirm_r ccid 2 2, confirm_l ccid 2 2, confirm_r ack_ratio 2>
18:38:52.185398 127.0.0.1.43458 > 127.0.0.1.5001: <nop, confirm_r ack_ratio 2, ack_vector0 0x00, elapsed_time 212>

:-)

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2a91aa39 20-Mar-2006 Andrea Bittau <a.bittau@cs.ucl.ac.uk>

[DCCP] CCID2: Initial CCID2 (TCP-Like) implementation

Original work by Andrea Bittau, Arnaldo Melo cleaned up and fixed several
issues on the merge process.

For now CCID2 was turned the default for all SOCK_DCCP connections, but this
will be remedied soon with the merge of the feature negotiation code.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 14c85021 26-Dec-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.h

To help in reducing the number of include dependencies, several files were
touched as they were getting needed headers indirectly for stuff they use.

Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
linux/dccp.h include twice.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d83d8461 14-Dec-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[IP_SOCKGLUE]: Remove most of the tcp specific calls

As DCCP needs to be called in the same spots.

Now we have a member in inet_sock (is_icsk), set at sock creation time from
struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and
DCCP) to see if a struct sock instance is a inet_connection_sock for places
like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if
sk_type was SOCK_STREAM, that is insufficient because we now use the same code
for DCCP, that has sk_type SOCK_DCCP.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 88f964db 18-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP]: Introduce CCID getsockopt for the CCIDs

Allocation for the optnames is similar to the DCCP options, with a
range for rx and tx half connection CCIDs.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 561713cf 18-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP]: Don't use necessarily the same CCID for tx and rx

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ae31c339 18-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP]: Move the ack vector code to net/dccp/ackvec.[ch]

Isolating it, that will be used when we introduce a CCID2 (TCP-Like)
implementation.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 67e6b629 16-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP]: Introduce DCCP_SOCKOPT_SERVICE

As discussed in the dccp@vger mailing list:

Now applications have to use setsockopt(DCCP_SOCKOPT_SERVICE, service[s]),
prior to calling listen() and connect().

An array of unsigned ints can be passed meaning that the listening sock accepts
connection requests for several services.

With this we can ditch struct sockaddr_dccp and use only sockaddr_in (and
sockaddr_in6 in the future).

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b0e56780 08-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP] Introduce dccp_timestamp

To start the timestamps with 0.0ms, easing the integer maths in the CCIDs, this
probably will be reworked to use the to be introduced struct timeval_offset
infrastructure out of skb_get_timestamp, etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 507d37cf 08-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID] Only call the HC insert_options methods when requested

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# a84ffe43 28-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP]: Introduce DCCP_SOCKOPT_PACKET_SIZE

So that applications can set dccp_sock->dccps_pkt_size, that in turn
is used in the CCID3 half connection init routines to set
ccid3hc[tr]x_s and use it in its rate calculations.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1bc09869 19-Aug-2005 Ian McDonald <iam4@cs.waikato.ac.nz>

[DCCP]: Fix the timestamp options

This changes timestamp, timestamp echo, and elapsed time to use units of 10
usecs as per DCCP spec. This has been tested to verify that times are correct.
Also fixed up length and used hton/ntoh more.

Still to add in later patches:
- actually use elapsed time to adjust RTT
(commented out as was prior to this patch)
- send options at times more closely following the spec
(content is now correct)

Signed-off-by: Ian McDonald <iam4@cs.waikato.ac.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 64cf1e5d 09-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP]: Finish the TIMEWAIT minisock support

Using most of the infrastructure TCP uses, with a dccp_death_row,
etc. As per my current interpretation of the draft what we have with
this changeset seems to be all we need (or very close to it 8)).

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1d3de414 09-Aug-2005 Harald Welte <laforge@netfilter.org>

[NETFILTER]: New iptables DCCP protocol header match

Using this new iptables DCCP protocol header match, it is possible to
create simplistic stateless packet filtering rules for DCCP. It
permits matching of port numbers, packet type and options.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e2e26866 09-Aug-2005 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

[DCCP]: Fix struct sockaddr_dccp definition

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5a47a470 09-Aug-2005 Harald Welte <laforge@netfilter.org>

[DCCP]: make <linux/dccp.h> include-able from userspace

The protocol header files in <linux/foo.h> are usually structured in a
way to be included by userspace code. The top section consists of
general protocol structure definitions, typedefs, enums - followed by
an #ifdef __KERNEL__ section.

Currently <linux/dccp.h> doesn't follow that convention and can
therefore not be used from userspace. However, for example iptables'
libipt_dccp.c actually needs various definitions from there.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7c657876 09-Aug-2005 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

[DCCP]: Initial implementation

Development to this point was done on a subversion repository at:

http://oops.ghostprotocols.net:81/cgi-bin/viewcvs.cgi/dccp-2.6/

This repository will be kept at this site for the foreseable future,
so that interested parties can see the history of this code,
attributions, etc.

If I ever decide to take this offline I'll provide the full history at
some other suitable place.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>