History log of /linux-master/net/dccp/ccids/ccid3.c
Revision Date Author Comments
# 0b609b55 27-Oct-2020 Andrew Lunn <andrew@lunn.ch>

net: dccp: Fix most of the kerneldoc warnings

net/dccp/ccids/ccid2.c:190: warning: Function parameter or member 'hc' not described in 'ccid2_update_used_window'
net/dccp/ccids/ccid2.c:190: warning: Function parameter or member 'new_wnd' not described in 'ccid2_update_used_window'
net/dccp/ccids/ccid2.c:360: warning: Function parameter or member 'sk' not described in 'ccid2_rtt_estimator'
net/dccp/ccids/ccid3.c:112: warning: Function parameter or member 'sk' not described in 'ccid3_hc_tx_update_x'
net/dccp/ccids/ccid3.c:159: warning: Function parameter or member 'hc' not described in 'ccid3_hc_tx_update_s'
net/dccp/ccids/ccid3.c:268: warning: Function parameter or member 'sk' not described in 'ccid3_hc_tx_send_packet'
net/dccp/ccids/ccid3.c:667: warning: Function parameter or member 'sk' not described in 'ccid3_first_li'
net/dccp/ccids/ccid3.c:85: warning: Function parameter or member 'hc' not described in 'ccid3_update_send_interval'
net/dccp/ccids/lib/loss_interval.c:85: warning: Function parameter or member 'lh' not described in 'tfrc_lh_update_i_mean'
net/dccp/ccids/lib/loss_interval.c:85: warning: Function parameter or member 'skb' not described in 'tfrc_lh_update_i_mean'
net/dccp/ccids/lib/packet_history.c:392: warning: Function parameter or member 'h' not described in 'tfrc_rx_hist_sample_rtt'
net/dccp/ccids/lib/packet_history.c:392: warning: Function parameter or member 'skb' not described in 'tfrc_rx_hist_sample_rtt'
net/dccp/feat.c:1003: warning: Function parameter or member 'dreq' not described in 'dccp_feat_server_ccid_dependencies'
net/dccp/feat.c:1040: warning: Function parameter or member 'array_len' not described in 'dccp_feat_prefer'
net/dccp/feat.c:1040: warning: Function parameter or member 'array' not described in 'dccp_feat_prefer'
net/dccp/feat.c:1040: warning: Function parameter or member 'preferred_value' not described in 'dccp_feat_prefer'
net/dccp/output.c:151: warning: Function parameter or member 'dp' not described in 'dccp_determine_ccmps'
net/dccp/output.c:242: warning: Function parameter or member 'sk' not described in 'dccp_xmit_packet'
net/dccp/output.c:305: warning: Function parameter or member 'sk' not described in 'dccp_flush_write_queue'
net/dccp/output.c:305: warning: Function parameter or member 'time_budget' not described in 'dccp_flush_write_queue'
net/dccp/output.c:378: warning: Function parameter or member 'sk' not described in 'dccp_retransmit_skb'
net/dccp/qpolicy.c:88: warning: Function parameter or member '' not described in 'dccp_qpolicy_operations'
net/dccp/qpolicy.c:88: warning: Function parameter or member '{' not described in 'dccp_qpolicy_operations'
net/dccp/qpolicy.c:88: warning: Function parameter or member 'params' not described in 'dccp_qpolicy_operations'

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20201028011412.931250-1-andrew@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# df561f66 23-Aug-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

treewide: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>


# 1aecbf18 22-Aug-2020 Miaohe Lin <linmiaohe@huawei.com>

net: dccp: Convert to use the preferred fallthrough macro

Convert the uses of fallthrough comments to fallthrough macro.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 266f3128 13-Jul-2020 Alexander A. Klimov <grandmaster@al2klimov.de>

dccp: Replace HTTP links with HTTPS ones

Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7931287d 23-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 132

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify it
under the terms of the gnu general public license as published by the
free software foundation either version 2 of the license or at your
option any later version this program is distributed in the hope that it
will be useful but without any warranty without even the implied warranty
of merchantability or fitness for a particular purpose see the gnu
general public license for more details you should have received a copy
of the gnu general public license along with this program if not write to
the free software foundation inc 675 mass ave cambridge ma 02139 usa

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 4 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190524100843.499675784@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 0ce4e70f 22-Jun-2018 Eric Dumazet <edumazet@google.com>

net: dccp: switch rx_tstamp_last_feedback to monotonic clock

To compute delays, better not use time of the day which can
be changed by admins or malicious programs.

Also change ccid3_first_li() to use s64 type for delta variable
to avoid potential overflows.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: dccp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>


# 74174fe5 22-Jun-2018 Eric Dumazet <edumazet@google.com>

net: dccp: avoid crash in ccid3_hc_rx_send_feedback()

On fast hosts or malicious bots, we trigger a DCCP_BUG() which
seems excessive.

syzbot reported :

BUG: delta (-6195) <= 0 at net/dccp/ccids/ccid3.c:628/ccid3_hc_rx_send_feedback()
CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.18.0-rc1+ #112
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
ccid3_hc_rx_send_feedback net/dccp/ccids/ccid3.c:628 [inline]
ccid3_hc_rx_packet_recv.cold.16+0x38/0x71 net/dccp/ccids/ccid3.c:793
ccid_hc_rx_packet_recv net/dccp/ccid.h:185 [inline]
dccp_deliver_input_to_ccids+0xf0/0x280 net/dccp/input.c:180
dccp_rcv_established+0x87/0xb0 net/dccp/input.c:378
dccp_v4_do_rcv+0x153/0x180 net/dccp/ipv4.c:654
sk_backlog_rcv include/net/sock.h:914 [inline]
__sk_receive_skb+0x3ba/0xd80 net/core/sock.c:517
dccp_v4_rcv+0x10f9/0x1f58 net/dccp/ipv4.c:875
ip_local_deliver_finish+0x2eb/0xda0 net/ipv4/ip_input.c:215
NF_HOOK include/linux/netfilter.h:287 [inline]
ip_local_deliver+0x1e9/0x750 net/ipv4/ip_input.c:256
dst_input include/net/dst.h:450 [inline]
ip_rcv_finish+0x823/0x2220 net/ipv4/ip_input.c:396
NF_HOOK include/linux/netfilter.h:287 [inline]
ip_rcv+0xa18/0x1284 net/ipv4/ip_input.c:492
__netif_receive_skb_core+0x2488/0x3680 net/core/dev.c:4628
__netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
process_backlog+0x219/0x760 net/core/dev.c:5373
napi_poll net/core/dev.c:5771 [inline]
net_rx_action+0x7da/0x1980 net/core/dev.c:5837
__do_softirq+0x2e8/0xb17 kernel/softirq.c:284
run_ksoftirqd+0x86/0x100 kernel/softirq.c:645
smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164
kthread+0x345/0x410 kernel/kthread.c:240
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: dccp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>


# 839a6094 24-Oct-2017 Kees Cook <keescook@chromium.org>

net: dccp: Convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. Adds a pointer back to the sock.

Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: dccp@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7b07f8eb 15-Aug-2012 Mathias Krause <minipli@googlemail.com>

dccp: fix info leak via getsockopt(DCCP_SOCKOPT_CCID_TX_INFO)

The CCID3 code fails to initialize the trailing padding bytes of struct
tfrc_tx_info added for alignment on 64 bit architectures. It that for
potentially leaks four bytes kernel stack via the getsockopt() syscall.
Add an explicit memset(0) before filling the structure to avoid the
info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2c53040f 10-Jul-2012 Ben Hutchings <bhutchings@solarflare.com>

net: Fix (nearly-)kernel-doc comments for various functions

Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 95c96174 14-Apr-2012 Eric Dumazet <eric.dumazet@gmail.com>

net: cleanup unsigned to unsigned int

Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 793734b5 27-Feb-2012 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: replace incorrect BUG_ON

This replaces an unjustified BUG_ON(), which could get triggered under normal
conditions: X_calc can be 0 when p > 0. X would in this case be set to the
minimum, s/t_mbi. Its replacement avoids t_ipi = 0 (unbounded sending rate).

Thanks to Jordi, Victor and Xavier who reported this.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.uk>


# eb939922 19-Dec-2011 Rusty Russell <rusty@rustcorp.com.au>

module_param: make bool parameters really bool (net & drivers/net)

module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.

(Thanks to Joe Perches for suggesting coccinelle for 0/1 -> true/false).

Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fe84f414 27-Oct-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Return-value convention of hc_tx_send_packet()

This patch reorganises the return value convention of the CCID TX sending
function, to permit more flexible schemes, as required by subsequent patches.

Currently the convention is
* values < 0 mean error,
* a value == 0 means "send now", and
* a value x > 0 means "send in x milliseconds".

The patch provides symbolic constants and a function to interpret return values.

In addition, it caps the maximum positive return value to 0xFFFF milliseconds,
corresponding to 65.535 seconds. This is possible since in CCID-3/4 the
maximum possible inter-packet gap is fixed at t_mbi = 64 sec.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# baf9e782 11-Oct-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: remove unused argument in CCID tx function

This removes the argument `more' from ccid_hc_tx_packet_sent, since it was
nowhere used in the entire code.

(Btw, this argument was not even used in the original KAME code where the
function initially came from; compare the variable moreToSend in the
freebsd61-dccp-kame-28.08.2006.patch kept by Emmanuel Lochin.)

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 536bb20b 19-Sep-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Remove redundant 'options_received' struct

The `options_received' struct is redundant, since it re-duplicates the existing
`p' and `x_recv' fields. This patch removes the sub-struct and migrates the
format conversion operations to ccid3_hc_tx_parse_options().

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 792e6d33 19-Sep-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp tfrc/ccid-3: computing the loss rate from the Loss Event Rate

This adds a function to take care of the following, separate cases occurring in
the computation of the Loss Rate p:

* 1/(2^32-1) is mapped into 0% as per RFC 4342, 8.5;
* 1/0 is mapped into 100%, the maximum;
* to avoid that p = 1/x is rounded down to 0 when x is very large, since this
means accidentally re-entering slow-start indicated by p == 0, the minimum
resolution value of p is now returned instead;
* a bug in ccid3_hc_rx_getsockopt is fixed: 1/0 was mapped into ~0U.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 80763dfb 19-Sep-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: remove dead states

This patch is thanks to an investigation by Leandro Sales de Melo and his
colleagues. They worked out two state diagrams which highlight the fact that
the xxx_TERM states in CCID-3/4 are in fact not necessary.

And this can be confirmed by in turn looking at the code: the xxx_TERM states
are only ever set in ccid3_hc_{rx,tx}_exit(): when CCID-3 sets the state
to xxx_TERM, it is at a time where no more processing should be going on,
hence it is not necessary to introduce a dedicated exit state - this is already
implied by unloading the CCID.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 4874c131 19-Sep-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Add packet type information to CCID-specific option parsing

This
1. adds packet type information to ccid_hc_{rx,tx}_parse_options(). This is
necessary, since table 3 in RFC 4340, 5.8 leaves it to the CCIDs to state
which options may (not) appear on what packet type.

2. adds such a check for CCID-3's {Loss Event, Receive} Rate as specified in
RFC 4340 8.3 ("Receive Rate options MUST NOT be sent on DCCP-Data packets")
and 8.5 ("Loss Event Rate options MUST NOT be sent on DCCP-Data packets").

3. removes an unused argument `idx' from ccid_hc_{rx,tx}_parse_options(). This
is also no longer necessary, since the CCID-specific option-parsing routines
are passed every single parameter of the type-length-value option encoding.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 37efb03f 14-Sep-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Simplify and consolidate tx_parse_options

This simplifies and consolidates the TX option-parsing code:

1. The Loss Intervals option is not currently used, so dead code related to
this option is removed. I am aware of no plans to support the option, but
if someone wants to implement it (e.g. for inter-op tests), it is better
to start afresh than having to also update currently unused code.

2. The Loss Event and Receive Rate options have a lot of code in common (both
are 32 bit, both have same length etc.), so this is consolidated.

3. The test against GSR is not necessary, because
- on first loading CCID3, ccid_new() zeroes out all fields in the socket;
- ccid3_hc_tx_packet_recv() treats 0 and ~0U equivalently, due to

pinv = opt_recv->ccid3or_loss_event_rate;
if (pinv == ~0U || pinv == 0)
hctx->p = 0;

- as a result, the sequence number field is removed from opt_recv.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# d2c72630 14-Sep-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: remove buggy RTT-sampling history lookup

This removes the RTT-sampling function tfrc_tx_hist_rtt(), since

1. it suffered from complex passing of return values (the return value both
indicated successful lookup while the value doubled as RTT sample);

2. when for some odd reason the sample value equalled 0, this triggered a bug
warning about "bogus Ack", due to the ambiguity of the return value;

3. on a passive host which has not sent anything the TX history is empty and
thus will lead to unwanted "bogus Ack" warnings such as
ccid3_hc_tx_packet_recv: server(e7b7d518): DATAACK with bogus ACK-28197148
ccid3_hc_tx_packet_recv: server(e7b7d518): DATAACK with bogus ACK-26641606.

The fix is to replace the implicit encoding by performing the steps manually.

Furthermore, the "bogus Ack" warning has been removed, since it can actually be
triggered due to several reasons (network reordering, old packet, (3) above),
hence it is not very useful.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 20cbd3e1 14-Sep-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: A lower bound for the inter-packet scheduling algorithm

This fixes a subtle bug in the calculation of the inter-packet gap and shows
that t_delta, as it is currently used, is not needed.

The algorithm from RFC 5348, 8.3 below continually computes a send time t_nom,
which is initialised with the current time t_now; t_gran = 1E6 / HZ specifies
the scheduling granularity, s the packet size, and X the sending rate:

t_distance = t_nom - t_now; // in microseconds
t_delta = min(t_ipi, t_gran) / 2; // `delta' parameter in microseconds

if (t_distance >= t_delta) {
reschedule after (t_distance / 1000) milliseconds;
} else {
t_ipi = s / X; // inter-packet interval in usec
t_nom += t_ipi; // compute the next send time
send packet now;
}

Problem:
--------
Rescheduling requires a conversion into milliseconds (sk_reset_timer()). The
highest jiffy resolution with HZ=1000 is 1 millisecond, so using a higher
granularity does not make much sense here.

As a consequence, values of t_distance < 1000 are truncated to 0. This issue
has so far been resolved by using instead

if (t_distance >= t_delta + 1000)
reschedule after (t_distance / 1000) milliseconds;

This is unnecessarily large, a lower bound is t_delta' = max(t_delta, 1000).
And it implies a further simplification:

a) when HZ >= 500, then t_delta <= t_gran/2 = 10^6/(2*HZ) <= 1000, so that
t_delta' = MAX(1000, t_delta) = 1000 (constant value);

b) when HZ < 500, then t_delta = 1/2*MIN(rtt, t_ipi, t_gran) <= t_gran/2,
so that 1000 <= t_delta' <= t_gran/2.

The maximum error of using a constant t_delta in (b) is less than half a jiffy.

Fix:
----
The patch replaces t_delta with a constant, whose value depends on CONFIG_HZ,
changing the above algorithm to:

if (t_distance >= t_delta')
reschedule after (t_distance / 1000) milliseconds;

where t_delta' = 10^6/(2*HZ) if HZ < 500, and t_delta' = 1000 otherwise.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 89858ad1 29-Aug-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: use per-route RTO or TCP RTO as fallback

This makes RTAX_RTO_MIN also available to CCID-3, replacing the compile-time
RTO lower bound with a per-route tunable value.

The original Kconfig option solved the problem that a very low RTT (in the
order of HZ) can trigger too frequent and unnecessary reductions of the
sending rate.

This tunable does not affect the initial RTO value of 2 seconds specified in
RFC 5348, section 4.2 and Appendix B. But like the hardcoded Kconfig value,
it allows to adapt to network conditions.

The same effect as the original Kconfig option of 100ms is now achieved by

> ip route replace to unicast 192.168.0.0/24 rto_min 100j dev eth0

(assuming HZ=1000).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 51c22bb5 22-Aug-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: No more CCID control blocks in LISTEN state

The CCIDs are activated as last of the features, at the end of the handshake,
were the LISTEN state of the master socket is inherited into the server
state of the child socket. Thus, the only states visible to CCIDs now are
OPEN/PARTOPEN, and the closing states.

This allows to remove tests which were previously necessary to protect
against referencing a socket in the listening state (in CCID-3), but which
now have become redundant.

As a further byproduct of enabling the CCIDs only after the connection has been
fully established, several typecast-initialisations of ccid3_hc_{rx,tx}_sock
can now be eliminated:
* the CCID is loaded, so it is not necessary to test if it is NULL,
* if it is possible to load a CCID and leave the private area NULL, then this
is a bug, which should crash loudly - and earlier,
* the test for state==OPEN || state==PARTOPEN now reduces only to the closing
phase (e.g. when the node has received an unexpected Reset).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 67b67e36 22-Aug-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk>

ccid: ccid-2/3 code cosmetics

This patch collects cosmetics-only changes to separate these from
code changes:
* update with regard to CodingStyle and whitespace changes,
* documentation:
- adding/revising comments,
- remove CCID-3 RX socket documentation which is either
duplicate or refers to fields that no longer exist,
* expand embedded tfrc_tx_info struct inline for consistency,
removing indirections via #define.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a7d13fbf 21-Jun-2010 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: remove unused function argument

This removes an unused 'sk' argument from several option-inserting functions.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b1383380 24-Mar-2010 Frans Pop <elendil@planet.nl>

net: remove trailing space in messages

Signed-off-by: Frans Pop <elendil@planet.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 996ccf49 04-Oct-2009 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Remove CCID naming redundancy 2/2

This continues the previous patch, by applying the same change to CCID-3.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 388d5e99 04-Oct-2009 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Overhaul CCID naming convention 2/2

This implements the new naming scheme also for CCID-3.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# aa1b1ff0 12-Sep-2009 Gerrit Renker <gerrit@erg.abdn.ac.uk>

net-next-2.6 [PATCH 1/1] dccp: ccids whitespace-cleanup / CodingStyle

No code change, cosmetical changes only:

* whitespace cleanup via scripts/cleanfile,
* remove self-references to filename at top of files,
* fix coding style (extraneous brackets),
* fix documentation style (kernel-doc-nano-HOWTO).

Thanks are due to Ivo Augusto Calado who raised these issues by
submitting good-quality patches.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 36cbd3dc 05-Aug-2009 Jan Engelhardt <jengelh@medozas.de>

net: mark read-only arrays as const

String literals are constant, and usually, we can also tag the array
of pointers const too, moving it to the .rodata section.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ddebc973 04-Jan-2009 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Lockless integration of CCID congestion-control plugins

Based on Arnaldo's earlier patch, this patch integrates the standardised
CCID congestion control plugins (CCID-2 and CCID-3) of DCCP with dccp.ko:

* enables a faster connection path by eliminating the need to always go
through the CCID registration lock;

* updates the implementation to use only a single array whose size equals
the number of configured CCIDs instead of the maximum (256);

* since the CCIDs are now fixed array elements, synchronization is no
longer needed, simplifying use and implementation.

CCID-2 is suggested as minimum for a basic DCCP implementation (RFC 4340, 10);
CCID-3 is a standards-track CCID supported by RFC 4342 and RFC 5348.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 410e27a4 09-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

This reverts "Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/dccp_exp"
as it accentally contained the wrong set of patches. These will be
submitted separately.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# a3cbdde8 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Preventing Oscillations

This implements [RFC 3448, 4.5], which performs congestion avoidance behaviour
by reducing the transmit rate as the queueing delay (measured in terms of
long-term RTT) increases.

Oscillation can be turned on/off via a module option (do_osc_prev) and via sysfs
(using mode 0644), the default is off.

Overflow analysis:
------------------
* oscillation prevention is done after update_x(), so that t_ipi <= 64000;
* hence the multiplication "t_ipi * sqrt(R_sample)" needs 64 bits;
* done using u64 for sqrt_sample and explicit typecast of t_ipi;
* the divisor, R_sqmean, is non-zero because oscillation prevention is first
called when receiving the second feedback packet, and tfrc_scaled_rtt() > 0.

A detailed discussion of the algorithm (with plots) is on
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3/sender_notes/oscillation_prevention/

The algorithm has negative side effects:
* when allowing to decrease t_ipi (leads to a large RTT) and
* when using it during slow-start;
both uses are therefore disabled.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 53ac9570 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Simplify computing and range-checking of t_ipi

This patch simplifies the computation of t_ipi, avoiding expensive computations
to enforce the minimum sending rate.

Both RFC 3448 and rfc3448bis (revision #06), as well as RFC 4342 sec 5., require
at various stages that at least one packet must be sent per t_mbi = 64 seconds.
This requires frequent divisions of the type X_min = s/t_mbi, which are later
converted back into an inter-packet-interval t_ipi_max = s/X_min = t_mbi.

The patch removes the expensive indirection; in the unlikely case of having
a sending rate less than one packet per 64 seconds, it also re-adjusts X.

The following cases document conformance with RFC 3448 / rfc3448bis-06:
1) Time until receiving the first feedback packet:
* if the sender has no initial RTT sample then X = s/1 Bps > s/t_mbi;
* if the sender has an initial RTT sample or when the first feedback
packet is received, X = W_init/R > s/t_mbi.

2) Slow-start (p == 0 and feedback packets come in):
* RFC 3448 (current code) enforces a minimum of s/R > s/t_mbi;
* rfc3448bis (future code) enforces an even higher minimum of W_init/R.

3) Congestion avoidance with no absence of feedback (p > 0):
* when X_calc or X_recv/2 are too low, the minimum of X_min = s/t_mbi
is enforced in update_x() when calling update_send_interval();
* update_send_interval() is, as before, only called when X changes
(i.e. either when increasing or decreasing, not when in equilibrium).

4) Reduction of X without prior feedback or during slow-start (p==0):
* both RFC 3448 and rfc3448bis here halve X directly;
* the associated constraint X >= s/t_mbi is nforced here by send_interval().

5) Reduction of X when p > 0:
* X is modified indirectly via X_recv (RFC 3448) or X_recv_set (rfc3448bis);
* in both cases, control goes back to section 4.3 (in both documents);
* since p > 0, both documents use X = max(min(...), s/t_mbi), which is
enforced in this patch by calling send_interval() from update_x().

I think that this analysis is exhaustive. Should I have forgotten a case,
the worst-case consideration arises when X sinks below s/t_mbi, and is then
increased back up to this minimum value. Even under this assumption, the
behaviour is correct, since all lower limits of X in RFC 3448 / rfc3448bis
are either equal to or greater than s/t_mbi.

Note on the condition X >= s/t_mbi <==> t_ipi = s/X <= t_mbi: since X is
scaled by 64, and all time units are in microseconds, the coded condition is:

t_ipi = s * 64 * 10^6 usec / X <= 64 * 10^6 usec

This simplifies to s / X <= 1 second <==> X * 1 second >= s > 0.
(A zero `s' is not allowed by the CCID-3 code).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# c8f41d50 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Measuring the packet size s with regard to rfc3448bis-06

rfc3448bis allows three different ways of tracking the packet size `s':

1. using the MSS/MPS (at initialisation, 4.2, and in 4.1 (1));
2. using the average of `s' (in 4.1);
3. using the maximum of `s' (in 4.2).

Instead of hard-coding a single interpretation of rfc3448bis, this implements
a choice of all three alternatives and suggests the first as default, since it
is the option which is most consistent with other parts of the specification.

The patch further deprecates the update of t_ipi whenever `s' changes. The
gains of doing this are only small since a change of s takes effect at the
next instant X is updated:
* when the next feedback comes in (within one RTT or less);
* when the nofeedback timer expires (within at most 4 RTTs).

Further, there are complications caused by updating t_ipi whenever s changes:
* if t_ipi had previously been updated to effect oscillation prevention (4.5),
then it is impossible to make the same adjustment to t_ipi again, thus
counter-acting the algorithm;
* s may be updated any time and a modification of t_ipi depends on the current
state (e.g. no oscillation prevention is done in the absence of feedback);
* in rev-06 of rfc3448bis, there are more possible cases, depending on whether
the sender is in slow-start (t_ipi <= R/W_init), or in congestion-avoidance,
limited by X_recv or the throughput equation (t_ipi <= t_mbi).

Thus there are side effects of always updating t_ipi as s changes. These may not
be desirable. The only case I can think of where such an update makes sense is
to recompute X_calc when p > 0 and when s changes (not done by this patch).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 9d497a2c 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Implement rfc3448bis change to initial-rate computation

The patch updates CCID-3 with regard to the latest rfc3448bis-06:
* in the first revisions of the draft, MSS was used for the RFC 3390 window;
* then (from revision #1 to revision #2), it used the packet size `s';
* now, in this revision (and apparently final), the value is back to MSS.

This change has an implication for the case when no RTT sample is available,
at the time of sending the first packet:

* with RTT sample, 2*MSS/RTT <= initial_rate <= 4*MSS/RTT;
* without RTT sample, the initial rate is one packet (s bytes) per second
(sec. 4.2), but using s instead of MSS here creates an imbalance, since
this would further reduce the initial sending rate.

Hence the patch uses MSS (called MPS in RFC 4340) in all places.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 88e97a93 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Update the RX history records in one place

This patch is a requirement for enabling ECN support later on. With that change
in mind, the following preparations are done:
* renamed handle_loss() into congestion_event() since it returns true when a
congestion event happens (it will eventually also take care of ECN packets);
* lets tfrc_rx_congestion_event() always update the RX history records, since
this routine needs to be called for each non-duplicate packet anyway;
* made all involved boolean-type functions to have return type `bool';

Updating the RX history records is now only necessary for the packets received
up to sending the first feedback. The receiver code becomes again simpler.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 68c89ee5 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Update the computation of X_recv

This updates the computation of X_recv with regard to Errata 610/611 for
RFC 4342 and draft rfc3448bis-06, ensuring that at least an interval of 1
RTT is used to compute X_recv. The change is wrapped into a new function
ccid3_hc_rx_x_recv().

Further changes:
----------------
* feedback is not sent when no data packets arrived (bytes_recv == 0), as per
rfc3448bis-06, 6.2;
* take the timestamp for the feedback /after/ dccp_send_ack() returns, to avoid
taking the transmission time into account (in case layer-2 is busy);
* clearer handling of failure in ccid3_first_li().

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 2b81143a 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Always perform receiver RTT sampling

This updates the CCID-3 receiver in part with regard to errata 610 and 611
(http://www.rfc-editor.org/errata_list.php), which change RFC 4342 to use the
Receive Rate as specified in rfc3448bis, requiring to constantly sample the
RTT (or use a sender RTT).

Doing this requires reusing the RX history structure after dealing with a loss.

The patch does not resolve how to compute X_recv if the interval is less
than 1 RTT. A FIXME has been added (and is resolved in subsequent patch).

Furthermore, since this is all TFRC-based functionality, the RTT estimation
is now also performed by the dccp_tfrc_lib module. This further simplifies
the CCID-3 code.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 2f3e3bba 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Remove duplicate RX states

The only state information that the CCID-3 receiver keeps is whether initial
feedback has been sent or not. Further, this overlaps with use of feedback:

* state == TFRC_RSTATE_NO_DATA as long as no feedback has been sent;
* state == TFRC_RSTATE_DATA as soon as the first feedback has been sent.

This patch reduces the duplication, by memorising the type of the last feedback.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 34a081be 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp tfrc: Let dccp_tfrc_lib do the sampling work

This migrates more TFRC-related code into the dccp_tfrc_lib:
* sampling of the packet size `s' (which is only needed until the first
loss interval is computed (ccid3_first_li));
* updating the byte-counter `bytes_recvd' in between sending feedbacks.
The result is a better separation of CCID-3 specific and TFRC specific
code, which aids future integration with ECN and e.g. CCID-4.

Further changes:
----------------
* replaced magic number of 536 with equivalent constant TCP_MIN_RCVMSS;
(this constant is also used when no estimate for `s' is available).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 3ca7aea0 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp tfrc: Return type of update_i_mean is void

This changes the return type of tfrc_lh_update_i_mean() to void, since that
function returns always `false'. This is due to

len = dccp_delta_seqno(cur->li_seqno, DCCP_SKB_CB(skb)->dccpd_seq) + 1;

if (len - (s64)cur->li_length <= 0) /* duplicate or reordered */
return 0;

which means that update_i_mean can only increase the length of the open loss
interval I_0, and hence the value of I_tot0 (RFC 3448, 5.4). Consequently the
test `i_mean < old_i_mean' at the end of the function always evaluates to false.

There is no known way by which a loss interval can suddenly become shorter,
therefore the return type of the function is changed to void. (That is, under
the given circumstances step (3) in RFC 3448, 6.1 will not occur.)

Further changes:
----------------
* the function is now called from tfrc_rx_handle_loss, which is equivalent
to the previous way of calling from rx_packet_recv (it was called whenever
there was no new or pending loss, now it is also updated when there is
a pending loss - this increases the accuracy a bit);
* added a FIXME to possibly consider NDP counting as per RFC 4342 (this is
not implemented yet).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# d20ed95f 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp tfrc: Perform early loss detection

This enables the TFRC code to begin loss detection (as soon as the module
is loaded), using the latest updates from rfc3448bis-06, 6.3.1:

* when the first data packet(s) are lost or marked, set
* X_target = s/(2*R) => f(p) = s/(R * X_target) = 2,
* corresponding to a loss rate of ~ 20.64%.

The handle_loss() function is now called right at the begin of rx_packet_recv()
and thus no longer protected against duplicates: hence a call to rx_duplicate()
has been added. Such a call makes sense now, as the previous patch initialises
the first entry with a sequence number of GSR.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 24b8d343 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp tfrc: Receiver history initialisation routine

This patch
1) separates history allocation and initialisation, to facilitate early
loss detection (implemented by a subsequent patch);

2) removes duplication by using the existing tfrc_rx_hist_purge() if the
allocation fails. This is now possible, since the initialisation routine
3) zeroes out the entire history before using it.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# d0c05fe4 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Simplified handling of TX states

Since CCIDs are only used during the established phase of a connection,
they have very little internal state; this specifically reduces to:

* "no packet sent" if and only if s == 0, for the TX packet size s;

* when the first packet has been sent (i.e. `s' > 0), the question is whether
or not feedback has been received:
- if a feedback packet is received, "feedback = yes" is set,
- if the nofeedback timer expires, "feedback = no" is set.

Thus the CCID only needs to remember state about whether or not feedback
has been received. This is now implemented using a boolean flag, which is
toggled when a feedback packet arrives or the nofeedback timer expires.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# f76fd327 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Runtime verification of timer resolution

The DCCP base time resolution is 10 microseconds (RFC 4340, 13.1 ... 13.3).

Using a timer with a lower resolution was found to trigger the following
bug warnings/problems on high-speed networks (e.g. local loopback):
* RTT samples are rounded down to 0 if below resolution;
* in some cases, negative RTT samples were observed;
* the CCID-3 feedback timer complains that the feedback interval is 0,
since the feedback interval is in the order of 1 RTT or less and RTT
measurement rounded this down to 0;
On an Intel computer this will for instance happen when using a
boot-time parameter of "clocksource=jiffies".

The following system log messages were observed:
11:24:00 kernel: BUG: delta (0) <= 0 at ccid3_hc_rx_send_feedback()
11:26:12 kernel: BUG: delta (0) <= 0 at ccid3_hc_rx_send_feedback()
11:26:30 kernel: dccp_sample_rtt: unusable RTT sample 0, using min
11:26:30 last message repeated 5 times

This patch defines a global constant for the time resolution, adds this in
timer.c, and checks the available clock resolution at CCID-3 module load time.

When the resolution is worse than 10 microseconds, module loading exits with
a message "socket type not supported".

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# f4a66ca4 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Return-value convention of hc_tx_send_packet()

This patch reorganises the return value convention of the CCID TX sending
function, to permit more flexible schemes, as required by subsequent patches.

Currently the convention is
* values < 0 mean error,
* a value == 0 means "send now", and
* a value x > 0 means "send in x milliseconds".

The patch provides symbolic constants and a function to interpret return values.
In addition, it caps the maximum positive return value to 0xFFFF milliseconds,
corresponding to 65.535 seconds.

This is possible since in CCID-3 the maximum inter-packet gap is t_mbi = 64 sec.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# d0995e6a 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Remove dead states

This patch is thanks to an investigation by Leandro Sales de Melo and his
colleagues. They worked out two state diagrams which highlight the fact that
the xxx_TERM states in CCID-3/4 are in fact not necessary.

And this can be confirmed by in turn looking at the code: the xxx_TERM states
are only ever set in ccid3_hc_{rx,tx}_exit(). These two functions are part
of the following call chain:

* ccid_hc_{tx,rx}_exit() are called from ccid_delete() only;
* ccid_delete() invokes ccid_hc_{tx,rx}_exit() in the way of a destructor:
after calling ccid_hc_{tx,rx}_exit(), the CCID is released from memory;
* ccid_delete() is in turn called only by ccid_hc_{tx,rx}_delete();
* ccid_hc_{tx,rx}_delete() is called only if
- feature negotiation failed (dccp_feat_activate_values()),
- when changing the RX/TX CCID (to eject the current CCID),
- when destroying the socket (in dccp_destroy_sock()).

In other words, when CCID-3 sets the state to xxx_TERM, it is at a time where
no more processing should be going on, hence it is not necessary to introduce
a dedicated exit state - this is implicit when unloading the CCID.

The patch removes this state, one switch-statement collapses as a result.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# c506d91d 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Unused argument in CCID tx function

This removes the argument `more' from ccid_hc_tx_packet_sent, since it was
nowhere used in the entire code.

(Anecdotally, this argument was not even used in the original KAME code where
the function originally came from; compare the variable moreToSend in the
freebsd61-dccp-kame-28.08.2006.patch now maintained by Emmanuel Lochin.)

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# ce177ae2 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Remove redundant 'options_received' struct

The `options_received' struct is redundant, since it re-duplicates the existing
`p' and `x_recv' fields. This patch removes the sub-struct and migrates the
format conversion operations (cf. below) to ccid3_hc_tx_parse_options().

Why the fields are redundant
----------------------------
The Loss Event Rate p and the Receive Rate x_recv are initially 0 when first
loading CCID-3, as ccid_new() zeroes out the entire ccid3_hc_tx_sock.

When Loss Event Rate or Receive Rate options are received, they are stored by
ccid3_hc_tx_parse_options() into the fields `ccid3or_loss_event_rate' and
`ccid3or_receive_rate' of the sub-struct `options_received' in ccid3_hc_tx_sock.

After parsing (considering only the established state - dccp_rcv_established()),
the packet is passed on to ccid_hc_tx_packet_recv(). This calls the CCID-3
specific routine ccid3_hc_tx_packet_recv(), which performs the following copy
operations between fields of ccid3_hc_tx_sock:

* hctx->options_received.ccid3or_receive_rate is copied into hctx->x_recv,
after scaling it for fixpoint arithmetic, by 2^64;
* hctx->options_received.ccid3or_loss_event_rate is copied into hctx->p,
considering the above special cases; in addition, a value of 0 here needs to
be mapped into p=0 (when no Loss Event Rate option has been received yet).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 535c55df 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp tfrc/ccid-3: Computing Loss Rate from Loss Event Rate

This adds a function to take care of the following cases occurring in the
computation of the Loss Rate p:

* 1/(2^32-1) is mapped into 0% as per RFC 4342, 8.5;
* 1/0 is mapped into the maximum of 100%;
* we want to avoid that p = 1/x is rounded down to 0 when x is very large,
since this means accidentally re-entering slow-start (indicated by p==0).

In the last case, the minimum-resolution value of p is returned.

Furthermore, a bug in ccid3_hc_rx_getsockopt is fixed (1/0 was mapped into ~0U),
which now allows to consistently print the scaled p-values as

printf("Loss Event Rate = %u.%04u %%\n", rx_info.tfrcrx_p / 10000,
rx_info.tfrcrx_p % 10000);

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 3306c781 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Add packet type information to CCID-specific option parsing

This patch ...
1. adds packet type information to ccid_hc_{rx,tx}_parse_options(). This is
necessary, since table 3 in RFC 4340, 5.8 leaves it to the CCIDs to state
which options may (not) appear on what packet type.

2. adds such a check for CCID-3's {Loss Event, Receive} Rate as specified in
RFC 4340 8.3 ("Receive Rate options MUST NOT be sent on DCCP-Data packets")
and 8.5 ("Loss Event Rate options MUST NOT be sent on DCCP-Data packets").

3. removes an unused argument `idx' from ccid_hc_{rx,tx}_parse_options(). This
is also no longer necessary, since the CCID-specific option-parsing routines
are passed every single parameter of the type-length-value option encoding.

Also added documentation and made argument naming scheme consistent.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 47a61e7b 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Simplify and consolidate tx_parse_options

This simplifies and consolidates the TX option-parsing code:

1. The Loss Intervals option is not currently used, so dead code related to
this option is removed. I am aware of no plans to support the option, but
if someone wants to implement it (e.g. for inter-op tests), it is better
to start afresh than having to also update currently unused code.

2. The Loss Event and Receive Rate options have a lot of code in common (both
are 32 bit, both have same length etc.), so this is consolidated.

3. The test against GSR is not necessary, because
- on first loading CCID3, ccid_new() zeroes out all fields in the socket;
- ccid3_hc_tx_packet_recv() treats 0 and ~0U equivalently, due to

pinv = opt_recv->ccid3or_loss_event_rate;
if (pinv == ~0U || pinv == 0)
hctx->p = 0;

- as a result, the sequence number field is removed from opt_recv.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 63b3a73b 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Remove ugly RTT-sampling history lookup

This removes the RTT-sampling function tfrc_tx_hist_rtt(), since

1. it suffered from complex passing of return values (the return value both
indicated successful lookup while the value doubled as RTT sample);

2. when for some odd reason the sample value equalled 0, this triggered a bug
warning about "bogus Ack", due to the ambiguity of the return value;

3. on a passive host which has not sent anything the TX history is empty and
thus will lead to unwanted "bogus Ack" warnings such as
ccid3_hc_tx_packet_recv: server(e7b7d518): DATAACK with bogus ACK-28197148
ccid3_hc_tx_packet_recv: server(e7b7d518): DATAACK with bogus ACK-26641606.

The fix is to replace the implicit encoding by performing the steps manually.

Furthermore, the "bogus Ack" warning has been removed, since it can actually be
triggered due to several reasons (network reordering, old packet, (3) above),
hence it is not very useful.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# de6f2b59 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Bug fix for the inter-packet scheduling algorithm

This fixes a subtle bug in the calculation of the inter-packet gap and shows
that t_delta, as it is currently used, is not needed. And hence replaced.

The algorithm from RFC 3448, 4.6 below continually computes a send time t_nom,
which is initialised with the current time t_now; t_gran = 1E6 / HZ specifies
the scheduling granularity, s the packet size, and X the sending rate:

t_distance = t_nom - t_now; // in microseconds
t_delta = min(t_ipi, t_gran) / 2; // `delta' parameter in microseconds

if (t_distance >= t_delta) {
reschedule after (t_distance / 1000) milliseconds;
} else {
t_ipi = s / X; // inter-packet interval in usec
t_nom += t_ipi; // compute the next send time
send packet now;
}


1) Description of the bug
-------------------------
Rescheduling requires a conversion into milliseconds, due to this call chain:

* ccid3_hc_tx_send_packet() returns a timeout in milliseconds,
* this value is converted by msecs_to_jiffies() in dccp_write_xmit(),
* and finally used as jiffy-expires-value for sk_reset_timer().

The highest jiffy resolution with HZ=1000 is 1 millisecond, so using a higher
granularity does not make much sense here.

As a consequence, values of t_distance < 1000 are truncated to 0. This issue
has so far been resolved by using instead

if (t_distance >= t_delta + 1000)
reschedule after (t_distance / 1000) milliseconds;

The bug is in artificially inflating t_delta to t_delta' = t_delta + 1000. This
is unnecessarily large, a more adequate value is t_delta' = max(t_delta, 1000).


2) Consequences of using the corrected t_delta'
-----------------------------------------------
Since t_delta <= t_gran/2 = 10^6/(2*HZ), we have t_delta <= 1000 as long as
HZ >= 500. This means that t_delta' = max(1000, t_delta) is constant at 1000.

On the other hand, when using a coarse HZ value of HZ < 500, we have three
sub-cases that can all be reduced to using another constant of t_gran/2.

(a) The first case arises when t_ipi > t_gran. Here t_delta' is the constant
t_delta' = max(1000, t_gran/2) = t_gran/2.

(b) If t_ipi <= 2000 < t_gran = 10^6/HZ usec, then t_delta = t_ipi/2 <= 1000,
so that t_delta' = max(1000, t_delta) = 1000 < t_gran/2.

(c) If 2000 < t_ipi <= t_gran, we have t_delta' = max(t_delta, 1000) = t_ipi/2.

In the second and third cases we have delay values less than t_gran/2, which is
in the order of less than or equal to half a jiffy.

How these are treated depends on how fractions of a jiffy are handled: they
are either always rounded down to 0, or always rounded up to 1 jiffy (assuming
non-zero values). In both cases the error is on average in the order of 50%.

Thus we are not increasing the error when in the second/third case we replace
a value less than t_gran/2 with 0, by setting t_delta' to the constant t_gran/2.


3) Summary
----------
Fixing (1) and considering (2), the patch replaces t_delta with a constant,
whose value depends on CONFIG_HZ, changing the above algorithm to:

if (t_distance >= t_delta')
reschedule after (t_distance / 1000) milliseconds;

where t_delta' = 10^6/(2*HZ) if HZ < 500, and t_delta' = 1000 otherwise.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# b2e317f4 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: No more CCID control blocks in LISTEN state

The CCIDs are activated as last of the features, at the end of the handshake,
were the LISTEN state of the master socket is inherited into the server
state of the child socket. Thus, the only states visible to CCIDs now are
OPEN/PARTOPEN, and the closing states.

This allows to remove tests which were previously necessary to protect
against referencing a socket in the listening state (in CCID3), but which
now have become redundant.

As a further byproduct of enabling the CCIDs only after the connection has been
fully established, several typecast-initialisations of ccid3_hc_{rx,tx}_sock
can now be eliminated:
* the CCID is loaded, so it is not necessary to test if it is NULL,
* if it is possible to load a CCID and leave the private area NULL, then this
is a bug, which should crash loudly - and earlier,
* the test for state==OPEN || state==PARTOPEN now reduces only to the closing
phase (e.g. when the node has received an unexpected Reset).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>


# 842d1ef1 03-Sep-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Remove ccid3hc{tx,rx}_ prefixes

This patch does the same for CCID-3 as the previous patch for CCID-2:

s#ccid3hctx_##g;
s#ccid3hcrx_##g;

plus manual editing to retain consistency.

Please note: expanded the fields of the `struct tfrc_tx_info' in the hc_tx_sock,
since using short #define identifiers is not a good idea. The only place where
this embedded struct was used is ccid3_hc_tx_getsockopt().

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 43264991 23-Aug-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Toggle debug output without module unloading

This sets the sysfs permissions so that root can toggle the `debug'
parameter available for nearly every DCCP module. This is useful
since there are various module inter-dependencies. The debug flag
can now be toggled at runtime using

echo 1 > /sys/module/dccp/parameters/dccp_debug
echo 1 > /sys/module/dccp_ccid2/parameters/ccid2_debug
echo 1 > /sys/module/dccp_ccid3/parameters/ccid3_debug
echo 1 > /sys/module/dccp_tfrc_lib/parameters/tfrc_debug

The last is not very useful yet, since no code at the moment calls
the tfrc_debug() macro.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 157439fa 23-Aug-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Toggle debug output without module unloading

This sets the sysfs permissions so that root can toggle the `debug'
parameter available for nearly every DCCP module. This is useful
since there are various module inter-dependencies. The debug flag
can now be toggled at runtime using

echo 1 > /sys/module/dccp/parameters/dccp_debug
echo 1 > /sys/module/dccp_ccid2/parameters/ccid2_debug
echo 1 > /sys/module/dccp_ccid3/parameters/ccid3_debug
echo 1 > /sys/module/dccp_tfrc_lib/parameters/tfrc_debug

The last is not very useful yet, since no code at the moment calls
the tfrc_debug() macro.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# b552c623 13-Jul-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Fix a loss detection bug

This fixes a bug in the logic of the TFRC loss detection:
* new_loss_indicated() should not be called while a loss is pending;
* but the code allows this;
* thus, for two subsequent gaps in the sequence space, when loss_count
has not yet reached NDUPACK=3, the loss_count is falsely reduced to 1.

To avoid further and similar problems, all loss handling and loss detection is
now done inside tfrc_rx_hist_handle_loss(), using an appropriate routine to
track new losses.

Further changes:
----------------
* added a reminder that no RX history operations should be performed when
rx_handle_loss() has identified a (new) loss, since the function takes
care of packet reordering during loss detection;
* made tfrc_rx_hist_loss_pending() bool (thanks to an earlier suggestion
by Arnaldo);
* removed unused functions.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 5b5d0e70 13-Jul-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Upgrade NDP count from 3 to 6 bytes

RFC 4340, 7.7 specifies up to 6 bytes for the NDP Count option, whereas the code
is currently limited to up to 3 bytes. This seems to be a relict of an earlier
draft version and is brought up to date by the patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 1e2f0e5e 11-Jun-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp: Fix sparse warnings

This patch fixes the following sparse warnings:
* nested min(max()) expression:
net/dccp/ccids/ccid3.c:91:21: warning: symbol '__x' shadows an earlier one
net/dccp/ccids/ccid3.c:91:21: warning: symbol '__y' shadows an earlier one

* Declaration of function prototypes in .c instead of .h file, resulting in
"should it be static?" warnings.

* Declared "struct dccpw" static (local to dccp_probe).

* Disabled dccp_delayed_ack() - not fully removed due to RFC 4340, 11.3
("Receivers SHOULD implement delayed acknowledgement timers ...").

* Used a different local variable name to avoid
net/dccp/ackvec.c:293:13: warning: symbol 'state' shadows an earlier one
net/dccp/ackvec.c:238:33: originally declared here

* Removed unused functions `dccp_ackvector_print' and `dccp_ackvec_print'.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 3294f202 11-Jun-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Bug-Fix - Zero RTT is possible

In commit $(825de27d9e40b3117b29a79d412b7a4b78c5d815) (from 27th May, commit
message `dccp ccid-3: Fix "t_ipi explosion" bug'), the CCID-3 window counter
computation was fixed to cope with RTTs < 4 microseconds.

Such RTTs can be found e.g. when running CCID-3 over loopback. The fix removed
a check against RTT < 4, but introduced a divide-by-zero bug.

All steady-state RTTs in DCCP are filtered using dccp_sample_rtt(), which
ensures non-zero samples. However, a zero RTT is possible on initialisation,
when there is no RTT sample from the Request/Response exchange.

The fix is to use the fallback-RTT from RFC 4340, 3.4.

This is also better than just fixing update_win_count() since it allows other
parts of the code to always assume that the RTT is non-zero during the time
that the CCID is used.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>


# 825de27d 27-May-2008 Gerrit Renker <gerrit@erg.abdn.ac.uk>

dccp ccid-3: Fix "t_ipi explosion" bug

The identification of this bug is thanks to Cheng Wei and Tomasz
Grobelny.

To avoid divide-by-zero, the implementation previously ignored RTTs
smaller than 4 microseconds when performing integer division RTT/4.

When the RTT reached a value less than 4 microseconds (as observed on
loopback), this prevented the Window Counter CCVal value from
advancing. As a result, the receiver stopped sending feedback. This in
turn caused non-ending expiries of the nofeedback timer at the sender,
so that the sending rate was progressively reduced until reaching the
minimum of one packet per 64 seconds.

The patch fixes this bug by handling integer division more
intelligently. Due to consistent use of dccp_sample_rtt(),
divide-by-zero-RTT is avoided.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 84994e16 02-May-2008 Harvey Harrison <harvey.harrison@gmail.com>

dccp: ccid2.c, ccid3.c use clamp(), clamp_t()

Makes the intention of the nested min/max clear.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c4e18dad 06-Jan-2008 Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

[CCID3]: Kill some bloat

Without a number of CONFIG.*DEBUG:

net/dccp/ccids/ccid3.c:
ccid3_hc_tx_update_x | -170
ccid3_hc_tx_packet_sent | -175
ccid3_hc_tx_packet_recv | -169
ccid3_hc_tx_no_feedback_timer | -192
ccid3_hc_tx_send_packet | -144
5 functions changed, 850 bytes removed, diff: -850

net/dccp/ccids/ccid3.c:
ccid3_update_send_interval | +191
1 function changed, 191 bytes added, diff: +191

net/dccp/ccids/ccid3.o:
6 functions changed, 191 bytes added, 850 bytes removed, diff: -659

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 52515e77 16-Dec-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Nofeedback timer according to rfc3448bis

This implements the changes to the nofeedback timer handling suggested
in draft rfc3448bis00, section 4.4. In particular, these changes mean:

* better handling of the lossless case (p == 0)
* the timestamp for computing t_ld becomes obsolete
* much more recent document (RFC 3448 is almost 5 years old)
* concepts in rfc3448bis arose from a real, working implementation
(cf. sec. 12)

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d8d1252f 16-Dec-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Implement rfc3448bis changes to feedback reception

This implements the algorithm to update the allowed sending rate X upon
receiving feedback packets, as described in draft rfc3448bis, 4.2/4.3.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5bd370a6 17-Dec-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Remove two irrelevant states in TX feedback handling

* the NO_SENT state is only triggered in bidirectional mode,
costing unnecessary processing.
* the TERM (terminating) state is irrelevant.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8e138e79 17-Dec-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Use a function to update p_inv, and p is never used

This patch
1) concentrates previously scattered computation of p_inv into one function;
2) removes the `p' element of the CCID3 RX sock (it is redundant);
3) makes the tfrc_rx_info structure standalone, only used on demand.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 84a97b0a 13-Dec-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID]: More informative registration

The patch makes the registration messages of CCID 2/3 a bit more
informative: instead of repeating the CCID number as currently done,

"CCID: Registered CCID 2 (ccid2)" or
"CCID: Registered CCID 3 (ccid3)",

the descriptive names of the CCID's (from RFCs) are now used:

"CCID: Registered CCID 2 (TCP-like)" and
"CCID: Registered CCID 3 (TCP-Friendly Rate Control)".

To allow spaces in the name, the slab name string has been changed to
refer to the numeric CCID identifier, using the same format as before.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 954c2db8 12-Dec-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Interface CCID3 code with newer Loss Intervals Database

This hooks up the TFRC Loss Interval database with CCID 3 packet reception.
In addition, it makes the CCID-specific computation of the first loss
interval (which requires access to all the guts of CCID3) local to ccid3.c.

The patch also fixes an omission in the DCCP code, that of a default /
fallback RTT value (defined in section 3.4 of RFC 4340 as 0.2 sec); while
at it, the upper bound of 4 seconds for an RTT sample has been reduced to
match the initial TCP RTO value of 3 seconds from[RFC 1122, 4.2.3.1].

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# db641960 12-Dec-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Redundant debugging output / documentation

Each time feedback is sent two lines are printed:

ccid3_hc_rx_send_feedback: client ... - entry
ccid3_hc_rx_send_feedback: Interval ...usec, X_recv=..., 1/p=...

The first line is redundant and thus removed.

Further, documentation of ccid3_hc_rx_sock (capitalisation) is made consistent.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 385ac2e3 08-Dec-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: HC-receiver should not insert timestamps as HC-sender doesn't uses it

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b84a2189 06-Dec-2007 Arnaldo Carvalho de Melo <acme@redhat.com>

[TFRC]: New rx history code

Credit here goes to Gerrit Renker, that provided the initial implementation for
this new codebase.

I modified it just to try to make it closer to the existing API, renaming some
functions, add namespacing and fix one bug where the tfrc_rx_hist_alloc was not
freeing the allocated ring entries on the error path.

Original changeset comment from Gerrit:
-----------
This provides a new, self-contained and generic RX history service for TFRC
based protocols.

Details:
* new data structure, initialisation and cleanup routines;
* allocation of dccp_rx_hist entries local to packet_history.c,
as a service exported by the dccp_tfrc_lib module.
* interface to automatically track highest-received seqno;
* receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1);
* a generic function to test for `data packets' as per RFC 4340, sec. 7.7.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 30a0eacd 05-Dec-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: The receiver of a half-connection does not set window counter values

Only the sender sets window counters [RFC 4342, sections 5 and 8.1].

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d58d1af0 05-Dec-2007 Arnaldo Carvalho de Melo <acme@redhat.com>

[TFRC]: Rename dccp_rx_ to tfrc_rx_

This is in preparation for merging the new rx history code written by Gerrit Renker.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 34a9e7ea 05-Dec-2007 Arnaldo Carvalho de Melo <acme@redhat.com>

[TFRC]: Make the rx history slab be global

This is in preparation for merging the new rx history code written by Gerrit Renker.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9108d5f4 29-Nov-2007 Arnaldo Carvalho de Melo <acme@redhat.com>

[TFRC]: Hide tx history details from the CCIDs

Based on a previous patch by Gerrit Renker.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 276f2edc 28-Nov-2007 Arnaldo Carvalho de Melo <acme@redhat.com>

[TFRC]: Migrate TX history to singly-linked lis

This patch was based on another made by Gerrit Renker, his changelog was:

------------------------------------------------------
The patch set migrates TFRC TX history to a singly-linked list.

The details are:
* use of a consistent naming scheme (all TFRC functions now begin with `tfrc_');
* allocation and cleanup are taken care of internally;
* provision of a lookup function, which is used by the CCID TX infrastructure
to determine the time a packet was sent (in turn used for RTT sampling);
* integration of the new interface with the present use in CCID3.
------------------------------------------------------

Simplifications I did:

. removing the tfrc_tx_hist_head that had a pointer to the list head and
another for the slabcache.
. No need for creating a slabcache for each CCID that wants to use the TFRC
tx history routines, create a single slabcache when the dccp_tfrc_lib module
init routine is called.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c3ada46a 20-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Inline for moving average

The moving average computation occurs so frequently in the CCID 3 code that
it merits an inline function of its own. This is uses a suggestion by
Arnaldo as per http://www.mail-archive.com/dccp@vger.kernel.org/msg01662.html

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a5358fdc 20-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Accurately determine idle & application-limited periods

This fixes/updates the handling of idle and application-limited periods in CCID3,
which currently is broken: there is no detection as to how long a sender has been
idle - there is only one flag which is toggled in between function calls.

Being obsolete now, the `idle' flag is removed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# eb279b79 20-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Ignore trivial amounts of elapsed time

This patch fixes a previously undiscovered bug; the problem is in computing
the elapsed time as the time between `receiving' the packet (i.e. skb enters
CCID module) and sending feedback:

- there is no layer-processing, queueing, or delay involved,
- hence the elapsed time is in the order of 1 function call
- this is in the dimension of maximally 50..100usec
- which renders the use of elapsed time almost entirely useless.

The fix is simply to ignore such trivial amounts of elapsed time.

As a further advantage, the now useless elapsed_time field can be removed from
the socket, which reduces the socket structure by another four bytes.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6c08b2cf 20-Nov-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Revert use of MSS instead of s

This updates the CCID3 code with regard to two instances of using `MSS' in place of `s':

1. The RFC3390-based initial rate: both rfc3448bis as well as the Faster Restart
draft now consistently use `s' instead of MSS.

2. Now agrees with section 4.2 of rfc3448bis: "If the sender is ready to send data when
it does not yet have a round trip sample, the value of X is set to s bytes per
second, for segment size s [...]"

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b24b8a24 23-Jan-2008 Pavel Emelyanov <xemul@openvz.org>

[NET]: Convert init_timer into setup_timer

Many-many code in the kernel initialized the timer->function
and timer->data together with calling init_timer(timer). There
is already a helper for this. Use it for networking code.

The patch is HUGE, but makes the code 130 lines shorter
(98 insertions(+), 228 deletions(-)).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5e8e034c 20-Dec-2007 Joe Perches <joe@perches.com>

[DCCP]: Spelling fixes

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 24c667db 24-Oct-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID2/3]: Initialisation assignments of 0 are redundant

Assigning initial values of `0' is redundant when loading a new CCID structure,
since in net/dccp/ccid.c the entire CCID structure is zeroed out prior to
initialisation in ccid_new():

struct ccid {
struct ccid_operations *ccid_ops;
char ccid_priv[0];
};

// ...
if (rx) {
memset(ccid + 1, 0, ccid_ops->ccid_hc_rx_obj_size);
if (ccid->ccid_ops->ccid_hc_rx_init != NULL &&
ccid->ccid_ops->ccid_hc_rx_init(ccid, sk) != 0)
goto out_free_ccid;
} else {
memset(ccid + 1, 0, ccid_ops->ccid_hc_tx_obj_size);
/* analogous to the rx case */
}

This patch therefore removes the redundant assignments. Thanks to Arnaldo for
the inspiration.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 76fd1e87 24-Oct-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Unaligned pointer access

This fixes `unaligned (read) access' errors of the type

Kernel unaligned access at TPC[100f970c] dccp_parse_options+0x4f4/0x7e0 [dccp]
Kernel unaligned access at TPC[1011f2e4] ccid3_hc_tx_parse_options+0x1ac/0x380 [dccp_ccid3]
Kernel unaligned access at TPC[100f9898] dccp_parse_options+0x680/0x880 [dccp]

by using the get_unaligned macro for parsing options.

Commiter note: Preserved the sparse __be{16,32} annotations.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# 042d18f9 04-Oct-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Make all `debug' parameters bool

This just sets the parameter to bool, since debugging messages are
either on or off.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2e86908f 26-Sep-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Move NULL-protection into function

This moves several instances of testing against NULL into the function which is
used to de-reference the CCID-private data.

Committer note: Made the BUG_ON depend on having CONFIG_IP_DCCP_CCID3_DEBUG, as it
is too much to have this on production code. Also made sure that
the macro is used only after checking if sk_state is not LISTEN,
to make it equivalent to what we had before.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>


# 3393da82 25-Sep-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Simplify interface of dccp_sample_rtt

The third parameter of dccp_sample_rtt now becomes useless and is removed.

Also combined the subtraction of the timestamp echo and the elapsed time.
This is safe, since (a) presence of timestamp echo is tested first and (b)
elapsed time is either present and non-zero or it is not set and equals 0
due to the memset in dccp_parse_options.

To avoid measuring option-processing time, the timestamp for measuring the
initial Request/Response RTT sample is taken directly when the function is
called (the Linux implementation always adds a timestamp on the Request,
so there is no loss in doing this).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# aa97efd9 25-Sep-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Reuse ktime_get_real() calls again

This patch reduces the number of timestamps taken in the receive path
for each packet.

The ccid3_hc_tx_update_x() routine is called in
* the receive path for each CCID3-controlled packet
* for the nofeedback timer (if no feedback arrives during 4 RTT)

Currently, when there is no loss, each packet gets timestamped twice.
The patch resolves this by recycling the first timestamp taken on packet
reception for RTT sampling.

When the no_feedback_timer() is called, then the timestamp argument is
simply set to NULL - so that ccid3_hc_tx_update_x() takes care of the logic.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0740d49c 19-Aug-2007 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

[DCCP] packet_history: Convert dccphtx_tstamp to ktime_t

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e7c23357 19-Aug-2007 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

[DCCP] packet_history: convert dccphrx_tstamp to ktime_t

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 668348a4 19-Aug-2007 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

[DCCP] CCID3: Stop using dccp_timestamp

Now to convert the ackvec code to ktime_t so that we can get rid of
dccp_timestamp and the epoch thing in dccp_sock.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9823b7b5 19-Aug-2007 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

[DCCP]: Convert dccp_sample_rtt to ktime_t

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e7a81c6d 19-Aug-2007 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

[DCCP]: Convert ccid3hcrx_tstamp_last_feedback to ktime_t

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1faf0a1f 19-Aug-2007 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

[DCCP]: Convert ccid3hcrx_tstamp_last_ack to ktime_t

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 23f062af 19-Aug-2007 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

[DCCP]: Convert ccid3hctx_t_ld to ktime_t

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ac198ea8 19-Aug-2007 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

[DCCP]: Make ccid3_hc_tx_update_x get a timestamp if needed

The code was too complicated, if p > 0 in ccid3_hc_tx_no_feedback_timer the
timestamp was being obtained to be passed to ccid3_hc_tx_update_x, where only
if p > 0 the timestamp was needed, so just leave it to ccid3_hc_tx_update_x to
obtain the timestamp if needed.

This will help in the upcoming changesets where we'll convert t_ld to ktime_t.
We'll eventually try to reuse ktime_get_real() calls again.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 49d66a70 16-Jun-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Fix a bug in the send time processing

ccid3_hc_tx_send_packet currently returns 0 when the time difference between
current time and t_nom is less than 1000 microseconds.

In this case the packet is sent immediately; but, unlike other packets that can
be emitted on first attempt, it will not have its window counter updated and
its options set as required. This is a bug.

Fix: Require the time difference to be at least 1000 microseconds. The
algorithm then converges: time differences > 1000 microseconds trigger the
timer in dccp_write_xmit; after timer expiry this function is tried again; when
the time difference is less than 1000, the packet will have its options added
and window counter updated as required.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>


# 8132da4d 16-Jun-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Sending time: update to ktime_t

This updates the computation of t_nom and t_last_win_count to use the newer
gettimeofday interface.

Committer note: used ktime_to_timeval to set the 'now' variable to t_ld in
ccid3hctx_no_feedback_timer

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>


# cc4d6a3a 28-May-2007 Arnaldo Carvalho de Melo <acme@redhat.com>

loss_interval: Nuke dccp_li_hist

It had just a slab cache, so, for the sake of simplicity just make
dccp_trfc_lib module init routine create the slab cache, no need for users of
the lib to create a private loss_interval object.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# cc0a910b 14-Jun-2007 Arnaldo Carvalho de Melo <acme@redhat.com>

[DCCP] loss_interval: Move ccid3_hc_rx_update_li to loss_interval

Renaming it to dccp_li_update_li.

Also based on previous work by Ian McDonald.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>


# 878ac600 13-Jun-2007 Arnaldo Carvalho de Melo <acme@redhat.com>

[CCID3]: Pass ccid3_li_hist to ccid3_hc_rx_update_li

Now ccid3_hc_rx_update_li is ready to be moved to
net/dccp/ccids/lib/loss_interval, it uses the same interface as the other
functions there.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>


# d83258a3 28-May-2007 Arnaldo Carvalho de Melo <acme@redhat.com>

Remove accesses to ccid3_hc_rx_sock in ccid3_hc_rx_{update,calc_first}_li

This is a preparatory patch for moving these loss interval functions from
net/dccp/ccids/ccid3.c to net/dccp/ccids/lib/loss_interval.c.

Based on a patch by Ian McDonald.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>


# 6bc7efe8 28-May-2007 Ian McDonald <ian.mcdonald@jandi.co.nz>

loss_interval: Fix timeval initialisation

When compiling with EXTRA_CFLAGS=-W noticed that tstamp is not initialised
correctly in dccp_li_calc_first_li.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>


# b2f41ff4 27-May-2007 Ian McDonald <ian.mcdonald@jandi.co.nz>

ccid3: Update copyrights

Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>


# 1b07a95a 23-May-2007 David S. Miller <davem@sunset.davemloft.net>

[DCCP]: Fix build warning when debugging is disabled.

Signed-off-by: David S. Miller <davem@davemloft.net>


# b2449fdc 20-Apr-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Fix bug in the calculation of very low sending rates

This fixes an error in the calculation of t_ipi when X converges towards
very low sending rates (between 1 and 64 bytes per second).

Although this case may not sound likely, it can be reproduced by connecting,
hitting enter (1 byte sent) and waiting for some time, during which the
nofeedback timer halves the sending rate until finally it reaches the region
1..64 bytes/sec. Computing X is handled correctly (tested separately); but by
dividing X _before_ entering the calculation of t_ipi, X becomes zero as
a result. This in turn triggers a BUG condition caught in scaled_div().

Fixed by replacing with equivalent statement and explicit typecast for good
measure.

Calculation verified and effect of patch tested - reduced never below 1 byte
per 64 seconds afterwards, i.e. not allowing divide-by-zero.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 30833ffe 20-Mar-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Use initial RTT sample from SYN exchange

The patch follows the following recommendation made in an erratum to RFC 4342:

"Senders MAY additionally make use of other available RTT measurements,
including those from the initial Request-Response packet exchange."

It implements larger initial windows with regard to this inital RTT measurement,
using the mechanism suggested in draft-ietf-dccp-rfc3448bis, section 4.2.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7dfee1a9 20-Mar-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Use function for RTT sampling

This replaces the existing occurrences of RTT sampling with
the use of the new function dccp_sample_rtt.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0c150efb 20-Mar-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Handle Idle and Application-Limited periods

This updates the code with regard to handling idle and application-limited
periods as specified in [RFC 4342, 5.1].

Background:


# a21f9f96 20-Mar-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Wrap computation of RFC3390-initial rate into separate function

The CCID 3 and TFRC specs (RFC 4342, RFC 3448, draft-3448bis) make frequent
reference to the computation of the RFC-3390 initial sending rate:

1. Initial sending rate when RTT is known (RFC 4342, p. 6)
2. Response to Idle/Application-Limited periods (RFC 4342, 5.1)

This warrants putting the code into its own function, for later code reuse.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1761f7d7 20-Mar-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Remove build warnings for 64bit

This clears the following sparc64 build warnings:
1) warning: format "%ld" expects type "long int", but argument 3 has type "suseconds_t"
2) warning: format "%llu" expects type "long long unsigned int", but argument 3 has type "__u64"
Fixed by using typecast to unsigned. This is argued to be safe, since the quantities, after
de-scaling (factor 2^6) fit all in u32.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1266adee 20-Mar-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Remove race condition and update t_ipi when `s' changes

This:

1. removes a race condition in the access to the scheduled send time t_nom which
results from allowing asynchronous r/w access to t_nom without locks;

2. updates the inter-packet interval t_ipi = s/X when `s' changes, following a
suggestion by Ian McDonald.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8699be7d 20-Mar-2007 Ian McDonald <ian.mcdonald@jandi.co.nz>

[CCID3]: More verbose debugging

This adds a few debugging statements to ccid3.c

Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 371fe777 20-Mar-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Use MSS for larger initial windows

This improves the slow-start phase by using the MSS
(as suggested in RFC 4342, sec. 5) instead of the packet size s.
Also figured out that __u32 is ample resource enough.

After applying, I got the following in the logs:

ccid3_hc_tx_packet_recv: client(f7421700), s=6, MSS=1424, w_init=4380, R_sample=176us, X=24886363

Had the previous variant been used, w_init would have been as low as 24.

Committer note: removed unneeded cast to unsigned long long that was
causing a compiler warning on 64bit architectures.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9bf17475 20-Mar-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Re-order CCID 3 source file

No code change at all.
This splits ccid3.c into a RX and a TX section, so that the file has an
organisation similar to the other ones (e.g. packet_history.{h,c}).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 353b13e1 20-Mar-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[CCID3]: Remove redundant `len' test

Since CCID3 avoids sending 0-byte data packets (cf. ccid3_hc_tx_send_packet),
testing for zero-payload length, as performed by ccid3_hc_tx_update_s, is
redundant - hence removed by this patch.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 151a9931 07-Mar-2007 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Revert patch which disables bidirectional mode

This reverts an earlier patch which disabled bidirectional mode, meaning that
a listening (passive) socket was not allowed to write to the other (active)
end of the connection.

This mode had been disabled when there were problems with CCID3, but it
imposes a constraint on socket programming and thus hinders deployment.

A change is included to ignore RX feedback received by the TX CCID3 module.

Many thanks to Andre Noll for pointing out this issue.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c9eaf173 09-Feb-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[NET] DCCP: Fix whitespace errors.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0f08461e 05-Feb-2007 Andrew Morton <akpm@linux-foundation.org>

[DCCP]: Warning fixes.

net/dccp/ccids/ccid3.c: In function `ccid3_hc_rx_packet_recv':
net/dccp/ccids/ccid3.c:1007: warning: long int format, different type arg (arg 3)
net/dccp/ccids/ccid3.c:1007: warning: long int format, different type arg (arg 4)

opaque types must be suitably cast for printing.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 832e3ca6 11-Dec-2006 Ian McDonald <ian.mcdonald@jandi.co.nz>

[DCCP] ccid3: return value in ccid3_hc_rx_calc_first_li

In a recent patch we introduced invalid return codes which will result in the
opposite of what is intended (i.e. send more packets in face of peculiar
network conditions).

This fixes it by returning ~0 which means not calculated as per
dccp_li_hist_calc_i_mean.

Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 8109b02b 10-Dec-2006 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP]: Whitespace cleanups

That accumulated over the last months hackaton, shame on me for not
using git-apply whitespace helping hand, will do that from now on.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 1fba78b6 10-Dec-2006 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP] ccid3: Fixup some type conversions related to rtts

Spotted by David Miller when compiling on sparc64, I reproduced it here on
parisc64, that are the only platforms to define __kernel_suseconds_t as an
'int', all the others, x86_64 and x86 included typedef it as a 'long', but from
the definition of suseconds_t it should just be an 'int' on platforms where it
is >= 32bits, it would not require all the castings from suseconds_t to (int)
when printking variables of this type, that are not needed on parisc64 and
sparc64.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 9e8efc82 09-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: BUG-FIX - conversion errors

This fixes conversion errors which arose by not properly type-casting
from u32 to __u64. Fixed by explicitly casting each type which is not
__u64, or by performing operation after assignment.

The patch further adds missing debug information to track the current
value of X_recv.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# a9672411 09-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Make debug output consistent

This patch does not alter any algorithm, just the debug message format:

* s#%s, sk=%p#%s(%p)#g

* when a statename is present, it now uses %s(%p, state=%s)

* when only function entry is debugged, it adds an `- entry'

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# c5a1ae9a 09-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Perform history operations only after packet has been sent

This migrates all packet history operations into the routine
ccid3_hc_tx_packet_sent, thereby removing synchronization problems
that occur when, as before, the operations are spread over multiple
routines.
The following minor simplifications are also applied:
* several simplifications now follow from this change - several tests
are now no longer required
* removal of one unnecessary variable (dp)

Justification:

Currently packet history operations span two different routines,
one of which is likely to pass through several iterations of sleeping
and awakening.
The first routine, ccid3_hc_tx_send_packet, allocates an entry and
sets a few fields. The remaining fields are filled in when the second
routine (which is not within a sleeping context), ccid3_hc_tx_packet_sent,
is called. This has several strong drawbacks:
* it is not necessary to split history operations - all fields can be
filled in by the second routine
* the first routine is called multiple times, until a packet can be sent,
and sleeps meanwhile - this causes a lot of difficulties with regard to
keeping the list consistent
* since both routines do not have a producer-consumer like synchronization,
it is very difficult to maintain data across calls to these routines
* the fact that the routines are called in different contexts (sleeping, not
sleeping) adds further problems

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# e312d100 09-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: TX history - remove unused field

This removes the `dccphtx_ccval' field since it is nowhere used in the code and
in fact not necessary for the accounting.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 9f8681db 09-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Shift window counter computation

This puts the window counter computation [RFC 4342, 8.1] into a separate
function which is called whenever a new packet is ready for immediate
transmission in ccid3_hc_tx_send_packet.

Justification:

The window counter update was previously computed after the packet was sent. This has
two drawbacks, both fixed by this patch:
1) re-compute another timestamp almost directly after the packet was sent (expensive),
2) the CCVal for the window counter is needed at the instant the packet is sent.

Further details:

The initialisation of the window counter is left in the state NO_SENT, as before.
The algorithm will do nothing if either RTT is initialised to 0 (which is ok) or if
the RTT value remains below 4 microseconds (which is almost pathological).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# de553c18 09-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Sanity-check RTT samples

CCID3 performance depends much on the accuracy of RTT samples. If RTT
samples grow too large, performance can be catastrophically poor.

To limit the amount of possible damage in such cases, the patch
* introduces an upper limit which identifies a maximum `sane' RTT value;
* uses a macro to enforce this upper limit.

Using a macro was given preference, since it is necessary to identify the
calling function in the warning message. Since exceeding this threshold
identifies a critical condition, DCCP_CRIT is used and not DCCP_WARN.

Many thanks to Ian McDonald for collaboration on this issue.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# fe0499ae 09-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Initialise RTT values

In both the sender and the receiver it is possible that the stored
RTT value is accessed before an actual RTT estimate has been computed.

This patch
* initialises the sender RTT to 0
- the sender always accesses the RTT in ccid3_hc_tx_packet_sent
- the RTT is further needed for the window counter algorithm

* replaces the receiver initialisation of 5msec with 0
- which has the same effect and removes an `XXX'
- the RTT value is needed in ccid3_hc_rx_packet_recv as rtt_prev

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 65d6c2b4 09-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid: Deprecate ccid_hc_tx_insert_options

The function ccid3_hc_tx_insert_options only does a redundant no-op,
as the operation

DCCP_SKB_CB(skb)->dccpd_ccval = hctx->ccid3hctx_last_win_count;

is already performed _unconditionally_ in ccid3_hc_tx_send_packet.

Since there is further no current need for this function, it is removed
entirely. Since furthermore, there is actually no present need for the
entire interface function ccid_hc_tx_insert_options, it was decided to
remove it also, to clean up the interface.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# bf58a381 09-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Only deliver to the CCID rx side in charge

This is an optimisation to reduce CPU load. The received feedback is now
only directed to the active CCID component, without requiring processing
also by the inactive one.

As a consequence, a similar test in ccid3.c is now redundant and is
also removed.

Justification:

Currently DCCP works as a unidirectional service, i.e. a listening server
is not at the same time a connecting client.
As far as I can see, several modifications are necessary until that
becomes possible.
At the present time, received feedback is both fed to the rx/tx CCID
modules. In unidirectional service, only one of these is active at any
one time.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 0f9e5b57 09-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Debug timeval operations

Problem:

Most target types in the CCID3 code are u32, so subtle conversion errors
can occur if signed time calculations yield negative results: the original
values are lost in the conversion to unsigned, calculation errors go undetected.

This patch therefore
* sets all critical time types from unsigned to suseconds_t
* avoids comparison between signed/unsigned via type-casting
* provides ample warning messages in case time calculations are negative

These warning messages can be removed at a later stage when the code
has undergone more testing.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# bfe24a6c 09-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Simplify calculation for reverse lookup of p

This simplifies the calculation of a value p for a given fval when the
first loss interval is computed (RFC 3448, 6.3.1). It makes use of the
two new functions scaled_div/scaled_div32 to provide overflow protection.

Additionally, protection against divide-by-zero is extended - in this
case the function will return the maximally possible value of p=100%.

Background:

The maximum fval, f(100%), is approximately 244, i.e. the scaled value of fval
should never exceed 244E6, which fits easily into u32. The problem is the scaling
by 10^6, since additionally R(TT) is in microseconds.
This is resolved by breaking the division into two stages: the first stage
computes fval=(s*10^6)/R, stores that into u64; the second stage computes
fval = (fval*10^6)/X_recv and complains if overflow is reached for u32.
This case is safe since the TFRC reverse-lookup routine then returns p=100%.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# b9039a2a 09-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Replace scaled division operations

This replaces the remaining uses of usecs_div with scaled_div32, which
internally uses 64bit division and produces a warning on overflow.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 1a21e49a 09-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Finer-grained resolution of sending rates

This patch
* resolves a bug where packets smaller than 32/64 bytes resulted in sending rates of 0
* supports all sending rates from 1/64 bytes/second up to 4Gbyte/second
* simplifies the present overflow problems in calculations

Current sending rate X and the cached value X_recv of the receiver-estimated
sending rate are both scaled by 64 (2^6) in order to
* cope with low sending rates (minimally 1 byte/second)
* allow upgrading to use a packets-per-second implementation of CCID 3
* avoid calculation errors due to integer arithmetic cut-off

The patch implements a revised strategy from
http://www.mail-archive.com/dccp@vger.kernel.org/msg01040.html

The only difference with regard to that strategy is that t_ipi is already
used in the calculation of the nofeedback timeout, which saves one division.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 179ebc9f 09-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Fix two bugs in sending rate computation

This fixes
1) a bug in the recomputation of the sending rate by the nofeedback
timer when no feedback at all has so far been sent by the receiver:
min_t was used instead of max_t, which is wrong (cf. RFC 3448, p. 10);

2) an error in the computation of larger initial windows: instead of
min(... max()) (cf. RFC 4342, 5.), the code had used max(... max()).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# ff586298 09-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Two optimisations for sending rate recomputation

This performs two optimisations for the recomputation of the sending rate.

1) Currently the target sending rate X_calc is recalculated whenever
a) the nofeedback timer expires, or
b) a feedback packet is received.
In the (a) case, recomputing X_calc is redundant, since

* the parameters p and RTT do not change in between the
reception of feedback packets;

* the parameter X_recv is either modified from received
feedback or via the nofeedback timer;

* a test (`p == 0') in the nofeedback timer avoids using
a stale/undefined value of X_calc if p was previously 0.

2) The nofeedback timer now only recomputes a timestamp when p == 0.
This is according to step (4) of [RFC 3448, 4.3] and avoids
unnecessarily determining a timestamp.

A debug statement about not updating X is also removed - it helps very
little in debugging and just clutters the logs.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 45393a66 09-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Check against too large p

This patch follows a suggestion by Ian McDonald and ensures that in
the current code the value of p can not exceed 100%. Such a value is
illegal and would consequently cause a bug condition in tfrc_calc_x().

The receiver case is also tested, and a warning message is added.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 54e6ecb2 06-Dec-2006 Christoph Lameter <clameter@sgi.com>

[PATCH] slab: remove SLAB_ATOMIC

SLAB_ATOMIC is an alias of GFP_ATOMIC

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 44158306 03-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Deprecate TFRC_SMALLEST_P

This patch deprecates the existing use of an arbitrary value TFRC_SMALLEST_P
for low-threshold values of p. This avoids masking low-resolution errors.
Instead, the code now checks against real boundaries (implemented by preceding
patch) and provides warnings whenever a real value falls below the threshold.

If such messages are observed, it is a better solution to take this as an
indication that the lookup table needs to be re-engineered.

Changelog:
----------
This patch
* makes handling all TFRC resolution errors local to the TFRC library

* removes unnecessary test whether X_calc is 'infinity' due to p==0 -- this
condition is already caught by tfrc_calc_x()

* removes setting ccid3hctx_p = TFRC_SMALLEST_P in ccid3_hc_tx_packet_recv
since this is now done by the TFRC library

* updates BUG_ON test in ccid3_hc_tx_no_feedback_timer to take into account
that p now is either 0 (and then X_calc is irrelevant), or it is > 0; since
the handling of TFRC_SMALLEST_P is now taken care of in the tfrc library

Justification:
--------------
The TFRC code uses a lookup table which has a bounded resolution.
The lowest possible value of the loss event rate `p' which can be
resolved is currently 0.0001. Substituting this lower threshold for
p when p is less than 0.0001 results in a huge, exponentially-growing
error. The error can be computed by the following formula:

(f(0.0001) - f(p))/f(p) * 100 for p < 0.0001

Currently the solution is to use an (arbitrary) value
TFRC_SMALLEST_P = 40 * 1E-6 = 0.00004
and to consider all values below this value as `virtually zero'. Due to
the exponentially growing resolution error, this is not a good idea, since
it hides the fact that the table can not resolve practically occurring cases.
Already at p == TFRC_SMALLEST_P, the error is as high as 58.19%!

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 26af3072 03-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Fix warning message about illegal ACK

This avoids a (harmless) warning message being printed at the DCCP server
(the receiver of a DCCP half connection).

Incoming packets are both directed to

* ccid_hc_rx_packet_recv() for the server half
* ccid_hc_tx_packet_recv() for the client half

The message gets printed since on a server the client half is currently not
sending data packets.
This is resolved for the moment by checking the DCCP-role first. In future
times (bidirectional DCCP connections), this test may have to be more
sophisticated.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 5c3fbb6a 03-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Fix bug in calculation of send rate

The main object of this patch is the following bug:
==> In ccid3_hc_tx_packet_recv, the parameters p and X_recv were updated
_after_ the send rate was calculated. This is clearly an error and is
resolved by re-ordering statements.

In addition,
* r_sample is converted from u32 to long to check whether the time difference
was negative (it would otherwise be converted to a large u32 value)
* protection against RTT=0 (this is possible) is provided in a further patch
* t_elapsed is also converted to long, to match the type of r_sample
* adds a a more debugging information regarding current send rates
* various trivial comment/documentation updates

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 76d12777 03-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Fix BUG in retransmission delay calculation

This bug resulted in ccid3_hc_tx_send_packet returning negative
delay values, which in turn triggered silently dequeueing packets in
dccp_write_xmit. As a result, only a few out of the submitted packets made
it at all onto the network. Occasionally, when dccp_wait_for_ccid was
involved, this also triggered a bug warning since ccid3_hc_tx_send_packet
returned a negative value (which in reality was a negative delay value).

The cause for this bug lies in the comparison

if (delay >= hctx->ccid3hctx_delta)
return delay / 1000L;

The type of `delay' is `long', that of ccid3hctx_delta is `u32'. When comparing
negative long values against u32 values, the test returned `true' whenever delay
was smaller than 0 (meaning the packet was overdue to send).

The fix is by casting, subtracting, and then testing the difference with
regard to 0.

This has been tested and shown to work.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 8a508ac2 03-Dec-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Use higher RTO default for CCID3

The TFRC nofeedback timer normally expires after the maximum of 4
RTTs and twice the current send interval (RFC 3448, 4.3). On LANs
with a small RTT this can mean a high processing load and reduced
performance, since then the nofeedback timer is triggered very
frequently.

This patch provides a configuration option to set the bound for the
nofeedback timer, using as default 100 milliseconds.

By setting the configuration option to 0, strict RFC 3448 behaviour
can be enforced for the nofeedback timer.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 6b57c93d 28-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Use `unsigned' for packet lengths

This patch implements a suggestion by Ian McDonald and

1) Avoids tests against negative packet lengths by using unsigned int
for packet payload lengths in the CCID send_packet()/packet_sent() routines

2) As a consequence, it removes an now unnecessary test with regard to `len > 0'
in ccid3_hc_tx_packet_sent: that condition is always true, since
* negative packet lengths are avoided
* ccid3_hc_tx_send_packet flags an error whenever the payload length is 0.
As a consequence, ccid3_hc_tx_packet_sent is never called as all errors
returned by ccid_hc_tx_send_packet are caught in dccp_write_xmit

3) Removes the third argument of ccid_hc_tx_send_packet (the `len' parameter),
since it is currently always set to skb->len. The code is updated with regard
to this parameter change.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# a79ef76f 28-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Larger initial windows

This implements the larger-initial-windows feature for CCID 3, as described in
section 5 of RFC 4342. When the first feedback packet arrives, the sender can
send up to 2..4 packets per RTT, instead of just one.

The patch further
* reduces the number of timestamping calls by passing the timestamp value
(which is computed in one of the calling functions anyway) as argument

* renames one constant with a very long name into one which is shorter and
resembles the one in RFC 3448 (t_mbi)

* simplifies some of the min_t/max_t cases where both `x', `y' have the same
type

Commiter note: renamed TFRC_t_mbi to TFRC_T_MBI, to follow Linux coding style.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 78ad713d 28-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Track RX/TX packet size `s' using moving-average

Problem:


# 2a1fda6f 28-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Set NoFeedback Timeout according to RFC 3448

This corrects the setting of the nofeedback timer with regard to RFC
3448 - previously it was not set to max(4*R, 2*s/X) as specified. Using
the maximum of 1 second as upper bound (as it was done before) can have
detrimental effects, especially if R is small.

Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 5d0dbc4a 27-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Consolidate handling of t_RTO

This patch
* removes setting t_RTO in ccid3_hc_tx_init (per [RFC 3448, 4.2], t_RTO is
undefined until feedback has been received);

* makes some trivial changes (updates of comments);

* performs a small optimisation by exploiting that the feedback timeout
uses the value of t_ipi. The way it is done is safe, because the timeouts
appear after the changes to t_ipi, ensuring that up-to-date values are used;

* in ccid3_hc_tx_packet_recv, moves the t_rto statement closer to the calculation
of the next_tmout. This makes the code clearer to read and is also safe, since
t_rto is not updated until the next call of ccid3_hc_tx_packet_recv, and is not
read by the functions called via ccid_wait_for_ccid();

* removes a `max' statement in sk_reset_timer, this is not needed since the timeout
value is always greater than 1E6 microseconds.

* adds `XXX'es to highlight that currently the nofeedback timer is set
in a non-standard way

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 17893bc1 27-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Consistently update t_nom, t_ipi, t_delta

This patch:

* consolidates updating of parameters (t_nom, t_ipi, t_delta) which
need to be updated at the same time, since they are inter-dependent

* removes two inline functions which are no longer needed as a result of
the above consolidation

* resolves a FIXME regarding the re-calculation of t_ipi within the nofeedback
timer, in the state where no feedback has previously been received

* ties updating these parameters to updating the sending rate X, exploiting
that all three parameters in turn depend on X; and using a small optimisation
which can reduce the number of required instructions: only update the three
parameters when X really changes

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 48e03eee 27-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Consolidate timer resets

This patch concerns updating the value of the nofeedback timer when no feedback
has been received so far.

Since in this case the value of R is still undefined according to [RFC 3448,
4.2], we can not perform step (3) of [RFC 3448, 4.3]. A clarification is
provided in [RFC 4342, sec. 5], which states that in these cases the nofeedback
timer (still) expires "after two seconds".

Many thanks to Ian McDonald for pointing this out and providing the
clarification.

The patch
* implements [RFC 4342, sec. 5] with regard to the above case
* consolidates handling timer restart by
- adding an appropriate jump label and
- initialising the timeout value

Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 5e19e3fc 26-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Resolve small FIXME

This considers the case - ACK received while no packet has been sent
so far. Resolved by printing a (rate-limited) warning message.

Further removes an unnecessary BUG_ON in ccid3_hc_tx_packet_recv,
received feedback on a terminating connection is simply ignored.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 70dbd5b0 26-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Remove redundant statements in ccid3_hc_tx_packet_sent

This patch removes a switch statement which is redundant since,
* nothing is done in states TFRC_SSTATE_NO_SENT/TFRC_SSTATE_NO_FBACK
* it is impossible that the function is called in the state TFRC_SSTATE_TERM, since
--the function is called, in dccp_write_xmit, after ccid3_hc_tx_send_packet
--if ccid3_hc_tx_send_packet is called in state TFRC_SSTATE_TERM, it returns
-EINVAL, which means that ccid3_hc_tx_packet_sent will not be called
(compare dccp_write_xmit)
--> therefore, this case is logically impossible
* the remaining state is TFRC_SSTATE_FBACK which conditionally updates t_ipi, t_nom,
and t_delta. This is a no-op, since
--t_ipi only changes when feedback is received
--however, when feedback arrives via ccid3_hc_tx_packet_recv, there is an identical
code block which performs the same set of operations
--performing the same set of operations again in ccid3_hc_tx_packet_sent therefore
does not change anything, since between the time of receiving the last feedback
(and therefore update of t_ipi, t_nom, and t_delta), the value of t_ipi has not
changed
--since t_ipi has not changed, the values of t_delta and t_nom also do not change,
they depend fully on t_ipi

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# da335baf 26-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Avoid congestion control on zero-sized data packets

This resolves an `XXX' in ccid3_hc_tx_send_packet().

The function is only called on Data and DataAck packets and returns a negative
result on zero-sized messages. This is a reasonable policy since CCID 3 is a
congestion-control module and congestion control on zero-sized Data(Ack)
packets is in a way pathological.

The patch uses a more suitable error code for this case, it returns the Posix.1
code `EBADMSG' ("Not a data message") instead of `ENOTCONN'.

As a result of ignoring zero-sized packets, a the condition for a warning
"First packet is data" in ccid3_hc_tx_packet_sent is always satisfied; this
message has been removed since it will always be printed.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 7da7f456 26-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Simplify control flow of ccid3_hc_tx_send_packet

This makes some logically equivalent simplifications, by replacing
rc - values plus goto's with direct return statements.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 91cf5a17 26-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Fix calculation of t_ipi time of scheduled transmission

Problem:


# f5c2d636 26-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Simplify control flow in the calculation of t_ipi

This patch performs a simplifying (performance) optimisation:

In each call of the inline function ccid3_calc_new_t_ipi(), the state is
tested against TFRC_SSTATE_NO_FBACK. This is expensive when the function
is called very often. A simpler solution, implemented by this patch, is
to adapt the control flow.

Background:


# 90feeb95 26-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP] ccid3: Fix bug in calculation of first t_nom and first t_ipi

Problem:


# 45543173 20-Nov-2006 Ian McDonald <ian.mcdonald@jandi.co.nz>

[DCCP] CCID3: Remove non-referenced variable

This removes a non-referenced variable.

Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 59348b19 20-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Simplified conditions due to use of enum:8 states

This reaps the benefit of the earlier patch, which changed the type of
CCID 3 states to use enums, in that many conditions are now simplified
and the number of possible (unexpected) values is greatly reduced.

In a few instances, this also allowed to simplify pre-conditions; where
care has been taken to retain logical equivalence.

[DCCP]: Introduce a consistent BUG/WARN message scheme

This refines the existing set of DCCP messages so that
* BUG(), BUG_ON(), WARN_ON() have meaningful DCCP-specific counterparts
* DCCP_CRIT (for severe warnings) is not rate-limited
* DCCP_WARN() is introduced as rate-limited wrapper

Using these allows a faster and cleaner transition to their original
counterparts once the code has matured into a full DCCP implementation.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 56724aa4 20-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Add CCID3 debug support to Kconfig

This adds a CCID3 debug option to the configuration menu
which is missing in Kconfig, but already used by the code.

CCID 2 already provides such an entry.

To enable debugging, set CONFIG_IP_DCCP_CCID3_DEBUG=y

NOTE: The use of ccid3_{t,r}x_state_name is safe, since
now only enum values can appear.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 3c695262 15-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Introduce DCCP_{BUG{_ON},CRIT} macros, use enum:8 for the ccid3 states

This patch tackles the following problem:
* the ccid3_hc_{t,r}x_sock define ccid3hc{t,r}x_state as `u8', but
in reality there can only be a few, pre-defined enum names
* this necessitates addiditional checking for unexpected values
which would otherwise be caught by the compiler

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 0e64e94e 24-Oct-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk>

[DCCP]: Update documentation references.

Updates the references to spec documents throughout the code, taking into
account that

* the DCCP, CCID 2, and CCID 3 drafts all became RFCs in March this year

* RFC 1063 was obsoleted by RFC 1191

* draft-ietf-tcpimpl-pmtud-0x.txt was published as an Informational
RFC, RFC 2923 on 2000-09-22.

All references verified.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3dd9a7c3 21-Sep-2006 Ian McDonald <ian.mcdonald@jandi.co.nz>

[DCCP]: Use constants for CCIDs

With constants for CCID numbers this now uses them in some places.

Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# fc747e82 29-Aug-2006 Ian McDonald <ian.mcdonald@jandi.co.nz>

[DCCP]: Tidyup CCID3 list handling

As Arnaldo Carvalho de Melo points out I should be using list_entry in case
the structure changes in future. Current code functions but is reliant
on position and requires type cast.

Noticed when doing this that I have one more variable than I needed so
removing that also.

Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 66a377c5 27-Aug-2006 Ian McDonald <ian.mcdonald@jandi.co.nz>

[DCCP]: Fix CCID3

This fixes CCID3 to give much closer performance to RFC4342.

CCID3 is meant to alter sending rate based on RTT and loss.

The performance was verified against:
http://wand.net.nz/~perry/max_download.php

For example I tested with netem and had the following parameters:
Delayed Acks 1, MSS 256 bytes, RTT 105 ms, packet loss 5%.

This gives a theoretical speed of 71.9 Kbits/s. I measured across three
runs with this patch set and got 70.1 Kbits/s. Without this patchset the
average was 232 Kbits/s which means Linux can't be used for CCID3 research
properly.

I also tested with netem turned off so box just acting as router with 1.2
msec RTT. The performance with this is the same with or without the patch
at around 30 Mbit/s.

Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e6bccd35 26-Aug-2006 Ian McDonald <ian.mcdonald@jandi.co.nz>

[DCCP]: Update contact details and copyright

Just updating copyright and contacts

Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6ab3d562 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de>

Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>


# 2d0817d1 20-Mar-2006 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP] options: Make dccp_insert_options & friends yell on error

And not the silly LIMIT_NETDEBUG and silently return without inserting
the option requested.

Also drop some old debugging messages associated to option insertion.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 72478873 20-Mar-2006 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP] ipv6: Add missing ipv6 control socket

I guess I forgot to add it, nah, now it just works:

18:04:33.274066 IP6 ::1.1476 > ::1.5001: request (service=0)
18:04:33.334482 IP6 ::1.5001 > ::1.1476: reset (code=bad_service_code)

Ditched IP_DCCP_UNLOAD_HACK, as now we would have to do it for both
IPv6 and IPv4, so I'll come up with another way for freeing the
control sockets in upcoming changesets.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c25a18ba 20-Mar-2006 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP]: Uninline some functions

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 60fe62e7 20-Mar-2006 Andrea Bittau <a.bittau@cs.ucl.ac.uk>

[DCCP]: sparse endianness annotations

This also fixes the layout of dccp_hdr short sequence numbers, problem
was not fatal now as we only support long (48 bits) sequence numbers.

Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 91f0ebf7 20-Mar-2006 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP] CCID: Improve CCID infrastructure

1. No need for ->ccid_init nor ->ccid_exit, this is what module_{init,exit}
does and anynways neither ccid2 nor ccid3 were using it.

2. Rename struct ccid to struct ccid_operations and introduce struct ccid
with a pointer to ccid_operations and rigth after it the rx or tx
private state.

3. Remove the pointer to the state of the half connections from struct
dccp_sock, now its derived thru ccid_priv() from the ccid pointer.

Now we also can implement the setsockopt for changing the CCID easily as
no ccid init routines can affect struct dccp_sock in any way that prevents
other CCIDs from working if a CCID switch operation is asked by apps.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# aa5d7df3 20-Mar-2006 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP] CCID3: Set the no_feedback_timer fields near init_timer

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 41144701 20-Mar-2006 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP] CCID: Allow ccid_{init,exit} to be NULL

Testing if the ccid being instantiated has these methods in
ccid_init().

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c0996660 03-Mar-2006 Ian McDonald <imcdnzl@gmail.com>

[DCCP] ccid3: Divide by zero fix

In rare circumstances 0 is returned by dccp_li_hist_calc_i_mean which
leads to a divide by zero in ccid3_hc_rx_packet_recv. Explicitly check
for zero return now. Update copyright notice at same time.

Found by Arnaldo.

Signed-off-by: Ian McDonald <imcdnzl@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 88f964db 18-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP]: Introduce CCID getsockopt for the CCIDs

Allocation for the optnames is similar to the DCCP options, with a
range for rx and tx half connection CCIDs.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 59c2353d 12-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3]: Listen socks doesn't have a private CCID block

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 59d203f9 09-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3] Cleanup ccid3 debug calls

Also use some BUG_ON where appropriate and use LIMIT_NETDEBUG for the unlikely
cases where we, at this stage, want to know about, that in my tests hasn't
appeared in the radar.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# d7e0fb98 09-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3] Initialize ccid3hctx_t_ipi to 250ms

To match more closely what is described in RFC 3448.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: Ian McDonald <iam4@cs.waikato.ac.nz>


# 59725dc2 08-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3] Introduce ccid3_hc_[rt]x_sk() for overal consistency

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# b0e56780 08-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP] Introduce dccp_timestamp

To start the timestamps with 0.0ms, easing the integer maths in the CCIDs, this
probably will be reworked to use the to be introduced struct timeval_offset
infrastructure out of skb_get_timestamp, etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 954ee31f 08-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3] Initialize more fields in ccid3_hc_rx_init

The initialization of ccid3hcrx_rtt to 5ms is just a bandaid, I'll continue
auditing the CCID3 HC rx codebase to fix this properly, probably I'll add a
feedback timer as suggested in the CCID3 draft.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# b3a3077d 08-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3] Make the ccid3hcrx_rtt calc look more like the ccid3hctx_rtt one

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 1a28599a 08-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3] Use ELAPSED_TIME in the HC TX RTT estimation

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 27ae543e 08-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3] Calculate ccid3hcrx_x_recv using usecs_div

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 507d37cf 08-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID] Only call the HC insert_options methods when requested

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 0ba7a3ba 08-Sep-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3] Avoid unsigned integer overflows in usecs_div

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# c530cfb1 28-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3]: Call sk->sk_write_space(sk) when receiving a feedback packet

This makes the send rate calculations behave way more closely to what
is specified, with the jitter previously seen on x and x_recv
disappearing completely on non lossy setups.

This resembles the tcp_data_snd_check code, that possibly we'll end up
using in DCCP as well, perhaps moving this code to
inet_connection_sock.

For now I'm doing the simplest implementation tho.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a84ffe43 28-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP]: Introduce DCCP_SOCKOPT_PACKET_SIZE

So that applications can set dccp_sock->dccps_pkt_size, that in turn
is used in the CCID3 half connection init routines to set
ccid3hc[tr]x_s and use it in its rate calculations.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 29e4f8b3 27-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3]: Move ccid3_hc_rx_detect_loss to packet_history.c

Renaming it to dccp_rx_hist_detect_loss.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 072ab6c6 27-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3]: Move ccid3_hc_rx_add_hist to packet_history.c

Renaming it to dccp_rx_hist_add_packet.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 36729c1a 27-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP]: Move the calc_X routines to dccp_tfrc_lib

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4524b259 27-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP]: Just move packet_history.[ch] to net/dccp/ccids/lib/

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ae6706f0 27-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3]: Move the loss interval code to loss_interval.[ch]

And put this into net/dccp/ccids/lib/, where packet_history.[ch] will also be
moved and then we'll have a tfrc_lib.ko module that will be used by
dccp_ccid3.ko and other CCIDs that are variations of TFRC (RFC 3448).

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cfc3c525 27-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3]: Move the CCID3 defines to ccid3.h

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6b5e633a 27-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3]: Introduce usecs_div

To avoid open coding this all over the place.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b6ee3d4a 27-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3]: Reorganise timeval handling

Introducing functions to add to or subtract from a timeval variable
and renaming now_delta to timeval_new_delta that calls do_gettimeofday
and then timeval_delta, that should be used when there are several
deltas made relative to the current time or setting variables to it,
so as to avoid calling do_gettimeofday excessively.

I'm leaving these "timeval_" prefixed funcions internal to DCCP for a
while till we're sure there are no subtle bugs in it.

It also is more correct as it checks if the number of usecs added to
or subtracted from a tv_usec field is more than 2 seconds.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1f2333ae 27-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3]: Reflow to mostly fit under 80 columns

No code changes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d6809c12 27-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP]: Introduce dccp_wait_for_ccid and use it in dccp_write_xmit

This is not quite what I think we should have long term but improves
performance for now, so lets use it till we get CCID3 working well,
then we can think about using sk_write_queue, perhaps using some ideas
from Juwen Lai's old stack for 2.4.20.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ba89966c 26-Aug-2005 Eric Dumazet <dada1@cosmosbay.com>

[NET]: use __read_mostly on kmem_cache_t , DEFINE_SNMP_STAT pointers

This patch puts mostly read only data in the right section
(read_mostly), to help sharing of these data between CPUS without
memory ping pongs.

On one of my production machine, tcp_statistics was sitting in a
heavily modified cache line, so *every* SNMP update had to force a
reload.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2babe1f6 23-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP]: Introduce dccp_get_info

And also hc_tx and hc_rx get_info functions for the CCIDs to fill in
information that is specific to them.

For now reusing struct tcp_info, later I'll try to figure out a better
solution, for now its really nice to get this kind of info:

[root@qemu ~]# ./ss -danemi
State Recv-Q Send-Q Local Addr:Port Peer Addr:Port
LISTEN 0 0 *:5001 *:* ino:628 sk:c1340040
mem:(r0,w0,f0,t0) cwnd:0 ssthresh:0
ESTAB 0 0 172.20.0.2:5001 172.20.0.1:32785 ino:629 sk:c13409a0
mem:(r0,w0,f0,t0) ts rto:1000 rtt:0.004/0 cwnd:0 ssthresh:0 rcv_rtt:61.377

This, for instance, shows that we're not congestion controlling ACKs,
as the above output is in the ttcp receiving host, and ttcp is a one
way app, i.e. the received never calls sendmsg, so
ccid_hc_tx_send_packet is never called, so the TX half connection
stays in TFRC_SSTATE_NO_SENT state and hctx_rtt is never calculated,
stays with the value set in ccid3_hc_tx_init, 4us, as show above in
milliseconds (0.004ms), upcoming patches will fix this.

rcv_rtt seems sane tho, matching ping results :-)

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4fded33b 23-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3]: Calculate the RTT in the RX half connection

Using TIMESTAMP_ECHO and ELAPSED_TIME options received.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c68e64cf 21-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3]: Reintroduce ccid3hctx_t_rto

CCID3 keeps this variable in usecs, inet_connection_socks in jiffies,
so to avoid Mars orbiter losses lets reintroduce ccid3hctx_t_rto 8)

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1bc09869 19-Aug-2005 Ian McDonald <iam4@cs.waikato.ac.nz>

[DCCP]: Fix the timestamp options

This changes timestamp, timestamp echo, and elapsed time to use units of 10
usecs as per DCCP spec. This has been tested to verify that times are correct.
Also fixed up length and used hton/ntoh more.

Still to add in later patches:
- actually use elapsed time to adjust RTT
(commented out as was prior to this patch)
- send options at times more closely following the spec
(content is now correct)

Signed-off-by: Ian McDonald <iam4@cs.waikato.ac.nz>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a10cedd4 14-Aug-2005 Patrick McHardy <kaber@trash.net>

[DCCP]: Fix compiler warnings

may be a false warning if there always is something on ccid3hcrx_hist:

net/dccp/ccids/ccid3.c: In function 'ccid3_hc_rx_packet_recv':
net/dccp/ccids/ccid3.c:1634: warning: 'tstamp.tv_usec' may be used uninitialized in this function
net/dccp/ccids/ccid3.c:1634: warning: 'tstamp.tv_sec' may be used uninitialized in this function

const on inline functions doesn't have any effect:

net/dccp/dccp.h:64: warning: type qualifiers ignored on function return type
net/dccp/dccp.h:70: warning: type qualifiers ignored on function return type
net/dccp/dccp.h:76: warning: type qualifiers ignored on function return type

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a1d3a355 13-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP]: Fix sparse warnings

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 725ba8ee 13-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP]: Introduce the DCCP Kernel hacking menu

Only available if CONFIG_DEBUG_KERNEL is enabled in the "Kernel
Hacking" Menu.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c1734376 13-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[PACKET_HISTORY]: Add dccphtx_rtt and rename the win_count fields

As requested by Ian.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: Ian McDonald <iam4@cs.waikato.ac.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cef07fd6 10-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3]: Ditch USEC_IN_SEC as time.h has USEC_PER_SEC

That is equivalent, no need to have a private one.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 8c60f3fa 09-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[CCID3]: Separate most of the packet history code

This also changes the list_for_each_entry_safe_continue behaviour to match its
kerneldoc comment, that is, to start after the pos passed.

Also adds several helper functions from previously open coded fragments, making
the code more clear.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 27258ee5 09-Aug-2005 Arnaldo Carvalho de Melo <acme@mandriva.com>

[DCCP]: Introduce dccp_write_xmit from code in dccp_sendmsg

This way it gets closer to the TCP flow, where congestion window
checks are done, it seems we can map ccid_hc_tx_send_packet in
dccp_write_xmit to tcp_snd_wnd_test in tcp_write_xmit, a CCID2
decision should just fit in here as well...

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 757f612e 09-Aug-2005 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

[CCID3]: Reenable list_for_each_entry_safe_continue usage

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7c657876 09-Aug-2005 Arnaldo Carvalho de Melo <acme@ghostprotocols.net>

[DCCP]: Initial implementation

Development to this point was done on a subversion repository at:

http://oops.ghostprotocols.net:81/cgi-bin/viewcvs.cgi/dccp-2.6/

This repository will be kept at this site for the foreseable future,
so that interested parties can see the history of this code,
attributions, etc.

If I ever decide to take this offline I'll provide the full history at
some other suitable place.

Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>