History log of /linux-master/net/sctp/outqueue.c
Revision Date Author Comments
# 9789c1c6 19-Apr-2023 Xin Long <lucien.xin@gmail.com>

sctp: delete the nested flexible array variable

This patch deletes the flexible-array variable[] from the structure
sctp_sackhdr and sctp_errhdr to avoid some sparse warnings:

# make C=2 CF="-Wflexible-array-nested" M=./net/sctp/
net/sctp/sm_statefuns.c: note: in included file (through include/net/sctp/structs.h, include/net/sctp/sctp.h):
./include/linux/sctp.h:451:28: warning: nested flexible array
./include/linux/sctp.h:393:29: warning: nested flexible array

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2f201ae1 04-Nov-2022 Xin Long <lucien.xin@gmail.com>

sctp: clear out_curr if all frag chunks of current msg are pruned

A crash was reported by Zhen Chen:

list_del corruption, ffffa035ddf01c18->next is NULL
WARNING: CPU: 1 PID: 250682 at lib/list_debug.c:49 __list_del_entry_valid+0x59/0xe0
RIP: 0010:__list_del_entry_valid+0x59/0xe0
Call Trace:
sctp_sched_dequeue_common+0x17/0x70 [sctp]
sctp_sched_fcfs_dequeue+0x37/0x50 [sctp]
sctp_outq_flush_data+0x85/0x360 [sctp]
sctp_outq_uncork+0x77/0xa0 [sctp]
sctp_cmd_interpreter.constprop.0+0x164/0x1450 [sctp]
sctp_side_effects+0x37/0xe0 [sctp]
sctp_do_sm+0xd0/0x230 [sctp]
sctp_primitive_SEND+0x2f/0x40 [sctp]
sctp_sendmsg_to_asoc+0x3fa/0x5c0 [sctp]
sctp_sendmsg+0x3d5/0x440 [sctp]
sock_sendmsg+0x5b/0x70

and in sctp_sched_fcfs_dequeue() it dequeued a chunk from stream
out_curr outq while this outq was empty.

Normally stream->out_curr must be set to NULL once all frag chunks of
current msg are dequeued, as we can see in sctp_sched_dequeue_done().
However, in sctp_prsctp_prune_unsent() as it is not a proper dequeue,
sctp_sched_dequeue_done() is not called to do this.

This patch is to fix it by simply setting out_curr to NULL when the
last frag chunk of current msg is dequeued from out_curr stream in
sctp_prsctp_prune_unsent().

Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations")
Reported-by: Zhen Chen <chenzhen126@huawei.com>
Tested-by: Caowangbao <caowangbao@huawei.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 9f0b7732 04-Nov-2022 Xin Long <lucien.xin@gmail.com>

sctp: remove the unnecessary sinfo_stream check in sctp_prsctp_prune_unsent

Since commit 5bbbbe32a431 ("sctp: introduce stream scheduler foundations"),
sctp_stream_outq_migrate() has been called in sctp_stream_init/update to
removes those chunks to streams higher than the new max. There is no longer
need to do such check in sctp_prsctp_prune_unsent().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# e3d37210 03-Apr-2022 Jamie Bainbridge <jamie.bainbridge@gmail.com>

sctp: count singleton chunks in assoc user stats

Singleton chunks (INIT, HEARTBEAT PMTU probes, and SHUTDOWN-
COMPLETE) are not counted in SCTP_GET_ASOC_STATS "sas_octrlchunks"
counter available to the assoc owner.

These are all control chunks so they should be counted as such.

Add counting of singleton chunks so they are properly accounted for.

Fixes: 196d67593439 ("sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/c9ba8785789880cf07923b8a5051e174442ea9ee.1649029663.git.jamie.bainbridge@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 70331909 24-Nov-2021 Xin Long <lucien.xin@gmail.com>

sctp: make the raise timer more simple and accurate

Currently, the probe timer is reused as the raise timer when PLPMTUD is in
the Search Complete state. raise_count was introduced to count how many
times the probe timer has timed out. When raise_count reaches to 30, the
raise timer handler will be triggered.

During the whole processing above, the timer keeps timing out every probe_
interval. It is a waste for the Search Complete state, as the raise timer
only needs to time out after 30 * probe_interval.

Since the raise timer and probe timer are never used at the same time, it
is no need to keep probe timer 'alive' in the Search Complete state. This
patch to introduce sctp_transport_reset_raise_timer() to start the timer
as the raise timer when entering the Search Complete state. When entering
the other states, sctp_transport_reset_probe_timer() will still be called
to reset the timer to the probe timer.

raise_count can be removed from sctp_transport as no need to count probe
timer timeout for raise timer timeout. last_rtx_chunks can be removed as
sctp_transport_reset_probe_timer() can be called in the place where asoc
rtx_data_chunks is changed.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/edb0e48988ea85997488478b705b11ddc1ba724a.1637781974.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# fe59379b 22-Jun-2021 Xin Long <lucien.xin@gmail.com>

sctp: do the basic send and recv for PLPMTUD probe

This patch does exactly what rfc8899#section-6.2.1.2 says:

The SCTP sender needs to be able to determine the total size of a
probe packet. The HEARTBEAT chunk could carry a Heartbeat
Information parameter that includes, besides the information
suggested in [RFC4960], the probe size to help an implementation
associate a HEARTBEAT ACK with the size of probe that was sent. The
sender could also use other methods, such as sending a nonce and
verifying the information returned also contains the corresponding
nonce. The length of the PAD chunk is computed by reducing the
probing size by the size of the SCTP common header and the HEARTBEAT
chunk.

Note that HB ACK chunk will carry back whatever HB chunk carried, including
the probe_size we put it in; We also check hbinfo->probe_size in the HB ACK
against link->pl.probe_size to validate this HB ACK chunk.

v1->v2:
- Remove the unused 'sp' and add static for sctp_packet_bundle_pad().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8ff0b1f0 18-Mar-2021 Xin Long <lucien.xin@gmail.com>

sctp: move sk_route_caps check and set into sctp_outq_flush_transports

The sk's sk_route_caps is set in sctp_packet_config, and later it
only needs to change when traversing the transport_list in a loop,
as the dst might be changed in the tx path.

So move sk_route_caps check and set into sctp_outq_flush_transports
from sctp_packet_transmit. This also fixes a dst leak reported by
Chen Yi:

https://bugzilla.kernel.org/show_bug.cgi?id=212227

As calling sk_setup_caps() in sctp_packet_transmit may also set the
sk_route_caps for the ctrl sock in a netns. When the netns is being
deleted, the ctrl sock's releasing is later than dst dev's deleting,
which will cause this dev's deleting to hang and dmesg error occurs:

unregister_netdevice: waiting for xxx to become free. Usage count = 1

Reported-by: Chen Yi <yiche@redhat.com>
Fixes: bcd623d8e9fa ("sctp: call sk_setup_caps in sctp_packet_transmit instead")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# df561f66 23-Aug-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

treewide: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>


# f398efc1 27-Dec-2019 Kevin Kou <qdkevin.kou@gmail.com>

sctp: add enabled check for path tracepoint loop.

sctp_outq_sack is the main function handles SACK, it is called very
frequently. As the commit "move trace_sctp_probe_path into sctp_outq_sack"
added below code to this function, sctp tracepoint is disabled most of time,
but the loop of transport list will be always called even though the
tracepoint is disabled, this is unnecessary.

+ /* SCTP path tracepoint for congestion control debugging. */
+ list_for_each_entry(transport, transport_list, transports) {
+ trace_sctp_probe_path(transport, asoc);
+ }

This patch is to add tracepoint enabled check at outside of the loop of
transport list, and avoid traversing the loop when trace is disabled,
it is a small optimization.

Signed-off-by: Kevin Kou <qdkevin.kou@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f643ee29 25-Dec-2019 Kevin Kou <qdkevin.kou@gmail.com>

sctp: move trace_sctp_probe_path into sctp_outq_sack

The original patch bringed in the "SCTP ACK tracking trace event"
feature was committed at Dec.20, 2017, it replaced jprobe usage
with trace events, and bringed in two trace events, one is
TRACE_EVENT(sctp_probe), another one is TRACE_EVENT(sctp_probe_path).
The original patch intended to trigger the trace_sctp_probe_path in
TRACE_EVENT(sctp_probe) as below code,

+TRACE_EVENT(sctp_probe,
+
+ TP_PROTO(const struct sctp_endpoint *ep,
+ const struct sctp_association *asoc,
+ struct sctp_chunk *chunk),
+
+ TP_ARGS(ep, asoc, chunk),
+
+ TP_STRUCT__entry(
+ __field(__u64, asoc)
+ __field(__u32, mark)
+ __field(__u16, bind_port)
+ __field(__u16, peer_port)
+ __field(__u32, pathmtu)
+ __field(__u32, rwnd)
+ __field(__u16, unack_data)
+ ),
+
+ TP_fast_assign(
+ struct sk_buff *skb = chunk->skb;
+
+ __entry->asoc = (unsigned long)asoc;
+ __entry->mark = skb->mark;
+ __entry->bind_port = ep->base.bind_addr.port;
+ __entry->peer_port = asoc->peer.port;
+ __entry->pathmtu = asoc->pathmtu;
+ __entry->rwnd = asoc->peer.rwnd;
+ __entry->unack_data = asoc->unack_data;
+
+ if (trace_sctp_probe_path_enabled()) {
+ struct sctp_transport *sp;
+
+ list_for_each_entry(sp, &asoc->peer.transport_addr_list,
+ transports) {
+ trace_sctp_probe_path(sp, asoc);
+ }
+ }
+ ),

But I found it did not work when I did testing, and trace_sctp_probe_path
had no output, I finally found that there is trace buffer lock
operation(trace_event_buffer_reserve) in include/trace/trace_events.h:

static notrace void \
trace_event_raw_event_##call(void *__data, proto) \
{ \
struct trace_event_file *trace_file = __data; \
struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\
struct trace_event_buffer fbuffer; \
struct trace_event_raw_##call *entry; \
int __data_size; \
\
if (trace_trigger_soft_disabled(trace_file)) \
return; \
\
__data_size = trace_event_get_offsets_##call(&__data_offsets, args); \
\
entry = trace_event_buffer_reserve(&fbuffer, trace_file, \
sizeof(*entry) + __data_size); \
\
if (!entry) \
return; \
\
tstruct \
\
{ assign; } \
\
trace_event_buffer_commit(&fbuffer); \
}

The reason caused no output of trace_sctp_probe_path is that
trace_sctp_probe_path written in TP_fast_assign part of
TRACE_EVENT(sctp_probe), and it will be placed( { assign; } ) after the
trace_event_buffer_reserve() when compiler expands Macro,

entry = trace_event_buffer_reserve(&fbuffer, trace_file, \
sizeof(*entry) + __data_size); \
\
if (!entry) \
return; \
\
tstruct \
\
{ assign; } \

so trace_sctp_probe_path finally can not acquire trace_event_buffer
and return no output, that is to say the nest of tracepoint entry function
is not allowed. The function call flow is:

trace_sctp_probe()
-> trace_event_raw_event_sctp_probe()
-> lock buffer
-> trace_sctp_probe_path()
-> trace_event_raw_event_sctp_probe_path() --nested
-> buffer has been locked and return no output.

This patch is to remove trace_sctp_probe_path from the TP_fast_assign
part of TRACE_EVENT(sctp_probe) to avoid the nest of entry function,
and trigger sctp_probe_path_trace in sctp_outq_sack.

After this patch, you can enable both events individually,
# cd /sys/kernel/debug/tracing
# echo 1 > events/sctp/sctp_probe/enable
# echo 1 > events/sctp/sctp_probe_path/enable

Or, you can enable all the events under sctp.

# echo 1 > events/sctp/enable

Signed-off-by: Kevin Kou <qdkevin.kou@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4e7696d9 08-Dec-2019 Xin Long <lucien.xin@gmail.com>

sctp: get netns from asoc and ep base

Commit 312434617cb1 ("sctp: cache netns in sctp_ep_common") set netns
in asoc and ep base since they're created, and it will never change.
It's a better way to get netns from asoc and ep base, comparing to
calling sock_net().

This patch is to replace them.

v1->v2:
- no change.

Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 47505b8b 23-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 104

Based on 1 normalized pattern(s):

this sctp implementation is free software you can redistribute it
and or modify it under the terms of the gnu general public license
as published by the free software foundation either version 2 or at
your option any later version this sctp implementation is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with gnu cc see the file copying if not see
http www gnu org licenses

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 42 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190523091649.683323110@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 7efba10d 28-Jan-2019 Xin Long <lucien.xin@gmail.com>

sctp: add SCTP_FUTURE_ASOC and SCTP_CURRENT_ASSOC for SCTP_STREAM_SCHEDULER sockopt

Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_scheduler and
check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_scheduler,
it's compatible with 0.

SCTP_CURRENT_ASSOC is supported for SCTP_STREAM_SCHEDULER in this
patch. It also adds default_ss in sctp_sock to support
SCTP_FUTURE_ASSOC.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 12480e3b 03-Nov-2018 Xin Long <lucien.xin@gmail.com>

sctp: define SCTP_SS_DEFAULT for Stream schedulers

According to rfc8260#section-4.3.2, SCTP_SS_DEFAULT is required to
defined as SCTP_SS_FCFS or SCTP_SS_RR.

SCTP_SS_FCFS is used for SCTP_SS_DEFAULT's value in this patch.

Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations")
Reported-by: Jianwen Ji <jiji@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 605c0ac1 16-Oct-2018 Xin Long <lucien.xin@gmail.com>

sctp: count both sk and asoc sndbuf with skb truesize and sctp_chunk size

Now it's confusing that asoc sndbuf_used is doing memory accounting with
SCTP_DATA_SNDSIZE(chunk) + sizeof(sk_buff) + sizeof(sctp_chunk) while sk
sk_wmem_alloc is doing that with skb->truesize + sizeof(sctp_chunk).

It also causes sctp_prsctp_prune to count with a wrong freed memory when
sndbuf_policy is not set.

To make this right and also keep consistent between asoc sndbuf_used, sk
sk_wmem_alloc and sk_wmem_queued, use skb->truesize + sizeof(sctp_chunk)
for them.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2cc543f5 02-Oct-2018 Gustavo A. R. Silva <gustavo@embeddedor.com>

sctp: fix fall-through annotation

Replace "fallthru" with a proper "fall through" annotation.

This fix is part of the ongoing efforts to enabling
-Wimplicit-fallthrough

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 05364ca0 10-Aug-2018 Konstantin Khorenko <khorenko@virtuozzo.com>

net/sctp: Make wrappers for accessing in/out streams

This patch introduces wrappers for accessing in/out streams indirectly.
This will enable to replace physically contiguous memory arrays
of streams with flexible arrays (or maybe any other appropriate
mechanism) which do memory allocation on a per-page basis.

Signed-off-by: Oleg Babin <obabin@virtuozzo.com>
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5884f35f 14-May-2018 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

sctp: checkpatch fixups

A collection of fixups from previous patches, left for later to not
introduce unnecessary changes while moving code around.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e136e965 14-May-2018 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

sctp: add asoc and packet to sctp_flush_ctx

Pre-compute these so the compiler won't reload them (due to
no-strict-aliasing).

Changes since v2:
- Do not replace a return with a break in sctp_outq_flush_data

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bb543847 14-May-2018 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

sctp: add sctp_flush_ctx, a context struct on outq_flush routines

With this struct we avoid passing lots of variables around and taking care
of updating the current transport/packet.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4fdbb0ef 14-May-2018 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

sctp: rework switch cases in sctp_outq_flush_data

Remove an inner one, which tended to be error prone due to the cascading
and it can be replaced by a simple if ().

Rework the outer one so that the actual flush code is not inside it. Now
we first validate if we can or cannot send data, return if not, and then
the flush code.

Suggested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6605f694 14-May-2018 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

sctp: make use of gfp on retransmissions

Retransmissions may be triggered when in user context, so lets make use
of gfp.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4bf21b61 14-May-2018 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

sctp: move transport flush code out of sctp_outq_flush

To the new sctp_outq_flush_transports.

Comment on Nagle is outdated and removed. Nagle is performed earlier, while
checking if the chunk fits the packet: if the outq length is not enough to
fill the packet, it returns SCTP_XMIT_DELAY.

So by when it gets to sctp_outq_flush_transports, it has to go through all
enlisted transports.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cb93cc5d 14-May-2018 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

sctp: move flushing of data chunks out of sctp_outq_flush

To the new sctp_outq_flush_data. Again, smaller functions and with well
defined objectives.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 96e0418e 14-May-2018 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

sctp: move outq data rtx code out of sctp_outq_flush

This patch renames current sctp_outq_flush_rtx to __sctp_outq_flush_rtx
and create a new sctp_outq_flush_rtx, with the code that was on
sctp_outq_flush. Again, the idea is to have functions with small and
defined objectives.

Yes, there is an open-coded path selection in the now sctp_outq_flush_rtx.
That is kept as is for now because it may be very different when we
implement retransmission path selection algorithms for CMT-SCTP.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7a0b9df6 14-May-2018 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

sctp: move the flush of ctrl chunks into its own function

Named sctp_outq_flush_ctrl and, with that, keep the contexts contained.

One small fix embedded is the reset of one_packet at every iteration.
This allows bundling of some control chunks in case they were preceeded by
another control chunk that cannot be bundled.

Other than this, it has the same behavior.

Changes since v2:
- Fixed panic reported by kbuild test robot if building with
only up to this patch applied, due to bad parameter to
sctp_outq_select_transport and by not initializing packet after
calling sctp_outq_flush_ctrl.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0d634b0c 14-May-2018 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

sctp: factor out sctp_outq_select_transport

We had two spots doing such complex operation and they were very close to
each other, a bit more tailored to here or there.

This patch unifies these under the same function,
sctp_outq_select_transport, which knows how to handle control chunks and
original transmissions (but not retransmissions).

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b9fd6839 14-May-2018 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

sctp: add sctp_packet_singleton

Factor out the code for generating singletons. It's used only once, but
helps to keep the context contained.

The const variables are to ease the reading of subsequent calls in there.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 51446780 24-Apr-2018 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

sctp: fix identification of new acks for SFR-CACC

It's currently written as:

if (!tchunk->tsn_gap_acked) { [1]
tchunk->tsn_gap_acked = 1;
...
}

if (TSN_lte(tsn, sack_ctsn)) {
if (!tchunk->tsn_gap_acked) {
/* SFR-CACC processing */
...
}
}

Which causes the SFR-CACC processing on ack reception to never process,
as tchunk->tsn_gap_acked is always true by then. Block [1] was
moved to that position by the commit marked below.

This patch fixes it by doing SFR-CACC processing earlier, before
tsn_gap_acked is set to true.

Fixes: 31b02e154940 ("sctp: Failover transmitted list on transport delete")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 37f47bc9 11-Jan-2018 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

sctp: avoid compiler warning on implicit fallthru

These fall-through are expected.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8e0c3b73 14-Dec-2017 Xin Long <lucien.xin@gmail.com>

sctp: implement generate_ftsn for sctp_stream_interleave

generate_ftsn is added as a member of sctp_stream_interleave, used to
create fwdtsn or ifwdtsn chunk according to abandoned chunks, called
in sctp_retransmit and sctp_outq_sack.

sctp_generate_iftsn works for ifwdtsn, and sctp_generate_fwdtsn is
still used for making fwdtsn.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 779edd73 25-Nov-2017 Xin Long <lucien.xin@gmail.com>

sctp: do not abandon the other frags in unsent outq if one msg has outstanding frags

Now for the abandoned chunks in unsent outq, it would just free the chunks.
Because no tsn is assigned to them yet, there's no need to send fwd tsn to
peer, unlike for the abandoned chunks in sent outq.

The problem is when parts of the msg have been sent and the other frags
are still in unsent outq, if they are abandoned/dropped, the peer would
never get this msg reassembled.

So these frags in unsent outq can't be dropped if this msg already has
outstanding frags.

This patch does the check in sctp_chunk_abandoned and
sctp_prsctp_prune_unsent.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e5f61296 25-Nov-2017 Xin Long <lucien.xin@gmail.com>

sctp: abandon the whole msg if one part of a fragmented message is abandoned

As rfc3758#section-3.1 demands:

A3) When a TSN is "abandoned", if it is part of a fragmented message,
all other TSN's within that fragmented message MUST be abandoned
at the same time.

Besides, if it couldn't handle this, the rest frags would never get
assembled in peer side.

This patch supports it by adding abandoned flag in sctp_datamsg, when
one chunk is being abandoned, set chunk->msg->abandoned as well. Next
time when checking for abandoned, go checking chunk->msg->abandoned
first.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d30fc512 25-Nov-2017 Xin Long <lucien.xin@gmail.com>

sctp: only update outstanding_bytes for transmitted queue when doing prsctp_prune

Now outstanding_bytes is only increased when appending chunks into one
packet and sending it at 1st time, while decreased when it is about to
move into retransmit queue. It means outstanding_bytes value is already
decreased for all chunks in retransmit queue.

However sctp_prsctp_prune_sent is a common function to check the chunks
in both transmitted and retransmit queue, it decrease outstanding_bytes
when moving a chunk into abandoned queue from either of them.

It could cause outstanding_bytes underflow, as it also decreases it's
value for the chunks in retransmit queue.

This patch fixes it by only updating outstanding_bytes for transmitted
queue when pruning queues for prsctp prio policy, the same fix is also
needed in sctp_check_transmitted.

Fixes: 8dbdf1f5b09c ("sctp: implement prsctp PRIO policy")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5bbbbe32 03-Oct-2017 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

sctp: introduce stream scheduler foundations

This patch introduces the hooks necessary to do stream scheduling, as
per RFC Draft ndata. It also introduces the first scheduler, which is
what we do today but now factored out: first come first served (FCFS).

With stream scheduling now we have to track which chunk was enqueued on
which stream and be able to select another other than the in front of
the main outqueue. So we introduce a list on sctp_stream_out_ext
structure for this purpose.

We reuse sctp_chunk->transmitted_list space for the list above, as the
chunk cannot belong to the two lists at the same time. By using the
union in there, we can have distinct names for these moments.

sctp_sched_ops are the operations expected to be implemented by each
scheduler. The dequeueing is a bit particular to this implementation but
it is to match how we dequeue packets today. We first dequeue and then
check if it fits the packet and if not, we requeue it at head. Thus why
we don't have a peek operation but have dequeue_done instead, which is
called once the chunk can be safely considered as transmitted.

The check removed from sctp_outq_flush is now performed by
sctp_stream_outq_migrate, which is only called during assoc setup.
(sctp_sendmsg() also checks for it)

The only operation that is foreseen but not yet added here is a way to
signalize that a new packet is starting or that the packet is done, for
round robin scheduler per packet, but is intentionally left to the
patch that actually implements it.

Support for I-DATA chunks, also described in this RFC, with user message
interleaving is straightforward as it just requires the schedulers to
probe for the feature and ignore datamsg boundaries when dequeueing.

See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f952be79 03-Oct-2017 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

sctp: introduce struct sctp_stream_out_ext

With the stream schedulers, sctp_stream_out will become too big to be
allocated by kmalloc and as we need to allocate with BH disabled, we
cannot use __vmalloc in sctp_stream_init().

This patch moves out the stats from sctp_stream_out to
sctp_stream_out_ext, which will be allocated only when the application
tries to sendmsg something on it.

Just the introduction of sctp_stream_out_ext would already fix the issue
described above by splitting the allocation in two. Moving the stats
to it also reduces the pressure on the allocator as we will ask for less
memory atomically when creating the socket and we will use GFP_KERNEL
later.

Then, for stream schedulers, we will just use sctp_stream_out_ext.

Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 86b36f2a 05-Aug-2017 Xin Long <lucien.xin@gmail.com>

sctp: remove the typedef sctp_xmit_t

This patch is to remove the typedef sctp_xmit_t, and
replace with enum sctp_xmit in the places where it's
using this typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 125c2982 05-Aug-2017 Xin Long <lucien.xin@gmail.com>

sctp: remove the typedef sctp_retransmit_reason_t

This patch is to remove the typedef sctp_retransmit_reason_t, and
replace with enum sctp_retransmit_reason in the places where it's
using this typedef.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# afd93b7b 22-Jul-2017 Xin Long <lucien.xin@gmail.com>

sctp: remove the typedef sctp_sack_variable_t

This patch is to remove the typedef sctp_sack_variable_t, and
replace with union sctp_sack_variable in the places where it's
using this typedef.

It is also to fix some indents in sctp_acked().

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 63354797 30-Jun-2017 Reshetova, Elena <elena.reshetova@intel.com>

net: convert sk_buff.users from atomic_t to refcount_t

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cee360ab 31-May-2017 Xin Long <lucien.xin@gmail.com>

sctp: define the member stream as an object instead of pointer in asoc

As Marcelo's suggestion, stream is a fixed size member of asoc and would
not grow with more streams. To avoid an allocation for it, this patch is
to define it as an object instead of pointer and update the places using
it, also create sctp_stream_update() called in sctp_assoc_update() to
migrate the stream info from one stream to another.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d229d48d 01-Apr-2017 Xin Long <lucien.xin@gmail.com>

sctp: add SCTP_PR_STREAM_STATUS sockopt for prsctp

Before when implementing sctp prsctp, SCTP_PR_STREAM_STATUS wasn't
added, as it needs to save abandoned_(un)sent for every stream.

After sctp stream reconf is added in sctp, assoc has structure
sctp_stream_out to save per stream info.

This patch is to add SCTP_PR_STREAM_STATUS by putting the prsctp
per stream statistics into sctp_stream_out.

v1->v2:
fix an indent issue.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# afe89962 31-Mar-2017 Xin Long <lucien.xin@gmail.com>

sctp: use right in and out stream cnt

Since sctp reconf was added in sctp, the real cnt of in/out stream
have not been c.sinit_max_instreams and c.sinit_num_ostreams any
more.

This patch is to replace them with stream->in/outcnt.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 23bb09cf 18-Mar-2017 Xin Long <lucien.xin@gmail.com>

sctp: out_qlen should be updated when pruning unsent queue

This patch is to fix the issue that sctp_prsctp_prune_sent forgot
to update q->out_qlen when removing a chunk from unsent queue.

Fixes: 8dbdf1f5b09c ("sctp: implement prsctp PRIO policy")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c86a773c 06-Feb-2017 Julian Anastasov <ja@ssi.bg>

sctp: add dst_pending_confirm flag

Add new transport flag to allow sockets to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
The flag is propagated from transport to every packet.
It is reset when cached dst is reset.

Reported-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 5110effee8fd ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bedf35d ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7f9d68ac 17-Jan-2017 Xin Long <lucien.xin@gmail.com>

sctp: implement sender-side procedures for SSN Reset Request Parameter

This patch is to implement sender-side procedures for the Outgoing
and Incoming SSN Reset Request Parameter described in rfc6525 section
5.1.2 and 5.1.3.

It is also add sockopt SCTP_RESET_STREAMS in rfc6525 section 6.3.2
for users.

Note that the new asoc member strreset_outstanding is to make sure
only one reconf request chunk on the fly as rfc6525 section 5.1.1
demands.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# eb004603 10-Jan-2017 Colin Ian King <colin.king@canonical.com>

sctp: Fix spelling mistake: "Atempt" -> "Attempt"

Trivial fix to spelling mistake in WARN_ONCE message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cc6ac9bc 07-Oct-2016 Xin Long <lucien.xin@gmail.com>

sctp: reuse sent_count to avoid retransmitted chunks for RTT measurements

Now sctp uses chunk->resent to record if a chunk is retransmitted, for
RTT measurements with retransmitted DATA chunks. chunk->sent_count was
introduced to record how many times one chunk has been sent for prsctp
RTX policy before. We actually can know if one chunk is retransmitted
by checking chunk->sent_count is greater than 1.

This patch is to remove resent from sctp_chunk and reuse sent_count
to avoid retransmitted chunks for RTT measurements.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# be4947bf 28-Sep-2016 Xin Long <lucien.xin@gmail.com>

sctp: change to check peer prsctp_capable when using prsctp polices

Now before using prsctp polices, sctp uses asoc->prsctp_enable to
check if prsctp is enabled. However asoc->prsctp_enable is set only
means local host support prsctp, sctp should not abandon packet if
peer host doesn't enable prsctp.

So this patch is to use asoc->peer.prsctp_capable to check if prsctp
is enabled on both side, instead of asoc->prsctp_enable, as asoc's
peer.prsctp_capable is set only when local and peer both enable prsctp.

Fixes: a6c2f792873a ("sctp: implement prsctp TTL policy")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0605483f 28-Sep-2016 Xin Long <lucien.xin@gmail.com>

sctp: remove prsctp_param from sctp_chunk

Now sctp uses chunk->prsctp_param to save the prsctp param for all the
prsctp polices, we didn't need to introduce prsctp_param to sctp_chunk.
We can just use chunk->sinfo.sinfo_timetolive for RTX and BUF polices,
and reuse msg->expires_at for TTL policy, as the prsctp polices and old
expires policy are mutual exclusive.

This patch is to remove prsctp_param from sctp_chunk, and reuse msg's
expires_at for TTL and chunk's sinfo.sinfo_timetolive for RTX and BUF
polices.

Note that sctp can't use chunk's sinfo.sinfo_timetolive for TTL policy,
as it needs a u64 variables to save the expires_at time.

This one also fixes the "netperf-Throughput_Mbps -37.2% regression"
issue.

Fixes: a6c2f792873a ("sctp: implement prsctp TTL policy")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a3007446 20-Sep-2016 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

sctp: fix the handling of SACK Gap Ack blocks

sctp_acked() is using 32bit arithmetics on 16bits vars, via TSN_lte()
macros, which is weird and confusing.

Once the offset to ctsn is calculated, all wrapping is already handled
and thus to verify the Gap Ack blocks we can just use pure
less/big-or-equal than checks.

Also, rename gap variable to tsn_offset, so it's more meaningful, as
it doesn't point to any gap at all.

Even so, I don't think this discrepancy resulted in any practical bug.

This patch is a preparation for the next one, which will introduce
typecheck() for TSN_lte() macros and would cause a compile error here.

Suggested-by: David Laight <David.Laight@ACULAB.COM>
Reported-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 83dbc3d4 13-Sep-2016 Xin Long <lucien.xin@gmail.com>

sctp: make sctp_outq_flush/tail/uncork return void

sctp_outq_flush return value is meaningless now, this patch is
to make sctp_outq_flush return void, as well as sctp_outq_fail
and sctp_outq_uncork.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 64519440 13-Sep-2016 Xin Long <lucien.xin@gmail.com>

sctp: save transmit error to sk_err in sctp_outq_flush

Every time when sctp calls sctp_outq_flush, it sends out the chunks of
control queue, retransmit queue and data queue. Even if some trunks are
failed to transmit, it still has to flush all the transports, as it's
the only chance to clean that transmit_list.

So the latest transmit error here should be returned back. This transmit
error is an internal error of sctp stack.

I checked all the places where it uses the transmit error (the return
value of sctp_outq_flush), most of them are actually just save it to
sk_err.

Except for sctp_assoc/endpoint_bh_rcv, they will drop the chunk if
it's failed to send a REPLY, which is actually incorrect, as we can't
be sure the error that sctp_outq_flush returns is from sending that
REPLY.

So it's meaningless for sctp_outq_flush to return error back.

This patch is to save transmit error to sk_err in sctp_outq_flush, the
new error can update the old value. Eventually, sctp_wait_for_* would
check for it.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b61c654f 13-Sep-2016 Xin Long <lucien.xin@gmail.com>

sctp: free msg->chunks when sctp_primitive_SEND return err

Last patch "sctp: do not return the transmit err back to sctp_sendmsg"
made sctp_primitive_SEND return err only when asoc state is unavailable.
In this case, chunks are not enqueued, they have no chance to be freed if
we don't take care of them later.

This Patch is actually to revert commit 1cd4d5c4326a ("sctp: remove the
unused sctp_datamsg_free()"), commit 69b5777f2e57 ("sctp: hold the chunks
only after the chunk is enqueued in outq") and commit 8b570dc9f7b6 ("sctp:
only drop the reference on the datamsg after sending a msg"), to use
sctp_datamsg_free to free the chunks of current msg.

Fixes: 8b570dc9f7b6 ("sctp: only drop the reference on the datamsg after sending a msg")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2c89791e 13-Sep-2016 Xin Long <lucien.xin@gmail.com>

sctp: remove the unnecessary state check in sctp_outq_tail

Data Chunks are only sent by sctp_primitive_SEND, in which sctp checks
the asoc's state through statetable before calling sctp_outq_tail. So
there's no need to check the asoc's state again in sctp_outq_tail.

Besides, sctp_do_sm is protected by lock_sock, even if sending msg is
interrupted by timer events, the event's processes still need to acquire
lock_sock first. It means no others CMDs can be enqueue into side effect
list before CMD_SEND_MSG to change asoc->state, so it's safe to remove it.

This patch is to remove redundant asoc->state check from sctp_outq_tail.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8dbdf1f5 09-Jul-2016 Xin Long <lucien.xin@gmail.com>

sctp: implement prsctp PRIO policy

prsctp PRIO policy is a policy to abandon lower priority chunks when
asoc doesn't have enough snd buffer, so that the current chunk with
higher priority can be queued successfully.

Similar to TTL/RTX policy, we will set the priority of the chunk to
prsctp_param with sinfo->sinfo_timetolive in sctp_set_prsctp_policy().
So if PRIO policy is enabled, msg->expire_at won't work.

asoc->sent_cnt_removable will record how many chunks can be checked to
remove. If priority policy is enabled, when the chunk is queued into
the out_queue, we will increase sent_cnt_removable. When the chunk is
moved to abandon_queue or dequeue and free, we will decrease
sent_cnt_removable.

In sctp_sendmsg, we will check if there is enough snd buffer for current
msg and if sent_cnt_removable is not 0. Then try to abandon chunks in
sctp_prune_prsctp when sendmsg from the retransmit/transmited queue, and
free chunks from out_queue in right order until the abandon+free size >
msg_len - sctp_wfree. For the abandon size, we have to wait until it
sends FORWARD TSN, receives the sack and the chunks are really freed.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ba6f5e33 06-Apr-2016 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

sctp: avoid refreshing heartbeat timer too often

Currently on high rate SCTP streams the heartbeat timer refresh can
consume quite a lot of resources as timer updates are costly and it
contains a random factor, which a) is also costly and b) invalidates
mod_timer() optimization for not editing a timer to the same value.
It may even cause the timer to be slightly advanced, for no good reason.

As suggested by David Laight this patch now removes this timer update
from hot path by leaving the timer on and re-evaluating upon its
expiration if the heartbeat is still needed or not, similarly to what is
done for TCP. If it's not needed anymore the timer is re-scheduled to
the new timeout, considering the time already elapsed.

For this, we now record the last tx timestamp per transport, updated in
the same spots as hb timer was restarted on tx. Also split up
sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and
sctp_transport_reset_hb_timer, so we can re-arm T3 without re-arming the
heartbeat one.

On loopback with MTU of 65535 and data chunks with 1636, so that we
have a considerable amount of chunks without stressing system calls,
netperf -t SCTP_STREAM -l 30, perf looked like this before:

Samples: 103K of event 'cpu-clock', Event count (approx.): 25833000000
Overhead Command Shared Object Symbol
+ 6,15% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string
- 5,43% netperf [kernel.vmlinux] [k] _raw_write_unlock_irqrestore
- _raw_write_unlock_irqrestore
- 96,54% _raw_spin_unlock_irqrestore
- 36,14% mod_timer
+ 97,24% sctp_transport_reset_timers
+ 2,76% sctp_do_sm
+ 33,65% __wake_up_sync_key
+ 28,77% sctp_ulpq_tail_event
+ 1,40% del_timer
- 1,84% mod_timer
+ 99,03% sctp_transport_reset_timers
+ 0,97% sctp_do_sm
+ 1,50% sctp_ulpq_tail_event

And after this patch, now with netperf -l 60:

Samples: 230K of event 'cpu-clock', Event count (approx.): 57707250000
Overhead Command Shared Object Symbol
+ 5,65% netperf [kernel.vmlinux] [k] memcpy_erms
+ 5,59% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string
- 5,05% netperf [kernel.vmlinux] [k] _raw_spin_unlock_irqrestore
- _raw_spin_unlock_irqrestore
+ 49,89% __wake_up_sync_key
+ 45,68% sctp_ulpq_tail_event
- 2,85% mod_timer
+ 76,51% sctp_transport_reset_t3_rtx
+ 23,49% sctp_do_sm
+ 1,55% del_timer
+ 2,50% netperf [sctp] [k] sctp_datamsg_from_user
+ 2,26% netperf [sctp] [k] sctp_sendmsg

Throughput-wise, from 6800mbps without the patch to 7050mbps with it,
~3.7%.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 31b055ef 18-Mar-2016 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

sctp: do not leak chunks that are sent to unconfirmed paths

Currently, if a chunk is scheduled to be sent through a transport that
is currently unconfirmed, it will be leaked as it is dequeued from outq
and is not re-queued nor freed.

As I'm not aware of any situation that may lead to this situation, I'm
fixing this by freeing the chunk and also logging a trace so that we can
fix the other bug if it ever happens.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cea8768f 10-Mar-2016 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

sctp: allow sctp_transmit_packet and others to use gfp

Currently sctp_sendmsg() triggers some calls that will allocate memory
with GFP_ATOMIC even when not necessary. In the case of
sctp_packet_transmit it will allocate a linear skb that will be used to
construct the packet and this may cause sends to fail due to ENOMEM more
often than anticipated specially with big MTUs.

This patch thus allows it to inherit gfp flags from upper calls so that
it can use GFP_KERNEL if it was triggered by a sctp_sendmsg call or
similar. All others, like retransmits or flushes started from BH, are
still allocated using GFP_ATOMIC.

In netperf tests this didn't result in any performance drawbacks when
memory is not too fragmented and made it trigger ENOMEM way less often.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8a0d19c5 05-Dec-2015 lucien <lucien.xin@gmail.com>

sctp: start t5 timer only when peer rwnd is 0 and local state is SHUTDOWN_PENDING

when A sends a data to B, then A close() and enter into SHUTDOWN_PENDING
state, if B neither claim his rwnd is 0 nor send SACK for this data, A
will keep retransmitting this data until t5 timeout, Max.Retrans times
can't work anymore, which is bad.

if B's rwnd is not 0, it should send abort after Max.Retrans times, only
when B's rwnd == 0 and A's retransmitting beyonds Max.Retrans times, A
will start t5 timer, which is also commit f8d960524328 ("sctp: Enforce
retransmission limit during shutdown") means, but it lacks the condition
peer rwnd == 0.

so fix it by adding a bit (zero_window_announced) in peer to record if
the last rwnd is 0. If it was, zero_window_announced will be set. and use
this bit to decide if start t5 timer when local.state is SHUTDOWN_PENDING.

Fixes: commit f8d960524328 ("sctp: Enforce retransmission limit during shutdown")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 69b5777f 05-Dec-2015 lucien <lucien.xin@gmail.com>

sctp: hold the chunks only after the chunk is enqueued in outq

When a msg is sent, sctp will hold the chunks of this msg and then try
to enqueue them. But if the chunks are not enqueued in sctp_outq_tail()
because of the invalid state, sctp_cmd_interpreter() may still return
success to sctp_sendmsg() after calling sctp_outq_flush(), these chunks
will become orphans and will leak.

So we fix them by moving sctp_chunk_hold() to sctp_outq_tail(), where we
are sure that the chunk is going to get queued.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 526cbef7 22-Jul-2014 David Laight <David.Laight@ACULAB.COM>

net: sctp: Rename SCTP_XMIT_NAGLE_DELAY to SCTP_XMIT_DELAY

MSG_MORE and 'corking' a socket would require that the transmit of
a data chunk be delayed.
Rename the return value to be less specific.

Signed-off-by: David Laight <david.laight@aculab.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 138ce910 14-Jul-2014 Fabian Frederick <fabf@skynet.be>

net: sctp: remove unnecessary break after return/goto

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 619a60ee 02-Jan-2014 Vlad Yasevich <vyasevich@gmail.com>

sctp: Remove outqueue empty state

The SCTP outqueue structure maintains a data chunks
that are pending transmission, the list of chunks that
are pending a retransmission and a length of data in
flight. It also tries to keep the emtpy state so that
it can performe shutdown sequence or notify user.

The problem is that the empy state is inconsistently
tracked. It is possible to completely drain the queue
without sending anything when using PR-SCTP. In this
case, the empty state will not be correctly state as
report by Jamal Hadi Salim <jhs@mojatatu.com>. This
can cause an association to be perminantly stuck in the
SHUTDOWN_PENDING state.

Additionally, SCTP is incredibly inefficient when setting
the empty state. Even though all the data is availaible
in the outqueue structure, we ignore it and walk a list
of trasnports.

In the end, we can completely remove the extra empty
state and figure out if the queue is empty by looking
at 3 things: length of pending data, length of in-flight
data, and exisiting of retransmit data. All of these
are already in the strucutre.

Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cb3f837b 22-Dec-2013 wangweidong <wangweidong1@huawei.com>

sctp: fix checkpatch errors with space required or prohibited

fix checkpatch errors while the space is required or prohibited
to the "=,()++..."

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4b2f13a2 06-Dec-2013 Jeff Kirsher <jeffrey.t.kirsher@intel.com>

sctp: Fix FSF address in file headers

Several files refer to an old address for the Free Software Foundation
in the file header comment. Resolve by replacing the address with
the URL <http://www.gnu.org/licenses/> so that we do not have to keep
updating the header comments anytime the address changes.

CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6eabca54 24-Nov-2013 Xufeng Zhang <xufeng.zhang@windriver.com>

sctp: Restore 'resent' bit to avoid retransmitted chunks for RTT measurements

Currently retransmitted DATA chunks could also be used for
RTT measurements since there are no flag to identify whether
the transmitted DATA chunk is a new one or a retransmitted one.
This problem is introduced by commit ae19c5486 ("sctp: remove
'resent' bit from the chunk") which inappropriately removed the
'resent' bit completely, instead of doing this, we should set
the resent bit only for the retransmitted DATA chunks.

Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d6c41614 21-Nov-2013 Chang Xiangzhong <changxiangzhong@gmail.com>

net: sctp: find the correct highest_new_tsn in sack

Function sctp_check_transmitted(transport t, ...) would iterate all of
transport->transmitted queue and looking for the highest __newly__ acked tsn.
The original algorithm would depend on the order of the assoc->transport_list
(in function sctp_outq_sack line 1215 - 1226). The result might not be the
expected due to the order of the tranport_list.

Solution: checking if the exising is smaller than the new one before assigning

Signed-off-by: Chang Xiangzhong <changxiangzhong@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 477143e3 06-Aug-2013 Daniel Borkmann <daniel@iogearbox.net>

net: sctp: trivial: update bug report in header comment

With the restructuring of the lksctp.org site, we only allow bug
reports through the SCTP mailing list linux-sctp@vger.kernel.org,
not via SF, as SF is only used for web hosting and nothing more.
While at it, also remove the obvious statement that bugs will be
fixed and incooperated into the kernel.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 91705c61 23-Jul-2013 Daniel Borkmann <daniel@iogearbox.net>

net: sctp: trivial: update mailing list address

The SCTP mailing list address to send patches or questions
to is linux-sctp@vger.kernel.org and not
lksctp-developers@lists.sourceforge.net anymore. Therefore,
update all occurences.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8c2f414a 09-Jul-2013 Daniel Borkmann <daniel@iogearbox.net>

net: sctp: confirm route during forward progress

This fix has been proposed originally by Vlad Yasevich. He says:

When SCTP makes forward progress (receives a SACK that acks new chunks,
renegs, or answeres 0-window probes) or when HB-ACK arrives, mark
the route as confirmed so we don't unnecessarily send NUD probes.

Having a simple SCTP client/server that exchange data chunks every 1sec,
without this patch ARP requests are sent periodically every 40-60sec.
With this fix applied, an ARP request is only done once right at the
"session" beginning. Also, when clearing the related ARP cache entry
manually during the session, a new request is correctly done. I have
only "backported" this to net-next and tested that it works, so full
credit goes to Vlad.

Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e02010ad 01-Jul-2013 Daniel Borkmann <daniel@iogearbox.net>

net: sctp: get rid of SCTP_DBG_TSNS entirely

After having reworked the debugging framework, Neil and Vlad agreed to
get rid of the leftover SCTP_DBG_TSNS code for a couple of reasons:

We can use systemtap scripts to investigate these things, we now have
pr_debug() helpers that make life easier, and if we really need anything
else besides those tools, we will be forced to come up with something
better than we have there. Therefore, get rid of this ifdef debugging
code entirely for now.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bb33381d 28-Jun-2013 Daniel Borkmann <daniel@iogearbox.net>

net: sctp: rework debugging framework to use pr_debug and friends

We should get rid of all own SCTP debug printk macros and use the ones
that the kernel offers anyway instead. This makes the code more readable
and conform to the kernel code, and offers all the features of dynamic
debbuging that pr_debug() et al has, such as only turning on/off portions
of debug messages at runtime through debugfs. The runtime cost of having
CONFIG_DYNAMIC_DEBUG enabled, but none of the debug statements printing,
is negligible [1]. If kernel debugging is completly turned off, then these
statements will also compile into "empty" functions.

While we're at it, we also need to change the Kconfig option as it /now/
only refers to the ifdef'ed code portions in outqueue.c that enable further
debugging/tracing of SCTP transaction fields. Also, since SCTP_ASSERT code
was enabled with this Kconfig option and has now been removed, we
transform those code parts into WARNs resp. where appropriate BUG_ONs so
that those bugs can be more easily detected as probably not many people
have SCTP debugging permanently turned on.

To turn on all SCTP debugging, the following steps are needed:

# mount -t debugfs none /sys/kernel/debug
# echo -n 'module sctp +p' > /sys/kernel/debug/dynamic_debug/control

This can be done more fine-grained on a per file, per line basis and others
as described in [2].

[1] https://www.kernel.org/doc/ols/2009/ols2009-pages-39-46.pdf
[2] Documentation/dynamic-debug-howto.txt

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c5c7774d 12-Jun-2013 Neil Horman <nhorman@tuxdriver.com>

sctp: fully initialize sctp_outq in sctp_outq_init

In commit 2f94aabd9f6c925d77aecb3ff020f1cc12ed8f86
(refactor sctp_outq_teardown to insure proper re-initalization)
we modified sctp_outq_teardown to use sctp_outq_init to fully re-initalize the
outq structure. Steve West recently asked me why I removed the q->error = 0
initalization from sctp_outq_teardown. I did so because I was operating under
the impression that sctp_outq_init would properly initalize that value for us,
but it doesn't. sctp_outq_init operates under the assumption that the outq
struct is all 0's (as it is when called from sctp_association_init), but using
it in __sctp_outq_teardown violates that assumption. We should do a memset in
sctp_outq_init to ensure that the entire structure is in a known state there
instead.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: "West, Steve (NSN - US/Fort Worth)" <steve.west@nsn.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: netdev@vger.kernel.org
CC: davem@davemloft.net
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dacda32e 16-Apr-2013 Daniel Borkmann <daniel@iogearbox.net>

net: sctp: outqueue: simplify sctp_outq_uncork function

Just a minor edit to simplify the function. No need for this
error variable here.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 165a4c31 16-Apr-2013 Daniel Borkmann <daniel@iogearbox.net>

net: sctp: sctp_outq: remove 'malloced' from its struct

sctp_outq is embedded into sctp_association, and thus never
kmalloced in any way. Also, malloced is always 0, thus kfree()
is never called. Therefore, remove that dead piece of code.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 25cc4ae9 03-Feb-2013 Ying Xue <ying.xue@windriver.com>

net: remove redundant check for timer pending state before del_timer

As in del_timer() there has already placed a timer_pending() function
to check whether the timer to be deleted is pending or not, it's
unnecessary to check timer pending state again before del_timer() is
called.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2f94aabd 17-Jan-2013 Neil Horman <nhorman@tuxdriver.com>

sctp: refactor sctp_outq_teardown to insure proper re-initalization

Jamie Parsons reported a problem recently, in which the re-initalization of an
association (The duplicate init case), resulted in a loss of receive window
space. He tracked down the root cause to sctp_outq_teardown, which discarded
all the data on an outq during a re-initalization of the corresponding
association, but never reset the outq->outstanding_data field to zero. I wrote,
and he tested this fix, which does a proper full re-initalization of the outq,
fixing this problem, and hopefully future proofing us from simmilar issues down
the road.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Jamie Parsons <Jamie.Parsons@metaswitch.com>
Tested-by: Jamie Parsons <Jamie.Parsons@metaswitch.com>
CC: Jamie Parsons <Jamie.Parsons@metaswitch.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 196d6759 30-Nov-2012 Michele Baldessari <michele@acksyn.org>

sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call

The current SCTP stack is lacking a mechanism to have per association
statistics. This is an implementation modeled after OpenSolaris'
SCTP_GET_ASSOC_STATS.

Userspace part will follow on lksctp if/when there is a general ACK on
this.
V4:
- Move ipackets++ before q->immediate.func() for consistency reasons
- Move sctp_max_rto() at the end of sctp_transport_update_rto() to avoid
returning bogus RTO values
- return asoc->rto_min when max_obs_rto value has not changed

V3:
- Increase ictrlchunks in sctp_assoc_bh_rcv() as well
- Move ipackets++ to sctp_inq_push()
- return 0 when no rto updates took place since the last call

V2:
- Implement partial retrieval of stat struct to cope for future expansion
- Kill the rtxpackets counter as it cannot be precise anyway
- Rename outseqtsns to outofseqtsns to make it clearer that these are out
of sequence unexpected TSNs
- Move asoc->ipackets++ under a lock to avoid potential miscounts
- Fold asoc->opackets++ into the already existing asoc check
- Kill unneeded (q->asoc) test when increasing rtxchunks
- Do not count octrlchunks if sending failed (SCTP_XMIT_OK != 0)
- Don't count SHUTDOWNs as SACKs
- Move SCTP_GET_ASSOC_STATS to the private space API
- Adjust the len check in sctp_getsockopt_assoc_stats() to allow for
future struct growth
- Move association statistics in their own struct
- Update idupchunks when we send a SACK with dup TSNs
- return min_rto in max_rto when RTO has not changed. Also return the
transport when max_rto last changed.

Signed-off: Michele Baldessari <michele@acksyn.org>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>

Signed-off-by: David S. Miller <davem@davemloft.net>


# edfee033 02-Oct-2012 Nicolas Dichtel <nicolas.dichtel@6wind.com>

sctp: check src addr when processing SACK to update transport state

Suppose we have an SCTP connection with two paths. After connection is
established, path1 is not available, thus this path is marked as inactive. Then
traffic goes through path2, but for some reasons packets are delayed (after
rto.max). Because packets are delayed, the retransmit mechanism will switch
again to path1. At this time, we receive a delayed SACK from path2. When we
update the state of the path in sctp_check_transmitted(), we do not take into
account the source address of the SACK, hence we update the wrong path.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 54a27924 03-Sep-2012 Wei Yongjun <yongjun_wei@trendmicro.com.cn>

sctp: use list_move_tail instead of list_del/list_add_tail

Using list_move_tail() instead of list_del() + list_add_tail().

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b01a2407 06-Aug-2012 Eric W. Biederman <ebiederm@xmission.com>

sctp: Make the mib per network namespace

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5aa93bcf 21-Jul-2012 Neil Horman <nhorman@tuxdriver.com>

sctp: Implement quick failover draft from tsvwg

I've seen several attempts recently made to do quick failover of sctp transports
by reducing various retransmit timers and counters. While its possible to
implement a faster failover on multihomed sctp associations, its not
particularly robust, in that it can lead to unneeded retransmits, as well as
false connection failures due to intermittent latency on a network.

Instead, lets implement the new ietf quick failover draft found here:
http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05

This will let the sctp stack identify transports that have had a small number of
errors, and avoid using them quickly until their reliability can be
re-established. I've tested this out on two virt guests connected via multiple
isolated virt networks and believe its in compliance with the above draft and
works well.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
CC: joe@perches.com
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 95c96174 14-Apr-2012 Eric Dumazet <eric.dumazet@gmail.com>

net: cleanup unsigned to unsigned int

Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a76c0adf 18-Dec-2011 Thomas Graf <tgraf@redhat.com>

sctp: Do not account for sizeof(struct sk_buff) in estimated rwnd

When checking whether a DATA chunk fits into the estimated rwnd a
full sizeof(struct sk_buff) is added to the needed chunk size. This
quickly exhausts the available rwnd space and leads to packets being
sent which are much below the PMTU limit. This can lead to much worse
performance.

The reason for this behaviour was to avoid putting too much memory
pressure on the receiver. The concept is not completely irational
because a Linux receiver does in fact clone an skb for each DATA chunk
delivered. However, Linux also reserves half the available socket
buffer space for data structures therefore usage of it is already
accounted for.

When proposing to change this the last time it was noted that this
behaviour was introduced to solve a performance issue caused by rwnd
overusage in combination with small DATA chunks.

Trying to reproduce this I found that with the sk_buff overhead removed,
the performance would improve significantly unless socket buffer limits
are increased.

The following numbers have been gathered using a patched iperf
supporting SCTP over a live 1 Gbit ethernet network. The -l option
was used to limit DATA chunk sizes. The numbers listed are based on
the average of 3 test runs each. Default values have been used for
sk_(r|w)mem.

Chunk
Size Unpatched No Overhead
-------------------------------------
4 15.2 Kbit [!] 12.2 Mbit [!]
8 35.8 Kbit [!] 26.0 Mbit [!]
16 95.5 Kbit [!] 54.4 Mbit [!]
32 106.7 Mbit 102.3 Mbit
64 189.2 Mbit 188.3 Mbit
128 331.2 Mbit 334.8 Mbit
256 537.7 Mbit 536.0 Mbit
512 766.9 Mbit 766.6 Mbit
1024 810.1 Mbit 808.6 Mbit

Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f207c050 15-Jun-2011 Michio Honda <micchie@sfc.wide.ad.jp>

sctp: HEARTBEAT negotiation after ASCONF

This patch fixes BUG that the ASCONF receiver transmits DATA chunks
to the newly added UNCONFIRMED destination.

Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f8d96052 06-Jul-2011 Thomas Graf <tgraf@infradead.org>

sctp: Enforce retransmission limit during shutdown

When initiating a graceful shutdown while having data chunks
on the retransmission queue with a peer which is in zero
window mode the shutdown is never completed because the
retransmission error count is reset periodically by the
following two rules:

- Do not timeout association while doing zero window probe.
- Reset overall error count when a heartbeat request has
been acknowledged.

The graceful shutdown will wait for all outstanding TSN to
be acknowledged before sending the SHUTDOWN request. This
never happens due to the peer's zero window not acknowledging
the continuously retransmitted data chunks. Although the
error counter is incremented for each failed retransmission,
the receiving of the SACK announcing the zero window clears
the error count again immediately. Also heartbeat requests
continue to be sent periodically. The peer acknowledges these
requests causing the error counter to be reset as well.

This patch changes behaviour to only reset the overall error
counter for the above rules while not in shutdown. After
reaching the maximum number of retransmission attempts, the
T5 shutdown guard timer is scheduled to give the receiver
some additional time to recover. The timer is stopped as soon
as the receiver acknowledges any data.

The issue can be easily reproduced by establishing a sctp
association over the loopback device, constantly queueing
data at the sender while not reading any at the receiver.
Wait for the window to reach zero, then initiate a shutdown
by killing both processes simultaneously. The association
will never be freed and the chunks on the retransmission
queue will be retransmitted indefinitely.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8a07eb0a 26-Apr-2011 Michio Honda <micchie@sfc.wide.ad.jp>

sctp: Add ASCONF operation on the single-homed host

In this case, the SCTP association transmits an ASCONF packet
including addition of the new IP address and deletion of the old
address. This patch implements this functionality.
In this case, the ASCONF chunk is added to the beginning of the
queue, because the other chunks cannot be transmitted in this state.

Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4c6a6f42 19-Apr-2011 Wei Yongjun <yjwei@cn.fujitsu.com>

sctp: move chunk from retransmit queue to abandoned list

If there is still data waiting to retransmit and remain in
retransmit queue, while doing the next retransmit, if the
chunk is abandoned, we should move it to abandoned list.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0b8f9e25 19-Apr-2011 Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: remove completely unsed EMPTY state

SCTP does not SCTP_STATE_EMPTY and we can never be in
that state. Remove useless code.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f246a7b7 18-Apr-2011 Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: teach CACC algorithm about removed transports

When we have have to remove a transport due to ASCONF, we move
the data to a new active path. This can trigger CACC algorithm
to not mark that data as missing when SACKs arrive. This is
because the transport passed to the CACC algorithm is the one
this data is sitting on, not the one it was sent on (that one
may be gone). So, by sending the original transport (even if
it's NULL), we may start marking data as missing.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# efea2c6b 04-Mar-2011 Hagen Paul Pfeifer <hagen@jauu.net>

sctp: several declared/set but unused fixes

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 145ce502 24-Aug-2010 Joe Perches <joe@perches.com>

net/sctp: Use pr_fmt and pr_<level>

Change SCTP_DEBUG_PRINTK and SCTP_DEBUG_PRINTK_IPADDR to
use do { print } while (0) guards.
Add SCTP_DEBUG_PRINTK_CONT to fix errors in log when
lines were continued.
Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
Add a missing newline in "Failed bind hash alloc"

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3fa21e07 18-May-2010 Joe Perches <joe@perches.com>

net: Remove unnecessary returns from void function()s

This patch removes from net/ (but not any netfilter files)
all the unnecessary return; statements that precede the
last closing brace of void functions.

It does not remove the returns that are immediately
preceded by a label as gcc doesn't like that.

Done via:
$ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bfa0d984 30-Apr-2010 Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Optimize computation of highest new tsn in SACK.

Right now, if the highest tsn in the SACK doesn't change, we'll
end up scanning the transmitted lists on the transports twice:
once for locating the highest _new_ tsn, and once for actually
tagging chunks as acked. This is a waste, since we can record
the highest _new_ tsn at the same time as tagging chunks. Long
ago this was not possible because we would try to mark chunks
as missing at the same time as tagging them acked and this approach
didn't work. Now that the two steps are separate, we can re-use
the old approach.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# ea862c8d 30-Apr-2010 Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: correctly mark missing chunks in fast recovery

According to RFC 4960 Section 7.2.4:
If an endpoint is in Fast
Recovery and a SACK arrives that advances the Cumulative TSN Ack
Point, the miss indications are incremented for all TSNs reported
missing in the SACK.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# d9efc223 30-Apr-2010 Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Do not force T3 timer on fast retransmissions.

We don't need to force the T3 timer any more and it's
actually wrong to do as it causes too long of a delay.
The timer will be started if one is not running, but if
one is running, we leave it alone.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# ae19c548 30-Apr-2010 Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: remove 'resent' bit from the chunk

The 'resent' bit is used to make sure that we don't update
rto estimate based on retransmitted chunks. However, we already
have the 'rto_pending' bit that we test when need to update rto,
so 'resent' bit is just extra. Additionally, we currently have
a bug in that we always set a 'resent' bit and thus rto estimate
is only updated by Heartbeats.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# ec7b9519 30-Apr-2010 Shan Wei <shanwei@cn.fujitsu.com>

sctp: use sctp_chunk_is_data macro to decide a chunk is data chunk

sctp_chunk_is_data macro is defined to decide that
whether a chunk is data chunk or not.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# bc4f841a 30-Apr-2010 Wei Yongjun <yjwei@cn.fujitsu.com>

sctp: fix to retranmit at least one DATA chunk

While doing retranmit, if control chunk exists, such as
FORWARD TSN chunk, and the DATA chunk can not be bundled with
this control chunk because of PMTU limit, no DATA chunk
will be retranmitted in the current implementation. This
patch makes sure to retranmit at least one DATA chunk in this case.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# bd69b981 30-Apr-2010 Wei Yongjun <yjwei@cn.fujitsu.com>

sctp: assure at least one T3-rtx timer is running if a FORWARD TSN is sent

PR-SCTP extension section 3.5 Sender Side Implementation of PR-SCTP:
C5) If a FORWARD TSN is sent, the sender MUST assure that at
least one T3-rtx timer is running.

So this patch fix to assure at least one T3-rtx timer is running
if a FORWARD TSN is or will to sent.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# 5a0e3ad6 24-Mar-2010 Tejun Heo <tj@kernel.org>

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>


# f64f9e71 29-Nov-2009 Joe Perches <joe@perches.com>

net: Move && and || to end of previous line

Not including net/atm/

Compiled tested x86 allyesconfig only
Added a > 80 column line or two, which I ignored.
Existing checkpatch plaints willfully, cheerfully ignored.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5fdd4bae 29-Nov-2009 Andrei Pelinescu-Onciul <andrei@iptel.org>

sctp: on T3_RTX retransmit all the in-flight chunks

When retransmitting due to T3 timeout, retransmit all the
in-flight chunks for the corresponding transport/path, including
chunks sent less then 1 rto ago.
This is the correct behaviour according to rfc4960 section 6.3.3
E3 and
"Note: Any DATA chunks that were sent to the address for which the
T3-rtx timer expired but did not fit in one MTU (rule E3 above)
should be marked for retransmission and sent as soon as cwnd
allows (normally, when a SACK arrives). ".

This fixes problems when more then one path is present and the T3
retransmission of the first chunk that timeouts stops the T3 timer
for the initial active path, leaving all the other in-flight
chunks waiting forever or until a new chunk is transmitted on the
same path and timeouts (and this will happen only if the cwnd
allows sending new chunks, but since cwnd was dropped to MTU by
the timeout => it will wait until the first heartbeat).

Example: 10 packets in flight, sent at 0.1 s intervals on the
primary path. The primary path is down and the first packet
timeouts. The first packet is retransmitted on another path, the
T3 timer for the primary path is stopped and cwnd is set to MTU.
All the other 9 in-flight packets will not be retransmitted
(unless more new packets are sent on the primary path which depend
on cwnd allowing it, and even in this case the 9 packets will be
retransmitted only after a new packet timeouts which even in the
best case would be more then RTO).

This commit reverts d0ce92910bc04e107b2f3f2048f07e94f570035d and
also removes the now unused transport->last_rto, introduced in
b6157d8e03e1e780660a328f7183bcbfa4a93a19.

p.s The problem is not only when multiple paths are there. It
can happen in a single homed environment. If the application
stops sending data, it possible to have a hung association.

Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 46d5a808 23-Nov-2009 Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Update max.burst implementation

Current implementation of max.burst ends up limiting new
data during cwnd decay period. The decay is happening becuase
the connection is idle and we are allowed to fill the congestion
window. The point of max.burst is to limit micro-bursts in response
to large acks. This still happens, as max.burst is still applied
to each transmit opportunity. It will also apply if a very large
send is made (greater then allowed by burst).

Tested-by: Florian Niederbacher <florian.niederbacher@student.uibk.ac.at>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# b93d6471 23-Nov-2009 Wei Yongjun <yjwei@cn.fujitsu.com>

sctp: implement the sender side for SACK-IMMEDIATELY extension

This patch implement the sender side for SACK-IMMEDIATELY
extension.

Section 4.1. Sender Side Considerations

Whenever the sender of a DATA chunk can benefit from the
corresponding SACK chunk being sent back without delay, the sender
MAY set the I-bit in the DATA chunk header.

Reasons for setting the I-bit include

o The sender is in the SHUTDOWN-PENDING state.

o The application requests to set the I-bit of the last DATA chunk
of a user message when providing the user message to the SCTP
implementation.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# 31b02e15 04-Sep-2009 Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Failover transmitted list on transport delete

Add-IP feature allows users to delete an active transport. If that
transport has chunks in flight, those chunks need to be moved to another
transport or association may get into unrecoverable state.

Reported-by: Rafael Laufer <rlaufer@cisco.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# 76595024 12-Mar-2009 Wei Yongjun <yjwei@cn.fujitsu.com>

sctp: fix to send FORWARD-TSN chunk only if peer has such capable

RFC3758 Section 3.3.1. Sending Forward-TSN-Supported param in INIT

Note that if the endpoint chooses NOT to include the parameter, then
at no time during the life of the association can it send or process
a FORWARD TSN.

If peer does not support PR-SCTP capable, don't send FORWARD-TSN chunk
to peer.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f61f6f82 02-Mar-2009 Wei Yongjun <yjwei@cn.fujitsu.com>

sctp: use time_before or time_after for comparing jiffies

The functions time_before or time_after are more robust
for comparing jiffies against other values.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6574df9a 22-Jan-2009 Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Correctly start rtx timer on new packet transmissions.

Commit 62aeaff5ccd96462b7077046357a6d7886175a57
(sctp: Start T3-RTX timer when fast retransmitting lowest TSN)
introduced a regression where it was possible to forcibly
restart the sctp retransmit timer at the transmission of any
new chunk. This resulted in much longer timeout times and
sometimes hung sctp connections.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c226ef9b 24-Jul-2008 Neil Horman <nhorman@tuxdriver.com>

sctp: reduce memory footprint of sctp_chunk structure

sctp_chunks should be put on a diet. This is some of the low hanging
fruit that we can strip out. Changes all the __s8/__u8 flags to
bitfields. Saves 12 bytes per chunk.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# 845b8eda 23-Jun-2008 Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Retransmit list is ineligable for missing indications

Chunks placed on the retransmit list are marked as inelegible
for fast retrasnmission. Since missing indications determine
when fast reransmission is done, there is not point in calling
sctp_mark_missing() on the retransmit list since those chunks
will not be marked.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# ab5216a5 19-Jun-2008 Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Optimize SFR-CACC transport list walking during SACK processing

There is a possibility of walking the transport list twice during
SACK processing when doing SFR-CACC algorithm. We can restructure
the code to only do this once.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# 2cd9b822 19-Jun-2008 Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Only mark chunks as missing when there are gaps

Frist small step in optimizing SACK processing. Do not call
sctp_mark_missing() when there are no gaps reported and thus
not missing chunks.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# abd0b198 22-Jul-2008 Adrian Bunk <bunk@kernel.org>

sctp: make sctp_outq_flush() static

sctp_outq_flush() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2e3216cd 19-Jun-2008 Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Follow security requirement of responding with 1 packet

RFC 4960, Section 11.4. Protection of Non-SCTP-Capable Hosts

When an SCTP stack receives a packet containing multiple control or
DATA chunks and the processing of the packet requires the sending of
multiple chunks in response, the sender of the response chunk(s) MUST
NOT send more than one packet. If bundling is supported, multiple
response chunks that fit into a single packet MAY be bundled together
into one single response packet. If bundling is not supported, then
the sender MUST NOT send more than one response chunk and MUST
discard all other responses. Note that this rule does NOT apply to a
SACK chunk, since a SACK chunk is, in itself, a response to DATA and
a SACK does not require a response of more DATA.

We implement this by not servicing our outqueue until we reach the end
of the packet. This enables maximum bundling. We also identify
'response' chunks and make sure that we only send 1 packet when sending
such chunks.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8b750ce5 04-Jun-2008 Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Flush the queue only once during fast retransmit.

When fast retransmit is triggered by a sack, we should flush the queue
only once so that only 1 retransmit happens. Also, since we could
potentially have non-fast-rtx chunks on the retransmit queue, we need
make sure any chunks eligable for fast retransmit are sent first
during fast retransmission.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 62aeaff5 04-Jun-2008 Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Start T3-RTX timer when fast retransmitting lowest TSN

When we are trying to fast retransmit the lowest outstanding TSN, we
need to restart the T3-RTX timer, so that subsequent timeouts will
correctly tag all the packets necessary for retransmissions.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8b73a07c 17-Apr-2008 Gui Jianfeng <guijianfeng@cn.fujitsu.com>

SCTP: Initialize partial_bytes_acked to 0, when all of the data is acked.

According to RFC4960 7.2.2,
When all of the data transmitted by the sender has
been acknowledged by the recerver, partial_bytes_acked is initialized to 0.

This patch conforms to rfc requirement.
Without this fix, cwnd might be error incremented.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9dbc15f0 12-Apr-2008 Robert P. J. Day <rpjday@crashcourse.ca>

[SCTP]: "list_for_each()" -> "list_for_each_entry()" where appropriate.

Replacing (almost) all invocations of list_for_each() with
list_for_each_entry() tightens up the code and allows for the deletion
of numerous list iterator variables that are no longer necessary.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f4ad85ca 12-Apr-2008 Gui Jianfeng <guijianfeng@cn.fujitsu.com>

[SCTP]: Fix protocol violation when receiving an error lenght INIT-ACK

When receiving an error length INIT-ACK during COOKIE-WAIT,
a 0-vtag ABORT will be responsed. This action violates the
protocol apparently. This patch achieves the following things.
1 If the INIT-ACK contains all the fixed parameters, use init-tag
recorded from INIT-ACK as vtag.
2 If the INIT-ACK doesn't contain all the fixed parameters,
just reflect its vtag.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0dc47877 05-Mar-2008 Harvey Harrison <harvey.harrison@gmail.com>

net: replace remaining __FUNCTION__ occurrences

__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 58fbbed4 29-Feb-2008 Neil Horman <nhorman@tuxdriver.com>

[SCTP]: extend exported data in /proc/net/sctp/assoc

RFC 3873 specifies several MIB objects that can't be obtained by the
current data set exported by /proc/sys/net/sctp/assoc. This patch
adds the missing pieces of data that allow us to compute all the
objects in the sctpAssocTable object.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5f9646c3 05-Feb-2008 Vlad Yasevich <vladislav.yasevich@hp.com>

[SCTP]: Make sure the chunk is off the transmitted list prior to freeing.

In a few instances, we need to remove the chunk from the transmitted list
prior to freeing it. This is because the free code doesn't do that any
more and so we need to do it manually.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# 60c778b2 11-Jan-2008 Vlad Yasevich <vladislav.yasevich@hp.com>

[SCTP]: Stop claiming that this is a "reference implementation"

I was notified by Randy Stewart that lksctp claims to be
"the reference implementation". First of all, "the
refrence implementation" was the original implementation
of SCTP in usersapce written ty Randy and a few others.
Second, after looking at the definiton of 'reference implementation',
we don't really meet the requirements.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# a08de64d 20-Dec-2007 Vlad Yasevich <vladislav.yasevich@hp.com>

[SCTP]: Update ASCONF processing to conform to spec.

The processing of the ASCONF chunks has changed a lot in the
spec. New items are:
1. A list of ASCONF-ACK chunks is now cached
2. The source of the packet is used in response.
3. New handling for unexpect ASCONF chunks.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7d54dc68 09-Nov-2007 Vlad Yasevich <vladislav.yasevich@hp.com>

SCTP: Always flush the queue when uncorcking.

When the code calls uncork, trigger a queue flush, even
if the queue was not corked. Most callers that explicitely
cork the queue will have additinal checks to see if they
corked it. Callers who do not cork the queue expect packets
to flow when they call uncork.

The scneario that showcased this bug happend when we were not
able to bundle DATA with outgoing COOKIE-ECHO. As a result
the data just sat in the outqueue and did not get transmitted.
The application expected a response, but nothing happened.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# b6157d8e 24-Oct-2007 Vlad Yasevich <vladislav.yasevich@hp.com>

SCTP: Fix difference cases of retransmit.

Commit d0ce92910bc04e107b2f3f2048f07e94f570035d broke several retransmit
cases including fast retransmit. The reason is that we should
only delay by rto while doing retranmists as a result of a timeout.
Retransmit as a result of path mtu discover, fast retransmit, or
other evernts that should trigger immidiate retransmissions got broken.

Also, since rto is doubled prior to marking of packets elegable for
retransmission, we never marked correct chunks anyway.

The fix is provide a reason for a given retransmission so that we
can mark chunks appropriately and to save the old rto value to do
comparisons against.

All regressions tests passed with this code.

Spotted by Wei Yongjun <yjwei@cn.fujitsu.com>

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# 64b0812b 14-Oct-2007 Wei Yongjun <yjwei@cn.fujitsu.com>

SCTP : Fix bad formatted comment in outqueue.c

Just fix the bad format of the comment in outqueue.c.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# d0ce9291 24-Aug-2007 Vlad Yasevich <vladislav.yasevich@hp.com>

SCTP: Do not retransmit chunks that are newer then rtt.

When performing a retransmit, do not include the chunk if
it was sent less then 1 rtt ago. The reason is that we
may receive the SACK very soon and wouldn't retransmit.
Suggested by Randy Stewart.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>


# 3ff50b79 20-Apr-2007 Stephen Hemminger <shemminger@linux-foundation.org>

[NET]: cleanup extra semicolons

Spring cleaning time...

There seems to be a lot of places in the network code that have
extra bogus semicolons after conditionals. Most commonly is a
bogus semicolon after: switch() { }

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8c4a2d41 21-Feb-2007 Vlad Yasevich <vladislav.yasevich@hp.com>

[SCTP]: Fix connection hang/slowdown with PR-SCTP

The problem that this patch corrects happens when all of the following
conditions are satisfisfied:
1. PR-SCTP is used and the timeout on the chunks is set below RTO.Max.
2. One of the paths on a multihomed associations is brought down.

In this scenario, data will expire within the rto of the initial
transmission and will never be retransmitted. However this data still
fills the send buffer and is counted against the association as outstanding
data. This causes any new data not to be sent and retransmission to not
happen.

The fix is to discount the abandoned data from the outstanding count and
peers rwnd estimation. This allows new data to be sent and a retransmission
timer restarted. Even though this new data will most likely expire within
the rto, the timer still counts as a strike against the transport and forces
the FORWARD-TSN chunk to be retransmitted as well.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d808ad9a 09-Feb-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[NET] SCTP: Fix whitespace errors.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 34bcca28 20-Nov-2006 Al Viro <viro@zeniv.linux.org.uk>

[SCTP]: Even more trivial sctp annotations.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9f81bcd9 20-Nov-2006 Al Viro <viro@zeniv.linux.org.uk>

[SCTP]: More trivial sctp annotations.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cd497885 29-Sep-2006 Sridhar Samudrala <sri@us.ibm.com>

[SCTP]: Include sk_buff overhead while updating the peer's receive window.

Currently if the sender is sending small messages, it can cause a receiver
to run out of receive buffer space even when the advertised receive window
is still open and results in packet drops and retransmissions. Including
a overhead while updating the sender's view of peer receive window will
reduce the chances of receive buffer space overshooting the receive window.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ac0b0462 22-Aug-2006 Sridhar Samudrala <sri@us.ibm.com>

[SCTP]: Extend /proc/net/sctp/snmp to provide more statistics.

This patch adds more statistics info under /proc/net/sctp/snmp
that should be useful for debugging. The additional events that
are counted now include timer expirations, retransmits, packet
and data chunk discards.

The Data chunk discards include all the cases where a data chunk
is discarded including high tsn, bad stream, dup tsn and the most
useful one(out of receive buffer/rwnd).

Also moved the SCTP MIB data structures from the generic include
directories to include/sctp/sctp.h.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ad8fec17 21-Jul-2006 Sridhar Samudrala <sri@us.ibm.com>

[SCTP]: Verify all the paths to a peer via heartbeat before using them.

This patch implements Path Initialization procedure as described in
Sec 2.36 of RFC4460.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4c9f5d53 17-Jun-2006 Vlad Yasevich <vladislav.yasevich@hp.com>

[SCTP] Reset rtt_in_progress for the chunk when processing its sack.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 27852c26 02-Feb-2006 Vlad Yasevich <vladislav.yasevich@hp.com>

[SCTP]: Fix 'fast retransmit' to send a TSN only once.

SCTP used to "fast retransmit" a TSN every time we hit the number
of missing reports for the TSN. However the Implementers Guide
specifies that we should only "fast retransmit" a given TSN once.
Subsequent retransmits should be timeouts only. Also change the
number of missing reports to 3 as per the latest IG(similar to TCP).

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 79af02c2 08-Jul-2005 David S. Miller <davem@davemloft.net>

[SCTP]: Use struct list_head for chunk lists, not sk_buff_head.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 3f7a87d2 20-Jun-2005 Frank Filz <ffilzlnx@us.ibm.com>

[SCTP] sctp_connectx() API support

Implements sctp_connectx() as defined in the SCTP sockets API draft by
tunneling the request through a setsockopt().

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!