History log of /linux-master/net/tipc/bcast.c
Revision Date Author Comments
# 0c6de0c9 28-Jun-2021 Menglong Dong <dong.menglong@zte.com.cn>

net: tipc: fix FB_MTU eat two pages

FB_MTU is used in 'tipc_msg_build()' to alloc smaller skb when memory
allocation fails, which can avoid unnecessary sending failures.

The value of FB_MTU now is 3744, and the data size will be:

(3744 + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) + \
SKB_DATA_ALIGN(BUF_HEADROOM + BUF_TAILROOM + 3))

which is larger than one page(4096), and two pages will be allocated.

To avoid it, replace '3744' with a calculation:

(PAGE_SIZE - SKB_DATA_ALIGN(BUF_OVERHEAD) - \
SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))

What's more, alloc_skb_fclone() will call SKB_DATA_ALIGN for data size,
and it's not necessary to make alignment for buf_size in
tipc_buf_acquire(). So, just remove it.

Fixes: 4c94cc2d3d57 ("tipc: fall back to smaller MTU if allocation of local send skb fails")
Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ec78e318 15-Oct-2020 Hoang Huu Le <hoang.h.le@dektech.com.au>

tipc: fix incorrect setting window for bcast link

In commit 16ad3f4022bb
("tipc: introduce variable window congestion control"), we applied
the algorithm to select window size from minimum window to the
configured maximum window for unicast link, and, besides we chose
to keep the window size for broadcast link unchanged and equal (i.e
fix window 50)

However, when setting maximum window variable via command, the window
variable was re-initialized to unexpect value (i.e 32).

We fix this by updating the fix window for broadcast as we stated.

Fixes: 16ad3f4022bb ("tipc: introduce variable window congestion control")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 75cee397 15-Oct-2020 Hoang Huu Le <hoang.h.le@dektech.com.au>

tipc: re-configure queue limit for broadcast link

The queue limit of the broadcast link is being calculated base on initial
MTU. However, when MTU value changed (e.g manual changing MTU on NIC
device, MTU negotiation etc.,) we do not re-calculate queue limit.
This gives throughput does not reflect with the change.

So fix it by calling the function to re-calculate queue limit of the
broadcast link.

Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# cad2929d 17-Jun-2020 Hoang Huu Le <hoang.h.le@dektech.com.au>

tipc: update a binding service via broadcast

Currently, updating binding table (add service binding to
name table/withdraw a service binding) is being sent over replicast.
However, if we are scaling up clusters to > 100 nodes/containers this
method is less affection because of looping through nodes in a cluster one
by one.

It is worth to use broadcast to update a binding service. This way, the
binding table can be updated on all peer nodes in one shot.

Broadcast is used when all peer nodes, as indicated by a new capability
flag TIPC_NAMED_BCAST, support reception of this message type.

Four problems need to be considered when introducing this feature.
1) When establishing a link to a new peer node we still update this by a
unicast 'bulk' update. This may lead to race conditions, where a later
broadcast publication/withdrawal bypass the 'bulk', resulting in
disordered publications, or even that a withdrawal may arrive before the
corresponding publication. We solve this by adding an 'is_last_bulk' bit
in the last bulk messages so that it can be distinguished from all other
messages. Only when this message has arrived do we open up for reception
of broadcast publications/withdrawals.

2) When a first legacy node is added to the cluster all distribution
will switch over to use the legacy 'replicast' method, while the
opposite happens when the last legacy node leaves the cluster. This
entails another risk of message disordering that has to be handled. We
solve this by adding a sequence number to the broadcast/replicast
messages, so that disordering can be discovered and corrected. Note
however that we don't need to consider potential message loss or
duplication at this protocol level.

3) Bulk messages don't contain any sequence numbers, and will always
arrive in order. Hence we must exempt those from the sequence number
control and deliver them unconditionally. We solve this by adding a new
'is_bulk' bit in those messages so that they can be recognized.

4) Legacy messages, which don't contain any new bits or sequence
numbers, but neither can arrive out of order, also need to be exempt
from the initial synchronization and sequence number check, and
delivered unconditionally. Therefore, we add another 'is_not_legacy' bit
to all new messages so that those can be distinguished from legacy
messages and the latter delivered directly.

v1->v2:
- fix warning issue reported by kbuild test robot <lkp@intel.com>
- add santiy check to drop the publication message with a sequence
number that is lower than the agreed synch point

Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 03b6fefd 26-May-2020 Tuong Lien <tuong.t.lien@dektech.com.au>

tipc: add support for broadcast rcv stats dumping

This commit enables dumping the statistics of a broadcast-receiver link
like the traditional 'broadcast-link' one (which is for broadcast-
sender). The link dumping can be triggered via netlink (e.g. the
iproute2/tipc tool) by the link flag - 'TIPC_NLA_LINK_BROADCAST' as the
indicator.

The name of a broadcast-receiver link of a specific peer will be in the
format: 'broadcast-link:<peer-id>'.

For example:

Link <broadcast-link:1001002>
Window:50 packets
RX packets:7841 fragments:2408/440 bundles:0/0
TX packets:0 fragments:0/0 bundles:0/0
RX naks:0 defs:124 dups:0
TX naks:21 acks:0 retrans:0
Congestion link:0 Send queue max:0 avg:0

In addition, the broadcast-receiver link statistics can be reset in the
usual way via netlink by specifying that link name in command.

Note: the 'tipc_link_name_ext()' is removed because the link name can
now be retrieved simply via the 'l->name'.

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a91d55d1 26-May-2020 Tuong Lien <tuong.t.lien@dektech.com.au>

tipc: enable broadcast retrans via unicast

In some environment, broadcast traffic is suppressed at high rate (i.e.
a kind of bandwidth limit setting). When it is applied, TIPC broadcast
can still run successfully. However, when it comes to a high load, some
packets will be dropped first and TIPC tries to retransmit them but the
packet retransmission is intentionally broadcast too, so making things
worse and not helpful at all.

This commit enables the broadcast retransmission via unicast which only
retransmits packets to the specific peer that has really reported a gap
i.e. not broadcasting to all nodes in the cluster, so will prevent from
being suppressed, and also reduce some overheads on the other peers due
to duplicates, finally improve the overall TIPC broadcast performance.

Note: the functionality can be turned on/off via the sysctl file:

echo 1 > /proc/sys/net/tipc/bc_retruni
echo 0 > /proc/sys/net/tipc/bc_retruni

Default is '0', i.e. the broadcast retransmission still works as usual.

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d7626b5a 26-May-2020 Tuong Lien <tuong.t.lien@dektech.com.au>

tipc: introduce Gap ACK blocks for broadcast link

As achieved through commit 9195948fbf34 ("tipc: improve TIPC throughput
by Gap ACK blocks"), we apply the same mechanism for the broadcast link
as well. The 'Gap ACK blocks' data field in a 'PROTOCOL/STATE_MSG' will
consist of two parts built for both the broadcast and unicast types:

31 16 15 0
+-------------+-------------+-------------+-------------+
| bgack_cnt | ugack_cnt | len |
+-------------+-------------+-------------+-------------+ -
| gap | ack | |
+-------------+-------------+-------------+-------------+ > bc gacks
: : : |
+-------------+-------------+-------------+-------------+ -
| gap | ack | |
+-------------+-------------+-------------+-------------+ > uc gacks
: : : |
+-------------+-------------+-------------+-------------+ -

which is "automatically" backward-compatible.

We also increase the max number of Gap ACK blocks to 128, allowing upto
64 blocks per type (total buffer size = 516 bytes).

Besides, the 'tipc_link_advance_transmq()' function is refactored which
is applicable for both the unicast and broadcast cases now, so some old
functions can be removed and the code is optimized.

With the patch, TIPC broadcast is more robust regardless of packet loss
or disorder, latency, ... in the underlying network. Its performance is
boost up significantly.
For example, experiment with a 5% packet loss rate results:

$ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
real 0m 42.46s
user 0m 1.16s
sys 0m 17.67s

Without the patch:

$ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
real 8m 27.94s
user 0m 0.55s
sys 0m 2.38s

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dca4a17d 10-Dec-2019 Tuong Lien <tuong.t.lien@dektech.com.au>

tipc: fix potential hanging after b/rcast changing

In commit c55c8edafa91 ("tipc: smooth change between replicast and
broadcast"), we allow instant switching between replicast and broadcast
by sending a dummy 'SYN' packet on the last used link to synchronize
packets on the links. The 'SYN' message is an object of link congestion
also, so if that happens, a 'SOCK_WAKEUP' will be scheduled to be sent
back to the socket...
However, in that commit, we simply use the same socket 'cong_link_cnt'
counter for both the 'SYN' & normal payload message sending. Therefore,
if both the replicast & broadcast links are congested, the counter will
be not updated correctly but overwritten by the latter congestion.
Later on, when the 'SOCK_WAKEUP' messages are processed, the counter is
reduced one by one and eventually overflowed. Consequently, further
activities on the socket will only wait for the false congestion signal
to disappear but never been met.

Because sending the 'SYN' message is vital for the mechanism, it should
be done anyway. This commit fixes the issue by marking the message with
an error code e.g. 'TIPC_ERR_NO_PORT', so its sending should not face a
link congestion, there is no need to touch the socket 'cong_link_cnt'
either. In addition, in the event of any error (e.g. -ENOBUFS), we will
purge the entire payload message queue and make a return immediately.

Fixes: c55c8edafa91 ("tipc: smooth change between replicast and broadcast")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 16ad3f40 09-Dec-2019 Jon Maloy <jon.maloy@ericsson.com>

tipc: introduce variable window congestion control

We introduce a simple variable window congestion control for links.
The algorithm is inspired by the Reno algorithm, covering both 'slow
start', 'congestion avoidance', and 'fast recovery' modes.

- We introduce hard lower and upper window limits per link, still
different and configurable per bearer type.

- We introduce a 'slow start theshold' variable, initially set to
the maximum window size.

- We let a link start at the minimum congestion window, i.e. in slow
start mode, and then let is grow rapidly (+1 per rceived ACK) until
it reaches the slow start threshold and enters congestion avoidance
mode.

- In congestion avoidance mode we increment the congestion window for
each window-size number of acked packets, up to a possible maximum
equal to the configured maximum window.

- For each non-duplicate NACK received, we drop back to fast recovery
mode, by setting the both the slow start threshold to and the
congestion window to (current_congestion_window / 2).

- If the timeout handler finds that the transmit queue has not moved
since the previous timeout, it drops the link back to slow start
and forces a probe containing the last sent sequence number to the
sent to the peer, so that this can discover the stale situation.

This change does in reality have effect only on unicast ethernet
transport, as we have seen that there is no room whatsoever for
increasing the window max size for the UDP bearer.
For now, we also choose to keep the limits for the broadcast link
unchanged and equal.

This algorithm seems to give a 50-100% throughput improvement for
messages larger than MTU.

Suggested-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ba5f6a86 20-Nov-2019 Hoang Le <hoang.h.le@dektech.com.au>

tipc: update replicast capability for broadcast send link

When setting up a cluster with non-replicast/replicast capability
supported. This capability will be disabled for broadcast send link
in order to be backwards compatible.

However, when these non-support nodes left and be removed out the cluster.
We don't update this capability on broadcast send link. Then, some of
features that based on this capability will also disabling as unexpected.

In this commit, we make sure the broadcast send link capabilities will
be re-calculated as soon as a node removed/rejoined a cluster.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fc1b6d6d 07-Nov-2019 Tuong Lien <tuong.t.lien@dektech.com.au>

tipc: introduce TIPC encryption & authentication

This commit offers an option to encrypt and authenticate all messaging,
including the neighbor discovery messages. The currently most advanced
algorithm supported is the AEAD AES-GCM (like IPSec or TLS). All
encryption/decryption is done at the bearer layer, just before leaving
or after entering TIPC.

Supported features:
- Encryption & authentication of all TIPC messages (header + data);
- Two symmetric-key modes: Cluster and Per-node;
- Automatic key switching;
- Key-expired revoking (sequence number wrapped);
- Lock-free encryption/decryption (RCU);
- Asynchronous crypto, Intel AES-NI supported;
- Multiple cipher transforms;
- Logs & statistics;

Two key modes:
- Cluster key mode: One single key is used for both TX & RX in all
nodes in the cluster.
- Per-node key mode: Each nodes in the cluster has one specific TX key.
For RX, a node requires its peers' TX key to be able to decrypt the
messages from those peers.

Key setting from user-space is performed via netlink by a user program
(e.g. the iproute2 'tipc' tool).

Internal key state machine:

Attach Align(RX)
+-+ +-+
| V | V
+---------+ Attach +---------+
| IDLE |---------------->| PENDING |(user = 0)
+---------+ +---------+
A A Switch| A
| | | |
| | Free(switch/revoked) | |
(Free)| +----------------------+ | |Timeout
| (TX) | | |(RX)
| | | |
| | v |
+---------+ Switch +---------+
| PASSIVE |<----------------| ACTIVE |
+---------+ (RX) +---------+
(user = 1) (user >= 1)

The number of TFMs is 10 by default and can be changed via the procfs
'net/tipc/max_tfms'. At this moment, as for simplicity, this file is
also used to print the crypto statistics at runtime:

echo 0xfff1 > /proc/sys/net/tipc/max_tfms

The patch defines a new TIPC version (v7) for the encryption message (-
backward compatibility as well). The message is basically encapsulated
as follows:

+----------------------------------------------------------+
| TIPCv7 encryption | Original TIPCv2 | Authentication |
| header | packet (encrypted) | Tag |
+----------------------------------------------------------+

The throughput is about ~40% for small messages (compared with non-
encryption) and ~9% for large messages. With the support from hardware
crypto i.e. the Intel AES-NI CPU instructions, the throughput increases
upto ~85% for small messages and ~55% for large messages.

By default, the new feature is inactive (i.e. no encryption) until user
sets a key for TIPC. There is however also a new option - "TIPC_CRYPTO"
in the kernel configuration to enable/disable the new code when needed.

MAINTAINERS | add two new files 'crypto.h' & 'crypto.c' in tipc

Acked-by: Ying Xue <ying.xue@windreiver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e654f9f5 15-Aug-2019 Jon Maloy <jon.maloy@ericsson.com>

tipc: clean up skb list lock handling on send path

The policy for handling the skb list locks on the send and receive paths
is simple.

- On the send path we never need to grab the lock on the 'xmitq' list
when the destination is an exernal node.

- On the receive path we always need to grab the lock on the 'inputq'
list, irrespective of source node.

However, when transmitting node local messages those will eventually
end up on the receive path of a local socket, meaning that the argument
'xmitq' in tipc_node_xmit() will become the 'ínputq' argument in the
function tipc_sk_rcv(). This has been handled by always initializing
the spinlock of the 'xmitq' list at message creation, just in case it
may end up on the receive path later, and despite knowing that the lock
in most cases never will be used.

This approach is inaccurate and confusing, and has also concealed the
fact that the stated 'no lock grabbing' policy for the send path is
violated in some cases.

We now clean up this by never initializing the lock at message creation,
instead doing this at the moment we find that the message actually will
enter the receive path. At the same time we fix the four locations
where we incorrectly access the spinlock on the send/error path.

This patch also reverts commit d12cffe9329f ("tipc: ensure head->lock
is initialised") which has now become redundant.

CC: Eric Dumazet <edumazet@google.com>
Reported-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6c9081a3 06-Aug-2019 John Rutherford <john.rutherford@dektech.com.au>

tipc: add loopback device tracking

Since node internal messages are passed directly to the socket, it is not
possible to observe those messages via tcpdump or wireshark.

We now remedy this by making it possible to clone such messages and send
the clones to the loopback interface. The clones are dropped at reception
and have no functional role except making the traffic visible.

The feature is enabled if network taps are active for the loopback device.
pcap filtering restrictions require the messages to be presented to the
receiving side of the loopback device.

v3 - Function dev_nit_active used to check for network taps.
- Procedure netif_rx_ni used to send cloned messages to loopback device.

Signed-off-by: John Rutherford <john.rutherford@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a7dc51ad 25-Jun-2019 Jon Maloy <jon.maloy@ericsson.com>

tipc: rename function msg_get_wrapped() to msg_inner_hdr()

We rename the inline function msg_get_wrapped() to the more
comprehensible msg_inner_hdr().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e1279ff7 03-Apr-2019 Hoang Le <hoang.h.le@dektech.com.au>

tipc: add NULL pointer check

skb somehow dequeued out of inputq before processing, it causes to
NULL pointer and kernel crashed.

Add checking skb valid before using.

Fixes: c55c8edafa9 ("tipc: smooth change between replicast and broadcast")
Reported-by: Tuong Lien Tong <tuong.t.lien@dektech.com.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6da88a82 25-Mar-2019 Wei Yongjun <weiyongjun1@huawei.com>

tipc: fix return value check in tipc_mcast_send_sync()

Fix the return value check which testing the wrong variable
in tipc_mcast_send_sync().

Fixes: c55c8edafa91 ("tipc: smooth change between replicast and broadcast")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 08e046c8 21-Mar-2019 Hoang Le <hoang.h.le@dektech.com.au>

tipc: fix a null pointer deref

In commit c55c8edafa91 ("tipc: smooth change between replicast and
broadcast") we introduced new method to eliminate the risk of message
reordering that happen in between different nodes.
Unfortunately, we forgot checking at receiving side to ignore intra node.

We fix this by checking and returning if arrived message from intra node.

syzbot report:

==================================================================
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 7820 Comm: syz-executor418 Not tainted 5.0.0+ #61
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
RIP: 0010:tipc_mcast_filter_msg+0x21b/0x13d0 net/tipc/bcast.c:782
Code: 45 c0 0f 84 39 06 00 00 48 89 5d 98 e8 ce ab a5 fa 49 8d bc
24 c8 00 00 00 48 b9 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03
<80> 3c 08 00 0f 85 9a 0e 00 00 49 8b 9c 24 c8 00 00 00 48 be 00 00
RSP: 0018:ffff8880959defc8 EFLAGS: 00010202
RAX: 0000000000000019 RBX: ffff888081258a48 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffffffff86cab862 RDI: 00000000000000c8
RBP: ffff8880959df030 R08: ffff8880813d0200 R09: ffffed1015d05bc8
R10: ffffed1015d05bc7 R11: ffff8880ae82de3b R12: 0000000000000000
R13: 000000000000002c R14: 0000000000000000 R15: ffff888081258a48
FS: 000000000106a880(0000) GS:ffff8880ae800000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020001cc0 CR3: 0000000094a20000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tipc_sk_filter_rcv+0x182d/0x34f0 net/tipc/socket.c:2168
tipc_sk_enqueue net/tipc/socket.c:2254 [inline]
tipc_sk_rcv+0xc45/0x25a0 net/tipc/socket.c:2305
tipc_sk_mcast_rcv+0x724/0x1020 net/tipc/socket.c:1209
tipc_mcast_xmit+0x7fe/0x1200 net/tipc/bcast.c:410
tipc_sendmcast+0xb36/0xfc0 net/tipc/socket.c:820
__tipc_sendmsg+0x10df/0x18d0 net/tipc/socket.c:1358
tipc_sendmsg+0x53/0x80 net/tipc/socket.c:1291
sock_sendmsg_nosec net/socket.c:651 [inline]
sock_sendmsg+0xdd/0x130 net/socket.c:661
___sys_sendmsg+0x806/0x930 net/socket.c:2260
__sys_sendmsg+0x105/0x1d0 net/socket.c:2298
__do_sys_sendmsg net/socket.c:2307 [inline]
__se_sys_sendmsg net/socket.c:2305 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2305
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4401c9
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8
48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05
<48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffd887fa9d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004401c9
RDX: 0000000000000000 RSI: 0000000020002140 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401a50
R13: 0000000000401ae0 R14: 0000000000000000 R15: 0000000000000000
Modules linked in:
---[ end trace ba79875754e1708f ]---

Reported-by: syzbot+be4bdf2cc3e85e952c50@syzkaller.appspotmail.com
Fixes: c55c8eda ("tipc: smooth change between replicast and broadcast")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c55c8eda 19-Mar-2019 Hoang Le <hoang.h.le@dektech.com.au>

tipc: smooth change between replicast and broadcast

Currently, a multicast stream may start out using replicast, because
there are few destinations, and then it should ideally switch to
L2/broadcast IGMP/multicast when the number of destinations grows beyond
a certain limit. The opposite should happen when the number decreases
below the limit.

To eliminate the risk of message reordering caused by method change,
a sending socket must stick to a previously selected method until it
enters an idle period of 5 seconds. Means there is a 5 seconds pause
in the traffic from the sender socket.

If the sender never makes such a pause, the method will never change,
and transmission may become very inefficient as the cluster grows.

With this commit, we allow such a switch between replicast and
broadcast without any need for a traffic pause.

Solution is to send a dummy message with only the header, also with
the SYN bit set, via broadcast or replicast. For the data message,
the SYN bit is set and sending via replicast or broadcast (inverse
method with dummy).

Then, at receiving side any messages follow first SYN bit message
(data or dummy message), they will be held in deferred queue until
another pair (dummy or data message) arrived in other link.

v2: reverse christmas tree declaration

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 02ec6caf 19-Mar-2019 Hoang Le <hoang.h.le@dektech.com.au>

tipc: support broadcast/replicast configurable for bc-link

Currently, a multicast stream uses either broadcast or replicast as
transmission method, based on the ratio between number of actual
destinations nodes and cluster size.

However, when an L2 interface (e.g., VXLAN) provides pseudo
broadcast support, this becomes very inefficient, as it blindly
replicates multicast packets to all cluster/subnet nodes,
irrespective of whether they host actual target sockets or not.

The TIPC multicast algorithm is able to distinguish real destination
nodes from other nodes, and hence provides a smarter and more
efficient method for transferring multicast messages than
pseudo broadcast can do.

Because of this, we now make it possible for users to force
the broadcast link to permanently switch to using replicast,
irrespective of which capabilities the bearer provides,
or pretend to provide.
Conversely, we also make it possible to force the broadcast link
to always use true broadcast. While maybe less useful in
deployed systems, this may at least be useful for testing the
broadcast algorithm in small clusters.

We retain the current AUTOSELECT ability, i.e., to let the broadcast link
automatically select which algorithm to use, and to switch back and forth
between broadcast and replicast as the ratio between destination
node number and cluster size changes. This remains the default method.

Furthermore, we make it possible to configure the threshold ratio for
such switches. The default ratio is now set to 10%, down from 25% in the
earlier implementation.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9cc1bf39 03-Sep-2018 Zhenbo Gao <zhenbo.gao@windriver.com>

tipc: correct spelling errors for struct tipc_bc_base's comment

Trivial fix for two spelling mistakes.

Signed-off-by: Zhenbo Gao <zhenbo.gao@windriver.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a0732548 27-Jul-2018 Jia-Ju Bai <baijiaju1990@gmail.com>

net: tipc: bcast: Replace GFP_ATOMIC with GFP_KERNEL in tipc_bcast_init()

tipc_bcast_init() is never called in atomic context.
It calls kzalloc() with GFP_ATOMIC, which is not necessary.
GFP_ATOMIC can be replaced with GFP_KERNEL.

This is found by a static analysis tool named DCNS written by myself.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c9efb15a 05-Mar-2018 Gustavo A. R. Silva <garsilva@embeddedor.com>

tipc: bcast: use true and false for boolean values

Assign true or false to boolean variables instead of an integer value.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4c94cc2d 30-Nov-2017 Jon Maloy <jon.maloy@ericsson.com>

tipc: fall back to smaller MTU if allocation of local send skb fails

When sending node local messages the code is using an 'mtu' of 66060
bytes to avoid unnecessary fragmentation. During situations of low
memory tipc_msg_build() may sometimes fail to allocate such large
buffers, resulting in unnecessary send failures. This can easily be
remedied by falling back to a smaller MTU, and then reassemble the
buffer chain as if the message were arriving from a remote node.

At the same time, we change the initial MTU setting of the broadcast
link to a lower value, so that large messages always are fragmented
into smaller buffers even when we run in single node mode. Apart from
obtaining the same advantage as for the 'fallback' solution above, this
turns out to give a significant performance improvement. This can
probably be explained with the __pskb_copy() operation performed on the
buffer for each recipient during reception. We found the optimal value
for this, considering the most relevant skb pool, to be 3744 bytes.

Acked-by: Ying Xue <ying.xue@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a80ae530 13-Oct-2017 Jon Maloy <jon.maloy@ericsson.com>

tipc: improve destination linked list

We often see a need for a linked list of destination identities,
sometimes containing a port number, sometimes a node identity, and
sometimes both. The currently defined struct u32_list is not generic
enough to cover all cases, so we extend it to contain two u32 integers
and rename it to struct tipc_dest_list.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3382605f 07-Oct-2017 Jon Maloy <jon.maloy@ericsson.com>

tipc: correct initialization of skb list

We change the initialization of the skb transmit buffer queues
in the functions tipc_bcast_xmit() and tipc_rcast_xmit() to also
initialize their spinlocks. This is needed because we may, during
error conditions, need to call skb_queue_purge() on those queues
further down the stack.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 01fd12bb 18-Jan-2017 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: make replicast a user selectable option

If the bearer carrying multicast messages supports broadcast, those
messages will be sent to all cluster nodes, irrespective of whether
these nodes host any actual destinations socket or not. This is clearly
wasteful if the cluster is large and there are only a few real
destinations for the message being sent.

In this commit we extend the eligibility of the newly introduced
"replicast" transmit option. We now make it possible for a user to
select which method he wants to be used, either as a mandatory setting
via setsockopt(), or as a relative setting where we let the broadcast
layer decide which method to use based on the ratio between cluster
size and the message's actual number of destination nodes.

In the latter case, a sending socket must stick to a previously
selected method until it enters an idle period of at least 5 seconds.
This eliminates the risk of message reordering caused by method change,
i.e., when changes to cluster size or number of destinations would
otherwise mandate a new method to be used.

Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a853e4c6 18-Jan-2017 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: introduce replicast as transport option for multicast

TIPC multicast messages are currently carried over a reliable
'broadcast link', making use of the underlying media's ability to
transport packets as L2 broadcast or IP multicast to all nodes in
the cluster.

When the used bearer is lacking that ability, we can instead emulate
the broadcast service by replicating and sending the packets over as
many unicast links as needed to reach all identified destinations.
We now introduce a new TIPC link-level 'replicast' service that does
this.

Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2ae0b8af 18-Jan-2017 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: add functionality to lookup multicast destination nodes

As a further preparation for the upcoming 'replicast' functionality,
we add some necessary structs and functions for looking up and returning
a list of all nodes that host destinations for a given multicast message.

Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9999974a 18-Jan-2017 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: add function for checking broadcast support in bearer

As a preparation for the 'replicast' functionality we are going to
introduce in the next commits, we need the broadcast base structure to
store whether bearer broadcast is available at all from the currently
used bearer or bearers.

We do this by adding a new function tipc_bearer_bcast_support() to
the bearer layer, and letting the bearer selection function in
bcast.c use this to give a new boolean field, 'bcast_support' the
appropriate value.

Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 365ad353 03-Jan-2017 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: reduce risk of user starvation during link congestion

The socket code currently handles link congestion by either blocking
and trying to send again when the congestion has abated, or just
returning to the user with -EAGAIN and let him re-try later.

This mechanism is prone to starvation, because the wakeup algorithm is
non-atomic. During the time the link issues a wakeup signal, until the
socket wakes up and re-attempts sending, other senders may have come
in between and occupied the free buffer space in the link. This in turn
may lead to a socket having to make many send attempts before it is
successful. In extremely loaded systems we have observed latency times
of several seconds before a low-priority socket is able to send out a
message.

In this commit, we simplify this mechanism and reduce the risk of the
described scenario happening. When a message is attempted sent via a
congested link, we now let it be added to the link's backlog queue
anyway, thus permitting an oversubscription of one message per source
socket. We still create a wakeup item and return an error code, hence
instructing the sender to block or stop sending. Only when enough space
has been freed up in the link's backlog queue do we issue a wakeup event
that allows the sender to continue with the next message, if any.

The fact that a socket now can consider a message sent even when the
link returns a congestion code means that the sending socket code can
be simplified. Also, since this is a good opportunity to get rid of the
obsolete 'mtu change' condition in the three socket send functions, we
now choose to refactor those functions completely.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 06bd2b1e 27-Oct-2016 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: fix broadcast link synchronization problem

In commit 2d18ac4ba745 ("tipc: extend broadcast link initialization
criteria") we tried to fix a problem with the initial synchronization
of broadcast link acknowledge values. Unfortunately that solution is
not sufficient to solve the issue.

We have seen it happen that LINK_PROTOCOL/STATE packets with a valid
non-zero unicast acknowledge number may bypass BCAST_PROTOCOL
initialization, NAME_DISTRIBUTOR and other STATE packets with invalid
broadcast acknowledge numbers, leading to premature opening of the
broadcast link. When the bypassed packets finally arrive, they are
inadvertently accepted, and the already correctly initialized
acknowledge number in the broadcast receive link is overwritten by
the invalid (zero) value of the said packets. After this the broadcast
link goes stale.

We now fix this by marking the packets where we know the acknowledge
value is or may be invalid, and then ignoring the acks from those.

To this purpose, we claim an unused bit in the header to indicate that
the value is invalid. We set the bit to 1 in the initial BCAST_PROTOCOL
synchronization packet and all initial ("bulk") NAME_DISTRIBUTOR
packets, plus those LINK_PROTOCOL packets sent out before the broadcast
links are fully synchronized.

This minor protocol update is fully backwards compatible.

Reported-by: John Thompson <thompa.atl@gmail.com>
Tested-by: John Thompson <thompa.atl@gmail.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 02d11ca2 01-Sep-2016 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: transfer broadcast nacks in link state messages

When we send broadcasts in clusters of more 70-80 nodes, we sometimes
see the broadcast link resetting because of an excessive number of
retransmissions. This is caused by a combination of two factors:

1) A 'NACK crunch", where loss of broadcast packets is discovered
and NACK'ed by several nodes simultaneously, leading to multiple
redundant broadcast retransmissions.

2) The fact that the NACKS as such also are sent as broadcast, leading
to excessive load and packet loss on the transmitting switch/bridge.

This commit deals with the latter problem, by moving sending of
broadcast nacks from the dedicated BCAST_PROTOCOL/NACK message type
to regular unicast LINK_PROTOCOL/STATE messages. We allocate 10 unused
bits in word 8 of the said message for this purpose, and introduce a
new capability bit, TIPC_BCAST_STATE_NACK in order to keep the change
backwards compatible.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e74a386d 03-Mar-2016 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: remove pre-allocated message header in link struct

Until now, we have kept a pre-allocated protocol message header
aggregated into struct tipc_link. Apart from adding unnecessary
footprint to the link instances, this requires extra code both to
initialize and re-initialize it.

We now remove this sub-optimization. This change also makes it
possible to clean up the function tipc_build_proto_msg() and remove
a couple of small functions that were accessing the mentioned header.
In particular, we can replace all occurrences of the local function
call link_own_addr(link) with the generic tipc_own_addr(net).

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 38206d59 19-Nov-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: narrow down interface towards struct tipc_link

We move the definition of struct tipc_link from link.h to link.c in
order to minimize its exposure to the rest of the code.

When needed, we define new functions to make it possible for external
entities to access and set data in the link.

Apart from the above, there are no functional changes.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2af5ae37 22-Oct-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: clean up unused code and structures

After the previous changes in this series, we can now remove some
unused code and structures, both in the broadcast, link aggregation
and link code.

There are no functional changes in this commit.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c72fa872 22-Oct-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: eliminate link's reference to owner node

With the recent commit series, we have established a one-way dependency
between the link aggregation (struct tipc_node) instances and their
pertaining tipc_link instances. This has enabled quite significant code
and structure simplifications.

In this commit, we eliminate the field 'owner', which points to an
instance of struct tipc_node, from struct tipc_link, and replace it with
a pointer to struct net, which is the only external reference now needed
by a link instance.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 959e1781 22-Oct-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: introduce jumbo frame support for broadcast

Until now, we have only been supporting a fix MTU size of 1500 bytes
for all broadcast media, irrespective of their actual capability.

We now make the broadcast MTU adaptable to the carrying media, i.e.,
we use the smallest MTU supported by any of the interfaces attached
to TIPC.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b06b281e 22-Oct-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: simplify bearer level broadcast

Until now, we have been keeping track of the exact set of broadcast
destinations though the help structure tipc_node_map. This leads us to
have to maintain a whole infrastructure for supporting this, including
a pseudo-bearer and a number of functions to manipulate both the bearers
and the node map correctly. Apart from the complexity, this approach is
also limiting, as struct tipc_node_map only can support cluster local
broadcast if we want to avoid it becoming excessively large. We want to
eliminate this limitation, in order to enable introduction of scoped
multicast in the future.

A closer analysis reveals that it is unnecessary maintaining this "full
set" overview; it is sufficient to keep a counter per bearer, indicating
how many nodes can be reached via this bearer at the moment. The protocol
is now robust enough to handle transitional discrepancies between the
nominal number of reachable destinations, as expected by the broadcast
protocol itself, and the number which is actually reachable at the
moment. The initial broadcast synchronization, in conjunction with the
retransmission mechanism, ensures that all packets will eventually be
acknowledged by the correct set of destinations.

This commit introduces these changes.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 52666986 22-Oct-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: let broadcast packet reception use new link receive function

The code path for receiving broadcast packets is currently distinct
from the unicast path. This leads to unnecessary code and data
duplication, something that can be avoided with some effort.

We now introduce separate per-peer tipc_link instances for handling
broadcast packet reception. Each receive link keeps a pointer to the
common, single, broadcast link instance, and can hence handle release
and retransmission of send buffers as if they belonged to the own
instance.

Furthermore, we let each unicast link instance keep a reference to both
the pertaining broadcast receive link, and to the common send link.
This makes it possible for the unicast links to easily access data for
broadcast link synchronization, as well as for carrying acknowledges for
received broadcast packets.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fd556f20 22-Oct-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: introduce capability bit for broadcast synchronization

Until now, we have tried to support both the newer, dedicated broadcast
synchronization mechanism along with the older, less safe, RESET_MSG/
ACTIVATE_MSG based one. The latter method has turned out to be a hazard
in a highly dynamic cluster, so we find it safer to disable it completely
when we find that the former mechanism is supported by the peer node.

For this purpose, we now introduce a new capabability bit,
TIPC_BCAST_SYNCH, to inform any peer nodes that dedicated broadcast
syncronization is supported by the present node. The new bit is conveyed
between peers in the 'capabilities' field of neighbor discovery messages.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2f566124 22-Oct-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: let broadcast transmission use new link transmit function

This commit simplifies the broadcast link transmission function, by
leveraging previous changes to the link transmission function and the
broadcast transmission link life cycle.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 32301906 22-Oct-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: use explicit allocation of broadcast send link

The broadcast link instance (struct tipc_link) used for sending is
currently aggregated into struct tipc_bclink. This means that we cannot
use the regular tipc_link_create() function for initiating the link, but
do instead have to initiate numerous fields directly from the
bcast_init() function.

We want to reduce dependencies between the broadcast functionality
and the inner workings of tipc_link. In this commit, we introduce
a new function tipc_bclink_create() to link.c, and allocate the
instance of the link separately using this function.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5fd9fd63 22-Oct-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: create broadcast transmission link at namespace init

The broadcast transmission link is currently instantiated when the
network subsystem is started, i.e., on order from user space via netlink.

This forces the broadcast transmission code to do unnecessary tests for
the existence of the transmission link, as well in single mode node as
in network mode.

In this commit, we do instead create the link during initialization of
the name space, and remove it when it is stopped. The fact that the
transmission link now has a guaranteed longer life cycle than any of its
potential clients paves the way for further code simplifcations
and optimizations.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0043550b 22-Oct-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: move broadcast link lock to struct tipc_net

The broadcast lock will need to be acquired outside bcast.c in a later
commit. For this reason, we move the lock to struct tipc_net. Consistent
with the changes in the previous commit, we also introducee two new
functions tipc_bcast_lock() and tipc_bcast_unlock(). The code that is
currently using tipc_bclink_lock()/unlock() will be phased out during
the coming commits in this series.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6beb19a6 22-Oct-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: move bcast definitions to bcast.c

Currently, a number of structure and function definitions related
to the broadcast functionality are unnecessarily exposed in the file
bcast.h. This obscures the fact that the external interface towards
the broadcast link in fact is very narrow, and causes unnecessary
recompilations of other files when anything changes in those
definitions.

In this commit, we move as many of those definitions as is currently
possible to the file bcast.c.

We also rename the structure 'tipc_bclink' to 'tipc_bc_base', both
since the name does not correctly describe the contents of this
struct, and will do so even less in the future, and because we want
to use the term 'link' more appropriately in the functionality
introduced later in this series.

Finally, we rename a couple of functions, such as tipc_bclink_xmit()
and others that will be kept in the future, to include the term 'bcast'
instead.

There are no functional changes in this commit.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 53387c4e 19-Oct-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: extend broadcast link window size

The default fix broadcast window size is currently set to 20 packets.
This is a very low value, set at a time when we were still testing on
10 Mb/s hubs, and a change to it is long overdue.

Commit 7845989cb4b3da1db ("net: tipc: fix stall during bclink wakeup procedure")
revealed a problem with this low value. For messages of importance LOW,
the backlog queue limit will be calculated to 30 packets, while a
single, maximum sized message of 66000 bytes, carried across a 1500 MTU
network consists of 46 packets.

This leads to the following scenario (among others leading to the same
situation):

1: Msg 1 of 46 packets is sent. 20 packets go to the transmit queue, 26
packets to the backlog queue.
2: Msg 2 of 46 packets is attempted sent, but rejected because there is
no more space in the backlog queue at this level. The sender is added
to the wakeup queue with a "pending packets chain size" number of 46.
3: Some packets in the transmit queue are acked and released. We try to
wake up the sender, but the pending size of 46 is bigger than the LOW
wakeup limit of 30, so this doesn't happen.
5: Subsequent acks releases all the remaining buffers. Each time we test
for the wakeup criteria and find that 46 still is larger than 30,
even after both the transmit and the backlog queues are empty.
6: The sender is never woken up and given a chance to send its message.
He is stuck.

We could now loosen the wakeup criteria (used by link_prepare_wakeup())
to become equal to the send criteria (used by tipc_link_xmit()), i.e.,
by ignoring the "pending packets chain size" value altogether, or we can
just increase the queue limits so that the criteria can be satisfied
anyway. There are good reasons (potentially multiple waiting senders) to
not opt for the former solution, so we choose the latter one.

This commit fixes the problem by giving the broadcast link window a
default value of 50 packets. We also introduce a new minimum link
window size BCLINK_MIN_WIN of 32, which is enough to always avoid the
described situation. Finally, in order to not break any existing users
which may set the window explicitly, we enforce that the window is set
to the new minimum value in case the user is trying to set it to
anything lower.

Fixes: 7845989cb4b3da1db ("net: tipc: fix stall during bclink wakeup procedure")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7845989c 07-Sep-2015 Kolmakov Dmitriy <kolmakov.dmitriy@huawei.com>

net: tipc: fix stall during bclink wakeup procedure

If an attempt to wake up users of broadcast link is made when there is
no enough place in send queue than it may hang up inside the
tipc_sk_rcv() function since the loop breaks only after the wake up
queue becomes empty. This can lead to complete CPU stall with the
following message generated by RCU:

INFO: rcu_sched self-detected stall on CPU { 0} (t=2101 jiffies
g=54225 c=54224 q=11465)
Task dump for CPU 0:
tpch R running task 0 39949 39948 0x0000000a
ffffffff818536c0 ffff88181fa037a0 ffffffff8106a4be 0000000000000000
ffffffff818536c0 ffff88181fa037c0 ffffffff8106d8a8 ffff88181fa03800
0000000000000001 ffff88181fa037f0 ffffffff81094a50 ffff88181fa15680
Call Trace:
<IRQ> [<ffffffff8106a4be>] sched_show_task+0xae/0x120
[<ffffffff8106d8a8>] dump_cpu_task+0x38/0x40
[<ffffffff81094a50>] rcu_dump_cpu_stacks+0x90/0xd0
[<ffffffff81097c3b>] rcu_check_callbacks+0x3eb/0x6e0
[<ffffffff8106e53f>] ? account_system_time+0x7f/0x170
[<ffffffff81099e64>] update_process_times+0x34/0x60
[<ffffffff810a84d1>] tick_sched_handle.isra.18+0x31/0x40
[<ffffffff810a851c>] tick_sched_timer+0x3c/0x70
[<ffffffff8109a43d>] __run_hrtimer.isra.34+0x3d/0xc0
[<ffffffff8109aa95>] hrtimer_interrupt+0xc5/0x1e0
[<ffffffff81030d52>] ? native_smp_send_reschedule+0x42/0x60
[<ffffffff81032f04>] local_apic_timer_interrupt+0x34/0x60
[<ffffffff810335bc>] smp_apic_timer_interrupt+0x3c/0x60
[<ffffffff8165a3fb>] apic_timer_interrupt+0x6b/0x70
[<ffffffff81659129>] ? _raw_spin_unlock_irqrestore+0x9/0x10
[<ffffffff8107eb9f>] __wake_up_sync_key+0x4f/0x60
[<ffffffffa313ddd1>] tipc_write_space+0x31/0x40 [tipc]
[<ffffffffa313dadf>] filter_rcv+0x31f/0x520 [tipc]
[<ffffffffa313d699>] ? tipc_sk_lookup+0xc9/0x110 [tipc]
[<ffffffff81659259>] ? _raw_spin_lock_bh+0x19/0x30
[<ffffffffa314122c>] tipc_sk_rcv+0x2dc/0x3e0 [tipc]
[<ffffffffa312e7ff>] tipc_bclink_wakeup_users+0x2f/0x40 [tipc]
[<ffffffffa313ce26>] tipc_node_unlock+0x186/0x190 [tipc]
[<ffffffff81597c1c>] ? kfree_skb+0x2c/0x40
[<ffffffffa313475c>] tipc_rcv+0x2ac/0x8c0 [tipc]
[<ffffffffa312ff58>] tipc_l2_rcv_msg+0x38/0x50 [tipc]
[<ffffffff815a76d3>] __netif_receive_skb_core+0x5a3/0x950
[<ffffffff815a98d3>] __netif_receive_skb+0x13/0x60
[<ffffffff815a993e>] netif_receive_skb_internal+0x1e/0x90
[<ffffffff815aa138>] napi_gro_receive+0x78/0xa0
[<ffffffffa07f93f4>] tg3_poll_work+0xc54/0xf40 [tg3]
[<ffffffff81597c8c>] ? consume_skb+0x2c/0x40
[<ffffffffa07f9721>] tg3_poll_msix+0x41/0x160 [tg3]
[<ffffffff815ab0f2>] net_rx_action+0xe2/0x290
[<ffffffff8104b92a>] __do_softirq+0xda/0x1f0
[<ffffffff8104bc26>] irq_exit+0x76/0xa0
[<ffffffff81004355>] do_IRQ+0x55/0xf0
[<ffffffff8165a12b>] common_interrupt+0x6b/0x6b
<EOI>

The issue occurs only when tipc_sk_rcv() is used to wake up postponed
senders:

tipc_bclink_wakeup_users()
// wakeupq - is a queue which consists of special
// messages with SOCK_WAKEUP type.
tipc_sk_rcv(wakeupq)
...
while (skb_queue_len(inputq)) {
filter_rcv(skb)
// Here the type of message is checked
// and if it is SOCK_WAKEUP then
// it tries to wake up a sender.
tipc_write_space(sk)
wake_up_interruptible_sync_poll()
}

After the sender thread is woke up it can gather control and perform
an attempt to send a message. But if there is no enough place in send
queue it will call link_schedule_user() function which puts a message
of type SOCK_WAKEUP to the wakeup queue and put the sender to sleep.
Thus the size of the queue actually is not changed and the while()
loop never exits.

The approach I proposed is to wake up only senders for which there is
enough place in send queue so the described issue can't occur.
Moreover the same approach is already used to wake up senders on
unicast links.

I have got into the issue on our product code but to reproduce the
issue I changed a benchmark test application (from
tipcutils/demos/benchmark) to perform the following scenario:
1. Run 64 instances of test application (nodes). It can be done
on the one physical machine.
2. Each application connects to all other using TIPC sockets in
RDM mode.
3. When setup is done all nodes start simultaneously send
broadcast messages.
4. Everything hangs up.

The issue is reproducible only when a congestion on broadcast link
occurs. For example, when there are only 8 nodes it works fine since
congestion doesn't occur. Send queue limit is 40 in my case (I use a
critical importance level) and when 64 nodes send a message at the
same moment a congestion occurs every time.

Signed-off-by: Dmitry S Kolmakov <kolmakov.dmitriy@huawei.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d999297c 16-Jul-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: reduce locking scope during packet reception

We convert packet/message reception according to the same principle
we have been using for message sending and timeout handling:

We move the function tipc_rcv() to node.c, hence handling the initial
packet reception at the link aggregation level. The function grabs
the node lock, selects the receiving link, and accesses it via a new
call tipc_link_rcv(). This function appends buffers to the input
queue for delivery upwards, but it may also append outgoing packets
to the xmit queue, just as we do during regular message sending. The
latter will happen when buffers are forwarded from the link backlog,
or when retransmission is requested.

Upon return of this function, and after having released the node lock,
tipc_rcv() delivers/tranmsits the contents of those queues, but it may
also perform actions such as link activation or reset, as indicated by
the return flags from the link.

This reduces the number of cpu cycles spent inside the node spinlock,
and reduces contention on that lock.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d3504c34 16-Jul-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: clean up definitions and usage of link flags

The status flag LINK_STOPPED is not needed any more, since the
mechanism for delayed deletion of links has been removed.
Likewise, LINK_STARTED and LINK_START_EVT are unnecessary,
because we can just as well start the link timer directly from
inside tipc_link_create().

We eliminate these flags in this commit.

Instead of the above flags, we now introduce three new link modes,
TIPC_LINK_OPEN, TIPC_LINK_BLOCKED and TIPC_LINK_TUNNEL. The values
indicate whether, and in the case of TIPC_LINK_TUNNEL, which, messages
the link is allowed to receive in this state. TIPC_LINK_BLOCKED also
blocks timer-driven protocol messages to be sent out, and any change
to the link FSM. Since the modes are mutually exclusive, we convert
them to state values, and rename the 'flags' field in struct tipc_link
to 'exec_mode'.

Finally, we move the #defines for link FSM states and events from link.h
into enums inside the file link.c, which is the real usage scope of
these definitions.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 22d85c79 16-Jul-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: change sk_buffer handling in tipc_link_xmit()

When the function tipc_link_xmit() is given a buffer list for
transmission, it currently consumes the list both when transmission
is successful and when it fails, except for the special case when
it encounters link congestion.

This behavior is inconsistent, and needs to be corrected if we want
to avoid problems in later commits in this series.

In this commit, we change this to let the function consume the list
only when transmission is successful, and leave the list with the
sender in all other cases. We also modifiy the socket code so that
it adapts to this change, i.e., purges the list when a non-congestion
error code is returned.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9d13ec65 16-Jul-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: introduce link entry structure to struct tipc_node

struct 'tipc_node' currently contains two arrays for link attributes,
one for the link pointers, and one for the usable link MTUs.

We now group those into a new struct 'tipc_link_entry', and intoduce
one single array consisting of such enties. Apart from being a cosmetic
improvement, this is a starting point for the strict master-slave
relation between node and link that we will introduce in the following
commits.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7d967b67 28-Jun-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: purge backlog queue counters when broadcast link is reset

In commit 1f66d161ab3d8b518903fa6c3f9c1f48d6919e74
("tipc: introduce starvation free send algorithm")
we introduced a counter per priority level for buffers
in the link backlog queue. We also introduced a new
function tipc_link_purge_backlog(), to reset these
counters to zero when the link is reset.

Unfortunately, we missed to call this function when
the broadcast link is reset, with the result that the
values of these counters might be permanently skewed
when new nodes are attached. This may in the worst case
lead to permananent, but spurious, broadcast link
congestion, where no broadcast packets can be sent at
all.

We fix this bug with this commit.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# dd3f9e70 14-May-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: add packet sequence number at instant of transmission

Currently, the packet sequence number is updated and added to each
packet at the moment a packet is added to the link backlog queue.
This is wasteful, since it forces the code to traverse the send
packet list packet by packet when adding them to the backlog queue.
It would be better to just splice the whole packet list into the
backlog queue when that is the right action to do.

In this commit, we do this change. Also, since the sequence numbers
cannot now be assigned to the packets at the moment they are added
the backlog queue, we do instead calculate and add them at the moment
of transmission, when the backlog queue has to be traversed anyway.
We do this in the function tipc_link_push_packet().

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a97b9d3f 14-May-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: rename fields in struct tipc_link

We rename some fields in struct tipc_link, in order to give them more
descriptive names:

next_in_no -> rcv_nxt
next_out_no-> snd_nxt
fsm_msg_cnt-> silent_intv_cnt
cont_intv -> keepalive_intv
last_retransmitted -> last_retransm

There are no functional changes in this commit.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 670f4f88 06-May-2015 Richard Alpe <richard.alpe@ericsson.com>

tipc: add broadcast link window set/get to nl api

Add the ability to get or set the broadcast link window through the
new netlink API. The functionality was unintentionally missing from
the new netlink API. Adding this means that we also fix the breakage
in the old API when coming through the compat layer.

Fixes: 37e2d4843f9e (tipc: convert legacy nl link prop set to nl compat)
Reported-by: Tomi Ollila <tomi.ollila@iki.fi>
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ed193ece 02-Apr-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: simplify link mtu negotiation

When a link is being established, the two endpoints advertise their
respective interface MTU in the transmitted RESET and ACTIVATE messages.
If there is any difference, the lower of the two MTUs will be selected
for use by both endpoints.

However, as a remnant of earlier attempts to introduce TIPC level
routing. there also exists an MTU discovery mechanism. If an intermediate
node has a lower MTU than the two endpoints, they will discover this
through a bisectional approach, and finally adopt this MTU for common use.

Since there is no TIPC level routing, and probably never will be,
this mechanism doesn't make any sense, and only serves to make the
link level protocol unecessarily complex.

In this commit, we eliminate the MTU discovery algorithm,and fall back
to the simple MTU advertising approach. This change is fully backwards
compatible.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8a0f6ebe 26-Mar-2015 Ying Xue <ying.xue@windriver.com>

tipc: involve reference counter for node structure

TIPC node hash node table is protected with rcu lock on read side.
tipc_node_find() is used to look for a node object with node address
through iterating the hash node table. As the entire process of what
tipc_node_find() traverses the table is guarded with rcu read lock,
it's safe for us. However, when callers use the node object returned
by tipc_node_find(), there is no rcu read lock applied. Therefore,
this is absolutely unsafe for callers of tipc_node_find().

Now we introduce a reference counter for node structure. Before
tipc_node_find() returns node object to its caller, it first increases
the reference counter. Accordingly, after its caller used it up,
it decreases the counter again. This can prevent a node being used by
one thread from being freed by another thread.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b952b2be 26-Mar-2015 Ying Xue <ying.xue@windriver.com>

tipc: fix potential deadlock when all links are reset

[ 60.988363] ======================================================
[ 60.988754] [ INFO: possible circular locking dependency detected ]
[ 60.989152] 3.19.0+ #194 Not tainted
[ 60.989377] -------------------------------------------------------
[ 60.989781] swapper/3/0 is trying to acquire lock:
[ 60.990079] (&(&n_ptr->lock)->rlock){+.-...}, at: [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
[ 60.990743]
[ 60.990743] but task is already holding lock:
[ 60.991106] (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
[ 60.991738]
[ 60.991738] which lock already depends on the new lock.
[ 60.991738]
[ 60.992174]
[ 60.992174] the existing dependency chain (in reverse order) is:
[ 60.992174]
-> #1 (&(&bclink->lock)->rlock){+.-...}:
[ 60.992174] [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
[ 60.992174] [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
[ 60.992174] [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
[ 60.992174] [<ffffffffa0000f57>] tipc_bclink_add_node+0x97/0xf0 [tipc]
[ 60.992174] [<ffffffffa0011815>] tipc_node_link_up+0xf5/0x110 [tipc]
[ 60.992174] [<ffffffffa0007783>] link_state_event+0x2b3/0x4f0 [tipc]
[ 60.992174] [<ffffffffa00193c0>] tipc_link_proto_rcv+0x24c/0x418 [tipc]
[ 60.992174] [<ffffffffa0008857>] tipc_rcv+0x827/0xac0 [tipc]
[ 60.992174] [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
[ 60.992174] [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
[ 60.992174] [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
[ 60.992174] [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
[ 60.992174] [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
[ 60.992174] [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
[ 60.992174] [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
[ 60.992174] [<ffffffff81647b60>] net_rx_action+0x150/0x360
[ 60.992174] [<ffffffff8105ec43>] __do_softirq+0x123/0x360
[ 60.992174] [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
[ 60.992174] [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
[ 60.992174] [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
[ 60.992174] [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
[ 60.992174] [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
[ 60.992174] [<ffffffff81033cda>] start_secondary+0x13a/0x150
[ 60.992174]
-> #0 (&(&n_ptr->lock)->rlock){+.-...}:
[ 60.992174] [<ffffffff810a8f7d>] __lock_acquire+0x163d/0x1ca0
[ 60.992174] [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
[ 60.992174] [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
[ 60.992174] [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
[ 60.992174] [<ffffffffa0001e11>] tipc_bclink_rcv+0x611/0x640 [tipc]
[ 60.992174] [<ffffffffa0008646>] tipc_rcv+0x616/0xac0 [tipc]
[ 60.992174] [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
[ 60.992174] [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
[ 60.992174] [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
[ 60.992174] [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
[ 60.992174] [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
[ 60.992174] [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
[ 60.992174] [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
[ 60.992174] [<ffffffff81647b60>] net_rx_action+0x150/0x360
[ 60.992174] [<ffffffff8105ec43>] __do_softirq+0x123/0x360
[ 60.992174] [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
[ 60.992174] [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
[ 60.992174] [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
[ 60.992174] [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
[ 60.992174] [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
[ 60.992174] [<ffffffff81033cda>] start_secondary+0x13a/0x150
[ 60.992174]
[ 60.992174] other info that might help us debug this:
[ 60.992174]
[ 60.992174] Possible unsafe locking scenario:
[ 60.992174]
[ 60.992174] CPU0 CPU1
[ 60.992174] ---- ----
[ 60.992174] lock(&(&bclink->lock)->rlock);
[ 60.992174] lock(&(&n_ptr->lock)->rlock);
[ 60.992174] lock(&(&bclink->lock)->rlock);
[ 60.992174] lock(&(&n_ptr->lock)->rlock);
[ 60.992174]
[ 60.992174] *** DEADLOCK ***
[ 60.992174]
[ 60.992174] 3 locks held by swapper/3/0:
[ 60.992174] #0: (rcu_read_lock){......}, at: [<ffffffff81646791>] __netif_receive_skb_core+0x71/0x980
[ 60.992174] #1: (rcu_read_lock){......}, at: [<ffffffffa0002c35>] tipc_l2_rcv_msg+0x5/0xd0 [tipc]
[ 60.992174] #2: (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
[ 60.992174]

The correct the sequence of grabbing n_ptr->lock and bclink->lock
should be that the former is first held and the latter is then taken,
which exactly happened on CPU1. But especially when the retransmission
of broadcast link is failed, bclink->lock is first held in
tipc_bclink_rcv(), and n_ptr->lock is taken in link_retransmit_failure()
called by tipc_link_retransmit() subsequently, which is demonstrated on
CPU0. As a result, deadlock occurs.

If the order of holding the two locks happening on CPU0 is reversed, the
deadlock risk will be relieved. Therefore, the node lock taken in
link_retransmit_failure() originally is moved to tipc_bclink_rcv()
so that it's obtained before bclink lock. But the precondition of
the adjustment of node lock is that responding to bclink reset event
must be moved from tipc_bclink_unlock() to tipc_node_unlock().

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1f66d161 24-Mar-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: introduce starvation free send algorithm

Currently, we only use a single counter; the length of the backlog
queue, to determine whether a message should be accepted to the queue
or not. Each time a message is being sent, the queue length is compared
to a threshold value for the message's importance priority. If the queue
length is beyond this threshold, the message is rejected. This algorithm
implies a risk of starvation of low importance senders during very high
load, because it may take a long time before the backlog queue has
decreased enough to accept a lower level message.

We now eliminate this risk by introducing a counter for each importance
priority. When a message is sent, we check only the queue level for that
particular message's priority. If that is ok, the message can be added
to the backlog, irrespective of the queue level for other priorities.
This way, each level is guaranteed a certain portion of the total
bandwidth, and any risk of starvation is eliminated.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bc14b8d6 25-Mar-2015 Ying Xue <ying.xue@windriver.com>

tipc: fix a link reset issue due to retransmission failures

When a node joins a cluster while we are transmitting a fragment
stream over the broadcast link, it's missing the preceding fragments
needed to build a meaningful message. As a result, the node has to
drop it. However, as the fragment message is not acknowledged to
its sender before it's dropped, it accidentally causes link reset
of retransmission failure on the node.

Reported-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e3eea1eb 13-Mar-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: clean up handling of message priorities

Messages transferred by TIPC are assigned an "importance priority", -an
integer value indicating how to treat the message when there is link or
destination socket congestion.

There is no separate header field for this value. Instead, the message
user values have been chosen in ascending order according to perceived
importance, so that the message user field can be used for this.

This is not a good solution. First, we have many more users than the
needed priority levels, so we end up with treating more priority
levels than necessary. Second, the user field cannot always
accurately reflect the priority of the message. E.g., a message
fragment packet should really have the priority of the enveloped
user data message, and not the priority of the MSG_FRAGMENTER user.
Until now, we have been working around this problem in different ways,
but it is now time to implement a consistent way of handling such
priorities, although still within the constraint that we cannot
allocate any more bits in the regular data message header for this.

In this commit, we define a new priority level, TIPC_SYSTEM_IMPORTANCE,
that will be the only one used apart from the four (lower) user data
levels. All non-data messages map down to this priority. Furthermore,
we take some free bits from the MSG_FRAGMENTER header and allocate
them to store the priority of the enveloped message. We then adjust
the functions msg_importance()/msg_set_importance() so that they
read/set the correct header fields depending on user type.

This small protocol change is fully compatible, because the code at
the receiving end of a link currently reads the importance level
only from user data messages, where there is no change.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 05dcc5aa 13-Mar-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: split link outqueue

struct tipc_link contains one single queue for outgoing packets,
where both transmitted and waiting packets are queued.

This infrastructure is hard to maintain, because we need
to keep a number of fields to keep track of which packets are
sent or unsent, and the number of packets in each category.

A lot of code becomes simpler if we split this queue into a transmission
queue, where sent/unacknowledged packets are kept, and a backlog queue,
where we keep the not yet sent packets.

In this commit we do this separation.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2cdf3918 13-Mar-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: eliminate unnecessary call to broadcast ack function

The unicast packet header contains a broadcast acknowledge sequence
number, that may need to be conveyed to the broadcast link for proper
treatment. Currently, the function tipc_rcv(), which is on the most
critical data path, calls the function tipc_bclink_acknowledge() to
have this done. This call is made for each received packet, and results
in the unconditional grabbing of the broadcast link spinlock.

This is unnecessary, since we can see directly from tipc_rcv() if
the acknowledged number differs from what has been previously acked
from the node in question. In the vast majority of cases the numbers
won't differ, and there is nothing to update.

We now make the call to tipc_bclink_acknowledge() conditional
to that the two ack values differ.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f2b3b2d4 09-Feb-2015 Richard Alpe <richard.alpe@ericsson.com>

tipc: convert legacy nl link stat to nl compat

Add functionality for safely appending string data to a TLV without
keeping write count in the caller.

Convert TIPC_CMD_SHOW_LINK_STATS to compat dumpit.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bfb3e5dd 09-Feb-2015 Richard Alpe <richard.alpe@ericsson.com>

tipc: move and rename the legacy nl api to "nl compat"

The new netlink API is no longer "v2" but rather the standard API and
the legacy API is now "nl compat". We split them into separate
start/stop and put them in different files in order to further
distinguish them.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# cb1b7280 05-Feb-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: eliminate race condition at multicast reception

In a previous commit in this series we resolved a race problem during
unicast message reception.

Here, we resolve the same problem at multicast reception. We apply the
same technique: an input queue serializing the delivery of arriving
buffers. The main difference is that here we do it in two steps.
First, the broadcast link feeds arriving buffers into the tail of an
arrival queue, which head is consumed at the socket level, and where
destination lookup is performed. Second, if the lookup is successful,
the resulting buffer clones are fed into a second queue, the input
queue. This queue is consumed at reception in the socket just like
in the unicast case. Both queues are protected by the same lock, -the
one of the input queue.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3c724acd 05-Feb-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: simplify socket multicast reception

The structure 'tipc_port_list' is used to collect port numbers
representing multicast destination socket on a receiving node.
The list is not based on a standard linked list, and is in reality
optimized for the uncommon case that there are more than one
multicast destinations per node. This makes the list handling
unecessarily complex, and as a consequence, even the socket
multicast reception becomes more complex.

In this commit, we replace 'tipc_port_list' with a new 'struct
tipc_plist', which is based on a standard list. We give the new
list stack (push/pop) semantics, someting that simplifies
the implementation of the function tipc_sk_mcast_rcv().

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c637c103 05-Feb-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: resolve race problem at unicast message reception

TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.

We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.

This solves the sequentiality problem, and seems to cause no measurable
performance degradation.

A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c5898636 05-Feb-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: reduce usage of context info in socket and link

The most common usage of namespace information is when we fetch the
own node addess from the net structure. This leads to a lot of
passing around of a parameter of type 'struct net *' between
functions just to make them able to obtain this address.

However, in many cases this is unnecessary. The own node address
is readily available as a member of both struct tipc_sock and
tipc_link, and can be fetched from there instead.
The fact that the vast majority of functions in socket.c and link.c
anyway are maintaining a pointer to their respective base structures
makes this option even more compelling.

In this commit, we introduce the inline functions tsk_own_node()
and link_own_node() to make it easy for functions to fetch the node
address from those structs instead of having to pass along and
dereference the namespace struct.

In particular, we make calls to the msg_xx() functions in msg.{h,c}
context independent by directly passing them the own node address
as parameter when needed. Those functions should be regarded as
leaves in the code dependency tree, and it is hence desirable to
keep them namspace unaware.

Apart from a potential positive effect on cache behavior, these
changes make it easier to introduce the changes that will follow
later in this series.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 34747539 09-Jan-2015 Ying Xue <ying.xue@windriver.com>

tipc: make tipc node address support net namespace

If net namespace is supported in tipc, each namespace will be treated
as a separate tipc node. Therefore, every namespace must own its
private tipc node address. This means the "tipc_own_addr" global
variable of node address must be moved to tipc_net structure to
satisfy the requirement. It's turned out that users also can assign
node address for every namespace.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1da46568 09-Jan-2015 Ying Xue <ying.xue@windriver.com>

tipc: make tipc broadcast link support net namespace

TIPC broadcast link is statically established and its relevant states
are maintained with the global variables: "bcbearer", "bclink" and
"bcl". Allowing different namespace to own different broadcast link
instances, these variables must be moved to tipc_net structure and
broadcast link instances would be allocated and initialized when
namespace is created.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7f9f95d9 09-Jan-2015 Ying Xue <ying.xue@windriver.com>

tipc: make bearer list support net namespace

Bearer list defined as a global variable is used to store bearer
instances. When tipc supports net namespace, bearers created in
one namespace must be isolated with others allocated in other
namespaces, which requires us that the bearer list(bearer_list)
must be moved to tipc_net structure. As a result, a net namespace
pointer has to be passed to functions which access the bearer list.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f2f9800d 09-Jan-2015 Ying Xue <ying.xue@windriver.com>

tipc: make tipc node table aware of net namespace

Global variables associated with node table are below:
- node table list (node_htable)
- node hash table list (tipc_node_list)
- node table lock (node_list_lock)
- node number counter (tipc_num_nodes)
- node link number counter (tipc_num_links)

To make node table support namespace, above global variables must be
moved to tipc_net structure in order to keep secret for different
namespaces. As a consequence, these variables are allocated and
initialized when namespace is created, and deallocated when namespace
is destroyed. After the change, functions associated with these
variables have to utilize a namespace pointer to access them. So
adding namespace pointer as a parameter of these functions is the
major change made in the commit.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c93d3baa 09-Jan-2015 Ying Xue <ying.xue@windriver.com>

tipc: involve namespace infrastructure

Involve namespace infrastructure, make the "tipc_net_id" global
variable aware of per namespace, and rename it to "net_id". In
order that the conversion can be successfully done, an instance
of networking namespace must be passed to relevant functions,
allowing them to access the "net_id" variable of per namespace.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Tero Aho <Tero.Aho@coriant.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 703068ee 09-Jan-2015 Jon Maloy <jon.maloy@ericsson.com>

tipc: fix bug in broadcast retransmit code

In commit 58dc55f25631178ee74cd27185956a8f7dcb3e32 ("tipc: use generic
SKB list APIs to manage link transmission queue") we replace all list
traversal loops with the macros skb_queue_walk() or
skb_queue_walk_safe(). While the previous loops were based on the
assumption that the list was NULL-terminated, the standard macros
stop when the iterator reaches the list head, which is non-NULL.

In the function bclink_retransmit_pkt() this macro replacement has
lead to a bug. When we receive a BCAST STATE_MSG we unconditionally
call the function bclink_retransmit_pkt(), whether there really is
anything to retransmit or not, assuming that the sequence number
comparisons will lead to the correct behavior. However, if the
transmission queue is empty, or if there are no eligible buffers in
the transmission queue, we will by mistake pass the list head pointer
to the function tipc_link_retransmit(). Since the list head is not a
valid sk_buff, this leads to a crash.

In this commit we fix this by only calling tipc_link_retransmit()
if we actually found eligible buffers in the transmission queue.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 16416779 07-Jan-2015 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: fix bug in broadcast retransmit code

In commit 58dc55f25631178ee74cd27185956a8f7dcb3e32 ("tipc: use generic
SKB list APIs to manage link transmission queue") we replace all list
traversal loops with the macros skb_queue_walk() or
skb_queue_walk_safe(). While the previous loops were based on the
assumption that the list was NULL-terminated, the standard macros
stop when the iterator reaches the list head, which is non-NULL.

In the function bclink_retransmit_pkt() this macro replacement has
lead to a bug. When we receive a BCAST STATE_MSG we unconditionally
call the function bclink_retransmit_pkt(), whether there really is
anything to retransmit or not, assuming that the sequence number
comparisons will lead to the correct behavior. However, if the
transmission queue is empty, or if there are no eligible buffers in
the transmission queue, we will by mistake pass the list head pointer
to the function tipc_link_retransmit(). Since the list head is not a
valid sk_buff, this leads to a crash.

In this commit we fix this by only calling tipc_link_retransmit()
if we actually found eligible buffers in the transmission queue.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4988bb4a 03-Dec-2014 Erik Hugne <erik.hugne@ericsson.com>

tipc: fix missing spinlock init and nullptr oops

commit 908344cdda80 ("tipc: fix bug in multicast congestion
handling") introduced two bugs with the bclink wakeup
function. This commit fixes the missing spinlock init for the
waiting_sks list. We also eliminate the race condition
between the waiting_sks length check/dequeue operations in
tipc_bclink_wakeup_users by simply removing the redundant
length check.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Acked-by: Tero Aho <Tero.Aho@coriant.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a6ca1094 25-Nov-2014 Ying Xue <ying.xue@windriver.com>

tipc: use generic SKB list APIs to manage TIPC outgoing packet chains

Use standard SKB list APIs associated with struct sk_buff_head to
manage socket outgoing packet chain and name table outgoing packet
chain, having relevant code simpler and more readable.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bc6fecd4 25-Nov-2014 Ying Xue <ying.xue@windriver.com>

tipc: use generic SKB list APIs to manage deferred queue of link

Use standard SKB list APIs associated with struct sk_buff_head to
manage link's deferred queue, simplifying relevant code.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 58dc55f2 25-Nov-2014 Ying Xue <ying.xue@windriver.com>

tipc: use generic SKB list APIs to manage link transmission queue

Use standard SKB list APIs associated with struct sk_buff_head to
manage link transmission queue, having relevant code more clean.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 47b4c9a8 25-Nov-2014 Ying Xue <ying.xue@windriver.com>

tipc: clean up the process of link pushing packets

In original tipc_link_push_packet(), it pushes messages from protocol
message queue, retransmission queue and next_out queue. But as the two
first queues are removed, we can simplify its relevant code through
deleting tipc_link_push_queue().

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d8182804 24-Nov-2014 Richard Alpe <richard.alpe@ericsson.com>

tipc: fix sparse warnings in new nl api

Fix sparse warnings about non-static declaration of static functions
in the new tipc netlink API.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7be57fc6 20-Nov-2014 Richard Alpe <richard.alpe@ericsson.com>

tipc: add link get/dump to new netlink api

Add TIPC_NL_LINK_GET command to the new tipc netlink API.

This command supports dumping all information about all links
(including the broadcast link) or getting all information about a
specific link (not the broadcast link).

The information about a link includes name, transmission info,
properties and link statistics.

As the tipc broadcast link is special we unfortunately have to treat
it specially. It is a deliberate decision not to abstract the
broadcast link on this (API) level.

Netlink logical layout of link response message:
-> port
-> name
-> MTU
-> RX
-> TX
-> up flag
-> active flag
-> properties
-> priority
-> tolerance
-> window
-> statistics
-> rx_info
-> rx_fragments
-> rx_fragmented
-> rx_bundles
-> rx_bundled
-> tx_info
-> tx_fragments
-> tx_fragmented
-> tx_bundles
-> tx_bundled
-> msg_prof_tot
-> msg_len_cnt
-> msg_len_tot
-> msg_len_p0
-> msg_len_p1
-> msg_len_p2
-> msg_len_p3
-> msg_len_p4
-> msg_len_p5
-> msg_len_p6
-> rx_states
-> rx_probes
-> rx_nacks
-> rx_deferred
-> tx_states
-> tx_probes
-> tx_nacks
-> tx_acks
-> retransmitted
-> duplicates
-> link_congs
-> max_queue
-> avg_queue

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 908344cd 07-Oct-2014 Jon Maloy <jon.maloy@ericsson.com>

tipc: fix bug in multicast congestion handling

One aim of commit 50100a5e39461b2a61d6040e73c384766c29975d ("tipc:
use pseudo message to wake up sockets after link congestion") was
to handle link congestion abatement in a uniform way for both unicast
and multicast transmit. However, the latter doesn't work correctly,
and has been broken since the referenced commit was applied.

If a user now sends a burst of multicast messages that is big
enough to cause broadcast link congestion, it will be put to sleep,
and not be waked up when the congestion abates as it should be.

This has two reasons. First, the flag that is used, TIPC_WAKEUP_USERS,
is set correctly, but in the wrong field. Instead of setting it in the
'action_flags' field of the arrival node struct, it is by mistake set
in the dummy node struct that is owned by the broadcast link, where it
will never tested for. Second, we cannot use the same flag for waking
up unicast and multicast users, since the function tipc_node_unlock()
needs to pick the wakeup pseudo messages to deliver from different
queues. It must hence be able to distinguish between the two cases.

This commit solves this problem by adding a new flag
TIPC_WAKEUP_BCAST_USERS, and a new function tipc_bclink_wakeup_user().
The latter is to be called by tipc_node_unlock() when the named flag,
now set in the correct field, is encountered.

v2: using explicit 'unsigned int' declaration instead of 'uint', as
per comment from David Miller.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2e84c60b 22-Aug-2014 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: remove include file port.h

We move the inline functions in the file port.h to socket.c, and modify
their names accordingly.

We move struct tipc_port and some macros to socket.h.

Finally, we remove the file port.h.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 50100a5e 22-Aug-2014 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: use pseudo message to wake up sockets after link congestion

The current link implementation keeps a linked list of blocked ports/
sockets that is populated when there is link congestion. The purpose
of this is to let the link know which users to wake up when the
congestion abates.

This adds unnecessary complexity to the data structure and the code,
since it forces us to involve the link each time we want to delete
a socket. It also forces us to grab the spinlock port_lock within
the scope of node_lock. We want to get rid of this direct dependence,
as well as the deadlock hazard resulting from the usage of port_lock.

In this commit, we instead let the link keep list of a "wakeup" pseudo
messages for use in such situations. Those messages are sent to the
pending sockets via the ordinary message reception path, and wake up
the socket's owner when they are received.

This enables us to get rid of the 'waiting_ports' linked lists in struct
tipc_port that manifest this direct reference. As a consequence, we can
eliminate another BH entry into the socket, and hence the need to grab
port_lock. This is a further step in our effort to remove port_lock
altogether.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6f92ee54 16-Jul-2014 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: ensure sequential message delivery across dual bearers

When we run broadcast packets over dual bearers/interfaces, the
current transmission code is flipping bearers between each sent
packet, with the purpose of leveraging the double bandwidth
available. The receiving bclink is resequencing the packets if
needed, so all messages are delivered upwards from the broadcast
link in the correct order, even if they may arrive in concurrent
interrupts.

However, at the moment of delivery upwards to the socket, we release
all spinlocks (bclink_lock, node_lock), so it is still possible
that arriving messages bypass each other before they reach the socket
queue.

We fix this by applying the same technique we are using for unicast
traffic. We use a link selector (i.e., the last bit of sending port
number) to ensure that messages from the same sender socket always are
sent over the same bearer. This guarantees sequential delivery between
socket pairs, which is sufficient to satisfy the protocol spec, as well
as all known user requirements.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 9fbfb8b1 16-Jul-2014 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: rename temporarily named functions

After the previous commit, we can now give the functions with temporary
names, such as tipc_link_xmit2(), tipc_msg_build2() etc., their proper
names.

There are no functional changes in this commit.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c4116e10 16-Jul-2014 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: remove unreferenced functions

We can now remove a number of functions which have become obsolete
and unreferenced through this commit series. There are no functional
changes in this commit.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0abd8ff2 16-Jul-2014 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: start using the new multicast functions

In this commit, we convert the socket multicast send function to
directly call the new multicast/broadcast function (tipc_bclink_xmit2())
introduced in the previous commit. We do this instead of letting the
call go via the now obsolete tipc_port_mcast_xmit(), hence saving
a call level and some code complexity.

We also remove the initial destination lookup at the message sending
side, and replace that with an unconditional lookup at the receiving
side, including on the sending node itself. This makes the destination
lookup and message transfer more uniform than before.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 078bec82 16-Jul-2014 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: add new functions for multicast and broadcast distribution

We add a new broadcast link transmit function in bclink.c and a new
receive function in socket.c. The purpose is to move the branching
between external and internal destination down to the link layer,
just as we have done with unicast in earlier commits. We also make
use of the new link-independent fragmentation support that was
introduced in an earlier commit series.

This gives a shorter and simpler code path, and makes it possible
to obtain copy-free buffer delivery to all node local destination
sockets.

The new transmission code is added in parallel with the existing one,
and will be used by the socket multicast send function in the next
commit in this series.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 99941754 11-Jul-2014 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: clear 'next'-pointer of message fragments before reassembly

If the 'next' pointer of the last fragment buffer in a message is not
zeroed before reassembly, we risk ending up with a corrupt message,
since the reassembly function itself isn't doing this.

Currently, when a buffer is retrieved from the deferred queue of the
broadcast link, the next pointer is not cleared, with the result as
described above.

This commit corrects this, and thereby fixes a bug that may occur when
long broadcast messages are transmitted across dual interfaces. The bug
has been present since 40ba3cdf542a469aaa9083fa041656e59b109b90 ("tipc:
message reassembly using fragment chain")

This commit should be applied to both net and net-next.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bad93e9d 11-Jun-2014 Octavian Purdila <octavian.purdila@intel.com>

net: add __pskb_copy_fclone and pskb_copy_for_clone

There are several instances where a pskb_copy or __pskb_copy is
immediately followed by an skb_clone.

Add a couple of new functions to allow the copy skb to be allocated
from the fclone cache and thus speed up subsequent skb_clone calls.

Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <antonio@meshcoding.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: Arvid Brodin <arvid.brodin@alten.se>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 37e22164 14-May-2014 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: rename and move message reassembly function

The function tipc_link_frag_rcv() is in reality a re-entrant generic
message reassemby function that has nothing in particular to do with
the link, where it is defined now. This becomes obvious when we see
the need to call the function from other places in the code.

In this commit rename it to tipc_buf_append() and move it to the file
msg.c. We also simplify its signature by moving the tail pointer to
the control block of the head buffer, hence making the head buffer
self-contained.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3f5a12bd 04-May-2014 Ying Xue <ying.xue@windriver.com>

tipc: avoid to asynchronously reset all links

Postpone the actions of resetting all links until after bclink
lock is released, avoiding to asynchronously reset all links.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# eb8b00f5 04-May-2014 Ying Xue <ying.xue@windriver.com>

tipc: convert allocations of global variables associated with bclink

Convert allocations of global variables associated with bclink from
static way to dynamical way for the convenience of bclink instance
initialisation. Meanwhile, this also helps TIPC support name space
in the future easily.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d69afc90 04-May-2014 Ying Xue <ying.xue@windriver.com>

tipc: define new functions to operate bc_lock

As we are going to do more jobs when bc_lock is released, the two
operations of holding/releasing the lock should be encapsulated with
functions. In addition, we move bc_lock spin lock into tipc_bclink
structure avoiding to define the global variable.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 28dd9418 20-Apr-2014 Ying Xue <ying.xue@windriver.com>

tipc: use bc_lock to protect node map in bearer structure

The node map variable - 'nodes' in bearer structure is only used by
bclink. When bclink accesses it, bc_lock is held. But when change it,
for instance, in tipc_bearer_add_dest() or tipc_bearer_remove_dest()
the bc_lock is not taken at all. To avoid any inconsistent data, we
should always grab bc_lock while accessing node map variable.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7216cd94 20-Apr-2014 Ying Xue <ying.xue@windriver.com>

tipc: purge tipc_net_lock lock

Now tipc routing hierarchy comprises the structures 'node', 'link'and
'bearer'. The whole hierarchy is protected by a big read/write lock,
tipc_net_lock, to ensure that nothing is added or removed while code
is accessing any of these structures. Obviously the locking policy
makes node, link and bearer components closely bound together so that
their relationship becomes unnecessarily complex. In the worst case,
such locking policy not only has a negative influence on performance,
but also it's prone to lead to deadlock occasionally.

In order o decouple the complex relationship between bearer and node
as well as link, the locking policy is adjusted as follows:

- Bearer level
RTNL lock is used on update side, and RCU is used on read side.
Meanwhile, all bearer instances including broadcast bearer are
saved into bearer_list array.

- Node and link level
All node instances are saved into two tipc_node_list and node_htable
lists. The two lists are protected by node_list_lock on write side,
and they are guarded with RCU lock on read side. All members in node
structure including link instances are protected by node spin lock.

- The relationship between bearer and node
When link accesses bearer, it first needs to find the bearer with
its bearer identity from the bearer_list array. When bearer accesses
node, it can iterate the node_htable hash list with the node
address to find the corresponding node.

In the new locking policy, every component has its private locking
solution and the relationship between bearer and node is very simple,
that is, they can find each other with node address or bearer identity
from node_htable hash list or bearer_list array.

Until now above all changes have been done, so tipc_net_lock can be
removed safely.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7a2f7d18 20-Apr-2014 Ying Xue <ying.xue@windriver.com>

tipc: decouple the relationship between bearer and link

Currently on both paths of message transmission and reception, the
read lock of tipc_net_lock must be held before bearer is accessed,
while the write lock of tipc_net_lock has to be taken before bearer
is configured. Although it can ensure that bearer is always valid on
the two data paths, link and bearer is closely bound together.

So as the part of effort of removing tipc_net_lock, the locking
policy of bearer protection will be adjusted as below: on the two
data paths, RCU is used, and on the configuration path of bearer,
RTNL lock is applied.

Now RCU just covers the path of message reception. To make it possible
to protect the path of message transmission with RCU, link should not
use its stored bearer pointer to access bearer, but it should use the
bearer identity of its attached bearer as index to get bearer instance
from bearer_list array, which can help us decouple the relationship
between bearer and link. As a result, bearer on the path of message
transmission can be safely protected by RCU when we access bearer_list
array within RCU lock protection.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f8322dfc 20-Apr-2014 Ying Xue <ying.xue@windriver.com>

tipc: convert bearer_list to RCU list

Convert bearer_list to RCU list. It's protected by RTNL lock on
update side, and RCU read lock is applied to read side.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Tested-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 987b58be 26-Mar-2014 Ying Xue <ying.xue@windriver.com>

tipc: make broadcast bearer store in bearer_list array

Now unicast bearer is dynamically allocated and placed into its
identity specified slot of bearer_list array. When we search
bearer_list array with a bearer identity, the corresponding bearer
instance can be found. But broadcast bearer is statically allocated
and it is not located in the bearer_list array yet. So we decide to
enlarge bearer_list array into MAX_BEARERS + 1 slots, and its last
slot stores the broadcast bearer so that the broadcast bearer can
be found from bearer_list array with MAX_BEARERS as index. The
change will help us reduce the complex relationship between bearer
and link in the future.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f47de12b 26-Mar-2014 Ying Xue <ying.xue@windriver.com>

tipc: remove active flag from tipc_bearer structure

After the allocation of tipc_bearer structure instance is converted
from statical way to dynamical way, we identify whether a certain
tipc_bearer structure pointer is valid by checking whether the pointer
is NULL or not. So the active flag in tipc_bearer structure becomes
redundant.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3874ccbb 26-Mar-2014 Ying Xue <ying.xue@windriver.com>

tipc: convert tipc_bearers array to pointer list

As part of the effort to introduce RCU protection for the bearer
list, we first need to change it to a list of pointers.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 247f0f3c 18-Feb-2014 Ying Xue <ying.xue@windriver.com>

tipc: align tipc function names with common naming practice in the network

Rename the following functions, which are shorter and more in line
with common naming practice in the network subsystem.

tipc_bclink_send_msg->tipc_bclink_xmit
tipc_bclink_recv_pkt->tipc_bclink_rcv
tipc_disc_recv_msg->tipc_disc_rcv
tipc_link_send_proto_msg->tipc_link_proto_xmit
link_recv_proto_msg->tipc_link_proto_rcv
link_send_sections_long->tipc_link_iovec_long_xmit
tipc_link_send_sections_fast->tipc_link_iovec_xmit_fast
tipc_link_send_sync->tipc_link_sync_xmit
tipc_link_recv_sync->tipc_link_sync_rcv
tipc_link_send_buf->__tipc_link_xmit
tipc_link_send->tipc_link_xmit
tipc_link_send_names->tipc_link_names_xmit
tipc_named_recv->tipc_named_rcv
tipc_link_recv_bundle->tipc_link_bundle_rcv
tipc_link_dup_send_queue->tipc_link_dup_queue_xmit
link_send_long_buf->tipc_link_frag_xmit

tipc_multicast->tipc_port_mcast_xmit
tipc_port_recv_mcast->tipc_port_mcast_rcv
tipc_port_reject_sections->tipc_port_iovec_reject
tipc_port_recv_proto_msg->tipc_port_proto_rcv
tipc_connect->tipc_port_connect
__tipc_connect->__tipc_port_connect
__tipc_disconnect->__tipc_port_disconnect
tipc_disconnect->tipc_port_disconnect
tipc_shutdown->tipc_port_shutdown
tipc_port_recv_msg->tipc_port_rcv
tipc_port_recv_sections->tipc_port_iovec_rcv

release->tipc_release
accept->tipc_accept
bind->tipc_bind
get_name->tipc_getname
poll->tipc_poll
send_msg->tipc_sendmsg
send_packet->tipc_send_packet
send_stream->tipc_send_stream
recv_msg->tipc_recvmsg
recv_stream->tipc_recv_stream
connect->tipc_connect
listen->tipc_listen
shutdown->tipc_shutdown
setsockopt->tipc_setsockopt
getsockopt->tipc_getsockopt

Above changes have no impact on current users of the functions.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a8304529 13-Feb-2014 Ying Xue <ying.xue@windriver.com>

tipc: remove bearer_lock from tipc_bearer struct

After the earlier commits ("tipc: remove 'links' list from
tipc_bearer struct") and ("tipc: introduce new spinlock to protect
struct link_req"), there is no longer any need to protect struct
link_req or or any link list by use of bearer_lock. Furthermore,
we have eliminated the need for using bearer_lock during downcalls
(send) from the link to the bearer, since we have ensured that
bearers always have a longer life cycle that their associated links,
and always contain valid data.

So, the only need now for a lock protecting bearers is for guaranteeing
consistency of the bearer list itself. For this, it is sufficient, at
least for the time being, to continue applying 'net_lock´ in write mode.

By removing bearer_lock we also pre-empt introduction of issue b) descibed
in the previous commit "tipc: remove 'links' list from tipc_bearer struct":

"b) When the outer protection from net_lock is gone, taking
bearer_lock and node_lock in opposite order of method 1) and 2)
will become an obvious deadlock hazard".

Therefore, we now eliminate the bearer_lock spinlock.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 03b92017 13-Feb-2014 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: stricter behavior of message reassembly function

The function tipc_link_recv_fragment(struct sk_buff **buf) currently
leaves the value of the input buffer pointer undefined when it returns,
except when the return code indicates that the reassembly is complete.
This despite the fact that it always consumes the input buffer.

Here, we enforce a stricter behavior by this function, ensuring that
the returned buffer pointer is non-NULL if and only if the reassembly
is complete. This makes it possible to test for the buffer pointer as
criteria for successful reassembly.

We also rename the function to tipc_link_frag_rcv(), which is both
shorter and more in line with common naming practice in the network
subsystem.

Apart from the new name, these changes have no impact on current
users of the function, but makes it more practical for use in some
planned future commits.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 581465fa 07-Jan-2014 Jon Paul Maloy <jon.maloy@ericsson.com>

tipc: make link start event synchronous

When a link is created we delay the start event by launching it
to be executed later in a tasklet. As we hold all the
necessary locks at the moment of creation, and there is no risk
of deadlock or contention, this delay serves no purpose in the
current code.

We remove this obsolete indirection step, and the associated function
link_start(). At the same time, we rename the function tipc_link_stop()
to the more appropriate tipc_link_purge_queues().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 512137ee 06-Dec-2013 Erik Hugne <erik.hugne@ericsson.com>

tipc: remove interface state mirroring in bearer

struct 'tipc_bearer' is a generic representation of the underlying
media type, and exists in a one-to-one relationship to each interface
TIPC is using. The struct contains a 'blocked' flag that mirrors the
operational and execution state of the represented interface, and is
updated through notification calls from the latter. The users of
tipc_bearer are checking this flag before each attempt to send a
packet via the interface.

This state mirroring serves no purpose in the current code base. TIPC
links will not discover a media failure any faster through this
mechanism, and in reality the flag only adds overhead at packet
sending and reception.

Furthermore, the fact that the flag needs to be protected by a spinlock
aggregated into tipc_bearer has turned out to cause a serious and
completely unnecessary deadlock problem.

CPU0 CPU1
---- ----
Time 0: bearer_disable() link_timeout()
Time 1: spin_lock_bh(&b_ptr->lock) tipc_link_push_queue()
Time 2: tipc_link_delete() tipc_bearer_blocked(b_ptr)
Time 3: k_cancel_timer(&req->timer) spin_lock_bh(&b_ptr->lock)
Time 4: del_timer_sync(&req->timer)

I.e., del_timer_sync() on CPU0 never returns, because the timer handler
on CPU1 is waiting for the bearer lock.

We eliminate the 'blocked' flag from struct tipc_bearer, along with all
tests on this flag. This not only resolves the deadlock, but also
simplifies and speeds up the data path execution of TIPC. It also fits
well into our ongoing effort to make the locking policy simpler and
more manageable.

An effect of this change is that we can get rid of functions such as
tipc_bearer_blocked(), tipc_continue() and tipc_block_bearer().
We replace the latter with a new function, tipc_reset_bearer(), which
resets all links associated to the bearer immediately after an
interface goes down.

A user might notice one slight change in link behaviour after this
change. When an interface goes down, (e.g. through a NETDEV_DOWN
event) all attached links will be reset immediately, instead of
leaving it to each link to detect the failure through a timer-driven
mechanism. We consider this an improvement, and see no obvious risks
with the new behavior.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <Paul.Gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 40ba3cdf 06-Nov-2013 Erik Hugne <erik.hugne@ericsson.com>

tipc: message reassembly using fragment chain

When the first fragment of a long data data message is received on a link, a
reassembly buffer large enough to hold the data from this and all subsequent
fragments of the message is allocated. The payload of each new fragment is
copied into this buffer upon arrival. When the last fragment is received, the
reassembled message is delivered upwards to the port/socket layer.

Not only is this an inefficient approach, but it may also cause bursts of
reassembly failures in low memory situations. since we may fail to allocate
the necessary large buffer in the first place. Furthermore, after 100 subsequent
such failures the link will be reset, something that in reality aggravates the
situation.

To remedy this problem, this patch introduces a different approach. Instead of
allocating a big reassembly buffer, we now append the arriving fragments
to a reassembly chain on the link, and deliver the whole chain up to the
socket layer once the last fragment has been received. This is safe because
the retransmission layer of a TIPC link always delivers packets in strict
uninterrupted order, to the reassembly layer as to all other upper layers.
Hence there can never be more than one fragment chain pending reassembly at
any given time in a link, and we can trust (but still verify) that the
fragments will be chained up in the correct order.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 528f6f4b 06-Nov-2013 Erik Hugne <erik.hugne@ericsson.com>

tipc: don't reroute message fragments

When a message fragment is received in a broadcast or unicast link,
the reception code will append the fragment payload to a big reassembly
buffer through a call to the function tipc_recv_fragm(). However, after
the return of that call, the logics goes on and passes the fragment
buffer to the function tipc_net_route_msg(), which will simply drop it.
This behavior is a remnant from the now obsolete multi-cluster
functionality, and has no relevance in the current code base.

Although currently harmless, this unnecessary call would be fatal
after applying the next patch in this series, which introduces
a completely new reassembly algorithm. So we change the code to
eliminate the redundant call.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# ae8509c4 17-Jun-2013 Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: cosmetic realignment of function arguments

No runtime code changes here. Just a realign of the function
arguments to start where the 1st one was, and fit as many args
as can be put in an 80 char line.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 488fc9af 30-Apr-2013 Gerlando Falauto <gerlando.falauto@keymile.com>

tipc: pskb_copy() buffers when sending on more than one bearer

When sending packets, TIPC bearers use skb_clone() before writing their
hardware header. This will however NOT copy the data buffer.
So when the same packet is sent over multiple bearers (to reach multiple
nodes), the same socket buffer data will be treated by multiple
tipc_media drivers which will write their own hardware header through
dev_hard_header().
Most of the time this is not a problem, because by the time the
packet is processed by the second media, it has already been sent over
the first one. However, when the first transmission is delayed (e.g.
because of insufficient bandwidth or through a shaper), the next bearer
will overwrite the hardware header, resulting in the packet being sent:
a) with the wrong source address, when bearers of the same type,
e.g. ethernet, are involved
b) with a completely corrupt header, or even dropped, when bearers of
different types are involved.

So when the same socket buffer is to be sent multiple times, send a
pskb_copy() instead (from the second instance on), and release it
afterwards (the bearer will skb_clone() it anyway).

Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 77861d9c 30-Apr-2013 Gerlando Falauto <gerlando.falauto@keymile.com>

tipc: tipc_bcbearer_send(): simplify bearer selection

Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e6160710 30-Apr-2013 Gerlando Falauto <gerlando.falauto@keymile.com>

tipc: cosmetic: clean up comments and break a long line

Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8aeb89f2 17-Apr-2013 Patrick McHardy <kaber@trash.net>

tipc: move bcast_addr from struct tipc_media to struct tipc_bearer

Some network protocols, like InfiniBand, don't have a fixed broadcast
address but one that depends on the configuration. Move the bcast_addr
to struct tipc_bearer and initialize it with the broadcast address of
the network device when the bearer is enabled.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c5c73dca 14-Feb-2013 Erik Hugne <erik.hugne@ericsson.com>

tipc: fix missing spinlock init in broadcast code

After commit 3c294cb3 "tipc: remove the bearer congestion mechanism",
we try to grab the broadcast bearer lock when sending multicast
messages over the broadcast link. This will cause an oops because
the lock is never initialized. This is an old bug, but the lock
was never actually used before commit 3c294cb3, so that why it was
not visible until now. The oops will look something like:

BUG: spinlock bad magic on CPU#2, daemon/147
lock: bcast_bearer+0x48/0xffffffffffffd19a [tipc],
.magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
Pid: 147, comm: daemon Not tainted 3.8.0-rc3+ #206
Call Trace:
spin_dump+0x8a/0x8f
spin_bug+0x21/0x26
do_raw_spin_lock+0x114/0x150
_raw_spin_lock_bh+0x19/0x20
tipc_bearer_blocked+0x1f/0x40 [tipc]
tipc_link_send_buf+0x82/0x280 [tipc]
? __alloc_skb+0x9f/0x2b0
tipc_bclink_send_msg+0x77/0xa0 [tipc]
tipc_multicast+0x11b/0x1b0 [tipc]
send_msg+0x225/0x530 [tipc]
sock_sendmsg+0xca/0xe0

The above can be triggered by running the multicast demo program.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 389dd9bc 15-Nov-2012 Ying Xue <ying.xue@windriver.com>

tipc: rename supported flag to recv_permitted

Rename the "supported" flag in bclink structure to "recv_permitted"
to better reflect what it is used for. When this flag is set for a
given node, we are permitted to receive and acknowledge broadcast
messages from that node. Convert it to a bool at the same time,
since it is not used to store any numerical values.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 3c294cb3 14-Nov-2012 Ying Xue <ying.xue@windriver.com>

tipc: remove the bearer congestion mechanism

Currently at the TIPC bearer layer there is the following congestion
mechanism:

Once sending packets has failed via that bearer, the bearer will be
flagged as being in congested state at once. During bearer congestion,
all packets arriving at link will be queued on the link's outgoing
buffer. When we detect that the state of bearer congestion has
relaxed (e.g. some packets are received from the bearer) we will try
our best to push all packets in the link's outgoing buffer until the
buffer is empty, or until the bearer is congested again.

However, in fact the TIPC bearer never receives any feedback from the
device layer whether a send was successful or not, so it must always
assume it was successful. Therefore, the bearer congestion mechanism
as it exists currently is of no value.

But the bearer blocking state is still useful for us. For example,
when the physical media goes down/up, we need to change the state of
the links bound to the bearer. So the code maintaing the state
information is not removed.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# dc1aed37 28-Jun-2012 Erik Hugne <erik.hugne@ericsson.com>

tipc: phase out most of the struct print_buf usage

The tipc_printf is renamed to tipc_snprintf, as the new name
describes more what the function actually does. It is also
changed to take a buffer and length parameter and return
number of characters written to the buffer. All callers of
this function that used to pass a print_buf are updated.

Final removal of the struct print_buf itself will be done
synchronously with the pending removal of the deprecated
logging code that also was using it.

Functions that build up a response message with a list of
ports, nametable contents etc. are changed to return the number
of characters written to the output buffer. This information
was previously hidden in a field of the print_buf struct, and
the number of chars written was fetched with a call to
tipc_printbuf_validate. This function is removed since it
is no longer referenced nor needed.

A generic max size ULTRA_STRING_MAX_LEN is defined, named
in keeping with the existing TIPC_TLV_ULTRA_STRING, and the
various definitions in port, link and nametable code that
largely duplicated this information are removed. This means
that amount of link statistics that can be returned is now
increased from 2k to 32k.

The buffer overflow check is now done just before the reply
message is passed over netlink or TIPC to a remote node and
the message indicating a truncated buffer is changed to a less
dramatic one (less CAPS), placed at the end of the message.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 2cf8aa19 28-Jun-2012 Erik Hugne <erik.hugne@ericsson.com>

tipc: use standard printk shortcut macros (pr_err etc.)

All messages should go directly to the kernel log. The TIPC
specific error, warning, info and debug trace macro's are
removed and all references replaced with pr_err, pr_warn,
pr_info and pr_debug.

Commonly used sub-strings are explicitly declared as a const
char to reduce .text size.

Note that this means the debug messages (changed to pr_debug),
are now enabled through dynamic debugging, instead of a TIPC
specific Kconfig option (TIPC_DEBUG). The latter will be
phased out completely

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: use pr_fmt as suggested by Joe Perches <joe@perches.com>]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 2c53040f 10-Jul-2012 Ben Hutchings <bhutchings@solarflare.com>

net: Fix (nearly-)kernel-doc comments for various functions

Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 617d3c7a 30-Apr-2012 Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: compress out gratuitous extra carriage returns

Some of the comment blocks are floating in limbo between two
functions, or between blocks of code. Delete the extra line
feeds between any comment and its associated following block
of code, to be consistent with the majority of the rest of
the kernel. Also delete trailing newlines at EOF and fix
a couple trivial typos in existing comments.

This is a 100% cosmetic change with no runtime impact. We get
rid of over 500 lines of non-code, and being blank line deletes,
they won't even show up as noise in git blame.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 5f6d9123 04-Nov-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Eliminate trivial buffer manipulation helper routines

Gets rid of two inlined routines that simply call existing sk_buff
manipulation routines, since there is no longer any extra processing
done by the helper routines.

Note that these changes are essentially cosmetic in nature, and have
no impact on the actual operation of TIPC.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 63e7f1ac 27-Oct-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Prevent loss of fragmented messages over broadcast link

Modifies broadcast link so that an incoming fragmented message is not
lost if reassembly cannot begin because there currently is no buffer
big enough to hold the entire reassembled message. The broadcast link
now ignores the first fragment completely, which causes the sending node
to retransmit the first fragment so that reassembly can be re-attempted.

Previously, the sender would have had no reason to retransmit the 1st
fragment, so we would never have a chance to re-try the allocation.

To do this cleanly without duplicaton, a new bclink_accept_pkt()
function is introduced.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 1ec2bb08 27-Oct-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Remove obsolete broadcast tag capability

Eliminates support for the broadcast tag field, which is no longer
used by broadcast link NACK messages.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 7a54d4a9 27-Oct-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Major redesign of broadcast link ACK/NACK algorithms

Completely redesigns broadcast link ACK and NACK mechanisms to prevent
spurious retransmit requests in dual LAN networks, and to prevent the
broadcast link from stalling due to the failure of a receiving node to
acknowledge receiving a broadcast message or request its retransmission.

Note: These changes only impact the timing of when ACK and NACK messages
are sent, and not the basic broadcast link protocol itself, so inter-
operability with nodes using the "classic" algorithms is maintained.

The revised algorithms are as follows:

1) An explicit ACK message is still sent after receiving 16 in-sequence
messages, and implicit ACK information continues to be carried in other
unicast link message headers (including link state messages). However,
the timing of explicit ACKs is now based on the receiving node's absolute
network address rather than its relative network address to ensure that
the failure of another node does not delay the ACK beyond its 16 message
target.

2) A NACK message is now typically sent only when a message gap persists
for two consecutive incoming link state messages; this ensures that a
suspected gap is not confirmed until both LANs in a dual LAN network have
had an opportunity to deliver the message, thereby preventing spurious NACKs.
A NACK message can also be generated by the arrival of a single link state
message, if the deferred queue is so big that the current message gap
cannot be the result of "normal" mis-ordering due to the use of dual LANs
(or one LAN using a bonded interface). Since link state messages typically
arrive at different nodes at different times the problem of multiple nodes
issuing identical NACKs simultaneously is inherently avoided.

3) Nodes continue to "peek" at NACK messages sent by other nodes. If
another node requests retransmission of a message gap suspected (but not
yet confirmed) by the peeking node, the peeking node forgets about the
gap and does not generate a duplicate retransmit request. (If the peeking
node subsequently fails to receive the lost message, later link state
messages will cause it to rediscover and confirm the gap and send another
NACK.)

4) Message gap "equality" is now determined by the start of the gap only.
This is sufficient to deal with the most common cases of message loss,
and eliminates the need for complex end of gap computations.

5) A peeking node no longer tries to determine whether it should send a
complementary NACK, since the most common cases of message loss don't
require it to be sent. Consequently, the node no longer examines the
"broadcast tag" field of a NACK message when peeking.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# b98158e3 26-Oct-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Add missing locks in broadcast link statistics accumulation

Ensures that all attempts to update broadcast link statistics are done
only while holding the lock that protects the link's main data structures,
to prevent interference by simultaneous updates caused by messages
arriving on other interfaces.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 0232c5a5 26-Oct-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Fix bug in broadcast link duplicate message statistics

Modifies broadcast link so that it increments the "received duplicate
message" count if an incoming message cannot be added to the deferred
message queue because it is already present in the queue. (The aligns
broadcast link behavior with that of TIPC's unicast links.)

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 8a275a6a 26-Oct-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Fix node lock reclamation issues in broadcast link reception

Fixes a pair of problems in broadcast link message reception code
relating to the reclamation of the node lock after consuming an
in-sequence message.

1) Now retests to see if the sending node is still up after reclaiming
the node lock, and bails out if it is non-operational.

2) Now manipulates the node's deferred message queue only after
reclaiming the node lock, rather than using queue head pointer
information that was cached previously.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 57732560 26-Oct-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Add missing broadcast link lock when sending NACK

Ensures that any attempt to send a NACK message over TIPC's broadcast
link has exclusive access to the link's main data structures, to prevent
interference with a simultaneous attempt to send other broadcast link
traffic (such as application-generated multicast messages).

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# a18c4bc3 29-Dec-2011 Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: rename struct link* to struct tipc_link*

This converts the following:

struct link -> struct tipc_link
struct link_req -> struct tipc_link_req
struct link_name -> struct tipc_link_name

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 7f9ab6ac 29-Dec-2011 Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: rename struct bcbearer* to tipc_bcbearer*

This changes both the struct bcbearer and struct bcbearer_pair to
have the "tipc_" prefix. Runtime behaviour is unchanged.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 6765fd67 29-Dec-2011 Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: rename struct bclink to struct tipc_bclink

Make this rename so that it is consistent with the majority
of the other tipc structs and to assist in removing any
ambiguity with other similar names in other subsystems.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 4584310b 29-Dec-2011 Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: rename struct port_list to struct tipc_port_list

Make this rename so that it is consistent with the majority
of the other tipc structs and to assist in removing any
ambiguity with other similar names in other subsystems.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 358a0d1c 29-Dec-2011 Paul Gortmaker <paul.gortmaker@windriver.com>

tipc: rename struct media to struct tipc_media

Give it a meaningful prefix, as suggested by DaveM, so that it
is consistent with things like struct tipc_bearer, and so it isn't
confused with anything else. This has no impact on the actual
runtime code behaviour.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# f905730c 24-Oct-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Allow use of buf_seqno() helper routine by unicast links

Migrates the buf_seqno() helper routine from broadcast link level to
unicast link level so that it can be used both types of TIPC links.
This is a cosmetic change only, and does not affect the operation of TIPC.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 36559591 24-Oct-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Ignore broadcast acknowledgements that are out-of-range

Adds checks to TIPC's broadcast link so that it ignores any
acknowledgement message containing a sequence number that does not
correspond to an unacknowledged message currently in the broadcast
link's transmit queue.

This change prevents the broadcast link from becoming stalled if a
newly booted node receives stale broadcast link acknowledgement
information from another node that has not yet fully synchronized
its end of the broadcast link to reflect the current state of the
new node's end.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 10745cd5 24-Oct-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Flush unsent broadcast messages when contact with last node is lost

Adds code to release any unsent broadcast messages in the broadcast link
transmit queue if TIPC loses contact with its only neighboring node.
Previously, a broadcast link that was in the congested state would hold
on to the unsent messages, even though the messages were now undeliverable.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 9157bafb 24-Oct-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Minor optimization of broadcast link transmit queue statistic

The two broadcast link statistics fields that are used to derive the
average length of that link's transmit queue are now updated only after
a successful attempt to send a broadcast message, since there is no need
to update these values when an unsuccessful send attempt leaves the
queue unchanged.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 2b78f9a0 24-Oct-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Handle broadcast attempt when no neighboring nodes exist

Adds a check to detect when an attempt is made to send a message
via the broadcast link and no neighboring nodes are currently available
to receive it. Rather than wasting effort passing the message to the
broadcast link and broadcast bearer, who will only throw it away,
TIPC now frees the message immediately and reports success (i.e. the
message has been delivered to all available destinations).

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# cd3decdf 24-Oct-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Ensure broadcast link spinlock is held when updating node map

Fixes oversight that allowed broadcast link node map to be updated without
first taking the broadcast link spinlock that protects the map. As part
of this fix the node map has been incorporated into the broadcast link
structure to make the need for such protection more evident.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# c47e9b91 24-Oct-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Eliminate dynamic allocation of broadcast link data structures

Creates global variables to hold the broadcast link's pseudo-bearer and
pseudo-link structures, rather than allocating them dynamically. There
is only a single instance of each structure, and changing over to static
allocation allows elimination of code to handle the cases where dynamic
allocation was unsuccessful.

The memset in the teardown code may look like they aren't used, but
the same teardown code is run when there is a non-fatal error at
init-time, so that stale data isn't present when the user fixes the
cause of the soft error.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 0f38513d 07-Apr-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Remove obsolete congestion handling when sending a broadcast NACK

Eliminates obsolete code that handles broadcast bearer congestion when
the broadast link sends a NACK message, since the broadcast pseudo-bearer
never becomes blocked.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 9f6bdcd4 07-Apr-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Discard incoming broadcast messages that are unexpected

Modifies TIPC's incoming broadcast packet handler to discard messages
that cannot legally be sent over the broadcast link, including:

- broadcast protocol messages that do no contain state information
- payload messages that are not named multicast messages
- any other form of message except for bundled messages, fragmented
messages, and name distribution messages.

These checks are needed to prevent TIPC from handing an unexpected
message to a routine that isn't prepared to handle it, which could
lead to incorrect processing (up to and including invalid memory
references caused by attempts to access message fields that aren't
present in the message).

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 693d03ae 07-Apr-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Remove deferred queue head caching during broadcast message reception

Modifies TIPC's incoming broadcast packet handler so that it no longer
pre-reads information about the deferred packet queue, since the cached
value is unreliable once the associated node lock has been released.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 5d3c488d 07-Apr-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Fix node lock problems during broadcast message reception

Modifies TIPC's incoming broadcast packet handler to ensure that the
node lock associated with the sender of the packet is held whenever
node-related data structure fields are accessed. The routine is also
restructured with a single exit point, making it easier to ensure
the node lock is properly released and the incoming packet is properly
disposed of.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 23f0ff90 07-Apr-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Remove non-executable code to handle broadcast bearer congestion

Eliminates code associated with the sending of unsent broadcast link
traffic when the broadcast pseudo-bearer becomes unblocked following a
temporary congestion situation. This code is non-executable because the
broadcast pseudo-bearer never becomes blocked [see tipc_bcbearer_send()].

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 2ff9f924 07-Apr-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Cosmetic changes to broadcast bearer send routine

Updates the comments in the broadcast bearer send routine to more
accurately describe the processing done by the routine. Also replaces
the improper use of a TIPC payload message error status symbol (in a place
that has nothing to do with such errors) with its numeric equivalent.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 2e2d9be8 07-Apr-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Update obsolete references to multicast link

Updates TIPC's broadcast link in a couple of places that were missed
during the transition from its former name ("multicast-link") to its
current name ("broadcast-link"). These changes are essentially cosmetic
and do not affect the overall operation of TIPC.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 5e726900 23-May-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Convert fatal broadcast sanity check to non-fatal check

Modifies the existing broadcast link sanity check that detects an
attempt to send a message off-node when there are no available
destinations so that it no longer causes a kernel panic; instead,
the check now issues a warning and stack trace and then returns
without sending the message anywhere.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# bebc55ae 19-Apr-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Fix sk_buff leaks when link congestion is detected

Modifies a TIPC send routine that did not discard the outgoing sk_buff
if it was not transmitted because of link congestion; this eliminates
the potential for buffer leakage in the many callers who did not clean up
the unsent buffer. (The two routines that previously did discard the unsent
buffer have been updated to eliminate their now-redundant clean up.)

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 7775bcc7 12-Apr-2011 Allan Stephens <allan.stephens@windriver.com>

tipc: Remove code to emulate loss of broadcast messages

Eliminates optional code used to test TIPC's ability to recover
from lost broadcast messages. This code duplicates functionality
already provided by the network stack's QoS option "network emulator".

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# bf781ecf 25-Jan-2011 Allan Stephens <Allan.Stephens@windriver.com>

tipc: Avoid reliable broadcast preparation for NACK messages

Enhance TIPC to skip unnecessary (and, in some cases, redundant)
preparation work when sending a broadcast link NACK message, since this
preparation is only required for broadcast messages that are sent in a
reliable manner. This change also fixes a bug that caused NACK messages
to be improperly counted as "TX packets" in TIPC's broadcast link
statistics.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 9bd80b60 18-Jan-2011 Allan Stephens <Allan.Stephens@windriver.com>

tipc: Improve accuracy of link transmit queue maximum size statistic

Enhances TIPC's unicast and broadcast link code to update the transmit
queue maximum size counter in a single place, namely the routine that
adds messages to the queue. This ensures that the maximum size statistic
reported for unicast links is completely accurate, rather than being
partially based on statistical sampling.

The changes to link.h are just documenting the roles of the variables.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 01d83edd 18-Jan-2011 Allan Stephens <Allan.Stephens@windriver.com>

tipc: Clean up tracking of node requesting a broadcast retransmit

Allows the broadcast link to track the node that is requesting a retransmit
in a new field dedicated to that purpose. This replaces the existing
mechanism that (ab)uses an existing node structure linked list field to do
the tracking.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# 2d627b92 07-Jan-2011 Allan Stephens <Allan.Stephens@windriver.com>

tipc: Combine bearer structure with tipc_bearer structure

Combines two distinct structures containing information about a TIPC bearer
into a single structure. The structures were previously kept separate so
that public information about a bearer could be made available to plug-in
media types using TIPC's native API, while the remaining information was
kept private for use by TIPC itself. However, now that the native API has
been removed there is no longer any need for this arrangement.

Since one of the structures was already embedded within the other, the
change largely involves replacing instances of "publ.foo" with "foo".
The changes do not otherwise alter the operation of TIPC bearers.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# a016892c 31-Dec-2010 Allan Stephens <Allan.Stephens@windriver.com>

tipc: remove extraneous braces from single statements

Cleans up TIPC's source code to eliminate the presence of unnecessary
use of {} around single statements.

These changes are purely cosmetic and do not alter the operation of TIPC
in any way.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# e3ec9c7d 31-Dec-2010 Allan Stephens <Allan.Stephens@windriver.com>

tipc: remove zeroing assignments to static global variables

Cleans up TIPC's source code to eliminate the needless initialization
of static variables to zero.

These changes are purely cosmetic and do not alter the operation of TIPC
in any way.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0e65967e 31-Dec-2010 Allan Stephens <Allan.Stephens@windriver.com>

tipc: cleanup various cosmetic whitespace issues

Cleans up TIPC's source code to eliminate deviations from generally
accepted coding conventions relating to leading/trailing white space
and white space around commas, braces, cases, and sizeof.

These changes are purely cosmetic and do not alter the operation of TIPC
in any way.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 886ef52a 31-Dec-2010 Allan Stephens <Allan.Stephens@windriver.com>

tipc: remove redundant #includes

Eliminates a number of #include statements that no longer serve any
useful purpose.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8d64a5ba 31-Dec-2010 Allan Stephens <Allan.Stephens@windriver.com>

tipc: Prune down link-specific debugging code

Eliminates most link-specific debugging code in TIPC, which is now
largely unnecessary. All calls to the link-specific debugging macros
have been removed, as are the macros themselves; in addition, the optional
allocation of print buffers to hold debugging information for each link
endpoint has been removed. The ability for TIPC to print out helpful
diagnostic information when link retransmit failures occur has been
retained for the time being, as an aid in tracking down the cause of
such failures.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b29f1428 31-Dec-2010 Allan Stephens <Allan.Stephens@windriver.com>

tipc: remove calls to dbg() and msg_dbg()

Eliminates obsolete calls to two of TIPC's main debugging macros, as well
as a pair of associated debugging routines that are no longer required.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8f92df6a 31-Dec-2010 Allan Stephens <Allan.Stephens@windriver.com>

tipc: Remove prototype code for supporting multiple clusters

Eliminates routines, data structures, and files that were intended
to allow TIPC to support a network containing multiple clusters.
Currently, TIPC supports only networks consisting of a single cluster
within a single zone, so this code is unnecessary.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c8026282 29-Nov-2010 Allan Stephens <allan.stephens@windriver.com>

tipc: Remove obsolete inclusions of header files

Gets rid of #include statements that are no longer required as a
result of the merging of obsolete native API header file content
into other TIPC include files.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d265fef6 29-Nov-2010 Allan Stephens <allan.stephens@windriver.com>

tipc: Remove obsolete native API files and exports

As part of the removal of TIPC's native API support it is no longer
necessary for TIPC to export symbols for routines that can be called
by kernel-based applications, nor for it to have header files that
kernel-based applications can include to access the declarations for
those routines. This commit eliminates the exporting of symbols by
TIPC and migrates the contents of each obsolete native API include
file into its corresponding non-native API equivalent.

The code which was migrated in this commit was migrated intact, in
that there are no technical changes combined with the relocation.

Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 31e3c3f6 13-Oct-2010 stephen hemminger <shemminger@vyatta.com>

tipc: cleanup function namespace

Do some cleanups of TIPC based on make namespacecheck
1. Don't export unused symbols
2. Eliminate dead code
3. Make functions and variables local
4. Rename buf_acquire to tipc_buf_acquire since it is used in several files

Compile tested only.
This make break out of tree kernel modules that depend on TIPC routines.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a02cec21 22-Sep-2010 Eric Dumazet <eric.dumazet@gmail.com>

net: return operator cleanup

Change "return (EXPR);" to "return EXPR;"

return is not a function, parentheses are not required.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 0048b826 17-Aug-2010 Allan Stephens <allan.stephens@windriver.com>

tipc: Fix bug in broadcast link transmit statistics computation

Modify TIPC's broadcast link so that it counts each piece of a
fragmented message individually, rather than as treating the group
as a single message. This ensures that proper correlation of sent
and received traffic can be done when the broadcast link statistics
are displayed, and is consistent with the way fragments are counted
by TIPC's unicast links.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5b1f7bde 17-Aug-2010 Allan Stephens <allan.stephens@windriver.com>

tipc: Fix premature broadcast advertisement by sending node

Prevent a TIPC node from sending out a LINK_STATE message
advertising a broadcast message that it is in the process
of sending, but has not yet actually sent. Previously, it was
possible for a link timeout to occur in between the time the
broadcast link updated its "last message sent" counter and the
time the broadcast message was passed to the broadcast bearer
for transmission. This ensures that the code which issues
the LINK_STATE message isn't informed of the new message until
the broadcast bearer has had a chance to send it.

Note: The "last message sent" value is stored in the "fsm_msg_count"
field of the link structure used by the broadcast link. Since the
broadcast link doesn't utilize the normal link FSM, this field can
be re-used rather than adding a new field to the broadcast link.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 7e3e5d09 17-Aug-2010 Allan Stephens <allan.stephens@windriver.com>

tipc: Prevent crash when broadcast link cannot send to all nodes

Allow TIPC's broadcast link to continue operation when it is unable
to send a message to all nodes in the cluster. Previously, the
broadcast link attempted to put the broadcast pseudo-bearer into a
blocked state; however, this caused a crash because the associated
bearer structure is only partially initialized. Further
investigation has revealed some conceptual problems with blocking
the pseudo-bearer; consequently, this functionality has been
disabled for the time being and the undelivered message is
eventually resent by the broadcast link's existing message
retransmission mechanism (if possible).

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 43608edc 11-May-2010 Allan Stephens <allan.stephens@windriver.com>

tipc: Reduce footprint by un-inlining port list routines

Converts port list inline routines that are more than one line into
standard functions, thereby eliminating a significant amount of
repeated code.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3e22e62b 11-May-2010 Allan Stephens <allan.stephens@windriver.com>

tipc: Reduce footprint by un-inlining nmap routines

Converts nmap inline routines that are more than one line into standard
functions, thereby eliminating a significant amount of repeated code.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c68ca7b7 11-May-2010 Allan Stephens <allan.stephens@windriver.com>

tipc: add tipc_ prefix to fcns targeted for un-inlining

These functions have enough code in them such that they
seem like sensible targets for un-inlining. Prior to doing
that, this adds the tipc_ prefix to the functions, so that
in the event of a panic dump or similar, the subsystem from
which the functions come from is immediately clear.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3aec9cc9 11-May-2010 Allan Stephens <allan.stephens@windriver.com>

tipc: Rename "multicast-link" to "broadcast-link"

Make a cosmetic change to the name displayed for the broadcast link,
to better reflect its true nature. Since TIPC utilizes this link to
distribute name table information, in addition to multicast messages
sent by user applications, the prior name "multicast-link" is
no longer appropriate.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1a624832 15-Mar-2010 Neil Horman <nhorman@tuxdriver.com>

tipc: Increase frequency of load distribution over broadcast link

Forward port commit 29eb572941501c40ac6e62dbc5043bf9ee76ee56
from git://tipc.cslab.ericsson.net/pub/git/people/allan/tipc.git

Origional commit message:
Increase frequency of load distribution over broadcast link

This patch enhances the behavior of TIPC's broadcast link so that it
alternates between redundant bearers (if available) after every
message sent, rather than after every 10 messages. This change helps
to speed up delivery of retransmitted messages by ensuring that
they are not sent repeatedly over a bearer that is no longer working,
but not yet recognized as failed.

Tested by myself in the latest net-2.6 tree using the tipc sanity test suite

Origionally-signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

bcast.c | 35 ++++++++++++++---------------------
1 file changed, 14 insertions(+), 21 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4b704d59 18-Mar-2009 Stephen Hemminger <shemminger@vyatta.com>

tipc: fix non-const printf format arguments

Fix warnings from current gcc about using non-const strings as printf
args in TIPC. Compile tested only (not a TIPC user).

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 6c00055a 03-Sep-2008 David S. Miller <davem@davemloft.net>

tipc: Don't use structure names which easily globally conflict.

Andrew Morton reported a build failure on sparc32, because TIPC
uses names like "struct node" and there is a like named data
structure defined in linux/node.h

This just regexp replaces "struct node*" to "struct tipc_node*"
to avoid this and any future similar problems.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 0e35fd5e 14-Jul-2008 Allan Stephens <allan.stephens@windriver.com>

tipc: Eliminate improper use of TIPC_OK error code

This patch corrects many places where TIPC routines indicated
successful completion by returning TIPC_OK instead of 0.
(The TIPC_OK symbol has the value 0, but it should only be used
in contexts that deal with the error code field of a TIPC
message header.)

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 40aecb1b 04-Jun-2008 Allan Stephens <allan.stephens@windriver.com>

tipc: Message rejection rework preparatory changes

This patch defines a few new message header manipulation routines,
and generalizes the usefulness of another, in preparation for upcoming
rework of TIPC's message rejection code.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 75715217 04-Jun-2008 Allan Stephens <allan.stephens@windriver.com>

tipc: Optimize message initialization routine

This patch eliminates the rarely-used "error code" argument
when initializing a TIPC message header, since the default
value of zero is the desired result in most cases; the few
exceptional cases now set the error code explicitly.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c4307285 09-Feb-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

[NET] TIPC: Fix whitespace errors.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 2710b57f 20-Nov-2006 Arnaldo Carvalho de Melo <acme@mandriva.com>

[TIPC]: Use kzalloc where appropriate

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>


# 34af946a 27-Jun-2006 Ingo Molnar <mingo@elte.hu>

[PATCH] spin/rwlock init cleanups

locking init cleanups:

- convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK()
- convert rwlocks in a similar manner

this patch was generated automatically.

Motivation:

- cleanliness
- lockdep needs control of lock initialization, which the open-coded
variants do not give
- it's also useful for -rt and for lock debugging in general

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 65f51ef0 26-Jun-2006 Allan Stephens <allan.stephens@windriver.com>

[TIPC]: Get rid of dynamically allocated arrays in broadcast code.

This change improves an earlier change which replaced the large local
variable arrays used during broadcasting with dynamically allocated arrays.
The temporary arrays are now incoprorated into the multicast link data
structure.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a10bd924 26-Jun-2006 Allan Stephens <allan.stephens@windriver.com>

[TIPC]: Enhanced & cleaned up system messages; fixed 2 obscure memory leaks.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3ac90216 26-Jun-2006 Eric Sesterhenn <snakebyte@gmx.de>

[TIPC] Fix for NULL pointer dereference

This fixes a bug spotted by the coverity checker, bug id #366. If
(mod(seqno - prev) != 1) we set buf to NULL, dereference it in the for
case, and set it to whatever value happes to be at adress 0+next, if it
happens to be non-zero, we even stay in the loop. It seems that the author
intended to break there.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d356eeba 26-Jun-2006 Allan Stephens <allan.stephens@windriver.com>

[TIPC]: Multicast link failure now resets all links to "nacking" node.

This fix prevents node from crashing.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# de70c5ba 20-Mar-2006 Per Liden <per.liden@ericsson.com>

[TIPC]: Reduce stack usage

The node_map struct can be quite large (516 bytes) and allocating two of
them on the stack is not a good idea since we might only have a 4K stack
to start with.

Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 988f088a 20-Mar-2006 Adrian Bunk <bunk@stusta.de>

[TIPC]: Cleanups

This patch contains the following possible cleanups:
- make needlessly global code static
- #if 0 the following unused global functions:
- name_table.c: tipc_nametbl_print()
- name_table.c: tipc_nametbl_dump()
- net.c: tipc_net_next_node()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 05790c64 20-Mar-2006 Sam Ravnborg <sam@ravnborg.org>

[TIPC]: Remove inlines from *.c

With reference to latest discussions on linux-kernel with respect to
inline here is a patch for tipc to remove all inlines as used in
the .c files. See also chapter 14 in Documentation/CodingStyle.

Before:
text data bss dec hex filename
102990 5292 1752 110034 1add2 tipc.o

Now:
text data bss dec hex filename
101190 5292 1752 108234 1a6ca tipc.o

This is a nice text size reduction which will improve icache usage.
In some cases bigger (> 4 lines) functions where declared inline
and used in many places, they are most probarly no longer inlined by gcc
resulting in the size reduction.
There are several one liners that no longer are declared inline, but gcc
should inline these just fine without the inline hint.

With this patch applied one warning is added about an unused static
function - that was hidded by utilising inline before.
The function in question were kept so this patch is solely a
inline removal patch.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1fc54d8f 20-Mar-2006 Sam Ravnborg <sam@ravnborg.org>

[TIPC]: Fix simple sparse warnings

Tried to run the new tipc stack through sparse.
Following patch fixes all cases where 0 was used
as replacement of NULL.
Use NULL to document this is a pointer and to silence sparse.

This brough sparse warning count down with 127 to 24 warnings.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 4323add6 17-Jan-2006 Per Liden <per.liden@ericsson.com>

[TIPC] Avoid polluting the global namespace

This patch adds a tipc_ prefix to all externally visible symbols.

Signed-off-by: Per Liden <per.liden@ericsson.com>


# 16cb4b33 13-Jan-2006 Per Liden <per.liden@nospam.ericsson.com>

[TIPC] Updated link priority macros

Added macros for min/default/max link priority in tipc_config.h.
Also renamed TIPC_NUM_LINK_PRI to TIPC_MEDIA_LINK_PRI since that
is a more accurate description of what it is used for.

Signed-off-by: Per Liden <per.liden@ericsson.com>


# 880b005f 12-Jan-2006 David S. Miller <davem@sunset.davemloft.net>

[TIPC]: Fix 64-bit build warnings.

When storing u32 values in a pointer, need to do
some long casts to keep GCC happy.

Signed-off-by: David S. Miller <davem@davemloft.net>


# 593a5f22 11-Jan-2006 Per Liden <per.liden@nospam.ericsson.com>

[TIPC] More updates of file headers

Updated copyright notice to include the year the file was
actually created. Information about file creation dates
was extracted from the files in the old CVS repository
at tipc.sourceforge.net.

Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>


# 9da1c8b6 11-Jan-2006 Per Liden <per.liden@nospam.ericsson.com>

[TIPC] Update of file headers

The copyright statements from different parts of Ericsson
have been merged into one.

Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>


# 9ea1fd3c 11-Jan-2006 Per Liden <per.liden@nospam.ericsson.com>

[TIPC] License header update

The license header in each file now more clearly state that this
code is licensed under a dual BSD/GPL. Before this was only
evident if you looked at the MODULE_LICENSE line in core.c.

Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>


# b97bf3fd 02-Jan-2006 Per Liden <per.liden@nospam.ericsson.com>

[TIPC] Initial merge

TIPC (Transparent Inter Process Communication) is a protocol designed for
intra cluster communication. For more information see
http://tipc.sourceforge.net

Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>