#
5e4eca5d |
|
26-Jan-2022 |
Jakub Kicinski <kuba@kernel.org> |
net: tipc: remove unused static inlines IIUC the TIPC msg helpers are not meant to provide and exhaustive API, so remove the unused ones. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0c6de0c9 |
|
28-Jun-2021 |
Menglong Dong <dong.menglong@zte.com.cn> |
net: tipc: fix FB_MTU eat two pages FB_MTU is used in 'tipc_msg_build()' to alloc smaller skb when memory allocation fails, which can avoid unnecessary sending failures. The value of FB_MTU now is 3744, and the data size will be: (3744 + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) + \ SKB_DATA_ALIGN(BUF_HEADROOM + BUF_TAILROOM + 3)) which is larger than one page(4096), and two pages will be allocated. To avoid it, replace '3744' with a calculation: (PAGE_SIZE - SKB_DATA_ALIGN(BUF_OVERHEAD) - \ SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) What's more, alloc_skb_fclone() will call SKB_DATA_ALIGN for data size, and it's not necessary to make alignment for buf_size in tipc_buf_acquire(). So, just remove it. Fixes: 4c94cc2d3d57 ("tipc: fall back to smaller MTU if allocation of local send skb fails") Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1ef6f7c9 |
|
17-Sep-2020 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: add automatic session key exchange With support from the master key option in the previous commit, it becomes easy to make frequent updates/exchanges of session keys between authenticated cluster nodes. Basically, there are two situations where the key exchange will take in place: - When a new node joins the cluster (with the master key), it will need to get its peer's TX key, so that be able to decrypt further messages from that peer. - When a new session key is generated (by either user manual setting or later automatic rekeying feature), the key will be distributed to all peer nodes in the cluster. A key to be exchanged is encapsulated in the data part of a 'MSG_CRYPTO /KEY_DISTR_MSG' TIPC v2 message, then xmit-ed as usual and encrypted by using the master key before sending out. Upon receipt of the message it will be decrypted in the same way as regular messages, then attached as the sender's RX key in the receiver node. In this way, the key exchange is reliable by the link layer, as well as security, integrity and authenticity by the crypto layer. Also, the forward security will be easily achieved by user changing the master key actively but this should not be required very frequently. The key exchange feature is independent on the presence of a master key Note however that the master key still is needed for new nodes to be able to join the cluster. It is also optional, and can be turned off/on via the sysfs: 'net/tipc/key_exchange_enabled' [default 1: enabled]. Backward compatibility is guaranteed because for nodes that do not have master key support, key exchange using master key ie. tx_key = 0 if any will be shortly discarded at the message validation step. In other words, the key exchange feature will be automatically disabled to those nodes. v2: fix the "implicit declaration of function 'tipc_crypto_key_flush'" error in node.c. The function only exists when built with the TIPC "CONFIG_TIPC_CRYPTO" option. v3: use 'info->extack' for a message emitted due to netlink operations instead (- David's comment). Reported-by: kernel test robot <lkp@intel.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
daef1ee3 |
|
17-Sep-2020 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: introduce encryption master key In addition to the supported cluster & per-node encryption keys for the en/decryption of TIPC messages, we now introduce one option for user to set a cluster key as 'master key', which is simply a symmetric key like the former but has a longer life cycle. It has two purposes: - Authentication of new member nodes in the cluster. New nodes, having no knowledge of current session keys in the cluster will still be able to join the cluster as long as they know the master key. This is because all neighbor discovery (LINK_CONFIG) messages must be encrypted with this key. - Encryption of session encryption keys during automatic exchange and update of those.This is a feature we will introduce in a later commit in this series. We insert the new key into the currently unused slot 0 in the key array and start using it immediately once the user has set it. After joining, a node only knowing the master key should be fully communicable to existing nodes in the cluster, although those nodes may have their own session keys activated (i.e. not the master one). To support this, we define a 'grace period', starting from the time a node itself reports having no RX keys, so the existing nodes will use the master key for encryption instead. The grace period can be extended but will automatically stop after e.g. 5 seconds without a new report. This is also the basis for later key exchanging feature as the new node will be impossible to decrypt anything without the support from master key. For user to set a master key, we define a new netlink flag - 'TIPC_NLA_NODE_KEY_MASTER', so it can be added to the current 'set key' netlink command to specify the setting key to be a master key. Above all, the traditional cluster/per-node key mechanism is guaranteed to work when user comes not to use this master key option. This is also compatible to legacy nodes without the feature supported. Even this master key can be updated without any interruption of cluster connectivity but is so is needed, this has to be coordinated and set by the user. Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e034c6d2 |
|
18-Jun-2020 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
tipc: Use struct_size() helper Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. This code was detected with the help of Coccinelle and, audited and fixed manually. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cad2929d |
|
17-Jun-2020 |
Hoang Huu Le <hoang.h.le@dektech.com.au> |
tipc: update a binding service via broadcast Currently, updating binding table (add service binding to name table/withdraw a service binding) is being sent over replicast. However, if we are scaling up clusters to > 100 nodes/containers this method is less affection because of looping through nodes in a cluster one by one. It is worth to use broadcast to update a binding service. This way, the binding table can be updated on all peer nodes in one shot. Broadcast is used when all peer nodes, as indicated by a new capability flag TIPC_NAMED_BCAST, support reception of this message type. Four problems need to be considered when introducing this feature. 1) When establishing a link to a new peer node we still update this by a unicast 'bulk' update. This may lead to race conditions, where a later broadcast publication/withdrawal bypass the 'bulk', resulting in disordered publications, or even that a withdrawal may arrive before the corresponding publication. We solve this by adding an 'is_last_bulk' bit in the last bulk messages so that it can be distinguished from all other messages. Only when this message has arrived do we open up for reception of broadcast publications/withdrawals. 2) When a first legacy node is added to the cluster all distribution will switch over to use the legacy 'replicast' method, while the opposite happens when the last legacy node leaves the cluster. This entails another risk of message disordering that has to be handled. We solve this by adding a sequence number to the broadcast/replicast messages, so that disordering can be discovered and corrected. Note however that we don't need to consider potential message loss or duplication at this protocol level. 3) Bulk messages don't contain any sequence numbers, and will always arrive in order. Hence we must exempt those from the sequence number control and deliver them unconditionally. We solve this by adding a new 'is_bulk' bit in those messages so that they can be recognized. 4) Legacy messages, which don't contain any new bits or sequence numbers, but neither can arrive out of order, also need to be exempt from the initial synchronization and sequence number check, and delivered unconditionally. Therefore, we add another 'is_not_legacy' bit to all new messages so that those can be distinguished from legacy messages and the latter delivered directly. v1->v2: - fix warning issue reported by kbuild test robot <lkp@intel.com> - add santiy check to drop the publication message with a sequence number that is lower than the agreed synch point Signed-off-by: kernel test robot <lkp@intel.com> Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
0a3e060f |
|
26-May-2020 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: add test for Nagle algorithm effectiveness When streaming in Nagle mode, we try to bundle small messages from user as many as possible if there is one outstanding buffer, i.e. not ACK-ed by the receiving side, which helps boost up the overall throughput. So, the algorithm's effectiveness really depends on when Nagle ACK comes or what the specific network latency (RTT) is, compared to the user's message sending rate. In a bad case, the user's sending rate is low or the network latency is small, there will not be many bundles, so making a Nagle ACK or waiting for it is not meaningful. For example: a user sends its messages every 100ms and the RTT is 50ms, then for each messages, we require one Nagle ACK but then there is only one user message sent without any bundles. In a better case, even if we have a few bundles (e.g. the RTT = 300ms), but now the user sends messages in medium size, then there will not be any difference at all, that says 3 x 1000-byte data messages if bundled will still result in 3 bundles with MTU = 1500. When Nagle is ineffective, the delay in user message sending is clearly wasted instead of sending directly. Besides, adding Nagle ACKs will consume some processor load on both the sending and receiving sides. This commit adds a test on the effectiveness of the Nagle algorithm for an individual connection in the network on which it actually runs. Particularly, upon receipt of a Nagle ACK we will compare the number of bundles in the backlog queue to the number of user messages which would be sent directly without Nagle. If the ratio is good (e.g. >= 2), Nagle mode will be kept for further message sending. Otherwise, we will leave Nagle and put a 'penalty' on the connection, so it will have to spend more 'one-way' messages before being able to re-enter Nagle. In addition, the 'ack-required' bit is only set when really needed that the number of Nagle ACKs will be reduced during Nagle mode. Testing with benchmark showed that with the patch, there was not much difference in throughput for small messages since the tool continuously sends messages without a break, so Nagle would still take in effect. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
03b6fefd |
|
26-May-2020 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: add support for broadcast rcv stats dumping This commit enables dumping the statistics of a broadcast-receiver link like the traditional 'broadcast-link' one (which is for broadcast- sender). The link dumping can be triggered via netlink (e.g. the iproute2/tipc tool) by the link flag - 'TIPC_NLA_LINK_BROADCAST' as the indicator. The name of a broadcast-receiver link of a specific peer will be in the format: 'broadcast-link:<peer-id>'. For example: Link <broadcast-link:1001002> Window:50 packets RX packets:7841 fragments:2408/440 bundles:0/0 TX packets:0 fragments:0/0 bundles:0/0 RX naks:0 defs:124 dups:0 TX naks:21 acks:0 retrans:0 Congestion link:0 Send queue max:0 avg:0 In addition, the broadcast-receiver link statistics can be reset in the usual way via netlink by specifying that link name in command. Note: the 'tipc_link_name_ext()' is removed because the link name can now be retrieved simply via the 'l->name'. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d7626b5a |
|
26-May-2020 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: introduce Gap ACK blocks for broadcast link As achieved through commit 9195948fbf34 ("tipc: improve TIPC throughput by Gap ACK blocks"), we apply the same mechanism for the broadcast link as well. The 'Gap ACK blocks' data field in a 'PROTOCOL/STATE_MSG' will consist of two parts built for both the broadcast and unicast types: 31 16 15 0 +-------------+-------------+-------------+-------------+ | bgack_cnt | ugack_cnt | len | +-------------+-------------+-------------+-------------+ - | gap | ack | | +-------------+-------------+-------------+-------------+ > bc gacks : : : | +-------------+-------------+-------------+-------------+ - | gap | ack | | +-------------+-------------+-------------+-------------+ > uc gacks : : : | +-------------+-------------+-------------+-------------+ - which is "automatically" backward-compatible. We also increase the max number of Gap ACK blocks to 128, allowing upto 64 blocks per type (total buffer size = 516 bytes). Besides, the 'tipc_link_advance_transmq()' function is refactored which is applicable for both the unicast and broadcast cases now, so some old functions can be removed and the code is optimized. With the patch, TIPC broadcast is more robust regardless of packet loss or disorder, latency, ... in the underlying network. Its performance is boost up significantly. For example, experiment with a 5% packet loss rate results: $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 real 0m 42.46s user 0m 1.16s sys 0m 17.67s Without the patch: $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 real 8m 27.94s user 0m 0.55s sys 0m 2.38s Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8b1e5b0a |
|
25-Mar-2020 |
Hoang Le <hoang.h.le@dektech.com.au> |
tipc: Add a missing case of TIPC_DIRECT_MSG type In the commit f73b12812a3d ("tipc: improve throughput between nodes in netns"), we're missing a check to handle TIPC_DIRECT_MSG type, it's still using old sending mechanism for this message type. So, throughput improvement is not significant as expected. Besides that, when sending a large message with that type, we're also handle wrong receiving queue, it should be enqueued in socket receiving instead of multicast messages. Fix this by adding the missing case for TIPC_DIRECT_MSG. Fixes: f73b12812a3d ("tipc: improve throughput between nodes in netns") Reported-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fc1b6d6d |
|
07-Nov-2019 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: introduce TIPC encryption & authentication This commit offers an option to encrypt and authenticate all messaging, including the neighbor discovery messages. The currently most advanced algorithm supported is the AEAD AES-GCM (like IPSec or TLS). All encryption/decryption is done at the bearer layer, just before leaving or after entering TIPC. Supported features: - Encryption & authentication of all TIPC messages (header + data); - Two symmetric-key modes: Cluster and Per-node; - Automatic key switching; - Key-expired revoking (sequence number wrapped); - Lock-free encryption/decryption (RCU); - Asynchronous crypto, Intel AES-NI supported; - Multiple cipher transforms; - Logs & statistics; Two key modes: - Cluster key mode: One single key is used for both TX & RX in all nodes in the cluster. - Per-node key mode: Each nodes in the cluster has one specific TX key. For RX, a node requires its peers' TX key to be able to decrypt the messages from those peers. Key setting from user-space is performed via netlink by a user program (e.g. the iproute2 'tipc' tool). Internal key state machine: Attach Align(RX) +-+ +-+ | V | V +---------+ Attach +---------+ | IDLE |---------------->| PENDING |(user = 0) +---------+ +---------+ A A Switch| A | | | | | | Free(switch/revoked) | | (Free)| +----------------------+ | |Timeout | (TX) | | |(RX) | | | | | | v | +---------+ Switch +---------+ | PASSIVE |<----------------| ACTIVE | +---------+ (RX) +---------+ (user = 1) (user >= 1) The number of TFMs is 10 by default and can be changed via the procfs 'net/tipc/max_tfms'. At this moment, as for simplicity, this file is also used to print the crypto statistics at runtime: echo 0xfff1 > /proc/sys/net/tipc/max_tfms The patch defines a new TIPC version (v7) for the encryption message (- backward compatibility as well). The message is basically encapsulated as follows: +----------------------------------------------------------+ | TIPCv7 encryption | Original TIPCv2 | Authentication | | header | packet (encrypted) | Tag | +----------------------------------------------------------+ The throughput is about ~40% for small messages (compared with non- encryption) and ~9% for large messages. With the support from hardware crypto i.e. the Intel AES-NI CPU instructions, the throughput increases upto ~85% for small messages and ~55% for large messages. By default, the new feature is inactive (i.e. no encryption) until user sets a key for TIPC. There is however also a new option - "TIPC_CRYPTO" in the kernel configuration to enable/disable the new code when needed. MAINTAINERS | add two new files 'crypto.h' & 'crypto.c' in tipc Acked-by: Ying Xue <ying.xue@windreiver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
06e7c70c |
|
31-Oct-2019 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: improve message bundling algorithm As mentioned in commit e95584a889e1 ("tipc: fix unlimited bundling of small messages"), the current message bundling algorithm is inefficient that can generate bundles of only one payload message, that causes unnecessary overheads for both the sender and receiver. This commit re-designs the 'tipc_msg_make_bundle()' function (now named as 'tipc_msg_try_bundle()'), so that when a message comes at the first place, we will just check & keep a reference to it if the message is suitable for bundling. The message buffer will be put into the link backlog queue and processed as normal. Later on, when another one comes we will make a bundle with the first message if possible and so on... This way, a bundle if really needed will always consist of at least two payload messages. Otherwise, we let the first buffer go its way without any need of bundling, so reduce the overheads to zero. Moreover, since now we have both the messages in hand, we can even optimize the 'tipc_msg_bundle()' function, make bundle of a very large (size ~ MSS) and small messages which is not with the current algorithm e.g. [1400-byte message] + [10-byte message] (MTU = 1500). Acked-by: Ying Xue <ying.xue@windreiver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c0bceb97 |
|
30-Oct-2019 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: add smart nagle feature We introduce a feature that works like a combination of TCP_NAGLE and TCP_CORK, but without some of the weaknesses of those. In particular, we will not observe long delivery delays because of delayed acks, since the algorithm itself decides if and when acks are to be sent from the receiving peer. - The nagle property as such is determined by manipulating a new 'maxnagle' field in struct tipc_sock. If certain conditions are met, 'maxnagle' will define max size of the messages which can be bundled. If it is set to zero no messages are ever bundled, implying that the nagle property is disabled. - A socket with the nagle property enabled enters nagle mode when more than 4 messages have been sent out without receiving any data message from the peer. - A socket leaves nagle mode whenever it receives a data message from the peer. In nagle mode, messages smaller than 'maxnagle' are accumulated in the socket write queue. The last buffer in the queue is marked with a new 'ack_required' bit, which forces the receiving peer to send a CONN_ACK message back to the sender upon reception. The accumulated contents of the write queue is transmitted when one of the following events or conditions occur. - A CONN_ACK message is received from the peer. - A data message is received from the peer. - A SOCK_WAKEUP pseudo message is received from the link level. - The write queue contains more than 64 1k blocks of data. - The connection is being shut down. - There is no CONN_ACK message to expect. I.e., there is currently no outstanding message where the 'ack_required' bit was set. As a consequence, the first message added after we enter nagle mode is always sent directly with this bit set. This new feature gives a 50-100% improvement of throughput for small (i.e., less than MTU size) messages, while it might add up to one RTT to latency time when the socket is in nagle mode. Acked-by: Ying Xue <ying.xue@windreiver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f73b1281 |
|
28-Oct-2019 |
Hoang Le <hoang.h.le@dektech.com.au> |
tipc: improve throughput between nodes in netns Currently, TIPC transports intra-node user data messages directly socket to socket, hence shortcutting all the lower layers of the communication stack. This gives TIPC very good intra node performance, both regarding throughput and latency. We now introduce a similar mechanism for TIPC data traffic across network namespaces located in the same kernel. On the send path, the call chain is as always accompanied by the sending node's network name space pointer. However, once we have reliably established that the receiving node is represented by a namespace on the same host, we just replace the namespace pointer with the receiving node/namespace's ditto, and follow the regular socket receive patch though the receiving node. This technique gives us a throughput similar to the node internal throughput, several times larger than if we let the traffic go though the full network stacks. As a comparison, max throughput for 64k messages is four times larger than TCP throughput for the same type of traffic. To meet any security concerns, the following should be noted. - All nodes joining a cluster are supposed to have been be certified and authenticated by mechanisms outside TIPC. This is no different for nodes/namespaces on the same host; they have to auto discover each other using the attached interfaces, and establish links which are supervised via the regular link monitoring mechanism. Hence, a kernel local node has no other way to join a cluster than any other node, and have to obey to policies set in the IP or device layers of the stack. - Only when a sender has established with 100% certainty that the peer node is located in a kernel local namespace does it choose to let user data messages, and only those, take the crossover path to the receiving node/namespace. - If the receiving node/namespace is removed, its namespace pointer is invalidated at all peer nodes, and their neighbor link monitoring will eventually note that this node is gone. - To ensure the "100% certainty" criteria, and prevent any possible spoofing, received discovery messages must contain a proof that the sender knows a common secret. We use the hash mix of the sending node/namespace for this purpose, since it can be accessed directly by all other namespaces in the kernel. Upon reception of a discovery message, the receiver checks this proof against all the local namespaces'hash_mix:es. If it finds a match, that, along with a matching node id and cluster id, this is deemed sufficient proof that the peer node in question is in a local namespace, and a wormhole can be opened. - We should also consider that TIPC is intended to be a cluster local IPC mechanism (just like e.g. UNIX sockets) rather than a network protocol, and hence we think it can justified to allow it to shortcut the lower protocol layers. Regarding traceability, we should notice that since commit 6c9081a3915d ("tipc: add loopback device tracking") it is possible to follow the node internal packet flow by just activating tcpdump on the loopback interface. This will be true even for this mechanism; by activating tcpdump on the involved nodes' loopback interfaces their inter-name space messaging can easily be tracked. v2: - update 'net' pointer when node left/rejoined v3: - grab read/write lock when using node ref obj v4: - clone traffics between netns to loopback Suggested-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
71204231 |
|
14-Aug-2019 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: fix false detection of retransmit failures This commit eliminates the use of the link 'stale_limit' & 'prev_from' (besides the already removed - 'stale_cnt') variables in the detection of repeated retransmit failures as there is no proper way to initialize them to avoid a false detection, i.e. it is not really a retransmission failure but due to a garbage values in the variables. Instead, a jiffies variable will be added to individual skbs (like the way we restrict the skb retransmissions) in order to mark the first skb retransmit time. Later on, at the next retransmissions, the timestamp will be checked to see if the skb in the link transmq is "too stale", that is, the link tolerance time has passed, so that a link reset will be ordered. Note, just checking on the first skb in the queue is fine enough since it must be the oldest one. A counter is also added to keep track the actual skb retransmissions' number for later checking when the failure happens. The downside of this approach is that the skb->cb[] buffer is about to be exhausted, however it is always able to allocate another memory area and keep a reference to it when needed. Fixes: 77cf8edbc0e7 ("tipc: simplify stale link failure criteria") Reported-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2320bcda |
|
23-Jul-2019 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: fix changeover issues due to large packet In conjunction with changing the interfaces' MTU (e.g. especially in the case of a bonding) where the TIPC links are brought up and down in a short time, a couple of issues were detected with the current link changeover mechanism: 1) When one link is up but immediately forced down again, the failover procedure will be carried out in order to failover all the messages in the link's transmq queue onto the other working link. The link and node state is also set to FAILINGOVER as part of the process. The message will be transmited in form of a FAILOVER_MSG, so its size is plus of 40 bytes (= the message header size). There is no problem if the original message size is not larger than the link's MTU - 40, and indeed this is the max size of a normal payload messages. However, in the situation above, because the link has just been up, the messages in the link's transmq are almost SYNCH_MSGs which had been generated by the link synching procedure, then their size might reach the max value already! When the FAILOVER_MSG is built on the top of such a SYNCH_MSG, its size will exceed the link's MTU. As a result, the messages are dropped silently and the failover procedure will never end up, the link will not be able to exit the FAILINGOVER state, so cannot be re-established. 2) The same scenario above can happen more easily in case the MTU of the links is set differently or when changing. In that case, as long as a large message in the failure link's transmq queue was built and fragmented with its link's MTU > the other link's one, the issue will happen (there is no need of a link synching in advance). 3) The link synching procedure also faces with the same issue but since the link synching is only started upon receipt of a SYNCH_MSG, dropping the message will not result in a state deadlock, but it is not expected as design. The 1) & 3) issues are resolved by the last commit that only a dummy SYNCH_MSG (i.e. without data) is generated at the link synching, so the size of a FAILOVER_MSG if any then will never exceed the link's MTU. For the 2) issue, the only solution is trying to fragment the messages in the failure link's transmq queue according to the working link's MTU so they can be failovered then. A new function is made to accomplish this, it will still be a TUNNEL PROTOCOL/FAILOVER MSG but if the original message size is too large, it will be fragmented & reassembled at the receiving side. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4929a932 |
|
23-Jul-2019 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: optimize link synching mechanism This commit along with the next one are to resolve the issues with the link changeover mechanism. See that commit for details. Basically, for the link synching, from now on, we will send only one single ("dummy") SYNCH message to peer. The SYNCH message does not contain any data, just a header conveying the synch point to the peer. A new node capability flag ("TIPC_TUNNEL_ENHANCED") is introduced for backward compatible! Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Suggested-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a7dc51ad |
|
25-Jun-2019 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: rename function msg_get_wrapped() to msg_inner_hdr() We rename the inline function msg_get_wrapped() to the more comprehensible msg_inner_hdr(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
382f598f |
|
03-Apr-2019 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: reduce duplicate packets for unicast traffic For unicast transmission, the current NACK sending althorithm is over- active that forces the sending side to retransmit a packet that is not really lost but just arrived at the receiving side with some delay, or even retransmit same packets that have already been retransmitted before. As a result, many duplicates are observed also under normal condition, ie. without packet loss. One example case is: node1 transmits 1 2 3 4 10 5 6 7 8 9, when node2 receives packet #10, it puts into the deferdq. When the packet #5 comes it sends NACK with gap [6 - 9]. However, shortly after that, when packet #6 arrives, it pulls out packet #10 from the deferfq, but it is still out of order, so it makes another NACK with gap [7 - 9] and so on ... Finally, node1 has to retransmit the packets 5 6 7 8 9 a number of times, but in fact all the packets are not lost at all, so duplicates! This commit reduces duplicates by changing the condition to send NACK, also restricting the retransmissions on individual packets via a timer of about 1ms. However, it also needs to say that too tricky condition for NACKs or too long timeout value for retransmissions will result in performance reducing! The criterias in this commit are found to be effective for both the requirements to reduce duplicates but not affect performance. The tipc_link_rcv() is also improved to only dequeue skb from the link deferdq if it is expected (ie. its seqno <= rcv_nxt). Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9195948f |
|
03-Apr-2019 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: improve TIPC throughput by Gap ACK blocks During unicast link transmission, it's observed very often that because of one or a few lost/dis-ordered packets, the sending side will fastly reach the send window limit and must wait for the packets to be arrived at the receiving side or in the worst case, a retransmission must be done first. The sending side cannot release a lot of subsequent packets in its transmq even though all of them might have already been received by the receiving side. That is, one or two packets dis-ordered/lost and dozens of packets have to wait, this obviously reduces the overall throughput! This commit introduces an algorithm to overcome this by using "Gap ACK blocks". Basically, a Gap ACK block will consist of <ack, gap> numbers that describes the link deferdq where packets have been got by the receiving side but with gaps, for example: link deferdq: [1 2 3 4 10 11 13 14 15 20] --> Gap ACK blocks: <4, 5>, <11, 1>, <15, 4>, <20, 0> The Gap ACK blocks will be sent to the sending side along with the traditional ACK or NACK message. Immediately when receiving the message the sending side will now not only release from its transmq the packets ack-ed by the ACK but also by the Gap ACK blocks! So, more packets can be enqueued and transmitted. In addition, the sending side can now do "multi-retransmissions" according to the Gaps reported in the Gap ACK blocks. The new algorithm as verified helps greatly improve the TIPC throughput especially under packet loss condition. So far, a maximum of 32 blocks is quite enough without any "Too few Gap ACK blocks" reports with a 5.0% packet loss rate, however this number can be increased in the furture if needed. Also, the patch is backward compatible. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c55c8eda |
|
19-Mar-2019 |
Hoang Le <hoang.h.le@dektech.com.au> |
tipc: smooth change between replicast and broadcast Currently, a multicast stream may start out using replicast, because there are few destinations, and then it should ideally switch to L2/broadcast IGMP/multicast when the number of destinations grows beyond a certain limit. The opposite should happen when the number decreases below the limit. To eliminate the risk of message reordering caused by method change, a sending socket must stick to a previously selected method until it enters an idle period of 5 seconds. Means there is a 5 seconds pause in the traffic from the sender socket. If the sender never makes such a pause, the method will never change, and transmission may become very inefficient as the cluster grows. With this commit, we allow such a switch between replicast and broadcast without any need for a traffic pause. Solution is to send a dummy message with only the header, also with the SYN bit set, via broadcast or replicast. For the data message, the SYN bit is set and sending via replicast or broadcast (inverse method with dummy). Then, at receiving side any messages follow first SYN bit message (data or dummy message), they will be held in deferred queue until another pair (dummy or data message) arrived in other link. v2: reverse christmas tree declaration Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
91986ee1 |
|
10-Feb-2019 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: fix link session and re-establish issues When a link endpoint is re-created (e.g. after a node reboot or interface reset), the link session number is varied by random, the peer endpoint will be synced with this new session number before the link is re-established. However, there is a shortcoming in this mechanism that can lead to the link never re-established or faced with a failure then. It happens when the peer endpoint is ready in ESTABLISHING state, the 'peer_session' as well as the 'in_session' flag have been set, but suddenly this link endpoint leaves. When it comes back with a random session number, there are two situations possible: 1/ If the random session number is larger than (or equal to) the previous one, the peer endpoint will be updated with this new session upon receipt of a RESET_MSG from this endpoint, and the link can be re- established as normal. Otherwise, all the RESET_MSGs from this endpoint will be rejected by the peer. In turn, when this link endpoint receives one ACTIVATE_MSG from the peer, it will move to ESTABLISHED and start to send STATE_MSGs, but again these messages will be dropped by the peer due to wrong session. The peer link endpoint can still become ESTABLISHED after receiving a traffic message from this endpoint (e.g. a BCAST_PROTOCOL or NAME_DISTRIBUTOR), but since all the STATE_MSGs are invalid, the link will be forced down sooner or later! Even in case the random session number is larger than the previous one, it can be that the ACTIVATE_MSG from the peer arrives first, and this link endpoint moves quickly to ESTABLISHED without sending out any RESET_MSG yet. Consequently, the peer link will not be updated with the new session number, and the same link failure scenario as above will happen. 2/ Another situation can be that, the peer link endpoint was reset due to any reasons in the meantime, its link state was set to RESET from ESTABLISHING but still in session, i.e. the 'in_session' flag is not reset... Now, if the random session number from this endpoint is less than the previous one, all the RESET_MSGs from this endpoint will be rejected by the peer. In the other direction, when this link endpoint receives a RESET_MSG from the peer, it moves to ESTABLISHING and starts to send ACTIVATE_MSGs, but all these messages will be rejected by the peer too. As a result, the link cannot be re-established but gets stuck with this link endpoint in state ESTABLISHING and the peer in RESET! Solution: =========== This link endpoint should not go directly to ESTABLISHED when getting ACTIVATE_MSG from the peer which may belong to the old session if the link was re-created. To ensure the session to be correct before the link is re-established, the peer endpoint in ESTABLISHING state will send back the last session number in ACTIVATE_MSG for a verification at this endpoint. Then, if needed, a new and more appropriate session number will be regenerated to force a re-synch first. In addition, when a link in ESTABLISHING state is reset, its state will move to RESET according to the link FSM, along with resetting the 'in_session' flag (and the other data) as a normal link reset, it will also be deleted if requested. The solution is backward compatible. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
31c4f4cc |
|
10-Nov-2018 |
LUU Duc Canh <canh.d.luu@dektech.com.au> |
tipc: improve broadcast retransmission algorithm Currently, the broadcast retransmission algorithm is using the 'prev_retr' field in struct tipc_link to time stamp the latest broadcast retransmission occasion. This helps to restrict retransmission of individual broadcast packets to max once per 10 milliseconds, even though all other criteria for retransmission are met. We now move this time stamp to the control block of each individual packet, and remove other limiting criteria. This simplifies the retransmission algorithm, and eliminates any risk of logical errors in selecting which packets can be retransmitted. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: LUU Duc Canh <canh.d.luu@dektech.com.au> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
67879274 |
|
28-Sep-2018 |
Tung Nguyen <tung.q.nguyen@dektech.com.au> |
tipc: buffer overflow handling in listener socket Default socket receive buffer size for a listener socket is 2Mb. For each arriving empty SYN, the linux kernel allocates a 768 bytes buffer. This means that a listener socket can serve maximum 2700 simultaneous empty connection setup requests before it hits a receive buffer overflow, and much fewer if the SYN is carrying any significant amount of data. When this happens the setup request is rejected, and the client receives an ECONNREFUSED error. This commit mitigates this problem by letting the client socket try to retransmit the SYN message multiple times when it sees it rejected with the code TIPC_ERR_OVERLOAD. Retransmission is done at random intervals in the range of [100 ms, setup_timeout / 4], as many times as there is room for within the setup timeout limit. Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
25b9221b |
|
28-Sep-2018 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: add SYN bit to connection setup messages Messages intended for intitating a connection are currently indistinguishable from regular datagram messages. The TIPC protocol specification defines bit 17 in word 0 as a SYN bit to allow sanity check of such messages in the listening socket, but this has so far never been implemented. We do that in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
25b0b9c4 |
|
22-Mar-2018 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: handle collisions of 32-bit node address hash values When a 32-bit node address is generated from a 128-bit identifier, there is a risk of collisions which must be discovered and handled. We do this as follows: - We don't apply the generated address immediately to the node, but do instead initiate a 1 sec trial period to allow other cluster members to discover and handle such collisions. - During the trial period the node periodically sends out a new type of message, DSC_TRIAL_MSG, using broadcast or emulated broadcast, to all the other nodes in the cluster. - When a node is receiving such a message, it must check that the presented 32-bit identifier either is unused, or was used by the very same peer in a previous session. In both cases it accepts the request by not responding to it. - If it finds that the same node has been up before using a different address, it responds with a DSC_TRIAL_FAIL_MSG containing that address. - If it finds that the address has already been taken by some other node, it generates a new, unused address and returns it to the requester. - During the trial period the requesting node must always be prepared to accept a failure message, i.e., a message where a peer suggests a different (or equal) address to the one tried. In those cases it must apply the suggested value as trial address and restart the trial period. This algorithm ensures that in the vast majority of cases a node will have the same address before and after a reboot. If a legacy user configures the address explicitly, there will be no trial period and messages, so this protocol addition is completely backwards compatible. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4c94cc2d |
|
30-Nov-2017 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: fall back to smaller MTU if allocation of local send skb fails When sending node local messages the code is using an 'mtu' of 66060 bytes to avoid unnecessary fragmentation. During situations of low memory tipc_msg_build() may sometimes fail to allocate such large buffers, resulting in unnecessary send failures. This can easily be remedied by falling back to a smaller MTU, and then reassemble the buffer chain as if the message were arriving from a remote node. At the same time, we change the initial MTU setting of the broadcast link to a lower value, so that large messages always are fragmented into smaller buffers even when we run in single node mode. Apart from obtaining the same advantage as for the 'fallback' solution above, this turns out to give a significant performance improvement. This can probably be explained with the __pskb_copy() operation performed on the buffer for each recipient during reception. We found the optimal value for this, considering the most relevant skb pool, to be 3744 bytes. Acked-by: Ying Xue <ying.xue@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d618d09a |
|
15-Nov-2017 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: enforce valid ratio between skb truesize and contents The socket level flow control is based on the assumption that incoming buffers meet the condition (skb->truesize / roundup(skb->len) <= 4), where the latter value is rounded off upwards to the nearest 1k number. This does empirically hold true for the device drivers we know, but we cannot trust that it will always be so, e.g., in a system with jumbo frames and very small packets. We now introduce a check for this condition at packet arrival, and if we find it to be false, we copy the packet to a new, smaller buffer, where the condition will be true. We expect this to affect only a small fraction of all incoming packets, if at all. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8d6e79d3 |
|
08-Nov-2017 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: improve link resiliency when rps is activated Currently, the TIPC RPS dissector is based only on the incoming packets' source node address, hence steering all traffic from a node to the same core. We have seen that this makes the links vulnerable to starvation and unnecessary resets when we turn down the link tolerance to very low values. To reduce the risk of this happening, we exempt probe and probe replies packets from the convergence to one core per source node. Instead, we do the opposite, - we try to diverge those packets across as many cores as possible, by randomizing the flow selector key. To make such packets identifiable to the dissector, we add a new 'is_keepalive' bit to word 0 of the LINK_PROTOCOL header. This bit is set both for PROBE and PROBE_REPLY messages, and only for those. It should be noted that these packets are not part of any flow anyway, and only constitute a minuscule fraction of all packets sent across a link. Hence, there is no risk that this will affect overall performance. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
04d7b574 |
|
13-Oct-2017 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: add multipoint-to-point flow control We already have point-to-multipoint flow control within a group. But we even need the opposite; -a scheme which can handle that potentially hundreds of sources may try to send messages to the same destination simultaneously without causing buffer overflow at the recipient. This commit adds such a mechanism. The algorithm works as follows: - When a member detects a new, joining member, it initially set its state to JOINED and advertises a minimum window to the new member. This window is chosen so that the new member can send exactly one maximum sized message, or several smaller ones, to the recipient before it must stop and wait for an additional advertisement. This minimum window ADV_IDLE is set to 65 1kB blocks. - When a member receives the first data message from a JOINED member, it changes the state of the latter to ACTIVE, and advertises a larger window ADV_ACTIVE = 12 x ADV_IDLE blocks to the sender, so it can continue sending with minimal disturbances to the data flow. - The active members are kept in a dedicated linked list. Each time a message is received from an active member, it will be moved to the tail of that list. This way, we keep a record of which members have been most (tail) and least (head) recently active. - There is a maximum number (16) of permitted simultaneous active senders per receiver. When this limit is reached, the receiver will not advertise anything immediately to a new sender, but instead put it in a PENDING state, and add it to a corresponding queue. At the same time, it will pick the least recently active member, send it an advertisement RECLAIM message, and set this member to state RECLAIMING. - The reclaimee member has to respond with a REMIT message, meaning that it goes back to a send window of ADV_IDLE, and returns its unused advertised blocks beyond that value to the reclaiming member. - When the reclaiming member receives the REMIT message, it unlinks the reclaimee from its active list, resets its state to JOINED, and notes that it is now back at ADV_IDLE advertised blocks to that member. If there are still unread data messages sent out by reclaimee before the REMIT, the member goes into an intermediate state REMITTED, where it stays until the said messages have been consumed. - The returned advertised blocks can now be re-advertised to the pending member, which is now set to state ACTIVE and added to the active member list. - To be proactive, i.e., to minimize the risk that any member will end up in the pending queue, we start reclaiming resources already when the number of active members exceeds 3/4 of the permitted maximum. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2f487712 |
|
13-Oct-2017 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: guarantee that group broadcast doesn't bypass group unicast We need a mechanism guaranteeing that group unicasts sent out from a socket are not bypassed by later sent broadcasts from the same socket. We do this as follows: - Each time a unicast is sent, we set a the broadcast method for the socket to "replicast" and "mandatory". This forces the first subsequent broadcast message to follow the same network and data path as the preceding unicast to a destination, hence preventing it from overtaking the latter. - In order to make the 'same data path' statement above true, we let group unicasts pass through the multicast link input queue, instead of as previously through the unicast link input queue. - In the first broadcast following a unicast, we set a new header flag, requiring all recipients to immediately acknowledge its reception. - During the period before all the expected acknowledges are received, the socket refuses to accept any more broadcast attempts, i.e., by blocking or returning EAGAIN. This period should typically not be longer than a few microseconds. - When all acknowledges have been received, the sending socket will open up for subsequent broadcasts, this time giving the link layer freedom to itself select the best transmission method. - The forced and/or abrupt transmission method changes described above may lead to broadcasts arriving out of order to the recipients. We remedy this by introducing code that checks and if necessary re-orders such messages at the receiving end. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5b8dddb6 |
|
13-Oct-2017 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: introduce group multicast messaging The previously introduced message transport to all group members is based on the tipc multicast service, but is logically a broadcast service within the group, and that is what we call it. We now add functionality for sending messages to all group members having a certain identity. Correspondingly, we call this feature 'group multicast'. The service is using unicast when only one destination is found, otherwise it will use the bearer broadcast service to transfer the messages. In the latter case, the receiving members filter arriving messages by looking at the intended destination instance. If there is no match, the message will be dropped, while still being considered received and read as seen by the flow control mechanism. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
27bd9ec0 |
|
13-Oct-2017 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: introduce group unicast messaging We now make it possible to send connectionless unicast messages within a communication group. To send a message, the sender can use either a direct port address, aka port identity, or an indirect port name to be looked up. This type of messages are subject to the same start synchronization and flow control mechanism as group broadcast messages. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b7d42635 |
|
13-Oct-2017 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: introduce flow control for group broadcast messages We introduce an end-to-end flow control mechanism for group broadcast messages. This ensures that no messages are ever lost because of destination receive buffer overflow, with minimal impact on performance. For now, the algorithm is based on the assumption that there is only one active transmitter at any moment in time. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ae236fb2 |
|
13-Oct-2017 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: receive group membership events via member socket Like with any other service, group members' availability can be subscribed for by connecting to be topology server. However, because the events arrive via a different socket than the member socket, there is a real risk that membership events my arrive out of synch with the actual JOIN/LEAVE action. I.e., it is possible to receive the first messages from a new member before the corresponding JOIN event arrives, just as it is possible to receive the last messages from a leaving member after the LEAVE event has already been received. Since each member socket is internally also subscribing for membership events, we now fix this problem by passing those events on to the user via the member socket. We leverage the already present member synch- ronization protocol to guarantee correct message/event order. An event is delivered to the user as an empty message where the two source addresses identify the new/lost member. Furthermore, we set the MSG_OOB bit in the message flags to mark it as an event. If the event is an indication about a member loss we also set the MSG_EOR bit, so it can be distinguished from a member addition event. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
31c82a2d |
|
13-Oct-2017 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: add second source address to recvmsg()/recvfrom() With group communication, it becomes important for a message receiver to identify not only from which socket (identfied by a node:port tuple) the message was sent, but also the logical identity (type:instance) of the sending member. We fix this by adding a second instance of struct sockaddr_tipc to the source address area when a message is read. The extra address struct is filled in with data found in the received message header (type,) and in the local member representation struct (instance.) Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
75da2163 |
|
13-Oct-2017 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: introduce communication groups As a preparation for introducing flow control for multicast and datagram messaging we need a more strictly defined framework than we have now. A socket must be able keep track of exactly how many and which other sockets it is allowed to communicate with at any moment, and keep the necessary state for those. We therefore introduce a new concept we have named Communication Group. Sockets can join a group via a new setsockopt() call TIPC_GROUP_JOIN. The call takes four parameters: 'type' serves as group identifier, 'instance' serves as an logical member identifier, and 'scope' indicates the visibility of the group (node/cluster/zone). Finally, 'flags' makes it possible to set certain properties for the member. For now, there is only one flag, indicating if the creator of the socket wants to receive a copy of broadcast or multicast messages it is sending via the socket, and if wants to be eligible as destination for its own anycasts. A group is closed, i.e., sockets which have not joined a group will not be able to send messages to or receive messages from members of the group, and vice versa. Any member of a group can send multicast ('group broadcast') messages to all group members, optionally including itself, using the primitive send(). The messages are received via the recvmsg() primitive. A socket can only be member of one group at a time. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
64ac5f59 |
|
13-Oct-2017 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: refactor function filter_rcv() In the following commits we will need to handle multiple incoming and rejected/returned buffers in the function socket.c::filter_rcv(). As a preparation for this, we generalize the function by handling buffer queues instead of individual buffers. We also introduce a help function tipc_skb_reject(), and rename filter_rcv() to tipc_sk_filter_rcv() in line with other functions in socket.c. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
14c04493 |
|
13-Oct-2017 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: add ability to order and receive topology events in driver As preparation for introducing communication groups, we add the ability to issue topology subscriptions and receive topology events from kernel space. This will make it possible for group member sockets to keep track of other group members. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a853e4c6 |
|
18-Jan-2017 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: introduce replicast as transport option for multicast TIPC multicast messages are currently carried over a reliable 'broadcast link', making use of the underlying media's ability to transport packets as L2 broadcast or IP multicast to all nodes in the cluster. When the used bearer is lacking that ability, we can instead emulate the broadcast service by replicating and sending the packets over as many unicast links as needed to reach all identified destinations. We now introduce a new TIPC link-level 'replicast' service that does this. Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
57d5f64d |
|
13-Jan-2017 |
Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> |
tipc: allocate user memory with GFP_KERNEL flag Until now, we allocate memory always with GFP_ATOMIC flag. When the system is under memory pressure and a user tries to send, the send fails due to low memory. However, the user application can wait for free memory if we allocate it using GFP_KERNEL flag. In this commit, we use allocate memory with GFP_KERNEL for all user allocation. Reported-by: Rune Torgersen <runet@innovsys.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
365ad353 |
|
03-Jan-2017 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: reduce risk of user starvation during link congestion The socket code currently handles link congestion by either blocking and trying to send again when the congestion has abated, or just returning to the user with -EAGAIN and let him re-try later. This mechanism is prone to starvation, because the wakeup algorithm is non-atomic. During the time the link issues a wakeup signal, until the socket wakes up and re-attempts sending, other senders may have come in between and occupied the free buffer space in the link. This in turn may lead to a socket having to make many send attempts before it is successful. In extremely loaded systems we have observed latency times of several seconds before a low-priority socket is able to send out a message. In this commit, we simplify this mechanism and reduce the risk of the described scenario happening. When a message is attempted sent via a congested link, we now let it be added to the link's backlog queue anyway, thus permitting an oversubscription of one message per source socket. We still create a wakeup item and return an error code, hence instructing the sender to block or stop sending. Only when enough space has been freed up in the link's backlog queue do we issue a wakeup event that allows the sender to continue with the next message, if any. The fact that a socket now can consider a message sent even when the link returns a congestion code means that the sending socket code can be simplified. Also, since this is a good opportunity to get rid of the obsolete 'mtu change' condition in the three socket send functions, we now choose to refactor those functions completely. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ba8aebe9 |
|
01-Nov-2016 |
Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> |
tipc: rename struct tipc_skb_cb member handle to bytes_read In this commit, we rename handle to bytes_read indicating the purpose of the member. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
06bd2b1e |
|
27-Oct-2016 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: fix broadcast link synchronization problem In commit 2d18ac4ba745 ("tipc: extend broadcast link initialization criteria") we tried to fix a problem with the initial synchronization of broadcast link acknowledge values. Unfortunately that solution is not sufficient to solve the issue. We have seen it happen that LINK_PROTOCOL/STATE packets with a valid non-zero unicast acknowledge number may bypass BCAST_PROTOCOL initialization, NAME_DISTRIBUTOR and other STATE packets with invalid broadcast acknowledge numbers, leading to premature opening of the broadcast link. When the bypassed packets finally arrive, they are inadvertently accepted, and the already correctly initialized acknowledge number in the broadcast receive link is overwritten by the invalid (zero) value of the said packets. After this the broadcast link goes stale. We now fix this by marking the packets where we know the acknowledge value is or may be invalid, and then ignoring the acks from those. To this purpose, we claim an unused bit in the header to indicate that the value is invalid. We set the bit to 1 in the initial BCAST_PROTOCOL synchronization packet and all initial ("bulk") NAME_DISTRIBUTOR packets, plus those LINK_PROTOCOL packets sent out before the broadcast links are fully synchronized. This minor protocol update is fully backwards compatible. Reported-by: John Thompson <thompa.atl@gmail.com> Tested-by: John Thompson <thompa.atl@gmail.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
02d11ca2 |
|
01-Sep-2016 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: transfer broadcast nacks in link state messages When we send broadcasts in clusters of more 70-80 nodes, we sometimes see the broadcast link resetting because of an excessive number of retransmissions. This is caused by a combination of two factors: 1) A 'NACK crunch", where loss of broadcast packets is discovered and NACK'ed by several nodes simultaneously, leading to multiple redundant broadcast retransmissions. 2) The fact that the NACKS as such also are sent as broadcast, leading to excessive load and packet loss on the transmitting switch/bridge. This commit deals with the latter problem, by moving sending of broadcast nacks from the dedicated BCAST_PROTOCOL/NACK message type to regular unicast LINK_PROTOCOL/STATE messages. We allocate 10 unused bits in word 8 of the said message for this purpose, and introduce a new capability bit, TIPC_BCAST_STATE_NACK in order to keep the change backwards compatible. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
27777daa |
|
20-Jun-2016 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: unclone unbundled buffers before forwarding When extracting an individual message from a received "bundle" buffer, we just create a clone of the base buffer, and adjust it to point into the right position of the linearized data area of the latter. This works well for regular message reception, but during periods of extremely high load it may happen that an extracted buffer, e.g, a connection probe, is reversed and forwarded through an external interface while the preceding extracted message is still unhandled. When this happens, the header or data area of the preceding message will be partially overwritten by a MAC header, leading to unpredicatable consequences, such as a link reset. We now fix this by ensuring that the msg_reverse() function never returns a cloned buffer, and that the returned buffer always contains sufficient valid head and tail room to be forwarded. Reported-by: Erik Hugne <erik.hugne@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
10724cc7 |
|
02-May-2016 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: redesign connection-level flow control There are two flow control mechanisms in TIPC; one at link level that handles network congestion, burst control, and retransmission, and one at connection level which' only remaining task is to prevent overflow in the receiving socket buffer. In TIPC, the latter task has to be solved end-to-end because messages can not be thrown away once they have been accepted and delivered upwards from the link layer, i.e, we can never permit the receive buffer to overflow. Currently, this algorithm is message based. A counter in the receiving socket keeps track of number of consumed messages, and sends a dedicated acknowledge message back to the sender for each 256 consumed message. A counter at the sending end keeps track of the sent, not yet acknowledged messages, and blocks the sender if this number ever reaches 512 unacknowledged messages. When the missing acknowledge arrives, the socket is then woken up for renewed transmission. This works well for keeping the message flow running, as it almost never happens that a sender socket is blocked this way. A problem with the current mechanism is that it potentially is very memory consuming. Since we don't distinguish between small and large messages, we have to dimension the socket receive buffer according to a worst-case of both. I.e., the window size must be chosen large enough to sustain a reasonable throughput even for the smallest messages, while we must still consider a scenario where all messages are of maximum size. Hence, the current fix window size of 512 messages and a maximum message size of 66k results in a receive buffer of 66 MB when truesize(66k) = 131k is taken into account. It is possible to do much better. This commit introduces an algorithm where we instead use 1024-byte blocks as base unit. This unit, always rounded upwards from the actual message size, is used when we advertise windows as well as when we count and acknowledge transmitted data. The advertised window is based on the configured receive buffer size in such a way that even the worst-case truesize/msgsize ratio always is covered. Since the smallest possible message size (from a flow control viewpoint) now is 1024 bytes, we can safely assume this ratio to be less than four, which is the value we are now using. This way, we have been able to reduce the default receive buffer size from 66 MB to 2 MB with maintained performance. In order to keep this solution backwards compatible, we introduce a new capability bit in the discovery protocol, and use this throughout the message sending/reception path to always select the right unit. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
634696b1 |
|
15-Apr-2016 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: guarantee peer bearer id exchange after reboot When a link endpoint is going down locally, e.g., because its interface is being stopped, it will spontaneously send out a RESET message to its peer, informing it about this fact. This saves the peer from detecting the failure via probing, and hence gives both speedier and less resource consuming failure detection on the peer side. According to the link FSM, a receiver of a RESET message, ignoring the reason for it, must now consider the sender ready to come back up, and starts periodically sending out ACTIVATE messages to the peer in order to re-establish the link. Also, according to the FSM, the receiver of an ACTIVATE message can now go directly to state ESTABLISHED and start sending regular traffic packets. This is a well-proven and robust FSM. However, in the case of a reboot, there is a small possibilty that link endpoint on the rebooted node may have been re-created with a new bearer identity between the moment it sent its (pre-boot) RESET and the moment it receives the ACTIVATE from the peer. The new bearer identity cannot be known by the peer according to this scenario, since traffic headers don't convey such information. This is a problem, because both endpoints need to know the correct value of the peer's bearer id at any moment in time in order to be able to produce correct link events for their users. The only way to guarantee this is to enforce a full setup message exchange (RESET + ACTIVATE) even after the reboot, since those messages carry the bearer idientity in their header. In this commit we do this by introducing and setting a "stopping" bit in the header of the spontaneously generated RESET messages, informing the peer that the sender will not be immediately ready to re-establish the link. A receiver seeing this bit must act as if this were a locally detected connectivity failure, and hence has to go through a full two- way setup message exchange before any link can be re-established. Although never reported, this problem seems to have always been around. This protocol addition is fully backwards compatible. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5b7066c3 |
|
07-Apr-2016 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: stricter filtering of packets in bearer layer Resetting a bearer/interface, with the consequence of resetting all its pertaining links, is not an atomic action. This becomes particularly evident in very large clusters, where a lot of traffic may happen on the remaining links while we are busy shutting them down. In extreme cases, we may even see links being re-created and re-established before we are finished with the job. To solve this, we now introduce a solution where we temporarily detach the bearer from the interface when the bearer is reset. This inhibits all packet reception, while sending still is possible. For the latter, we use the fact that the device's user pointer now is zero to filter out which packets can be sent during this situation; i.e., outgoing RESET messages only. This filtering serves to speed up the neighbors' detection of the loss event, and saves us from unnecessary probing. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
52666986 |
|
22-Oct-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: let broadcast packet reception use new link receive function The code path for receiving broadcast packets is currently distinct from the unicast path. This leads to unnecessary code and data duplication, something that can be avoided with some effort. We now introduce separate per-peer tipc_link instances for handling broadcast packet reception. Each receive link keeps a pointer to the common, single, broadcast link instance, and can hence handle release and retransmission of send buffers as if they belonged to the own instance. Furthermore, we let each unicast link instance keep a reference to both the pertaining broadcast receive link, and to the common send link. This makes it possible for the unicast links to easily access data for broadcast link synchronization, as well as for carrying acknowledges for received broadcast packets. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2f566124 |
|
22-Oct-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: let broadcast transmission use new link transmit function This commit simplifies the broadcast link transmission function, by leveraging previous changes to the link transmission function and the broadcast transmission link life cycle. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c1ab3f1d |
|
22-Oct-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: make struct tipc_link generic to support broadcast Realizing that unicast is just a special case of broadcast, we also see that we can go in the other direction, i.e., that modest changes to the current unicast link can make it generic enough to support broadcast. The following changes are introduced here: - A new counter ("ackers") in struct tipc_link, to indicate how many peers need to ack a packet before it can be released. - A corresponding counter in the skb user area, to keep track of how many peers a are left to ack before a buffer can be released. - A new counter ("acked"), to keep persistent track of how far a peer has acked at the moment, i.e., where in the transmission queue to start updating buffers when the next ack arrives. This is to avoid double acknowledgements from a peer, with inadvertent relase of packets as a result. - A more generic tipc_link_retrans() function, where retransmit starts from a given sequence number, instead of the first packet in the transmision queue. This is to minimize the number of retransmitted packets on the broadcast media. When the new functionality is taken into use in the next commits, we expect it to have minimal effect on unicast mode performance. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8306f99a |
|
15-Oct-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: disallow packet duplicates in link deferred queue After the previous commits, we are guaranteed that no packets of type LINK_PROTOCOL or with illegal sequence numbers will be attempted added to the link deferred queue. This makes it possible to make some simplifications to the sorting algorithm in the function tipc_skb_queue_sorted(). We also alter the function so that it will drop packets if one with the same seqeunce number is already present in the queue. This is necessary because we have identified weird packet sequences, involving duplicate packets, where a legitimate in-sequence packet may advance to the head of the queue without being detected and de-queued. Finally, we make this function outline, since it will now be called only in exceptional cases. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dde4b5ae |
|
14-Oct-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: move fragment importance field to new header position In commit e3eea1eb47a ("tipc: clean up handling of message priorities") we introduced a field in the packet header for keeping track of the priority of fragments, since this value is not present in the specified protocol header. Since the value so far only is used at the transmitting end of the link, we have not yet officially defined it as part of the protocol. Unfortunately, the field we use for keeping this value, bits 13-15 in in word 5, has turned out to be a poor choice; it is already used by the broadcast protocol for carrying the 'network id' field of the sending node. Since packet fragments also need to be transported across the broadcast protocol, the risk of conflict is obvious, and we see this happen when we use network identities larger than 2^13-1. This has escaped our testing because we have so far only been using small network id values. We now move this field to bits 0-2 in word 9, a field that is guaranteed to be unused by all involved protocols. Fixes: e3eea1eb47a ("tipc: clean up handling of message priorities") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
23d8335d |
|
30-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: remove implicit message delivery in node_unlock() After the most recent changes, all access calls to a link which may entail addition of messages to the link's input queue are postpended by an explicit call to tipc_sk_rcv(), using a reference to the correct queue. This means that the potentially hazardous implicit delivery, using tipc_node_unlock() in combination with a binary flag and a cached queue pointer, now has become redundant. This commit removes this implicit delivery mechanism both for regular data messages and for binding table update messages. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
598411d7 |
|
30-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: make resetting of links non-atomic In order to facilitate future improvements to the locking structure, we want to make resetting and establishing of links non-atomic. I.e., the functions tipc_node_link_up() and tipc_node_link_down() should be called from outside the node lock context, and grab/release the node lock themselves. This requires that we can freeze the link state from the moment it is set to RESETTING or PEER_RESET in one lock context until it is set to RESET or ESTABLISHING in a later context. The recently introduced link FSM makes this possible, so we are now ready to introduce the above change. This commit implements this. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6e498158 |
|
30-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: move link synch and failover to link aggregation level Link failover and synchronization have until now been handled by the links themselves, forcing them to have knowledge about and to access parallel links in order to make the two algorithms work correctly. In this commit, we move the control part of this functionality to the link aggregation level in node.c, which is the right location for this. As a result, the two algorithms become easier to follow, and the link implementation becomes simpler. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cda3696d |
|
22-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: clean up socket layer message reception When a message is received in a socket, one of the call chains tipc_sk_rcv()->tipc_sk_enqueue()->filter_rcv()(->tipc_sk_proto_rcv()) or tipc_sk_backlog_rcv()->filter_rcv()(->tipc_sk_proto_rcv()) are followed. At each of these levels we may encounter situations where the message may need to be rejected, or a new message produced for transfer back to the sender. Despite recent improvements, the current code for doing this is perceived as awkward and hard to follow. Leveraging the two previous commits in this series, we now introduce a more uniform handling of such situations. We let each of the functions in the chain itself produce/reverse the message to be returned to the sender, but also perform the actual forwarding. This simplifies the necessary logics within each function. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bcd3ffd4 |
|
22-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: introduce new tipc_sk_respond() function Currently, we use the code sequence if (msg_reverse()) tipc_link_xmit_skb() at numerous locations in socket.c. The preparation of arguments for these calls, as well as the sequence itself, makes the code unecessarily complex. In this commit, we introduce a new function, tipc_sk_respond(), that performs this call combination. We also replace some, but not yet all, of these explicit call sequences with calls to the new function. Notably, we let the function tipc_sk_proto_rcv() use the new function to directly send out PROBE_REPLY messages, instead of deferring this to the calling tipc_sk_rcv() function, as we do now. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
29042e19 |
|
22-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: let function tipc_msg_reverse() expand header when needed The shortest TIPC message header, for cluster local CONNECTED messages, is 24 bytes long. With this format, the fields "dest_node" and "orig_node" are optimized away, since they in reality are redundant in this particular case. However, the absence of these fields leads to code inconsistencies that are difficult to handle in some cases, especially when we need to reverse or reject messages at the socket layer. In this commit, we concentrate the handling of the absent fields to one place, by letting the function tipc_msg_reverse() reallocate the buffer and expand the header to 32 bytes when necessary. This means that the socket code now can assume that the two previously absent fields are present in the header when a message needs to be rejected. This opens up for some further simplifications of the socket code. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d999297c |
|
16-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: reduce locking scope during packet reception We convert packet/message reception according to the same principle we have been using for message sending and timeout handling: We move the function tipc_rcv() to node.c, hence handling the initial packet reception at the link aggregation level. The function grabs the node lock, selects the receiving link, and accesses it via a new call tipc_link_rcv(). This function appends buffers to the input queue for delivery upwards, but it may also append outgoing packets to the xmit queue, just as we do during regular message sending. The latter will happen when buffers are forwarded from the link backlog, or when retransmission is requested. Upon return of this function, and after having released the node lock, tipc_rcv() delivers/tranmsits the contents of those queues, but it may also perform actions such as link activation or reset, as indicated by the return flags from the link. This reduces the number of cpu cycles spent inside the node spinlock, and reduces contention on that lock. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1a20cc25 |
|
16-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: introduce node contact FSM The logics for determining when a node is permitted to establish and maintain contact with its peer node becomes non-trivial in the presence of multiple parallel links that may come and go independently. A known failure scenario is that one endpoint registers both its links to the peer lost, cleans up it binding table, and prepares for a table update once contact is re-establihed, while the other endpoint may see its links reset and re-established one by one, hence seeing no need to re-synchronize the binding table. To avoid this, a node must not allow re-establishing contact until it has confirmation that even the peer has lost both links. Currently, the mechanism for handling this consists of setting and resetting two state flags from different locations in the code. This solution is hard to understand and maintain. A closer analysis even reveals that it is not completely safe. In this commit we do instead introduce an FSM that keeps track of the conditions for when the node can establish and maintain links. It has six states and four events, and is strictly based on explicit knowledge about the own node's and the peer node's contact states. Only events leading to state change are shown as edges in the figure below. +--------------+ | SELF_UP/ | +---------------->| PEER_COMING |-----------------+ SELF_ | +--------------+ |PEER_ ESTBL_ | | |ESTBL_ CONTACT| SELF_LOST_CONTACT | |CONTACT | v | | +--------------+ | | PEER_ | SELF_DOWN/ | SELF_ | | LOST_ +--| PEER_LEAVING |<--+ LOST_ v +-------------+ CONTACT | +--------------+ | CONTACT +-----------+ | SELF_DOWN/ |<----------+ +----------| SELF_UP/ | | PEER_DOWN |<----------+ +----------| PEER_UP | +-------------+ SELF_ | +--------------+ | PEER_ +-----------+ | LOST_ +--| SELF_LEAVING/|<--+ LOST_ A | CONTACT | PEER_DOWN | CONTACT | | +--------------+ | | A | PEER_ | PEER_LOST_CONTACT | |SELF_ ESTBL_ | | |ESTBL_ CONTACT| +--------------+ |CONTACT +---------------->| PEER_UP/ |-----------------+ | SELF_COMING | +--------------+ Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dd3f9e70 |
|
14-May-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: add packet sequence number at instant of transmission Currently, the packet sequence number is updated and added to each packet at the moment a packet is added to the link backlog queue. This is wasteful, since it forces the code to traverse the send packet list packet by packet when adding them to the backlog queue. It would be better to just splice the whole packet list into the backlog queue when that is the right action to do. In this commit, we do this change. Also, since the sequence numbers cannot now be assigned to the packets at the moment they are added the backlog queue, we do instead calculate and add them at the moment of transmission, when the backlog queue has to be traversed anyway. We do this in the function tipc_link_push_packet(). Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f21e897e |
|
14-May-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: improve link congestion algorithm The link congestion algorithm used until now implies two problems. - It is too generous towards lower-level messages in situations of high load by giving "absolute" bandwidth guarantees to the different priority levels. LOW traffic is guaranteed 10%, MEDIUM is guaranted 20%, HIGH is guaranteed 30%, and CRITICAL is guaranteed 40% of the available bandwidth. But, in the absence of higher level traffic, the ratio between two distinct levels becomes unreasonable. E.g. if there is only LOW and MEDIUM traffic on a system, the former is guaranteed 1/3 of the bandwidth, and the latter 2/3. This again means that if there is e.g. one LOW user and 10 MEDIUM users, the former will have 33.3% of the bandwidth, and the others will have to compete for the remainder, i.e. each will end up with 6.7% of the capacity. - Packets of type MSG_BUNDLER are created at SYSTEM importance level, but only after the packets bundled into it have passed the congestion test for their own respective levels. Since bundled packets don't result in incrementing the level counter for their own importance, only occasionally for the SYSTEM level counter, they do in practice obtain SYSTEM level importance. Hence, the current implementation provides a gap in the congestion algorithm that in the worst case may lead to a link reset. We now refine the congestion algorithm as follows: - A message is accepted to the link backlog only if its own level counter, and all superior level counters, permit it. - The importance of a created bundle packet is set according to its contents. A bundle packet created from messges at levels LOW to CRITICAL is given importance level CRITICAL, while a bundle created from a SYSTEM level message is given importance SYSTEM. In the latter case only subsequent SYSTEM level messages are allowed to be bundled into it. This solves the first problem described above, by making the bandwidth guarantee relative to the total number of users at all levels; only the upper limit for each level remains absolute. In the example described above, the single LOW user would use 1/11th of the bandwidth, the same as each of the ten MEDIUM users, but he still has the same guarantee against starvation as the latter ones. The fix also solves the second problem. If the CRITICAL level is filled up by bundle packets of that level, no lower level packets will be accepted any more. Suggested-by: Gergely Kiss <gergely.kiss@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e4bf4f76 |
|
14-May-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: simplify packet sequence number handling Although the sequence number in the TIPC protocol is 16 bits, we have until now stored it internally as an unsigned 32 bits integer. We got around this by always doing explicit modulo-65535 operations whenever we need to access a sequence number. We now make the incoming and outgoing sequence numbers to unsigned 16-bit integers, and remove the modulo operations where applicable. We also move the arithmetic inline functions for 16 bit integers to core.h, and the function buf_seqno() to msg.h, so they can easily be accessed from anywhere in the code. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dff29b1a |
|
02-Apr-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: eliminate delayed link deletion at link failover When a bearer is disabled manually, all its links have to be reset and deleted. However, if there is a remaining, parallel link ready to take over a deleted link's traffic, we currently delay the delete of the removed link until the failover procedure is finished. This is because the remaining link needs to access state from the reset link, such as the last received packet number, and any partially reassembled buffer, in order to perform a successful failover. In this commit, we do instead move the state data over to the new link, so that it can fulfill the procedure autonomously, without accessing any data on the old link. This means that we can now proceed and delete all pertaining links immediately when a bearer is disabled. This saves us from some unnecessary complexity in such situations. We also choose to change the confusing definitions CHANGEOVER_PROTOCOL, ORIGINAL_MSG and DUPLICATE_MSG to the more descriptive TUNNEL_PROTOCOL, FAILOVER_MSG and SYNCH_MSG respectively. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8b4ed863 |
|
24-Mar-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: eliminate race condition at dual link establishment Despite recent improvements, the establishment of dual parallel links still has a small glitch where messages can bypass each other. When the second link in a dual-link configuration is established, part of the first link's traffic will be steered over to the new link. Although we do have a mechanism to ensure that packets sent before and after the establishment of the new link arrive in sequence to the destination node, this is not enough. The arriving messages will still be delivered upwards in different threads, something entailing a risk of message disordering during the transition phase. To fix this, we introduce a synchronization mechanism between the two parallel links, so that traffic arriving on the new link cannot be added to its input queue until we are guaranteed that all pre-establishment messages have been delivered on the old, parallel link. This problem seems to always have been around, but its occurrence is so rare that it has not been noticed until recent intensive testing. Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3127a020 |
|
24-Mar-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: clean up handling of link congestion After the recent changes in message importance handling it becomes possible to simplify handling of messages and sockets when we encounter link congestion. We merge the function tipc_link_cong() into link_schedule_user(), and simplify the code of the latter. The code should now be easier to follow, especially regarding return codes and handling of the message that caused the situation. In case the scheduling function is unable to pre-allocate a wakeup message buffer, it now returns -ENOBUFS, which is a more correct code than the previously used -EHOSTUNREACH. Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e3eea1eb |
|
13-Mar-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: clean up handling of message priorities Messages transferred by TIPC are assigned an "importance priority", -an integer value indicating how to treat the message when there is link or destination socket congestion. There is no separate header field for this value. Instead, the message user values have been chosen in ascending order according to perceived importance, so that the message user field can be used for this. This is not a good solution. First, we have many more users than the needed priority levels, so we end up with treating more priority levels than necessary. Second, the user field cannot always accurately reflect the priority of the message. E.g., a message fragment packet should really have the priority of the enveloped user data message, and not the priority of the MSG_FRAGMENTER user. Until now, we have been working around this problem in different ways, but it is now time to implement a consistent way of handling such priorities, although still within the constraint that we cannot allocate any more bits in the regular data message header for this. In this commit, we define a new priority level, TIPC_SYSTEM_IMPORTANCE, that will be the only one used apart from the four (lower) user data levels. All non-data messages map down to this priority. Furthermore, we take some free bits from the MSG_FRAGMENTER header and allocate them to store the priority of the enveloped message. We then adjust the functions msg_importance()/msg_set_importance() so that they read/set the correct header fields depending on user type. This small protocol change is fully compatible, because the code at the receiving end of a link currently reads the importance level only from user data messages, where there is no change. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
05dcc5aa |
|
13-Mar-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: split link outqueue struct tipc_link contains one single queue for outgoing packets, where both transmitted and waiting packets are queued. This infrastructure is hard to maintain, because we need to keep a number of fields to keep track of which packets are sent or unsent, and the number of packets in each category. A lot of code becomes simpler if we split this queue into a transmission queue, where sent/unacknowledged packets are kept, and a backlog queue, where we keep the not yet sent packets. In this commit we do this separation. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cf2157f8 |
|
13-Mar-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: move message validation function to msg.c The function link_buf_validate() is in reality re-entrant and context independent, and will in later commits be called from several locations. Therefore, we move it to msg.c, make it outline and rename the it to tipc_msg_validate(). We also redesign the function to make proper use of pskb_may_pull() Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7764d6e8 |
|
13-Mar-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: add framework for node capabilities exchange The TIPC protocol spec has defined a 13 bit capability bitmap in the neighbor discovery header, as a means to maintain compatibility between different code and protocol generations. Until now this field has been unused. We now introduce the basic framework for exchanging capabilities between nodes at first contact. After exchange, a peer node's capabilities are stored as a 16 bit bitmap in struct tipc_node. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d0f91938 |
|
05-Mar-2015 |
Erik Hugne <erik.hugne@ericsson.com> |
tipc: add ip/udp media type The ip/udp bearer can be configured in a point-to-point mode by specifying both local and remote ip/hostname, or it can be enabled in multicast mode, where links are established to all tipc nodes that have joined the same multicast group. The multicast IP address is generated based on the TIPC network ID, but can be overridden by using another multicast address as remote ip. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
91e2eb56 |
|
27-Feb-2015 |
Erik Hugne <erik.hugne@ericsson.com> |
tipc: rename media/msg related definitions The TIPC_MEDIA_ADDR_SIZE and TIPC_MEDIA_ADDR_OFFSET names are misleading, as they actually define the size and offset of the whole media info field and not the address part. This patch does not have any functional changes. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cb1b7280 |
|
05-Feb-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: eliminate race condition at multicast reception In a previous commit in this series we resolved a race problem during unicast message reception. Here, we resolve the same problem at multicast reception. We apply the same technique: an input queue serializing the delivery of arriving buffers. The main difference is that here we do it in two steps. First, the broadcast link feeds arriving buffers into the tail of an arrival queue, which head is consumed at the socket level, and where destination lookup is performed. Second, if the lookup is successful, the resulting buffer clones are fed into a second queue, the input queue. This queue is consumed at reception in the socket just like in the unicast case. Both queues are protected by the same lock, -the one of the input queue. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c637c103 |
|
05-Feb-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: resolve race problem at unicast message reception TIPC handles message cardinality and sequencing at the link layer, before passing messages upwards to the destination sockets. During the upcall from link to socket no locks are held. It is therefore possible, and we see it happen occasionally, that messages arriving in different threads and delivered in sequence still bypass each other before they reach the destination socket. This must not happen, since it violates the sequentiality guarantee. We solve this by adding a new input buffer queue to the link structure. Arriving messages are added safely to the tail of that queue by the link, while the head of the queue is consumed, also safely, by the receiving socket. Sequentiality is secured per socket by only allowing buffers to be dequeued inside the socket lock. Since there may be multiple simultaneous readers of the queue, we use a 'filter' parameter to reduce the risk that they peek the same buffer from the queue, hence also reducing the risk of contention on the receiving socket locks. This solves the sequentiality problem, and seems to cause no measurable performance degradation. A nice side effect of this change is that lock handling in the functions tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that will enable future simplifications of those functions. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e3a77561 |
|
05-Feb-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: split up function tipc_msg_eval() The function tipc_msg_eval() is in reality doing two related, but different tasks. First it tries to find a new destination for named messages, in case there was no first lookup, or if the first lookup failed. Second, it does what its name suggests, evaluating the validity of the message and its destination, and returning an appropriate error code depending on the result. This is confusing, and in this commit we choose to break it up into two functions. A new function, tipc_msg_lookup_dest(), first attempts to find a new destination, if the message is of the right type. If this lookup fails, or if the message should not be subject to a second lookup, the already existing tipc_msg_reverse() is called. This function performs prepares the message for rejection, if applicable. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c5898636 |
|
05-Feb-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: reduce usage of context info in socket and link The most common usage of namespace information is when we fetch the own node addess from the net structure. This leads to a lot of passing around of a parameter of type 'struct net *' between functions just to make them able to obtain this address. However, in many cases this is unnecessary. The own node address is readily available as a member of both struct tipc_sock and tipc_link, and can be fetched from there instead. The fact that the vast majority of functions in socket.c and link.c anyway are maintaining a pointer to their respective base structures makes this option even more compelling. In this commit, we introduce the inline functions tsk_own_node() and link_own_node() to make it easy for functions to fetch the node address from those structs instead of having to pass along and dereference the namespace struct. In particular, we make calls to the msg_xx() functions in msg.{h,c} context independent by directly passing them the own node address as parameter when needed. Those functions should be regarded as leaves in the code dependency tree, and it is hence desirable to keep them namspace unaware. Apart from a potential positive effect on cache behavior, these changes make it easier to introduce the changes that will follow later in this series. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
34747539 |
|
09-Jan-2015 |
Ying Xue <ying.xue@windriver.com> |
tipc: make tipc node address support net namespace If net namespace is supported in tipc, each namespace will be treated as a separate tipc node. Therefore, every namespace must own its private tipc node address. This means the "tipc_own_addr" global variable of node address must be moved to tipc_net structure to satisfy the requirement. It's turned out that users also can assign node address for every namespace. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4ac1c8d0 |
|
09-Jan-2015 |
Ying Xue <ying.xue@windriver.com> |
tipc: name tipc name table support net namespace TIPC name table is used to store the mapping relationship between TIPC service name and socket port ID. When tipc supports namespace, it allows users to publish service names only owned by a certain namespace. Therefore, every namespace must have its private name table to prevent service names published to one namespace from being contaminated by other service names in another namespace. Therefore, The name table global variable (ie, nametbl) and its lock must be moved to tipc_net structure, and a parameter of namespace must be added for necessary functions so that they can obtain name table variable defined in tipc_net structure. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1da46568 |
|
09-Jan-2015 |
Ying Xue <ying.xue@windriver.com> |
tipc: make tipc broadcast link support net namespace TIPC broadcast link is statically established and its relevant states are maintained with the global variables: "bcbearer", "bclink" and "bcl". Allowing different namespace to own different broadcast link instances, these variables must be moved to tipc_net structure and broadcast link instances would be allocated and initialized when namespace is created. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
859fc7c0 |
|
09-Jan-2015 |
Ying Xue <ying.xue@windriver.com> |
tipc: cleanup core.c and core.h files Only the works of initializing and shutting down tipc module are done in core.h and core.c files, so all stuffs which are not closely associated with the two tasks should be moved to appropriate places. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2f55c437 |
|
09-Jan-2015 |
Ying Xue <ying.xue@windriver.com> |
tipc: remove unnecessary wrapper functions of kernel timer APIs Not only some wrapper function like k_term_timer() is empty, but also some others including k_start_timer() and k_cancel_timer() don't return back any value to its caller, what's more, there is no any component in the kernel world to do such thing. Therefore, these timer interfaces defined in tipc module should be purged. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a6ca1094 |
|
25-Nov-2014 |
Ying Xue <ying.xue@windriver.com> |
tipc: use generic SKB list APIs to manage TIPC outgoing packet chains Use standard SKB list APIs associated with struct sk_buff_head to manage socket outgoing packet chain and name table outgoing packet chain, having relevant code simpler and more readable. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
58dc55f2 |
|
25-Nov-2014 |
Ying Xue <ying.xue@windriver.com> |
tipc: use generic SKB list APIs to manage link transmission queue Use standard SKB list APIs associated with struct sk_buff_head to manage link transmission queue, having relevant code more clean. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
58311d16 |
|
25-Nov-2014 |
Ying Xue <ying.xue@windriver.com> |
tipc: eliminate two pseudo message types of BUNDLE_OPEN and BUNDLE_CLOSED The pseudo message types of BUNDLE_CLOSED as well as BUNDLE_OPEN are used to flag whether or not more messages can be bundled into a data packet in the outgoing transmission queue. Obviously, no more messages can be appended after the packet has been sent and is waiting to be acknowledged and deleted. These message types do in reality represent a send-side local implementation flag, and are not defined as part of the protocol. It is therefore safe to move it to to where it belongs, that is, the control area (TIPC_SKB_CB) of the buffer. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
45dcc687 |
|
14-Nov-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
tipc_msg_build(): pass msghdr instead of its ->msg_iov Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
50100a5e |
|
22-Aug-2014 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: use pseudo message to wake up sockets after link congestion The current link implementation keeps a linked list of blocked ports/ sockets that is populated when there is link congestion. The purpose of this is to let the link know which users to wake up when the congestion abates. This adds unnecessary complexity to the data structure and the code, since it forces us to involve the link each time we want to delete a socket. It also forces us to grab the spinlock port_lock within the scope of node_lock. We want to get rid of this direct dependence, as well as the deadlock hazard resulting from the usage of port_lock. In this commit, we instead let the link keep list of a "wakeup" pseudo messages for use in such situations. Those messages are sent to the pending sockets via the ordinary message reception path, and wake up the socket's owner when they are received. This enables us to get rid of the 'waiting_ports' linked lists in struct tipc_port that manifest this direct reference. As a consequence, we can eliminate another BH entry into the socket, and hence the need to grab port_lock. This is a further step in our effort to remove port_lock altogether. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1dd0bd2b |
|
22-Aug-2014 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: introduce new function tipc_msg_create() The function tipc_msg_init() has turned out to be of limited value in many cases. It take too few parameters to be usable for creating a complete message, it makes too many assumptions about what the message should be used for, and it does not allocate any buffer to be returned to the caller. Therefore, we now introduce the new function tipc_msg_create(), which takes all the parameters needed to create a full message, and returns a buffer of the requested size. The new function will be very useful for the changes we will be doing in later commits in this series. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9fbfb8b1 |
|
16-Jul-2014 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: rename temporarily named functions After the previous commit, we can now give the functions with temporary names, such as tipc_link_xmit2(), tipc_msg_build2() etc., their proper names. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c4116e10 |
|
16-Jul-2014 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: remove unreferenced functions We can now remove a number of functions which have become obsolete and unreferenced through this commit series. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
078bec82 |
|
16-Jul-2014 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: add new functions for multicast and broadcast distribution We add a new broadcast link transmit function in bclink.c and a new receive function in socket.c. The purpose is to move the branching between external and internal destination down to the link layer, just as we have done with unicast in earlier commits. We also make use of the new link-independent fragmentation support that was introduced in an earlier commit series. This gives a shorter and simpler code path, and makes it possible to obtain copy-free buffer delivery to all node local destination sockets. The new transmission code is added in parallel with the existing one, and will be used by the socket multicast send function in the next commit in this series. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a379074 |
|
25-Jun-2014 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: introduce message evaluation function When a message arrives in a node and finds no destination socket, we may need to drop it, reject it, or forward it after a secondary destination lookup. The latter two cases currently results in a code path that is perceived as complex, because it follows a deep call chain via obscure functions such as net_route_named_msg() and net_route_msg(). We now introduce a function, tipc_msg_eval(), that takes the decision about whether such a message should be rejected or forwarded, but leaves it to the caller to actually perform the indicated action. If the decision is 'reject', it is still the task of the recently introduced function tipc_msg_reverse() to take the final decision about whether the message is rejectable or not. In the latter case it drops the message. As a result of this change, we can finally eliminate the function net_route_named_msg(), and hence become independent of net_route_msg(). Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8db1bae3 |
|
25-Jun-2014 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: separate building and sending of rejected messages The way we build and send rejected message is currenty perceived as hard to follow, partly because we let the transmission go via deep call chains through functions such as tipc_reject_msg() and net_route_msg(). We want to remove those functions, and make the call sequences shallower and simpler. For this purpose, we separate building and sending of rejected messages. We build the reject message using the new function tipc_msg_reverse(), and let the transmission go via the newly introduced tipc_link_xmit2() function, as all transmission eventually will do. We also ensure that all calls to tipc_link_xmit2() are made outside port_lock/bh_lock_sock. Finally, we replace all calls to tipc_reject_msg() with the two new calls at all locations in the code that we want to keep. The remaining calls are made from code that we are planning to remove, along with tipc_reject_msg() itself. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
067608e9 |
|
25-Jun-2014 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: introduce direct iovec to buffer chain fragmentation function Fragmentation at message sending is currently performed in two places in link.c, depending on whether data to be transmitted is delivered in the form of an iovec or as a big sk_buff. Those functions are also tightly entangled with the send functions that are using them. We now introduce a re-entrant, standalone function, tipc_msg_build2(), that builds a packet chain directly from an iovec. Each fragment is sized according to the MTU value given by the caller, and is prepended with a correctly built fragment header, when needed. The function is independent from who is calling and where the chain will be delivered, as long as the caller is able to indicate a correct MTU. The function is tested, but not called by anybody yet. Since it is incompatible with the existing tipc_msg_build(), and we cannot yet remove that function, we have given it a temporary name. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4f1688b2 |
|
25-Jun-2014 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: introduce send functions for chained buffers in link The current link implementation provides several different transmit functions, depending on the characteristics of the message to be sent: if it is an iovec or an sk_buff, if it needs fragmentation or not, if the caller holds the node_lock or not. The permutation of these options gives us an unwanted amount of unnecessarily complex code. As a first step towards simplifying the send path for all messages, we introduce two new send functions at link level, tipc_link_xmit2() and __tipc_link_xmit2(). The former looks up a link to the message destination, and if one is found, it grabs the node lock and calls the second function, which works exclusively inside the node lock protection. If no link is found, and the destination is on the same node, it delivers the message directly to the local destination socket. The new functions take a buffer chain where all packet headers are already prepared, and the correct MTU has been used. These two functions will later replace all other link-level transmit functions. The functions are not backwards compatible, so we have added them as new functions with temporary names. They are tested, but have no users yet. Those will be added later in this series. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
37e22164 |
|
14-May-2014 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: rename and move message reassembly function The function tipc_link_frag_rcv() is in reality a re-entrant generic message reassemby function that has nothing in particular to do with the link, where it is defined now. This becomes obvious when we see the need to call the function from other places in the code. In this commit rename it to tipc_buf_append() and move it to the file msg.c. We also simplify its signature by moving the tail pointer to the control block of the head buffer, hence making the head buffer self-contained. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
40ba3cdf |
|
06-Nov-2013 |
Erik Hugne <erik.hugne@ericsson.com> |
tipc: message reassembly using fragment chain When the first fragment of a long data data message is received on a link, a reassembly buffer large enough to hold the data from this and all subsequent fragments of the message is allocated. The payload of each new fragment is copied into this buffer upon arrival. When the last fragment is received, the reassembled message is delivered upwards to the port/socket layer. Not only is this an inefficient approach, but it may also cause bursts of reassembly failures in low memory situations. since we may fail to allocate the necessary large buffer in the first place. Furthermore, after 100 subsequent such failures the link will be reset, something that in reality aggravates the situation. To remedy this problem, this patch introduces a different approach. Instead of allocating a big reassembly buffer, we now append the arriving fragments to a reassembly chain on the link, and deliver the whole chain up to the socket layer once the last fragment has been received. This is safe because the retransmission layer of a TIPC link always delivers packets in strict uninterrupted order, to the reassembly layer as to all other upper layers. Hence there can never be more than one fragment chain pending reassembly at any given time in a link, and we can trust (but still verify) that the fragments will be chained up in the correct order. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9446b87a |
|
17-Oct-2013 |
Ying Xue <ying.xue@windriver.com> |
tipc: remove iovec length parameter from all sending functions tipc_msg_build() now copies message data from iovec to skb_buff using memcpy_fromiovecend(), which doesn't need to be passed the iovec length to perform the copying. So we remove the parameter indicating iovec length in all functions where TIPC messages are built and sent. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ae8509c4 |
|
17-Jun-2013 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
tipc: cosmetic realignment of function arguments No runtime code changes here. Just a realign of the function arguments to start where the 1st one was, and fit as many args as can be put in an 80 char line. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f1733d75 |
|
17-Jun-2013 |
Ying Xue <ying.xue@windriver.com> |
tipc: remove user_port instance from tipc_port structure After the native API has been completely removed, the 'user_port' field in struct tipc_port becomes unused, and can be removed. As a consequence, the "usrmem" argument in tipc_msg_build() is no longer needed, and so we remove that one too. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
617d3c7a |
|
30-Apr-2012 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
tipc: compress out gratuitous extra carriage returns Some of the comment blocks are floating in limbo between two functions, or between blocks of code. Delete the extra line feeds between any comment and its associated following block of code, to be consistent with the majority of the rest of the kernel. Also delete trailing newlines at EOF and fix a couple trivial typos in existing comments. This is a 100% cosmetic change with no runtime impact. We get rid of over 500 lines of non-code, and being blank line deletes, they won't even show up as noise in git blame. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
c74a4611 |
|
02-Nov-2011 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Remove duplicate check of message destination node Eliminates a check in the processing of TIPC messages arriving from off node that ensures the message is destined for this node, since this check duplicates an earlier check. (The check would be necessary if TIPC needed to be able to route incoming messages to another node, but the elimination of multi-cluster support means that this never happens and all incoming messages are consumed by the receiving node.) Note: This change involves the elimination of a single "if" statement with a large "then" clause; consequently, a significant number of lines end up getting re-indented. In addition, a simple message header access routine that is no longer referenced is eliminated. However, the only functional change is the elimination of the single check described above. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
fc0eea69 |
|
28-Oct-2011 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Introduce node signature field in neighbor discovery message Adds support for the new "node signature" in neighbor discovery messages, which is a 16 bit identifier chosen randomly when TIPC is initialized. This field makes it possible for nodes receiving a neighbor discovery message to detect if multiple neighboring nodes are using the same network address (i.e. <Z.C.N>), even when the messages are arriving on different interfaces. This first phase of node signature support creates the signature, incorporates it into outgoing neighbor discovery messages, and tracks the signature used by valid neighbors. An upcoming patch builds on this foundation to implement the improved duplicate neighbor detection checking. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
3d749a6a |
|
07-Oct-2011 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Hide media-specific addressing details from generic bearer code Reworks TIPC's media address data structure and associated processing routines to transfer all media-specific details of address conversion to the associated TIPC media adaptation code. TIPC's generic bearer code now only needs to know which media type an address is associated with and whether or not it is a broadcast address, and totally ignores the "value" field that contains the actual media-specific addressing info. These changes eliminate the need for a number of endianness conversion operations and will make it easier for TIPC to support new media types in the future. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
741d9eb7 |
|
31-May-2011 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Cleanup of message header size terminology Performs cosmetic cleanup of the symbolic names used to specify TIPC payload message header sizes. The revised names now more accurately reflect the payload messages in which they can appear. In addition, several places where these payload message symbol names were being used to create non-payload messages have been updated to use the proper internal message symbolic name. No functional changes are introduced by this rework. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
7eb878ed |
|
25-May-2011 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Eliminate redundant masking in message header routines Gets rid of unnecessary masking in two routines that set TIPC message header fields. (The msg_set_bits() routine already takes care of masking the new value to the correct size.) Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
74d33b32 |
|
24-May-2011 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Eliminate message header routines for caching destination node Gets rid of a pair of routines that provide support for temporarily caching the destination node for a message in the associated message buffer's application handle, since this capability is no longer used. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
26896904 |
|
21-Apr-2011 |
Allan Stephens <Allan.Stephens@windriver.com> |
tipc: Avoid recomputation of outgoing message length Rework TIPC's message sending routines to take advantage of the total amount of data value passed to it by the kernel socket infrastructure. This change eliminates the need for TIPC to compute the size of outgoing messages itself, as well as the check for an oversize message in tipc_msg_build(). In addition, this change warrants an explanation: - res = send_packet(NULL, sock, &my_msg, 0); + res = send_packet(NULL, sock, &my_msg, bytes_to_send); Previously, the final argument to send_packet() was ignored (since the amount of data being sent was recalculated by a lower-level routine) and we could just pass in a dummy value (0). Now that the recalculation is being eliminated, the argument value being passed to send_packet() is significant and we have to supply the actual amount of data we want to send. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
92138d1f |
|
08-Apr-2011 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Cosmetic consolidation of internal message type definitions Half of the #define entries in msg.h were down at the bottom of the header, instead of up at the top before any of the static inlines etc. Relocate them up to the top, to be consistent with the other normal linux header file layout conventions. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
19f53d2c |
|
08-Apr-2011 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Eliminate unused routing message definitions Gets rid of unused constants defining the types used in routing messages. These messages no longer exist in TIPC now that multicluster and multizone support has been eliminated. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
cc4c4353 |
|
08-Apr-2011 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Update comments in message header include file Removes comments in TIPC's message header include file that are outdated and/or unnecessary. Also introduces short comments (or supplements existing ones) to better describe several set of existing symbolic constants. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
390bce42 |
|
11-Mar-2011 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Eliminate obsolete routine for handling routed messages Eliminates a routine that is used in handling messages arriving from another cluster or zone. Such messages can no longer be received by TIPC now that multi-cluster and multi-zone network support has been eliminated. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
7945c1fb |
|
11-Mar-2011 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Eliminate remaining support for routing table messages Gets rid of all remaining code relating to ROUTE_DISTRIBUTOR messages. These messages were only used in multi-cluster and multi-zone networks, which TIPC no longer supports. (For safety, TIPC now treats such messages the same way that it handles other unrecognized messages.) Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
d901a42b |
|
28-Feb-2011 |
Allan Stephens <Allan.Stephens@windriver.com> |
tipc: Eliminate unnecessary constant for neighbor discovery msg size Eliminates an unnecessary constant that defines the size of a LINK_CONFIG message, and uses one of the existing standard message size symbols in its place. (The defunct constant was located in the wrong place anyway, since it was grouped with other constants that define message users instead of message sizes.) Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
77f167fc |
|
28-Feb-2011 |
Allan Stephens <Allan.Stephens@windriver.com> |
tipc: make msg_set_redundant_link() consistent with other set ops All the other boolean like msg_set_X(m) operations don't export both a msg_set_X(a) and a msg_clear_X(m), but instead just have the single msg_set_X(m, val) variant. Make the redundant_link one consistent by having the set take a value, and delete the msg_clear_redundant_link() anomoly. This is a cosmetic change and should not change behaviour. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
e7b3acb6 |
|
27-Feb-2011 |
Allan Stephens <Allan.Stephens@windriver.com> |
tipc: Eliminate timestamp from link protocol messages Removes support for the timestamp field of TIPC's link protocol messages. This field was previously used to hold an OS-dependent timestamp value that was used to assist in debugging early versions of TIPC. The field has now been deemed unnecessary and has been removed from the latest TIPC specification. This change has no impact on the operation of TIPC since the field was set by TIPC, but never referenced. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
2e07dda1 |
|
25-Jan-2011 |
Allan Stephens <Allan.Stephens@windriver.com> |
tipc: Remove unused message header field for requested number of links Eliminates support for the "number of requested links" field in a neighbor discovery message. This field was never used and has been removed from the TIPC 2.0 protocol specification. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
741de3e9 |
|
25-Jan-2011 |
Allan Stephens <Allan.Stephens@windriver.com> |
tipc: Remove support for per-connection message sequence numbering Eliminates TIPC's prototype support for message sequence numbering on routable connections (i.e. connections requiring more than one hop). This capability isn't currently used, and can be removed since TIPC only supports systems in which all inter-node communication can be achieved in a single hop. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
0e65967e |
|
31-Dec-2010 |
Allan Stephens <Allan.Stephens@windriver.com> |
tipc: cleanup various cosmetic whitespace issues Cleans up TIPC's source code to eliminate deviations from generally accepted coding conventions relating to leading/trailing white space and white space around commas, braces, cases, and sizeof. These changes are purely cosmetic and do not alter the operation of TIPC in any way. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
51a8e4de |
|
31-Dec-2010 |
Allan Stephens <Allan.Stephens@windriver.com> |
tipc: Remove prototype code for supporting inter-cluster routing Eliminates routines and data structures that were intended to allow TIPC to route messages to other clusters. Currently, TIPC supports only networks consisting of a single cluster within a single zone, so this code is unnecessary. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
08c80e9a |
|
31-Dec-2010 |
Allan Stephens <Allan.Stephens@windriver.com> |
tipc: Remove prototype code for supporting slave nodes Simplifies routines and data structures that were intended to allow TIPC to support slave nodes (i.e. nodes that did not have links to all of the other nodes in its cluster, forcing TIPC to route messages that it could not deliver directly through a non-slave node). Currently, TIPC supports only networks containing non-slave nodes, so this code is unnecessary. Note: The latest edition of the TIPC 2.0 Specification has eliminated the concept of slave nodes entirely. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d265fef6 |
|
29-Nov-2010 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Remove obsolete native API files and exports As part of the removal of TIPC's native API support it is no longer necessary for TIPC to export symbols for routines that can be called by kernel-based applications, nor for it to have header files that kernel-based applications can include to access the declarations for those routines. This commit eliminates the exporting of symbols by TIPC and migrates the contents of each obsolete native API include file into its corresponding non-native API equivalent. The code which was migrated in this commit was migrated intact, in that there are no technical changes combined with the relocation. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a02cec21 |
|
22-Sep-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: return operator cleanup Change "return (EXPR);" to "return EXPR;" return is not a function, parentheses are not required. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
23461e83 |
|
11-May-2010 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Reduce footprint by un-inlining tipc_msg_* routines Convert tipc_msg_* inline routines that are more than one line into standard functions, thereby eliminating some repeated code. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c68ca7b7 |
|
11-May-2010 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: add tipc_ prefix to fcns targeted for un-inlining These functions have enough code in them such that they seem like sensible targets for un-inlining. Prior to doing that, this adds the tipc_ prefix to the functions, so that in the event of a panic dump or similar, the subsystem from which the functions come from is immediately clear. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
40aecb1b |
|
04-Jun-2008 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Message rejection rework preparatory changes This patch defines a few new message header manipulation routines, and generalizes the usefulness of another, in preparation for upcoming rework of TIPC's message rejection code. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bd784533 |
|
04-Jun-2008 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Expand link sequence gap field to 13 bits This patch increases the "sequence gap" field of the LINK_PROTOCOL message header from 8 bits to 13 bits (utilizing 5 previously unused 0 bits). This ensures that the field is big enough to indicate the loss of up to 8191 consecutive messages on the link, thereby accommodating the current worst-case scenario of 4000 lost messages. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
75715217 |
|
04-Jun-2008 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Optimize message initialization routine This patch eliminates the rarely-used "error code" argument when initializing a TIPC message header, since the default value of zero is the desired result in most cases; the few exceptional cases now set the error code explicitly. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
becf3da2 |
|
26-Apr-2008 |
Al Viro <viro@zeniv.linux.org.uk> |
tipc: endianness annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c0cb7ef0 |
|
06-Mar-2008 |
Allan Stephens <allan.stephens@windriver.com> |
[TIPC]: Enhancements to message header writing This patch makes two enhancements to the routine used to set bit fields within a TIPC message header: 1) It now ignores any bits of the new field value that are not covered by the mask being used. (Previously, if the new value exceeded the size of the mask the extra bits could corrupt other fields in the message header word being updated.) 2) The code has been optimized to minimize the number of run-time endianness conversion operations by leveraging the fact that the mask (and, in some cases, the value as well) is constant and the necessary conversion can be performed by the compiler. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
37695420 |
|
06-Mar-2008 |
Allan Stephens <allan.stephens@windriver.com> |
[TIPC]: Use correct bitmask when setting version This patch ensures that the 3-bit version field of the TIPC message header is masked correctly when written into a message. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
06d82c91 |
|
06-Mar-2008 |
Allan Stephens <allan.stephens@windriver.com> |
[TIPC]: Minor cleanup of message header code This patch eliminates some unused or duplicate message header symbols, and fixes up the comments and/or location of a few other symbols. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8c869655 |
|
06-Mar-2008 |
Allan Stephens <allan.stephens@windriver.com> |
[TIPC]: Removal of message header option code This patch removes code associated with optional, user-specified fields of the TIPC message header. Such fields were never utilized by TIPC, and have now been removed from the protocol specification. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
86121fe5 |
|
07-Feb-2008 |
Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> |
[TIPC]: Kill unused static inline (x5) All these static inlines are unused: in_own_zone 1 (net/tipc/addr.h) msg_dataoctet 1 (net/tipc/msg.h) msg_direct 1 (include/net/tipc/tipc_msg.h) msg_options 1 (include/net/tipc/tipc_msg.h) tipc_nmap_get 1 (net/tipc/bcast.h) Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
582ee43d |
|
26-Jul-2007 |
Al Viro <viro@ftp.linux.org.uk> |
net/* misc endianness annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1f9eda7e |
|
24-Apr-2007 |
Allan Stephens <allan.stephens@windriver.com> |
[TIPC]: Enhancements to msg_set_bits() routine This patch makes two enhancements to msg_set_bits(): 1) It now ignores any bits of the new field value that are not covered by the mask being used. (Previously, if the new value exceeded the size of the mask the extra bits could corrupt other fields in the message header word being updated.) 2) The code has been optimized to minimize the number of run-time endianness conversion operations by leveraging the fact that the mask (and, in some cases, the value as well) is constant and the necessary conversion can be performed by the compiler. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Jon Paul Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
27d7ff46 |
|
31-Mar-2007 |
Arnaldo Carvalho de Melo <acme@ghostprotocols.net> |
[SK_BUFF]: Introduce skb_copy_to_linear_data{_offset} To clearly state the intent of copying to linear sk_buffs, _offset being a overly long variant but interesting for the sake of saving some bytes. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
|
#
c4307285 |
|
09-Feb-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] TIPC: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4323add6 |
|
17-Jan-2006 |
Per Liden <per.liden@ericsson.com> |
[TIPC] Avoid polluting the global namespace This patch adds a tipc_ prefix to all externally visible symbols. Signed-off-by: Per Liden <per.liden@ericsson.com>
|
#
5f7c3ff6 |
|
13-Jan-2006 |
Jon Maloy <jon.maloy@ericsson.com> |
[TIPC] Minor changes to #includes Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
|
#
593a5f22 |
|
11-Jan-2006 |
Per Liden <per.liden@nospam.ericsson.com> |
[TIPC] More updates of file headers Updated copyright notice to include the year the file was actually created. Information about file creation dates was extracted from the files in the old CVS repository at tipc.sourceforge.net. Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
|
#
9da1c8b6 |
|
11-Jan-2006 |
Per Liden <per.liden@nospam.ericsson.com> |
[TIPC] Update of file headers The copyright statements from different parts of Ericsson have been merged into one. Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
|
#
9ea1fd3c |
|
11-Jan-2006 |
Per Liden <per.liden@nospam.ericsson.com> |
[TIPC] License header update The license header in each file now more clearly state that this code is licensed under a dual BSD/GPL. Before this was only evident if you looked at the MODULE_LICENSE line in core.c. Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
|
#
b97bf3fd |
|
02-Jan-2006 |
Per Liden <per.liden@nospam.ericsson.com> |
[TIPC] Initial merge TIPC (Transparent Inter Process Communication) is a protocol designed for intra cluster communication. For more information see http://tipc.sourceforge.net Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
|