#
1ef6f7c9 |
|
17-Sep-2020 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: add automatic session key exchange With support from the master key option in the previous commit, it becomes easy to make frequent updates/exchanges of session keys between authenticated cluster nodes. Basically, there are two situations where the key exchange will take in place: - When a new node joins the cluster (with the master key), it will need to get its peer's TX key, so that be able to decrypt further messages from that peer. - When a new session key is generated (by either user manual setting or later automatic rekeying feature), the key will be distributed to all peer nodes in the cluster. A key to be exchanged is encapsulated in the data part of a 'MSG_CRYPTO /KEY_DISTR_MSG' TIPC v2 message, then xmit-ed as usual and encrypted by using the master key before sending out. Upon receipt of the message it will be decrypted in the same way as regular messages, then attached as the sender's RX key in the receiver node. In this way, the key exchange is reliable by the link layer, as well as security, integrity and authenticity by the crypto layer. Also, the forward security will be easily achieved by user changing the master key actively but this should not be required very frequently. The key exchange feature is independent on the presence of a master key Note however that the master key still is needed for new nodes to be able to join the cluster. It is also optional, and can be turned off/on via the sysfs: 'net/tipc/key_exchange_enabled' [default 1: enabled]. Backward compatibility is guaranteed because for nodes that do not have master key support, key exchange using master key ie. tx_key = 0 if any will be shortly discarded at the message validation step. In other words, the key exchange feature will be automatically disabled to those nodes. v2: fix the "implicit declaration of function 'tipc_crypto_key_flush'" error in node.c. The function only exists when built with the TIPC "CONFIG_TIPC_CRYPTO" option. v3: use 'info->extack' for a message emitted due to netlink operations instead (- David's comment). Reported-by: kernel test robot <lkp@intel.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cad2929d |
|
17-Jun-2020 |
Hoang Huu Le <hoang.h.le@dektech.com.au> |
tipc: update a binding service via broadcast Currently, updating binding table (add service binding to name table/withdraw a service binding) is being sent over replicast. However, if we are scaling up clusters to > 100 nodes/containers this method is less affection because of looping through nodes in a cluster one by one. It is worth to use broadcast to update a binding service. This way, the binding table can be updated on all peer nodes in one shot. Broadcast is used when all peer nodes, as indicated by a new capability flag TIPC_NAMED_BCAST, support reception of this message type. Four problems need to be considered when introducing this feature. 1) When establishing a link to a new peer node we still update this by a unicast 'bulk' update. This may lead to race conditions, where a later broadcast publication/withdrawal bypass the 'bulk', resulting in disordered publications, or even that a withdrawal may arrive before the corresponding publication. We solve this by adding an 'is_last_bulk' bit in the last bulk messages so that it can be distinguished from all other messages. Only when this message has arrived do we open up for reception of broadcast publications/withdrawals. 2) When a first legacy node is added to the cluster all distribution will switch over to use the legacy 'replicast' method, while the opposite happens when the last legacy node leaves the cluster. This entails another risk of message disordering that has to be handled. We solve this by adding a sequence number to the broadcast/replicast messages, so that disordering can be discovered and corrected. Note however that we don't need to consider potential message loss or duplication at this protocol level. 3) Bulk messages don't contain any sequence numbers, and will always arrive in order. Hence we must exempt those from the sequence number control and deliver them unconditionally. We solve this by adding a new 'is_bulk' bit in those messages so that they can be recognized. 4) Legacy messages, which don't contain any new bits or sequence numbers, but neither can arrive out of order, also need to be exempt from the initial synchronization and sequence number check, and delivered unconditionally. Therefore, we add another 'is_not_legacy' bit to all new messages so that those can be distinguished from legacy messages and the latter delivered directly. v1->v2: - fix warning issue reported by kbuild test robot <lkp@intel.com> - add santiy check to drop the publication message with a sequence number that is lower than the agreed synch point Signed-off-by: kernel test robot <lkp@intel.com> Signed-off-by: Hoang Huu Le <hoang.h.le@dektech.com.au> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e1f32190 |
|
07-Nov-2019 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: add support for AEAD key setting via netlink This commit adds two netlink commands to TIPC in order for user to be able to set or remove AEAD keys: - TIPC_NL_KEY_SET - TIPC_NL_KEY_FLUSH When the 'KEY_SET' is given along with the key data, the key will be initiated and attached to TIPC crypto. On the other hand, the 'KEY_FLUSH' command will remove all existing keys if any. Acked-by: Ying Xue <ying.xue@windreiver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fc1b6d6d |
|
07-Nov-2019 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: introduce TIPC encryption & authentication This commit offers an option to encrypt and authenticate all messaging, including the neighbor discovery messages. The currently most advanced algorithm supported is the AEAD AES-GCM (like IPSec or TLS). All encryption/decryption is done at the bearer layer, just before leaving or after entering TIPC. Supported features: - Encryption & authentication of all TIPC messages (header + data); - Two symmetric-key modes: Cluster and Per-node; - Automatic key switching; - Key-expired revoking (sequence number wrapped); - Lock-free encryption/decryption (RCU); - Asynchronous crypto, Intel AES-NI supported; - Multiple cipher transforms; - Logs & statistics; Two key modes: - Cluster key mode: One single key is used for both TX & RX in all nodes in the cluster. - Per-node key mode: Each nodes in the cluster has one specific TX key. For RX, a node requires its peers' TX key to be able to decrypt the messages from those peers. Key setting from user-space is performed via netlink by a user program (e.g. the iproute2 'tipc' tool). Internal key state machine: Attach Align(RX) +-+ +-+ | V | V +---------+ Attach +---------+ | IDLE |---------------->| PENDING |(user = 0) +---------+ +---------+ A A Switch| A | | | | | | Free(switch/revoked) | | (Free)| +----------------------+ | |Timeout | (TX) | | |(RX) | | | | | | v | +---------+ Switch +---------+ | PASSIVE |<----------------| ACTIVE | +---------+ (RX) +---------+ (user = 1) (user >= 1) The number of TFMs is 10 by default and can be changed via the procfs 'net/tipc/max_tfms'. At this moment, as for simplicity, this file is also used to print the crypto statistics at runtime: echo 0xfff1 > /proc/sys/net/tipc/max_tfms The patch defines a new TIPC version (v7) for the encryption message (- backward compatibility as well). The message is basically encapsulated as follows: +----------------------------------------------------------+ | TIPCv7 encryption | Original TIPCv2 | Authentication | | header | packet (encrypted) | Tag | +----------------------------------------------------------+ The throughput is about ~40% for small messages (compared with non- encryption) and ~9% for large messages. With the support from hardware crypto i.e. the Intel AES-NI CPU instructions, the throughput increases upto ~85% for small messages and ~55% for large messages. By default, the new feature is inactive (i.e. no encryption) until user sets a key for TIPC. There is however also a new option - "TIPC_CRYPTO" in the kernel configuration to enable/disable the new code when needed. MAINTAINERS | add two new files 'crypto.h' & 'crypto.c' in tipc Acked-by: Ying Xue <ying.xue@windreiver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4cbf8ac2 |
|
07-Nov-2019 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: enable creating a "preliminary" node When user sets RX key for a peer not existing on the own node, a new node entry is needed to which the RX key will be attached. However, since the peer node address (& capabilities) is unknown at that moment, only the node-ID is provided, this commit allows the creation of a node with only the data that we call as “preliminary”. A preliminary node is not the object of the “tipc_node_find()” but the “tipc_node_find_by_id()”. Once the first message i.e. LINK_CONFIG comes from that peer, and is successfully decrypted by the own node, the actual peer node data will be properly updated and the node will function as usual. In addition, the node timer always starts when a node object is created so if a preliminary node is not used, it will be cleaned up. The later encryption functions will also use the node timer and be able to create a preliminary node automatically when needed. Acked-by: Ying Xue <ying.xue@windreiver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c0bceb97 |
|
30-Oct-2019 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: add smart nagle feature We introduce a feature that works like a combination of TCP_NAGLE and TCP_CORK, but without some of the weaknesses of those. In particular, we will not observe long delivery delays because of delayed acks, since the algorithm itself decides if and when acks are to be sent from the receiving peer. - The nagle property as such is determined by manipulating a new 'maxnagle' field in struct tipc_sock. If certain conditions are met, 'maxnagle' will define max size of the messages which can be bundled. If it is set to zero no messages are ever bundled, implying that the nagle property is disabled. - A socket with the nagle property enabled enters nagle mode when more than 4 messages have been sent out without receiving any data message from the peer. - A socket leaves nagle mode whenever it receives a data message from the peer. In nagle mode, messages smaller than 'maxnagle' are accumulated in the socket write queue. The last buffer in the queue is marked with a new 'ack_required' bit, which forces the receiving peer to send a CONN_ACK message back to the sender upon reception. The accumulated contents of the write queue is transmitted when one of the following events or conditions occur. - A CONN_ACK message is received from the peer. - A data message is received from the peer. - A SOCK_WAKEUP pseudo message is received from the link level. - The write queue contains more than 64 1k blocks of data. - The connection is being shut down. - There is no CONN_ACK message to expect. I.e., there is currently no outstanding message where the 'ack_required' bit was set. As a consequence, the first message added after we enter nagle mode is always sent directly with this bit set. This new feature gives a 50-100% improvement of throughput for small (i.e., less than MTU size) messages, while it might add up to one RTT to latency time when the socket is in nagle mode. Acked-by: Ying Xue <ying.xue@windreiver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f73b1281 |
|
28-Oct-2019 |
Hoang Le <hoang.h.le@dektech.com.au> |
tipc: improve throughput between nodes in netns Currently, TIPC transports intra-node user data messages directly socket to socket, hence shortcutting all the lower layers of the communication stack. This gives TIPC very good intra node performance, both regarding throughput and latency. We now introduce a similar mechanism for TIPC data traffic across network namespaces located in the same kernel. On the send path, the call chain is as always accompanied by the sending node's network name space pointer. However, once we have reliably established that the receiving node is represented by a namespace on the same host, we just replace the namespace pointer with the receiving node/namespace's ditto, and follow the regular socket receive patch though the receiving node. This technique gives us a throughput similar to the node internal throughput, several times larger than if we let the traffic go though the full network stacks. As a comparison, max throughput for 64k messages is four times larger than TCP throughput for the same type of traffic. To meet any security concerns, the following should be noted. - All nodes joining a cluster are supposed to have been be certified and authenticated by mechanisms outside TIPC. This is no different for nodes/namespaces on the same host; they have to auto discover each other using the attached interfaces, and establish links which are supervised via the regular link monitoring mechanism. Hence, a kernel local node has no other way to join a cluster than any other node, and have to obey to policies set in the IP or device layers of the stack. - Only when a sender has established with 100% certainty that the peer node is located in a kernel local namespace does it choose to let user data messages, and only those, take the crossover path to the receiving node/namespace. - If the receiving node/namespace is removed, its namespace pointer is invalidated at all peer nodes, and their neighbor link monitoring will eventually note that this node is gone. - To ensure the "100% certainty" criteria, and prevent any possible spoofing, received discovery messages must contain a proof that the sender knows a common secret. We use the hash mix of the sending node/namespace for this purpose, since it can be accessed directly by all other namespaces in the kernel. Upon reception of a discovery message, the receiver checks this proof against all the local namespaces'hash_mix:es. If it finds a match, that, along with a matching node id and cluster id, this is deemed sufficient proof that the peer node in question is in a local namespace, and a wormhole can be opened. - We should also consider that TIPC is intended to be a cluster local IPC mechanism (just like e.g. UNIX sockets) rather than a network protocol, and hence we think it can justified to allow it to shortcut the lower protocol layers. Regarding traceability, we should notice that since commit 6c9081a3915d ("tipc: add loopback device tracking") it is possible to follow the node internal packet flow by just activating tcpdump on the loopback interface. This will be true even for this mechanism; by activating tcpdump on the involved nodes' loopback interfaces their inter-name space messaging can easily be tracked. v2: - update 'net' pointer when node left/rejoined v3: - grab read/write lock when using node ref obj v4: - clone traffics between netns to loopback Suggested-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4929a932 |
|
23-Jul-2019 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: optimize link synching mechanism This commit along with the next one are to resolve the issues with the link changeover mechanism. See that commit for details. Basically, for the link synching, from now on, we will send only one single ("dummy") SYNCH message to peer. The SYNCH message does not contain any data, just a header conveying the synch point to the peer. A new node capability flag ("TIPC_TUNNEL_ENHANCED") is introduced for backward compatible! Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Suggested-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9195948f |
|
03-Apr-2019 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: improve TIPC throughput by Gap ACK blocks During unicast link transmission, it's observed very often that because of one or a few lost/dis-ordered packets, the sending side will fastly reach the send window limit and must wait for the packets to be arrived at the receiving side or in the worst case, a retransmission must be done first. The sending side cannot release a lot of subsequent packets in its transmq even though all of them might have already been received by the receiving side. That is, one or two packets dis-ordered/lost and dozens of packets have to wait, this obviously reduces the overall throughput! This commit introduces an algorithm to overcome this by using "Gap ACK blocks". Basically, a Gap ACK block will consist of <ack, gap> numbers that describes the link deferdq where packets have been got by the receiving side but with gaps, for example: link deferdq: [1 2 3 4 10 11 13 14 15 20] --> Gap ACK blocks: <4, 5>, <11, 1>, <15, 4>, <20, 0> The Gap ACK blocks will be sent to the sending side along with the traditional ACK or NACK message. Immediately when receiving the message the sending side will now not only release from its transmq the packets ack-ed by the ACK but also by the Gap ACK blocks! So, more packets can be enqueued and transmitted. In addition, the sending side can now do "multi-retransmissions" according to the Gaps reported in the Gap ACK blocks. The new algorithm as verified helps greatly improve the TIPC throughput especially under packet loss condition. So far, a maximum of 32 blocks is quite enough without any "Too few Gap ACK blocks" reports with a 5.0% packet loss rate, however this number can be increased in the furture if needed. Also, the patch is backward compatible. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ff2ebbfb |
|
19-Mar-2019 |
Hoang Le <hoang.h.le@dektech.com.au> |
tipc: introduce new capability flag for cluster As a preparation for introducing a smooth switching between replicast and broadcast method for multicast message, We have to introduce a new capability flag TIPC_MCAST_RBCTL to handle this new feature. During a cluster upgrade a node can come back with this new capabilities which also must be reflected in the cluster capabilities field. The new feature is only applicable if all node in the cluster supports this new capability. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b4b9771b |
|
18-Dec-2018 |
Tuong Lien <tuong.t.lien@dektech.com.au> |
tipc: enable tracepoints in tipc As for the sake of debugging/tracing, the commit enables tracepoints in TIPC along with some general trace_events as shown below. It also defines some 'tipc_*_dump()' functions that allow to dump TIPC object data whenever needed, that is, for general debug purposes, ie. not just for the trace_events. The following trace_events are now available: - trace_tipc_skb_dump(): allows to trace and dump TIPC msg & skb data, e.g. message type, user, droppable, skb truesize, cloned skb, etc. - trace_tipc_list_dump(): allows to trace and dump any TIPC buffers or queues, e.g. TIPC link transmq, socket receive queue, etc. - trace_tipc_sk_dump(): allows to trace and dump TIPC socket data, e.g. sk state, sk type, connection type, rmem_alloc, socket queues, etc. - trace_tipc_link_dump(): allows to trace and dump TIPC link data, e.g. link state, silent_intv_cnt, gap, bc_gap, link queues, etc. - trace_tipc_node_dump(): allows to trace and dump TIPC node data, e.g. node state, active links, capabilities, link entries, etc. How to use: Put the trace functions at any places where we want to dump TIPC data or events. Note: a) The dump functions will generate raw data only, that is, to offload the trace event's processing, it can require a tool or script to parse the data but this should be simple. b) The trace_tipc_*_dump() should be reserved for a failure cases only (e.g. the retransmission failure case) or where we do not expect to happen too often, then we can consider enabling these events by default since they will almost not take any effects under normal conditions, but once the rare condition or failure occurs, we get the dumped data fully for post-analysis. For other trace purposes, we can reuse these trace classes as template but different events. c) A trace_event is only effective when we enable it. To enable the TIPC trace_events, echo 1 to 'enable' files in the events/tipc/ directory in the 'debugfs' file system. Normally, they are located at: /sys/kernel/debug/tracing/events/tipc/ For example: To enable the tipc_link_dump event: echo 1 > /sys/kernel/debug/tracing/events/tipc/tipc_link_dump/enable To enable all the TIPC trace_events: echo 1 > /sys/kernel/debug/tracing/events/tipc/enable To collect the trace data: cat trace or cat trace_pipe > /trace.out & To disable all the TIPC trace_events: echo 0 > /sys/kernel/debug/tracing/events/tipc/enable To clear the trace buffer: echo > trace d) Like the other trace_events, the feature like 'filter' or 'trigger' is also usable for the tipc trace_events. For more details, have a look at: Documentation/trace/ftrace.txt MAINTAINERS | add two new files 'trace.h' & 'trace.c' in tipc Acked-by: Ying Xue <ying.xue@windriver.com> Tested-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
25b9221b |
|
28-Sep-2018 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: add SYN bit to connection setup messages Messages intended for intitating a connection are currently indistinguishable from regular datagram messages. The TIPC protocol specification defines bit 17 in word 0 as a SYN bit to allow sanity check of such messages in the listening socket, but this has so far never been implemented. We do that in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9012de50 |
|
09-Jul-2018 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: add sequence number check for link STATE messages Some switch infrastructures produce huge amounts of packet duplicates. This becomes a problem if those messages are STATE/NACK protocol messages, causing unnecessary retransmissions of already accepted packets. We now introduce a unique sequence number per STATE protocol message so that duplicates can be identified and ignored. This will also be useful when tracing such cases, and to avert replay attacks when TIPC is encrypted. For compatibility reasons we have to introduce a new capability flag TIPC_LINK_PROTO_SEQNO to handle this new feature. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3e5cf362 |
|
25-Apr-2018 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: introduce ioctl for fetching node identity After the introduction of a 128-bit node identity it may be difficult for a user to correlate between this identity and the generated node hash address. We now try to make this easier by introducing a new ioctl() call for fetching a node identity by using the hash value as key. This will be particularly useful when we extend some of the commands in the 'tipc' tool, but we also expect regular user applications to need this feature. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
682cd3cf |
|
19-Apr-2018 |
GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> |
tipc: confgiure and apply UDP bearer MTU on running links Currently, we have option to configure MTU of UDP media. The configured MTU takes effect on the links going up after that moment. I.e, a user has to reset bearer to have new value applied across its links. This is confusing and disturbing on a running cluster. We now introduce the functionality to change the default UDP bearer MTU in struct tipc_bearer. Additionally, the links are updated dynamically, without any need for a reset, when bearer value is changed. We leverage the existing per-link functionality and the design being symetrical to the confguration of link tolerance. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
25b0b9c4 |
|
22-Mar-2018 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: handle collisions of 32-bit node address hash values When a 32-bit node address is generated from a 128-bit identifier, there is a risk of collisions which must be discovered and handled. We do this as follows: - We don't apply the generated address immediately to the node, but do instead initiate a 1 sec trial period to allow other cluster members to discover and handle such collisions. - During the trial period the node periodically sends out a new type of message, DSC_TRIAL_MSG, using broadcast or emulated broadcast, to all the other nodes in the cluster. - When a node is receiving such a message, it must check that the presented 32-bit identifier either is unused, or was used by the very same peer in a previous session. In both cases it accepts the request by not responding to it. - If it finds that the same node has been up before using a different address, it responds with a DSC_TRIAL_FAIL_MSG containing that address. - If it finds that the address has already been taken by some other node, it generates a new, unused address and returns it to the requester. - During the trial period the requesting node must always be prepared to accept a failure message, i.e., a message where a peer suggests a different (or equal) address to the one tried. In those cases it must apply the suggested value as trial address and restart the trial period. This algorithm ensures that in the vast majority of cases a node will have the same address before and after a reboot. If a legacy user configures the address explicitly, there will be no trial period and messages, so this protocol addition is completely backwards compatible. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d50ccc2d |
|
22-Mar-2018 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: add 128-bit node identifier We add a 128-bit node identity, as an alternative to the currently used 32-bit node address. For the sake of compatibility and to minimize message header changes we retain the existing 32-bit address field. When not set explicitly by the user, this field will be filled with a hash value generated from the much longer node identity, and be used as a shorthand value for the latter. We permit either the address or the identity to be set by configuration, but not both, so when the address value is set by a legacy user the corresponding 128-bit node identity is generated based on the that value. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
20263641 |
|
22-Mar-2018 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: remove restrictions on node address values Nominally, TIPC organizes network nodes into a three-level network hierarchy consisting of the levels 'zone', 'cluster' and 'node'. This hierarchy is reflected in the node address format, - it is sub-divided into an 8-bit zone id, and 12 bit cluster id, and a 12-bit node id. However, the 'zone' and 'cluster' levels have in reality never been fully implemented,and never will be. The result of this has been that the first 20 bits the node identity structure have been wasted, and the usable node identity range within a cluster has been limited to 12 bits. This is starting to become a problem. In the following commits, we will need to be able to connect between nodes which are using the whole 32-bit value space of the node address. We therefore remove the restrictions on which values can be assigned to node identity, -it is from now on only a 32-bit integer with no assumed internal structure. Isolation between clusters is now achieved only by setting different values for the 'network id' field used during neighbor discovery, in practice leading to the latter becoming the new cluster identity. The rules for accepting discovery requests/responses from neighboring nodes now become: - If the user is using legacy address format on both peers, reception of discovery messages is subject to the legacy lookup domain check in addition to the cluster id check. - Otherwise, the discovery request/response is always accepted, provided both peers have the same network id. This secures backwards compatibility for users who have been using zone or cluster identities as cluster separators, instead of the intended 'network id'. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
37c64cf6 |
|
14-Feb-2018 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: apply bearer link tolerance on running links Currently, the default link tolerance set in struct tipc_bearer only has effect on links going up after that moment. I.e., a user has to reset all the node's links across that bearer to have the new value applied. This is too limiting and disturbing on a running cluster to be useful. We now change this so that also already existing links are updated dynamically, without any need for a reset, when the bearer value is changed. We leverage the already existing per-link functionality for this to achieve the wanted effect. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
75da2163 |
|
13-Oct-2017 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: introduce communication groups As a preparation for introducing flow control for multicast and datagram messaging we need a more strictly defined framework than we have now. A socket must be able keep track of exactly how many and which other sockets it is allowed to communicate with at any moment, and keep the necessary state for those. We therefore introduce a new concept we have named Communication Group. Sockets can join a group via a new setsockopt() call TIPC_GROUP_JOIN. The call takes four parameters: 'type' serves as group identifier, 'instance' serves as an logical member identifier, and 'scope' indicates the visibility of the group (node/cluster/zone). Finally, 'flags' makes it possible to set certain properties for the member. For now, there is only one flag, indicating if the creator of the socket wants to receive a copy of broadcast or multicast messages it is sending via the socket, and if wants to be eligible as destination for its own anycasts. A group is closed, i.e., sockets which have not joined a group will not be able to send messages to or receive messages from members of the group, and vice versa. Any member of a group can send multicast ('group broadcast') messages to all group members, optionally including itself, using the primitive send(). The messages are received via the recvmsg() primitive. A socket can only be member of one group at a time. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f70d37b7 |
|
13-Oct-2017 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: add new function for sending multiple small messages We see an increasing need to send multiple single-buffer messages of TIPC_SYSTEM_IMPORTANCE to different individual destination nodes. Instead of looping over the send queue and sending each buffer individually, as we do now, we add a new help function tipc_node_distr_xmit() to do this. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
38077b8e |
|
13-Oct-2017 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: add ability to obtain node availability status from other files In the coming commits, functions at the socket level will need the ability to read the availability status of a given node. We therefore introduce a new function for this purpose, while renaming the existing static function currently having the wanted name. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
01fd12bb |
|
18-Jan-2017 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: make replicast a user selectable option If the bearer carrying multicast messages supports broadcast, those messages will be sent to all cluster nodes, irrespective of whether these nodes host any actual destinations socket or not. This is clearly wasteful if the cluster is large and there are only a few real destinations for the message being sent. In this commit we extend the eligibility of the newly introduced "replicast" transmit option. We now make it possible for a user to select which method he wants to be used, either as a mandatory setting via setsockopt(), or as a relative setting where we let the broadcast layer decide which method to use based on the ratio between cluster size and the message's actual number of destination nodes. In the latter case, a sending socket must stick to a previously selected method until it enters an idle period of at least 5 seconds. This eliminates the risk of message reordering caused by method change, i.e., when changes to cluster size or number of destinations would otherwise mandate a new method to be used. Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
02d11ca2 |
|
01-Sep-2016 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: transfer broadcast nacks in link state messages When we send broadcasts in clusters of more 70-80 nodes, we sometimes see the broadcast link resetting because of an excessive number of retransmissions. This is caused by a combination of two factors: 1) A 'NACK crunch", where loss of broadcast packets is discovered and NACK'ed by several nodes simultaneously, leading to multiple redundant broadcast retransmissions. 2) The fact that the NACKS as such also are sent as broadcast, leading to excessive load and packet loss on the transmitting switch/bridge. This commit deals with the latter problem, by moving sending of broadcast nacks from the dedicated BCAST_PROTOCOL/NACK message type to regular unicast LINK_PROTOCOL/STATE messages. We allocate 10 unused bits in word 8 of the said message for this purpose, and introduce a new capability bit, TIPC_BCAST_STATE_NACK in order to keep the change backwards compatible. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b3404022 |
|
18-Aug-2016 |
Richard Alpe <richard.alpe@ericsson.com> |
tipc: add peer removal functionality Add TIPC_NL_PEER_REMOVE netlink command. This command can remove an offline peer node from the internal data structures. This will be supported by the tipc user space tool in iproute2. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cf6f7e1d |
|
26-Jul-2016 |
Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> |
tipc: dump monitor attributes In this commit, we dump the monitor attributes when queried. The link monitor attributes are separated into two kinds: 1. general attributes per bearer 2. specific attributes per node/peer This style resembles the socket attributes and the nametable publications per socket. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bf1035b2 |
|
26-Jul-2016 |
Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> |
tipc: get monitor threshold for the cluster In this commit, we add support to fetch the configured cluster monitoring threshold. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7b3f5229 |
|
26-Jul-2016 |
Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> |
tipc: make cluster size threshold for monitoring configurable In this commit, we introduce support to configure the minimum threshold to activate the new link monitoring algorithm. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
10724cc7 |
|
02-May-2016 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: redesign connection-level flow control There are two flow control mechanisms in TIPC; one at link level that handles network congestion, burst control, and retransmission, and one at connection level which' only remaining task is to prevent overflow in the receiving socket buffer. In TIPC, the latter task has to be solved end-to-end because messages can not be thrown away once they have been accepted and delivered upwards from the link layer, i.e, we can never permit the receive buffer to overflow. Currently, this algorithm is message based. A counter in the receiving socket keeps track of number of consumed messages, and sends a dedicated acknowledge message back to the sender for each 256 consumed message. A counter at the sending end keeps track of the sent, not yet acknowledged messages, and blocks the sender if this number ever reaches 512 unacknowledged messages. When the missing acknowledge arrives, the socket is then woken up for renewed transmission. This works well for keeping the message flow running, as it almost never happens that a sender socket is blocked this way. A problem with the current mechanism is that it potentially is very memory consuming. Since we don't distinguish between small and large messages, we have to dimension the socket receive buffer according to a worst-case of both. I.e., the window size must be chosen large enough to sustain a reasonable throughput even for the smallest messages, while we must still consider a scenario where all messages are of maximum size. Hence, the current fix window size of 512 messages and a maximum message size of 66k results in a receive buffer of 66 MB when truesize(66k) = 131k is taken into account. It is possible to do much better. This commit introduces an algorithm where we instead use 1024-byte blocks as base unit. This unit, always rounded upwards from the actual message size, is used when we advertise windows as well as when we count and acknowledge transmitted data. The advertised window is based on the configured receive buffer size in such a way that even the worst-case truesize/msgsize ratio always is covered. Since the smallest possible message size (from a flow control viewpoint) now is 1024 bytes, we can safely assume this ratio to be less than four, which is the value we are now using. This way, we have been able to reduce the default receive buffer size from 66 MB to 2 MB with maintained performance. In order to keep this solution backwards compatible, we introduce a new capability bit in the discovery protocol, and use this throughout the message sending/reception path to always select the right unit. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
60020e18 |
|
02-May-2016 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: propagate peer node capabilities to socket layer During neighbor discovery, nodes advertise their capabilities as a bit map in a dedicated 16-bit field in the discovery message header. This bit map has so far only be stored in the node structure on the peer nodes, but we now see the need to keep a copy even in the socket structure. This commit adds this functionality. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
38206d59 |
|
19-Nov-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: narrow down interface towards struct tipc_link We move the definition of struct tipc_link from link.h to link.c in order to minimize its exposure to the rest of the code. When needed, we define new functions to make it possible for external entities to access and set data in the link. Apart from the above, there are no functional changes. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5be9c086 |
|
19-Nov-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: narrow down exposure of struct tipc_node In our effort to have less code and include dependencies between entities such as node, link and bearer, we try to narrow down the exposed interface towards the node as much as possible. In this commit, we move the definition of struct tipc_node, along with many of its associated function declarations, from node.h to node.c. We also move some function definitions from link.c and name_distr.c to node.c, since they access fields in struct tipc_node that should not be externally visible. The moved functions are renamed according to new location, and made static whenever possible. There are no functional changes in this commit. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5405ff6e |
|
19-Nov-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: convert node lock to rwlock According to the node FSM a node in state SELF_UP_PEER_UP cannot change state inside a lock context, except when a TUNNEL_PROTOCOL (SYNCH or FAILOVER) packet arrives. However, the node's individual links may still change state. Since each link now is protected by its own spinlock, we finally have the conditions in place to convert the node spinlock to an rwlock_t. If the node state and arriving packet type are rigth, we can let the link directly receive the packet under protection of its own spinlock and the node lock in read mode. In all other cases we use the node lock in write mode. This enables full concurrent execution between parallel links during steady-state traffic situations, i.e., 99+ % of the time. This commit implements this change. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2312bf61 |
|
19-Nov-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: introduce per-link spinlock As a preparation to allow parallel links to work more independently from each other we introduce a per-link spinlock, to be stored in the struct nodes's link entry area. Since the node lock still is a regular spinlock there is no increase in parallellism at this stage. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1d7e1c25 |
|
19-Nov-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: reduce code dependency between binding table and node layer The file name_distr.c currently contains three functions, named_cluster_distribute(), tipc_publ_subcscribe() and tipc_publ_unsubscribe() that all directly access fields in struct tipc_node. We want to eliminate such dependencies, so we move those functions to the file node.c and rename them to tipc_node_broadcast(), tipc_node_subscribe() and tipc_node_unsubscribe() respectively. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
2af5ae37 |
|
22-Oct-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: clean up unused code and structures After the previous changes in this series, we can now remove some unused code and structures, both in the broadcast, link aggregation and link code. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
52666986 |
|
22-Oct-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: let broadcast packet reception use new link receive function The code path for receiving broadcast packets is currently distinct from the unicast path. This leads to unnecessary code and data duplication, something that can be avoided with some effort. We now introduce separate per-peer tipc_link instances for handling broadcast packet reception. Each receive link keeps a pointer to the common, single, broadcast link instance, and can hence handle release and retransmission of send buffers as if they belonged to the own instance. Furthermore, we let each unicast link instance keep a reference to both the pertaining broadcast receive link, and to the common send link. This makes it possible for the unicast links to easily access data for broadcast link synchronization, as well as for carrying acknowledges for received broadcast packets. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fd556f20 |
|
22-Oct-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: introduce capability bit for broadcast synchronization Until now, we have tried to support both the newer, dedicated broadcast synchronization mechanism along with the older, less safe, RESET_MSG/ ACTIVATE_MSG based one. The latter method has turned out to be a hazard in a highly dynamic cluster, so we find it safer to disable it completely when we find that the former mechanism is supported by the peer node. For this purpose, we now introduce a new capabability bit, TIPC_BCAST_SYNCH, to inform any peer nodes that dedicated broadcast syncronization is supported by the present node. The new bit is conveyed between peers in the 'capabilities' field of neighbor discovery messages. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
23d8335d |
|
30-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: remove implicit message delivery in node_unlock() After the most recent changes, all access calls to a link which may entail addition of messages to the link's input queue are postpended by an explicit call to tipc_sk_rcv(), using a reference to the correct queue. This means that the potentially hazardous implicit delivery, using tipc_node_unlock() in combination with a binary flag and a cached queue pointer, now has become redundant. This commit removes this implicit delivery mechanism both for regular data messages and for binding table update messages. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cf148816 |
|
30-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: move received discovery data evaluation inside node.c The node lock is currently grabbed and and released in the function tipc_disc_rcv() in the file discover.c. As a preparation for the next commits, we need to move this node lock handling, along with the code area it is covering, to node.c. This commit introduces this change. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6e498158 |
|
30-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: move link synch and failover to link aggregation level Link failover and synchronization have until now been handled by the links themselves, forcing them to have knowledge about and to access parallel links in order to make the two algorithms work correctly. In this commit, we move the control part of this functionality to the link aggregation level in node.c, which is the right location for this. As a result, the two algorithms become easier to follow, and the link implementation becomes simpler. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
66996b6c |
|
30-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: extend node FSM In the next commit, we will move link synch/failover orchestration to the link aggregation level. In order to do this, we first need to extend the node FSM with two more states, NODE_SYNCHING and NODE_FAILINGOVER, plus four new events to enter and leave those states. This commit introduces this change, without yet making use of it. The node FSM now looks as follows: +-----------------------------------------+ | PEER_DOWN_EVT| | | +------------------------+----------------+ | |SELF_DOWN_EVT | | | | | | | | +-----------+ +-----------+ | | |NODE_ | |NODE_ | | | +----------|FAILINGOVER|<---------|SYNCHING |------------+ | | |SELF_ +-----------+ FAILOVER_+-----------+ PEER_ | | | |DOWN_EVT | A BEGIN_EVT A | DOWN_EVT| | | | | | | | | | | | | | | | | | | | |FAILOVER_|FAILOVER_ |SYNCH_ |SYNCH_ | | | | |END_EVT |BEGIN_EVT |BEGIN_EVT|END_EVT | | | | | | | | | | | | | | | | | | | | | +--------------+ | | | | | +------->| SELF_UP_ |<-------+ | | | | +----------------| PEER_UP |------------------+ | | | | |SELF_DOWN_EVT +--------------+ PEER_DOWN_EVT| | | | | | A A | | | | | | | | | | | | | | PEER_UP_EVT| |SELF_UP_EVT | | | | | | | | | | | V V V | | V V V +------------+ +-----------+ +-----------+ +------------+ |SELF_DOWN_ | |SELF_UP_ | |PEER_UP_ | |PEER_DOWN | |PEER_LEAVING|<------|PEER_COMING| |SELF_COMING|------>|SELF_LEAVING| +------------+ SELF_ +-----------+ +-----------+ PEER_ +------------+ | DOWN_EVT A A DOWN_EVT | | | | | | | | | | SELF_UP_EVT| |PEER_UP_EVT | | | | | | | | | |PEER_DOWN_EVT +--------------+ SELF_DOWN_EVT| +------------------->| SELF_DOWN_ |<--------------------+ | PEER_DOWN | +--------------+ Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6144a996 |
|
30-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: move all link_reset() calls to link aggregation level In line with our effort to let the node level have full control over its links, we want to move all link reset calls from link.c to node.c. Some of the calls can be moved by simply moving the calling function, when this is the right thing to do. For the remaining calls we use the now established technique of returning a TIPC_LINK_DOWN_EVT flag from tipc_link_rcv(), whereafter we perform the reset call when the call returns. This change serves as a preparation for the coming commits. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d999297c |
|
16-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: reduce locking scope during packet reception We convert packet/message reception according to the same principle we have been using for message sending and timeout handling: We move the function tipc_rcv() to node.c, hence handling the initial packet reception at the link aggregation level. The function grabs the node lock, selects the receiving link, and accesses it via a new call tipc_link_rcv(). This function appends buffers to the input queue for delivery upwards, but it may also append outgoing packets to the xmit queue, just as we do during regular message sending. The latter will happen when buffers are forwarded from the link backlog, or when retransmission is requested. Upon return of this function, and after having released the node lock, tipc_rcv() delivers/tranmsits the contents of those queues, but it may also perform actions such as link activation or reset, as indicated by the return flags from the link. This reduces the number of cpu cycles spent inside the node spinlock, and reduces contention on that lock. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1a20cc25 |
|
16-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: introduce node contact FSM The logics for determining when a node is permitted to establish and maintain contact with its peer node becomes non-trivial in the presence of multiple parallel links that may come and go independently. A known failure scenario is that one endpoint registers both its links to the peer lost, cleans up it binding table, and prepares for a table update once contact is re-establihed, while the other endpoint may see its links reset and re-established one by one, hence seeing no need to re-synchronize the binding table. To avoid this, a node must not allow re-establishing contact until it has confirmation that even the peer has lost both links. Currently, the mechanism for handling this consists of setting and resetting two state flags from different locations in the code. This solution is hard to understand and maintain. A closer analysis even reveals that it is not completely safe. In this commit we do instead introduce an FSM that keeps track of the conditions for when the node can establish and maintain links. It has six states and four events, and is strictly based on explicit knowledge about the own node's and the peer node's contact states. Only events leading to state change are shown as edges in the figure below. +--------------+ | SELF_UP/ | +---------------->| PEER_COMING |-----------------+ SELF_ | +--------------+ |PEER_ ESTBL_ | | |ESTBL_ CONTACT| SELF_LOST_CONTACT | |CONTACT | v | | +--------------+ | | PEER_ | SELF_DOWN/ | SELF_ | | LOST_ +--| PEER_LEAVING |<--+ LOST_ v +-------------+ CONTACT | +--------------+ | CONTACT +-----------+ | SELF_DOWN/ |<----------+ +----------| SELF_UP/ | | PEER_DOWN |<----------+ +----------| PEER_UP | +-------------+ SELF_ | +--------------+ | PEER_ +-----------+ | LOST_ +--| SELF_LEAVING/|<--+ LOST_ A | CONTACT | PEER_DOWN | CONTACT | | +--------------+ | | A | PEER_ | PEER_LOST_CONTACT | |SELF_ ESTBL_ | | |ESTBL_ CONTACT| +--------------+ |CONTACT +---------------->| PEER_UP/ |-----------------+ | SELF_COMING | +--------------+ Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8a1577c9 |
|
16-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: move link supervision timer to node level In our effort to move control of the links to the link aggregation layer, we move the perodic link supervision timer to struct tipc_node. The new timer is shared between all links belonging to the node, thus saving resources, while still kicking the FSM on both its pertaining links at each expiration. The current link timer and corresponding functions are removed. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
af9b028e |
|
16-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: make media xmit call outside node spinlock context Currently, message sending is performed through a deep call chain, where the node spinlock is grabbed and held during a significant part of the transmission time. This is clearly detrimental to overall throughput performance; it would be better if we could send the message after the spinlock has been released. In this commit, we do instead let the call revert on the stack after the buffer chain has been added to the transmission queue, whereafter clones of the buffers are transmitted to the device layer outside the spinlock scope. As a further step in our effort to separate the roles of the node and link entities we also move the function tipc_link_xmit() to node.c, and rename it to tipc_node_xmit(). Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
36e78a46 |
|
16-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: use bearer index when looking up active links struct tipc_node currently holds two arrays of link pointers; one, indexed by bearer identity, which contains all links irrespective of current state, and one two-slot array for the currently active link or links. The latter array contains direct pointers into the elements of the former. This has the effect that we cannot know the bearer id of a link when accessing it via the "active_links[]" array without actually dereferencing the pointer, something we want to avoid in some cases. In this commit, we do instead store the bearer identity in the "active_links" array, and use this as an index to find the right element in the overall link entry array. This change should be seen as a preparation for the later commits in this series. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d39bbd44 |
|
16-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: move link input queue to tipc_node At present, the link input queue and the name distributor receive queues are fields aggregated in struct tipc_link. This is a hazard, because a link might be deleted while a receiving socket still keeps reference to one of the queues. This commit fixes this bug. However, rather than adding yet another reference counter to the critical data path, we move the two queues to safe ground inside struct tipc_node, which is already protected, and let the link code only handle references to the queues. This is also in line with planned later changes in this area. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d3a43b90 |
|
16-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: move link creation from neighbor discoverer to node As a step towards turning links into node internal entities, we move the creation of links from the neighbor discovery logics to the node's link control logics. We also create an additional entry for the link's media address in the newly introduced struct tipc_link_entry, since this is where it is needed in the upcoming commits. The current copy in struct tipc_link is kept for now, but will be removed later. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9d13ec65 |
|
16-Jul-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: introduce link entry structure to struct tipc_node struct 'tipc_node' currently contains two arrays for link attributes, one for the link pointers, and one for the usable link MTUs. We now group those into a new struct 'tipc_link_entry', and intoduce one single array consisting of such enties. Apart from being a cosmetic improvement, this is a starting point for the strict master-slave relation between node and link that we will introduce in the following commits. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a6bf70f7 |
|
14-May-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: simplify include dependencies When we try to add new inline functions in the code, we sometimes run into circular include dependencies. The main problem is that the file core.h, which really should be at the root of the dependency chain, instead is a leaf. I.e., core.h includes a number of header files that themselves should be allowed to include core.h. In reality this is unnecessary, because core.h does not need to know the full signature of any of the structs it refers to, only their type declaration. In this commit, we remove all dependencies from core.h towards any other tipc header file. As a consequence of this change, we can now move the function tipc_own_addr(net) from addr.c to addr.h, and make it inline. There are no functional changes in this commit. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
8a0f6ebe |
|
26-Mar-2015 |
Ying Xue <ying.xue@windriver.com> |
tipc: involve reference counter for node structure TIPC node hash node table is protected with rcu lock on read side. tipc_node_find() is used to look for a node object with node address through iterating the hash node table. As the entire process of what tipc_node_find() traverses the table is guarded with rcu read lock, it's safe for us. However, when callers use the node object returned by tipc_node_find(), there is no rcu read lock applied. Therefore, this is absolutely unsafe for callers of tipc_node_find(). Now we introduce a reference counter for node structure. Before tipc_node_find() returns node object to its caller, it first increases the reference counter. Accordingly, after its caller used it up, it decreases the counter again. This can prevent a node being used by one thread from being freed by another thread. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b952b2be |
|
26-Mar-2015 |
Ying Xue <ying.xue@windriver.com> |
tipc: fix potential deadlock when all links are reset [ 60.988363] ====================================================== [ 60.988754] [ INFO: possible circular locking dependency detected ] [ 60.989152] 3.19.0+ #194 Not tainted [ 60.989377] ------------------------------------------------------- [ 60.989781] swapper/3/0 is trying to acquire lock: [ 60.990079] (&(&n_ptr->lock)->rlock){+.-...}, at: [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc] [ 60.990743] [ 60.990743] but task is already holding lock: [ 60.991106] (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc] [ 60.991738] [ 60.991738] which lock already depends on the new lock. [ 60.991738] [ 60.992174] [ 60.992174] the existing dependency chain (in reverse order) is: [ 60.992174] -> #1 (&(&bclink->lock)->rlock){+.-...}: [ 60.992174] [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140 [ 60.992174] [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50 [ 60.992174] [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc] [ 60.992174] [<ffffffffa0000f57>] tipc_bclink_add_node+0x97/0xf0 [tipc] [ 60.992174] [<ffffffffa0011815>] tipc_node_link_up+0xf5/0x110 [tipc] [ 60.992174] [<ffffffffa0007783>] link_state_event+0x2b3/0x4f0 [tipc] [ 60.992174] [<ffffffffa00193c0>] tipc_link_proto_rcv+0x24c/0x418 [tipc] [ 60.992174] [<ffffffffa0008857>] tipc_rcv+0x827/0xac0 [tipc] [ 60.992174] [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc] [ 60.992174] [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980 [ 60.992174] [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70 [ 60.992174] [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130 [ 60.992174] [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0 [ 60.992174] [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490 [ 60.992174] [<ffffffff8155c1b7>] e1000_clean+0x267/0x990 [ 60.992174] [<ffffffff81647b60>] net_rx_action+0x150/0x360 [ 60.992174] [<ffffffff8105ec43>] __do_softirq+0x123/0x360 [ 60.992174] [<ffffffff8105f12e>] irq_exit+0x8e/0xb0 [ 60.992174] [<ffffffff8179f9f5>] do_IRQ+0x65/0x110 [ 60.992174] [<ffffffff8179da6f>] ret_from_intr+0x0/0x13 [ 60.992174] [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20 [ 60.992174] [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0 [ 60.992174] [<ffffffff81033cda>] start_secondary+0x13a/0x150 [ 60.992174] -> #0 (&(&n_ptr->lock)->rlock){+.-...}: [ 60.992174] [<ffffffff810a8f7d>] __lock_acquire+0x163d/0x1ca0 [ 60.992174] [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140 [ 60.992174] [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50 [ 60.992174] [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc] [ 60.992174] [<ffffffffa0001e11>] tipc_bclink_rcv+0x611/0x640 [tipc] [ 60.992174] [<ffffffffa0008646>] tipc_rcv+0x616/0xac0 [tipc] [ 60.992174] [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc] [ 60.992174] [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980 [ 60.992174] [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70 [ 60.992174] [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130 [ 60.992174] [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0 [ 60.992174] [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490 [ 60.992174] [<ffffffff8155c1b7>] e1000_clean+0x267/0x990 [ 60.992174] [<ffffffff81647b60>] net_rx_action+0x150/0x360 [ 60.992174] [<ffffffff8105ec43>] __do_softirq+0x123/0x360 [ 60.992174] [<ffffffff8105f12e>] irq_exit+0x8e/0xb0 [ 60.992174] [<ffffffff8179f9f5>] do_IRQ+0x65/0x110 [ 60.992174] [<ffffffff8179da6f>] ret_from_intr+0x0/0x13 [ 60.992174] [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20 [ 60.992174] [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0 [ 60.992174] [<ffffffff81033cda>] start_secondary+0x13a/0x150 [ 60.992174] [ 60.992174] other info that might help us debug this: [ 60.992174] [ 60.992174] Possible unsafe locking scenario: [ 60.992174] [ 60.992174] CPU0 CPU1 [ 60.992174] ---- ---- [ 60.992174] lock(&(&bclink->lock)->rlock); [ 60.992174] lock(&(&n_ptr->lock)->rlock); [ 60.992174] lock(&(&bclink->lock)->rlock); [ 60.992174] lock(&(&n_ptr->lock)->rlock); [ 60.992174] [ 60.992174] *** DEADLOCK *** [ 60.992174] [ 60.992174] 3 locks held by swapper/3/0: [ 60.992174] #0: (rcu_read_lock){......}, at: [<ffffffff81646791>] __netif_receive_skb_core+0x71/0x980 [ 60.992174] #1: (rcu_read_lock){......}, at: [<ffffffffa0002c35>] tipc_l2_rcv_msg+0x5/0xd0 [tipc] [ 60.992174] #2: (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc] [ 60.992174] The correct the sequence of grabbing n_ptr->lock and bclink->lock should be that the former is first held and the latter is then taken, which exactly happened on CPU1. But especially when the retransmission of broadcast link is failed, bclink->lock is first held in tipc_bclink_rcv(), and n_ptr->lock is taken in link_retransmit_failure() called by tipc_link_retransmit() subsequently, which is demonstrated on CPU0. As a result, deadlock occurs. If the order of holding the two locks happening on CPU0 is reversed, the deadlock risk will be relieved. Therefore, the node lock taken in link_retransmit_failure() originally is moved to tipc_bclink_rcv() so that it's obtained before bclink lock. But the precondition of the adjustment of node lock is that responding to bclink reset event must be moved from tipc_bclink_unlock() to tipc_node_unlock(). Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
05dcc5aa |
|
13-Mar-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: split link outqueue struct tipc_link contains one single queue for outgoing packets, where both transmitted and waiting packets are queued. This infrastructure is hard to maintain, because we need to keep a number of fields to keep track of which packets are sent or unsent, and the number of packets in each category. A lot of code becomes simpler if we split this queue into a transmission queue, where sent/unacknowledged packets are kept, and a backlog queue, where we keep the not yet sent packets. In this commit we do this separation. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7764d6e8 |
|
13-Mar-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: add framework for node capabilities exchange The TIPC protocol spec has defined a 13 bit capability bitmap in the neighbor discovery header, as a means to maintain compatibility between different code and protocol generations. Until now this field has been unused. We now introduce the basic framework for exchanging capabilities between nodes at first contact. After exchange, a peer node's capabilities are stored as a 16 bit bitmap in struct tipc_node. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4b28cb58 |
|
09-Feb-2015 |
Richard Alpe <richard.alpe@ericsson.com> |
tipc: convert legacy nl node dump to nl compat Convert TIPC_CMD_GET_NODES to compat dumpit and remove global node counter solely used by the legacy API. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
357ebdbf |
|
09-Feb-2015 |
Richard Alpe <richard.alpe@ericsson.com> |
tipc: convert legacy nl link dump to nl compat Convert TIPC_CMD_GET_LINKS to compat dumpit and remove global link counter solely used by the legacy API. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cb1b7280 |
|
05-Feb-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: eliminate race condition at multicast reception In a previous commit in this series we resolved a race problem during unicast message reception. Here, we resolve the same problem at multicast reception. We apply the same technique: an input queue serializing the delivery of arriving buffers. The main difference is that here we do it in two steps. First, the broadcast link feeds arriving buffers into the tail of an arrival queue, which head is consumed at the socket level, and where destination lookup is performed. Second, if the lookup is successful, the resulting buffer clones are fed into a second queue, the input queue. This queue is consumed at reception in the socket just like in the unicast case. Both queues are protected by the same lock, -the one of the input queue. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c637c103 |
|
05-Feb-2015 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: resolve race problem at unicast message reception TIPC handles message cardinality and sequencing at the link layer, before passing messages upwards to the destination sockets. During the upcall from link to socket no locks are held. It is therefore possible, and we see it happen occasionally, that messages arriving in different threads and delivered in sequence still bypass each other before they reach the destination socket. This must not happen, since it violates the sequentiality guarantee. We solve this by adding a new input buffer queue to the link structure. Arriving messages are added safely to the tail of that queue by the link, while the head of the queue is consumed, also safely, by the receiving socket. Sequentiality is secured per socket by only allowing buffers to be dequeued inside the socket lock. Since there may be multiple simultaneous readers of the queue, we use a 'filter' parameter to reduce the risk that they peek the same buffer from the queue, hence also reducing the risk of contention on the receiving socket locks. This solves the sequentiality problem, and seems to cause no measurable performance degradation. A nice side effect of this change is that lock handling in the functions tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that will enable future simplifications of those functions. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f2f9800d |
|
09-Jan-2015 |
Ying Xue <ying.xue@windriver.com> |
tipc: make tipc node table aware of net namespace Global variables associated with node table are below: - node table list (node_htable) - node hash table list (tipc_node_list) - node table lock (node_list_lock) - node number counter (tipc_num_nodes) - node link number counter (tipc_num_links) To make node table support namespace, above global variables must be moved to tipc_net structure in order to keep secret for different namespaces. As a consequence, these variables are allocated and initialized when namespace is created, and deallocated when namespace is destroyed. After the change, functions associated with these variables have to utilize a namespace pointer to access them. So adding namespace pointer as a parameter of these functions is the major change made in the commit. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bc6fecd4 |
|
25-Nov-2014 |
Ying Xue <ying.xue@windriver.com> |
tipc: use generic SKB list APIs to manage deferred queue of link Use standard SKB list APIs associated with struct sk_buff_head to manage link's deferred queue, simplifying relevant code. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a8f48af5 |
|
25-Nov-2014 |
Ying Xue <ying.xue@windriver.com> |
tipc: remove node subscription infrastructure The node subscribe infrastructure represents a virtual base class, so its users, such as struct tipc_port and struct publication, can derive its implemented functionalities. However, after the removal of struct tipc_port, struct publication is left as its only single user now. So defining an abstract infrastructure for one user becomes no longer reasonable. If corresponding new functions associated with the infrastructure are moved to name_table.c file, the node subscription infrastructure can be removed as well. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3e4b6ab5 |
|
20-Nov-2014 |
Richard Alpe <richard.alpe@ericsson.com> |
tipc: add node get/dump to new netlink api Add TIPC_NL_NODE_GET to the new tipc netlink API. This command can dump the address and node status of all nodes in the tipc cluster. Netlink logical layout of returned node/address data: -> node -> address -> up flag Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7b8613e0 |
|
20-Oct-2014 |
Ying Xue <ying.xue@windriver.com> |
tipc: fix a potential deadlock Locking dependency detected below possible unsafe locking scenario: CPU0 CPU1 T0: tipc_named_rcv() tipc_rcv() T1: [grab nametble write lock]* [grab node lock]* T2: tipc_update_nametbl() tipc_node_link_up() T3: tipc_nodesub_subscribe() tipc_nametbl_publish() T4: [grab node lock]* [grab nametble write lock]* The opposite order of holding nametbl write lock and node lock on above two different paths may result in a deadlock. If we move the the updating of the name table after link state named out of node lock, the reverse order of holding locks will be eliminated, and as a result, the deadlock risk. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
908344cd |
|
07-Oct-2014 |
Jon Maloy <jon.maloy@ericsson.com> |
tipc: fix bug in multicast congestion handling One aim of commit 50100a5e39461b2a61d6040e73c384766c29975d ("tipc: use pseudo message to wake up sockets after link congestion") was to handle link congestion abatement in a uniform way for both unicast and multicast transmit. However, the latter doesn't work correctly, and has been broken since the referenced commit was applied. If a user now sends a burst of multicast messages that is big enough to cause broadcast link congestion, it will be put to sleep, and not be waked up when the congestion abates as it should be. This has two reasons. First, the flag that is used, TIPC_WAKEUP_USERS, is set correctly, but in the wrong field. Instead of setting it in the 'action_flags' field of the arrival node struct, it is by mistake set in the dummy node struct that is owned by the broadcast link, where it will never tested for. Second, we cannot use the same flag for waking up unicast and multicast users, since the function tipc_node_unlock() needs to pick the wakeup pseudo messages to deliver from different queues. It must hence be able to distinguish between the two cases. This commit solves this problem by adding a new flag TIPC_WAKEUP_BCAST_USERS, and a new function tipc_bclink_wakeup_user(). The latter is to be called by tipc_node_unlock() when the named flag, now set in the correct field, is encountered. v2: using explicit 'unsigned int' declaration instead of 'uint', as per comment from David Miller. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
02be61a9 |
|
22-Aug-2014 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: use message to abort connections when losing contact to node In the current implementation, each 'struct tipc_node' instance keeps a linked list of those ports/sockets that are connected to the node represented by that struct. The purpose of this is to let the node object know which sockets to alert when it loses contact with its peer node, i.e., which sockets need to have their connections aborted. This entails an unwanted direct reference from the node structure back to the port/socket structure, and a need to grab port_lock when we have to make an upcall to the port. We want to get rid of this unecessary BH entry point into the socket, and also eliminate its use of port_lock. In this commit, we instead let the node struct keep list of "connected socket" structs, which each represents a connected socket, but is allocated independently by the node at the moment of connection. If the node loses contact with its peer node, the list is traversed, and a "connection abort" message is created for each entry in the list. The message is sent to it respective connected socket using the ordinary data path, and the receiving socket aborts its connections upon reception of the message. This enables us to get rid of the direct reference from 'struct node' to ´struct port', and another unwanted BH access point to the latter. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
50100a5e |
|
22-Aug-2014 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: use pseudo message to wake up sockets after link congestion The current link implementation keeps a linked list of blocked ports/ sockets that is populated when there is link congestion. The purpose of this is to let the link know which users to wake up when the congestion abates. This adds unnecessary complexity to the data structure and the code, since it forces us to involve the link each time we want to delete a socket. It also forces us to grab the spinlock port_lock within the scope of node_lock. We want to get rid of this direct dependence, as well as the deadlock hazard resulting from the usage of port_lock. In this commit, we instead let the link keep list of a "wakeup" pseudo messages for use in such situations. Those messages are sent to the pending sockets via the ordinary message reception path, and wake up the socket's owner when they are received. This enables us to get rid of the 'waiting_ports' linked lists in struct tipc_port that manifest this direct reference. As a consequence, we can eliminate another BH entry into the socket, and hence the need to grab port_lock. This is a further step in our effort to remove port_lock altogether. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
16e166b8 |
|
25-Jun-2014 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: make link mtu easily accessible from socket Message fragmentation is currently performed at link level, inside the protection of node_lock. This potentially binds up the sending link structure for a long time, instead of letting it do other tasks, such as handle reception of new packets. In this commit, we make the MTUs of each active link become easily accessible from the socket level, i.e., without taking any spinlock or dereferencing the target link pointer. This way, we make it possible to perform fragmentation in the sending socket, before sending the whole fragment chain to the link for transport. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
37e22164 |
|
14-May-2014 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: rename and move message reassembly function The function tipc_link_frag_rcv() is in reality a re-entrant generic message reassemby function that has nothing in particular to do with the link, where it is defined now. This becomes obvious when we see the need to call the function from other places in the code. In this commit rename it to tipc_buf_append() and move it to the file msg.c. We also simplify its signature by moving the tail pointer to the control block of the head buffer, hence making the head buffer self-contained. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
aecb9bb8 |
|
07-May-2014 |
Ying Xue <ying.xue@windriver.com> |
tipc: rename enum names of node flags Rename node flags to action_flags as well as its enum names so that they can reflect its real meanings. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ca0c4273 |
|
04-May-2014 |
Ying Xue <ying.xue@windriver.com> |
tipc: avoid to asynchronously deliver name tables to peer node Postpone the actions of delivering name tables until after node lock is released, avoiding to do it under asynchronous context. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9d561949 |
|
04-May-2014 |
Ying Xue <ying.xue@windriver.com> |
tipc: remove TIPC_NAMES_GONE node flag Since previously what all publications pertaining to the lost node were removed from name table was finished in tasklet context asynchronously, we need to TIPC_NAMES_GONE flag indicating whether the node cleanup work is finished or not. But now as the cleanup work has been finished when node lock is released, the flag becomes meaningless for us. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9db9fdd1 |
|
04-May-2014 |
Ying Xue <ying.xue@windriver.com> |
tipc: avoid to asynchronously notify subscriptions Postpone the actions of notifying subscriptions until after node lock is released, avoiding to asynchronously execute registered handlers when node is lost. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
10f465c4 |
|
04-May-2014 |
Ying Xue <ying.xue@windriver.com> |
tipc: rename setup_blocked variable of node struct to flags Rename setup_blocked variable of node struct to a more common name called "flags", which will be used to represent kinds of node states. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
486f930a |
|
04-May-2014 |
Ying Xue <ying.xue@windriver.com> |
tipc: adjust order of variables in tipc_node structure Move more frequently used variables up to the head of tipc_node structure, hopefully improving a bit performance. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
78acb1f9 |
|
24-Apr-2014 |
Erik Hugne <erik.hugne@ericsson.com> |
tipc: add ioctl to fetch link names We add a new ioctl for AF_TIPC that can be used to fetch the logical name for a link to a remote node on a given bearer. This should be used in combination with link state subscriptions. The logical name size limit definitions are moved to tipc.h, as they are now also needed by the new ioctl. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6c7a762e |
|
26-Mar-2014 |
Ying Xue <ying.xue@windriver.com> |
tipc: tipc: convert node list and node hlist to RCU lists Convert tipc_node_list list and node_htable hash list to RCU lists. On read side, the two lists are protected with RCU read lock, and on update side, node_list_lock is applied to them. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
46651c59 |
|
26-Mar-2014 |
Ying Xue <ying.xue@windriver.com> |
tipc: rename node create lock to protect node list and hlist When a node is created, tipc_net_lock read lock is first held and then node_create_lock is grabbed in order to prevent the same node from being created and inserted into both node list and hlist twice. But when we query node from the two node lists, we only hold tipc_net_lock read lock without grabbing node_create_lock. Obviously this locking policy is unable to guarantee that the two node lists are always synchronized especially when the operation of changing and accessing them occurs in different contexts like currently doing. Therefore, rename node_create_lock to node_list_lock to protect the two node lists, that is, whenever node is inserted into them or node is queried from them, the node_list_lock should be always held. As a result, tipc_net_lock read lock becomes redundant and then can be removed from the node query functions. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
b9d4c339 |
|
07-Jan-2014 |
Jon Paul Maloy <jon.maloy@ericsson.com> |
tipc: remove 'has_redundant_link' flag from STATE link protocol messages The flag 'has_redundant_link' is defined only in RESET and ACTIVATE protocol messages. Due to an ambiguity in the protocol specification it is currently also transferred in STATE messages. Its value is used to initialize a link state variable, 'permit_changeover', which is used to inhibit futile link failover attempts when it is known that the peer node has no working links at the moment, although the local node may still think it has one. The fact that 'has_redundant_link' incorrectly is read from STATE messages has the effect that 'permit_changeover' sometimes gets a wrong value, and permanently blocks any links from being re-established. Such failures can only occur in in dual-link systems, and are extremely rare. This bug seems to have always been present in the code. Furthermore, since commit b4b5610223f17790419b03eaa962b0e3ecf930d7 ("tipc: Ensure both nodes recognize loss of contact between them"), the 'permit_changeover' field serves no purpose any more. The task of enforcing 'lost contact' cycles at both peer endpoints is now taken by a new mechanism, using the flags WAIT_NODE_DOWN and WAIT_PEER_DOWN in struct tipc_node to abort unnecessary failover attempts. We therefore remove the 'has_redundant_link' flag from STATE messages, as well as the now redundant 'permit_changeover' variable. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
eec73f1c |
|
04-Jan-2014 |
stephen hemminger <stephen@networkplumber.org> |
tipc: remove unused code Remove dead code; tipc_bearer_find_interface tipc_node_redundant_links This may break out of tree version of TIPC if there still is one. But that maybe a good thing :-) Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
40ba3cdf |
|
06-Nov-2013 |
Erik Hugne <erik.hugne@ericsson.com> |
tipc: message reassembly using fragment chain When the first fragment of a long data data message is received on a link, a reassembly buffer large enough to hold the data from this and all subsequent fragments of the message is allocated. The payload of each new fragment is copied into this buffer upon arrival. When the last fragment is received, the reassembled message is delivered upwards to the port/socket layer. Not only is this an inefficient approach, but it may also cause bursts of reassembly failures in low memory situations. since we may fail to allocate the necessary large buffer in the first place. Furthermore, after 100 subsequent such failures the link will be reset, something that in reality aggravates the situation. To remedy this problem, this patch introduces a different approach. Instead of allocating a big reassembly buffer, we now append the arriving fragments to a reassembly chain on the link, and deliver the whole chain up to the socket layer once the last fragment has been received. This is safe because the retransmission layer of a TIPC link always delivers packets in strict uninterrupted order, to the reassembly layer as to all other upper layers. Hence there can never be more than one fragment chain pending reassembly at any given time in a link, and we can trust (but still verify) that the fragments will be chained up in the correct order. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
389dd9bc |
|
15-Nov-2012 |
Ying Xue <ying.xue@windriver.com> |
tipc: rename supported flag to recv_permitted Rename the "supported" flag in bclink structure to "recv_permitted" to better reflect what it is used for. When this flag is set for a given node, we are permitted to receive and acknowledge broadcast messages from that node. Convert it to a bool at the same time, since it is not used to store any numerical values. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
818f4da5 |
|
15-Nov-2012 |
Ying Xue <ying.xue@windriver.com> |
tipc: remove supportable flag from bclink structure The "supportable" flag in bclink structure is a compatibility flag indicating whether a peer node is capable of receiving TIPC broadcast messages. However, all TIPC versions since tipc-1.5, and after the inclusion in the upstream Linux kernel in 2006, support this capability. It is highly unlikely that anybody is still using such an old version of TIPC, let alone that they want to mix it with TIPC-2.0 nodes. Therefore, we now remove the "supportable" flag. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
617d3c7a |
|
30-Apr-2012 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
tipc: compress out gratuitous extra carriage returns Some of the comment blocks are floating in limbo between two functions, or between blocks of code. Delete the extra line feeds between any comment and its associated following block of code, to be consistent with the majority of the rest of the kernel. Also delete trailing newlines at EOF and fix a couple trivial typos in existing comments. This is a 100% cosmetic change with no runtime impact. We get rid of over 500 lines of non-code, and being blank line deletes, they won't even show up as noise in git blame. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
a635b46b |
|
04-Nov-2011 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Hide internal details of node table implementation Relocates information about the size of TIPC's node table index and its associated hash function, since only node subsystem routines need to have access to this information. Note that these changes are essentially cosmetic in nature, and have no impact on the actual operation of TIPC. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
fc0eea69 |
|
28-Oct-2011 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Introduce node signature field in neighbor discovery message Adds support for the new "node signature" in neighbor discovery messages, which is a 16 bit identifier chosen randomly when TIPC is initialized. This field makes it possible for nodes receiving a neighbor discovery message to detect if multiple neighboring nodes are using the same network address (i.e. <Z.C.N>), even when the messages are arriving on different interfaces. This first phase of node signature support creates the signature, incorporates it into outgoing neighbor discovery messages, and tracks the signature used by valid neighbors. An upcoming patch builds on this foundation to implement the improved duplicate neighbor detection checking. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
1ec2bb08 |
|
27-Oct-2011 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Remove obsolete broadcast tag capability Eliminates support for the broadcast tag field, which is no longer used by broadcast link NACK messages. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
7a54d4a9 |
|
27-Oct-2011 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Major redesign of broadcast link ACK/NACK algorithms Completely redesigns broadcast link ACK and NACK mechanisms to prevent spurious retransmit requests in dual LAN networks, and to prevent the broadcast link from stalling due to the failure of a receiving node to acknowledge receiving a broadcast message or request its retransmission. Note: These changes only impact the timing of when ACK and NACK messages are sent, and not the basic broadcast link protocol itself, so inter- operability with nodes using the "classic" algorithms is maintained. The revised algorithms are as follows: 1) An explicit ACK message is still sent after receiving 16 in-sequence messages, and implicit ACK information continues to be carried in other unicast link message headers (including link state messages). However, the timing of explicit ACKs is now based on the receiving node's absolute network address rather than its relative network address to ensure that the failure of another node does not delay the ACK beyond its 16 message target. 2) A NACK message is now typically sent only when a message gap persists for two consecutive incoming link state messages; this ensures that a suspected gap is not confirmed until both LANs in a dual LAN network have had an opportunity to deliver the message, thereby preventing spurious NACKs. A NACK message can also be generated by the arrival of a single link state message, if the deferred queue is so big that the current message gap cannot be the result of "normal" mis-ordering due to the use of dual LANs (or one LAN using a bonded interface). Since link state messages typically arrive at different nodes at different times the problem of multiple nodes issuing identical NACKs simultaneously is inherently avoided. 3) Nodes continue to "peek" at NACK messages sent by other nodes. If another node requests retransmission of a message gap suspected (but not yet confirmed) by the peeking node, the peeking node forgets about the gap and does not generate a duplicate retransmit request. (If the peeking node subsequently fails to receive the lost message, later link state messages will cause it to rediscover and confirm the gap and send another NACK.) 4) Message gap "equality" is now determined by the start of the gap only. This is sufficient to deal with the most common cases of message loss, and eliminates the need for complex end of gap computations. 5) A peeking node no longer tries to determine whether it should send a complementary NACK, since the most common cases of message loss don't require it to be sent. Consequently, the node no longer examines the "broadcast tag" field of a NACK message when peeking. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
93499313 |
|
25-Oct-2011 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Ensure broadcast link re-acquires node after link failure Fix a bug that can prevent TIPC from sending broadcast messages to a node if contact with the node is lost and then regained. The problem occurs if the broadcast link first clears the flag indicating the node is part of the link's distribution set (when it loses contact with the node), and later fails to restore the flag (when contact is regained); restoration fails if contact with the node is regained by implicit unicast link activation triggered by the arrival of a data message, rather than explicitly by the arrival of a link activation message. The broadcast link now uses separate fields to track whether a node is theoretically capable of receiving broadcast messages versus whether it is actually part of the link's distribution set. The former member is updated by the receipt of link protocol messages, which can occur at any time; the latter member is updated only when contact with the node is gained or lost. This change also permits the simplification of several conditional expressions since the broadcast link's "supported" field can now only be set if there are working links to the associated node. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
a18c4bc3 |
|
29-Dec-2011 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
tipc: rename struct link* to struct tipc_link* This converts the following: struct link -> struct tipc_link struct link_req -> struct tipc_link_req struct link_name -> struct tipc_link_name Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
b4b56102 |
|
27-May-2011 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Ensure both nodes recognize loss of contact between them Enhances TIPC to ensure that a node that loses contact with a neighboring node does not allow contact to be re-established until it sees that its peer has also recognized the loss of contact. Previously, nodes that were connected by two or more links could encounter a situation in which node A would lose contact with node B on all of its links, purge its name table of names published by B, and then fail to repopulate those names once contact with B was restored. This would happen because B was able to re-establish one or more links so quickly that it never reached a point where it had no links to A -- meaning that B never saw a loss of contact with A, and consequently didn't re-publish its names to A. This problem is now prevented by enhancing the cleanup done by TIPC following a loss of contact with a neighboring node to ensure that node A ignores all messages sent by B until it receives a LINK_PROTOCOL message that indicates B has lost contact with A, thereby preventing the (re)establishment of links between the nodes. The loss of contact is recognized when a RESET or ACTIVATE message is received that has a "redundant link exists" field of 0, indicating that B's sending link endpoint is in a reset state and that B has no other working links. Additionally, TIPC now suppresses the sending of (most) link protocol messages to a neighboring node while it is cleaning up after an earlier loss of contact with that node. This stops the peer node from prematurely activating its link endpoint, which would prevent TIPC from later activating its own end. TIPC still allows outgoing RESET messages to occur during cleanup, to avoid problems if its own node recognizes the loss of contact first and tries to notify the peer of the situation. Finally, TIPC now recognizes an impending loss of contact with a peer node as soon as it receives a RESET message on a working link that is the peer's only link to the node, and ensures that the link protocol suppression mentioned above goes into effect right away -- that is, even before its own link endpoints have failed. This is necessary to ensure correct operation when there are redundant links between the nodes, since otherwise TIPC would send an ACTIVATE message upon receiving a RESET on its first link and only begin suppressing when a RESET on its second link was received, instead of initiating suppression with the first RESET message as it needs to. Note: The reworked cleanup code also eliminates a check that prevented a link endpoint's discovery object from responding to incoming messages while stale name table entries are being purged. This check is now unnecessary and would have slowed down re-establishment of communication between the nodes in some situations. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
37b9c08a |
|
28-Feb-2011 |
Allan Stephens <Allan.Stephens@windriver.com> |
tipc: Optimizations to link creation code Enhances link creation code as follows: 1) Detects illegal attempts to add a requested link earlier in the link creation process. This prevents TIPC from wasting time initializing a link object it then throws away, and also eliminates the code needed to do the throwing away. 2) Passes in the node object associated with the requested link. This allows TIPC to eliminate a search to locate the node object, as well as code that attempted to create the node if it doesn't exist. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
8f19afb2 |
|
28-Feb-2011 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
tipc: cosmetic - function names are not to be full sentences Function names like "tipc_node_has_redundant_links" are unweildy and result in long lines even for simple lines. The "has" doesn't contribute any value add, so dropping that is a slight step in the right direction. This is a cosmetic change, basic result of: for i in `grep -l tipc_node_has_ *` ; do sed -i s/tipc_node_has_/tipc_node_/ $i ; done Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
672d99e1 |
|
25-Feb-2011 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Convert node object array to a hash table Replaces the dynamically allocated array of pointers to the cluster's node objects with a static hash table. Hash collisions are resolved using chaining, with a typical hash chain having only a single node, to avoid degrading performance during processing of incoming packets. The conversion to a hash table reduces the memory requirements for TIPC's node table to approximately the same size it had prior to the previous commit. In addition to the hash table itself, TIPC now also maintains a linked list for the node objects, sorted by ascending network address. This list allows TIPC to continue sending responses to user space applications that request node and link information in sorted order. The list also improves performance when name table update messages are sent by making it easier to identify the nodes that must be notified. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
d1bcb115 |
|
25-Feb-2011 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Split up unified structure of network-related variables Converts the fields of the global "tipc_net" structure into individual variables. Since the struct was never referenced as a complete unit, its existence was pointless. This will facilitate upcoming changes to TIPC's node table and simpify upcoming relocation of the variables so they are only visible to the files that actually use them. This change is essentially cosmetic in nature, and doesn't affect the operation of TIPC. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
|
#
8f92df6a |
|
31-Dec-2010 |
Allan Stephens <Allan.Stephens@windriver.com> |
tipc: Remove prototype code for supporting multiple clusters Eliminates routines, data structures, and files that were intended to allow TIPC to support a network containing multiple clusters. Currently, TIPC supports only networks consisting of a single cluster within a single zone, so this code is unnecessary. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
51a8e4de |
|
31-Dec-2010 |
Allan Stephens <Allan.Stephens@windriver.com> |
tipc: Remove prototype code for supporting inter-cluster routing Eliminates routines and data structures that were intended to allow TIPC to route messages to other clusters. Currently, TIPC supports only networks consisting of a single cluster within a single zone, so this code is unnecessary. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
51f98a8d |
|
31-Dec-2010 |
Allan Stephens <Allan.Stephens@windriver.com> |
tipc: Remove prototype code for supporting multiple zones Eliminates routines, data structures, and files that were intended to allows TIPC to support a network containing multiple zones. Currently, TIPC supports only networks consisting of a single cluster within a single zone, so this code is unnecessary. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
31e3c3f6 |
|
13-Oct-2010 |
stephen hemminger <shemminger@vyatta.com> |
tipc: cleanup function namespace Do some cleanups of TIPC based on make namespacecheck 1. Don't export unused symbols 2. Eliminate dead code 3. Make functions and variables local 4. Rename buf_acquire to tipc_buf_acquire since it is used in several files Compile tested only. This make break out of tree kernel modules that depend on TIPC routines. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5a68d5ee |
|
17-Aug-2010 |
Allan Stephens <allan.stephens@windriver.com> |
tipc: Prevent missing name table entries when link flip-flops rapidly Ensure that TIPC does not re-establish communication with a neighboring node until it has finished updating all data structures containing information about that node to reflect the earlier loss of contact. Previously, it was possible for TIPC to perform its purge of name table entries relating to the node once contact had already been re-established, resulting in the unwanted removal of valid name table entries. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6c00055a |
|
03-Sep-2008 |
David S. Miller <davem@davemloft.net> |
tipc: Don't use structure names which easily globally conflict. Andrew Morton reported a build failure on sparc32, because TIPC uses names like "struct node" and there is a like named data structure defined in linux/node.h This just regexp replaces "struct node*" to "struct tipc_node*" to avoid this and any future similar problems. Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c4307285 |
|
09-Feb-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] TIPC: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5392d646 |
|
26-Jun-2006 |
Allan Stephens <allan.stephens@windriver.com> |
[TIPC]: Fixed link switchover bugs Incorporates several related fixes: - switchover now occurs when switching from an active link to a standby link - failure of a standby link no longer initiates switchover - links now display correct # of received packtes following reactivation Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Per Liden <per.liden@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1fc54d8f |
|
20-Mar-2006 |
Sam Ravnborg <sam@ravnborg.org> |
[TIPC]: Fix simple sparse warnings Tried to run the new tipc stack through sparse. Following patch fixes all cases where 0 was used as replacement of NULL. Use NULL to document this is a pointer and to silence sparse. This brough sparse warning count down with 127 to 24 warnings. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Per Liden <per.liden@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
4323add6 |
|
17-Jan-2006 |
Per Liden <per.liden@ericsson.com> |
[TIPC] Avoid polluting the global namespace This patch adds a tipc_ prefix to all externally visible symbols. Signed-off-by: Per Liden <per.liden@ericsson.com>
|
#
593a5f22 |
|
11-Jan-2006 |
Per Liden <per.liden@nospam.ericsson.com> |
[TIPC] More updates of file headers Updated copyright notice to include the year the file was actually created. Information about file creation dates was extracted from the files in the old CVS repository at tipc.sourceforge.net. Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
|
#
9da1c8b6 |
|
11-Jan-2006 |
Per Liden <per.liden@nospam.ericsson.com> |
[TIPC] Update of file headers The copyright statements from different parts of Ericsson have been merged into one. Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
|
#
9ea1fd3c |
|
11-Jan-2006 |
Per Liden <per.liden@nospam.ericsson.com> |
[TIPC] License header update The license header in each file now more clearly state that this code is licensed under a dual BSD/GPL. Before this was only evident if you looked at the MODULE_LICENSE line in core.c. Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
|
#
b97bf3fd |
|
02-Jan-2006 |
Per Liden <per.liden@nospam.ericsson.com> |
[TIPC] Initial merge TIPC (Transparent Inter Process Communication) is a protocol designed for intra cluster communication. For more information see http://tipc.sourceforge.net Signed-off-by: Per Liden <per.liden@nospam.ericsson.com>
|