1<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Chapter�30.�High Availability</title><link rel="stylesheet" href="samba.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.66.1"><link rel="start" href="index.html" title="The Official Samba-3 HOWTO and Reference Guide"><link rel="up" href="optional.html" title="Part�III.�Advanced Configuration"><link rel="prev" href="Backup.html" title="Chapter�29.�Backup Techniques"><link rel="next" href="largefile.html" title="Chapter�31.�Handling Large Directories"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter�30.�High Availability</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="Backup.html">Prev</a>�</td><th width="60%" align="center">Part�III.�Advanced Configuration</th><td width="20%" align="right">�<a accesskey="n" href="largefile.html">Next</a></td></tr></table><hr></div><div class="chapter" lang="en"><div class="titlepage"><div><div><h2 class="title"><a name="SambaHA"></a>Chapter�30.�High Availability</h2></div><div><div class="author"><h3 class="author"><span class="firstname">John</span> <span class="othername">H.</span> <span class="surname">Terpstra</span></h3><div class="affiliation"><span class="orgname">Samba Team<br></span><div class="address"><p><tt class="email"><<a href="mailto:jht@samba.org">jht@samba.org</a>></tt></p></div></div></div></div><div><div class="author"><h3 class="author"><span class="firstname">Jeremy</span> <span class="surname">Allison</span></h3><div class="affiliation"><span class="orgname">Samba Team<br></span><div class="address"><p><tt class="email"><<a href="mailto:jra@samba.org">jra@samba.org</a>></tt></p></div></div></div></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="sect1"><a href="SambaHA.html#id2609371">Features and Benefits</a></span></dt><dt><span class="sect1"><a href="SambaHA.html#id2609423">Technical Discussion</a></span></dt><dd><dl><dt><span class="sect2"><a href="SambaHA.html#id2609436">The Ultimate Goal</a></span></dt><dt><span class="sect2"><a href="SambaHA.html#id2609517">Why Is This So Hard?</a></span></dt><dt><span class="sect2"><a href="SambaHA.html#id2609891">A Simple Solution</a></span></dt><dt><span class="sect2"><a href="SambaHA.html#id2609930">High Availability Server Products</a></span></dt><dt><span class="sect2"><a href="SambaHA.html#id2609986">MS-DFS: The Poor Man's Cluster</a></span></dt><dt><span class="sect2"><a href="SambaHA.html#id2610024">Conclusions</a></span></dt></dl></dd></dl></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2609371"></a>Features and Benefits</h2></div></div></div><p> 2Network administrators are often concerned about the availability of file and print 3services. Network users are inclined toward intolerance of the services they depend 4on to perform vital task responsibilities. 5</p><p> 6A sign in a computer room served to remind staff of their responsibilities. It read: 7</p><div class="blockquote"><blockquote class="blockquote"><p> 8All humans fail, in both great and small ways we fail continually. Machines fail too. 9Computers are machines that are managed by humans, the fallout from failure 10can be spectacular. Your responsibility is to deal with failure, to anticipate it 11and to eliminate it as far as is humanly and economically wise to achieve. 12Are your actions part of the problem or part of the solution? 13</p></blockquote></div><p> 14If we are to deal with failure in a planned and productive manner, then first we must 15understand the problem. That is the purpose of this chapter. 16</p><p> 17Parenthetically, in the following discussion there are seeds of information on how to 18provision a network infrastructure against failure. Our purpose here is not to provide 19a lengthy dissertation on the subject of high availability. Additionally, we have made 20a conscious decision to not provide detailed working examples of high availability 21solutions; instead we present an overview of the issues in the hope that someone will 22rise to the challenge of providing a detailed document that is focused purely on 23presentation of the current state of knowledge and practice in high availability as it 24applies to the deployment of Samba and other CIFS/SMB technologies. 25</p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="id2609423"></a>Technical Discussion</h2></div></div></div><p> 26The following summary was part of a presentation by Jeremy Allison at the SambaXP 2003 27conference that was held at Goettingen, Germany, in April 2003. Material has been added 28from other sources, but it was Jeremy who inspired the structure that follows. 29</p><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2609436"></a>The Ultimate Goal</h3></div></div></div><p> 30 All clustering technologies aim to achieve one or more of the following: 31 </p><div class="itemizedlist"><ul type="disc"><li><p>Obtain the maximum affordable computational power.</p></li><li><p>Obtain faster program execution.</p></li><li><p>Deliver unstoppable services.</p></li><li><p>Avert points of failure.</p></li><li><p>Exact most effective utilization of resources.</p></li></ul></div><p> 32 A clustered file server ideally has the following properties: 33 </p><div class="itemizedlist"><ul type="disc"><li><p>All clients can connect transparently to any server.</p></li><li><p>A server can fail and clients are transparently reconnected to another server.</p></li><li><p>All servers server out the same set of files.</p></li><li><p>All file changes are immediately seen on all servers.</p><div class="itemizedlist"><ul type="circle"><li><p>Requires a distributed file system.</p></li></ul></div></li><li><p>Infinite ability to scale by adding more servers or disks.</p></li></ul></div></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2609517"></a>Why Is This So Hard?</h3></div></div></div><p> 34 In short, the problem is one of <span class="emphasis"><em>state</em></span>. 35 </p><div class="itemizedlist"><ul type="disc"><li><p> 36 All TCP/IP connections are dependent on state information. 37 </p><p> 38 The TCP connection involves a packet sequence number. This 39 sequence number would need to be dynamically updated on all 40 machines in the cluster to effect seamless TCP fail-over. 41 </p></li><li><p> 42 CIFS/SMB (the Windows networking protocols) uses TCP connections. 43 </p><p> 44 This means that from a basic design perspective, fail-over is not 45 seriously considered. 46 </p><div class="itemizedlist"><ul type="circle"><li><p> 47 All current SMB clusters are fail-over solutions 48 they rely on the clients to reconnect. They provide server 49 fail-over, but clients can lose information due to a server failure. 50 </p></li></ul></div><p> 51 </p></li><li><p> 52 Servers keep state information about client connections. 53 </p><div class="itemizedlist"><ul type="circle"><li><p>CIFS/SMB involves a lot of state.</p></li><li><p>Every file open must be compared with other file opens 54 to check share modes.</p></li></ul></div><p> 55 </p></li></ul></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id2609595"></a>The Front-End Challenge</h4></div></div></div><p> 56 To make it possible for a cluster of file servers to appear as a single server that has one 57 name and one IP address, the incoming TCP data streams from clients must be processed by the 58 front end virtual server. This server must de-multiplex the incoming packets at the SMB protocol 59 layer level and then feed the SMB packet to different servers in the cluster. 60 </p><p> 61 One could split all IPC$ connections and RPC calls to one server to handle printing and user 62 lookup requirements. RPC Printing handles are shared between different IPC4 sessions it is 63 hard to split this across clustered servers! 64 </p><p> 65 Conceptually speaking, all other servers would then provide only file services. This is a simpler 66 problem to concentrate on. 67 </p></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id2609627"></a>De-multiplexing SMB Requests</h4></div></div></div><p> 68 De-multiplexing of SMB requests requires knowledge of SMB state information, 69 all of which must be held by the front-end <span class="emphasis"><em>virtual</em></span> server. 70 This is a perplexing and complicated problem to solve. 71 </p><p> 72 Windows XP and later have changed semantics so state information (vuid, tid, fid) 73 must match for a successful operation. This makes things simpler than before and is a 74 positive step forward. 75 </p><p> 76 SMB requests are sent by vuid to their associated server. No code exists today to 77 affect this solution. This problem is conceptually similar to the problem of 78 correctly handling requests from multiple requests from Windows 2000 79 Terminal Server in Samba. 80 </p><p> 81 One possibility is to start by exposing the server pool to clients directly. 82 This could eliminate the de-multiplexing step. 83 </p></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id2609664"></a>The Distributed File System Challenge</h4></div></div></div><p> 84<a class="indexterm" name="id2609672"></a> 85 There exists many distributed file systems for UNIX and Linux. 86 </p><p> 87 Many could be adopted to backend our cluster, so long as awareness of SMB 88 semantics is kept in mind (share modes, locking and oplock issues in particular). 89 Common free distributed file systems include: 90<a class="indexterm" name="id2609687"></a> 91<a class="indexterm" name="id2609694"></a> 92<a class="indexterm" name="id2609701"></a> 93<a class="indexterm" name="id2609707"></a> 94 </p><div class="itemizedlist"><ul type="disc"><li><p>NFS</p></li><li><p>AFS</p></li><li><p>OpenGFS</p></li><li><p>Lustre</p></li></ul></div><p> 95 The server pool (cluster) can use any distributed file system backend if all SMB 96 semantics are performed within this pool. 97 </p></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id2609743"></a>Restrictive Constraints on Distributed File Systems</h4></div></div></div><p> 98 Where a clustered server provides purely SMB services, oplock handling 99 may be done within the server pool without imposing a need for this to 100 be passed to the backend file system pool. 101 </p><p> 102 On the other hand, where the server pool also provides NFS or other file services, 103 it will be essential that the implementation be oplock aware so it can 104 interoperate with SMB services. This is a significant challenge today. A failure 105 to provide this will result in a significant loss of performance that will be 106 sorely noted by users of Microsoft Windows clients. 107 </p><p> 108 Last, all state information must be shared across the server pool. 109 </p></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id2609771"></a>Server Pool Communications</h4></div></div></div><p> 110 Most backend file systems support POSIX file semantics. This makes it difficult 111 to push SMB semantics back into the file system. POSIX locks have different properties 112 and semantics from SMB locks. 113 </p><p> 114 All <span><b class="command">smbd</b></span> processes in the server pool must of necessity communicate 115 very quickly. For this, the current <i class="parameter"><tt>tdb</tt></i> file structure that Samba 116 uses is not suitable for use across a network. Clustered <span><b class="command">smbd</b></span>'s must use something else. 117 </p></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id2609809"></a>Server Pool Communications Demands</h4></div></div></div><p> 118 High speed inter-server communications in the server pool is a design prerequisite 119 for a fully functional system. Possibilities for this include: 120 </p><div class="itemizedlist"><ul type="disc"><li><p> 121 Proprietary shared memory bus (example: Myrinet or SCI [Scalable Coherent Interface]). 122 These are high cost items. 123 </p></li><li><p> 124 Gigabit ethernet (now quite affordable). 125 </p></li><li><p> 126 Raw ethernet framing (to bypass TCP and UDP overheads). 127 </p></li></ul></div><p> 128 We have yet to identify metrics for performance demands to enable this to happen 129 effectively. 130 </p></div><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="id2609848"></a>Required Modifications to Samba</h4></div></div></div><p> 131 Samba needs to be significantly modified to work with a high-speed server inter-connect 132 system to permit transparent fail-over clustering. 133 </p><p> 134 Particular functions inside Samba that will be affected include: 135 </p><div class="itemizedlist"><ul type="disc"><li><p> 136 The locking database, oplock notifications, 137 and the share mode database. 138 </p></li><li><p> 139 Failure semantics need to be defined. Samba behaves the same way as Windows. 140 When oplock messages fail, a file open request is allowed, but this is 141 potentially dangerous in a clustered environment. So how should inter-server 142 pool failure semantics function and how should this be implemented? 143 </p></li><li><p> 144 Should this be implemented using a point-to-point lock manager, or can this 145 be done using multicast techniques? 146 </p></li></ul></div></div></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2609891"></a>A Simple Solution</h3></div></div></div><p> 147 Allowing fail-over servers to handle different functions within the exported file system 148 removes the problem of requiring a distributed locking protocol. 149 </p><p> 150 If only one server is active in a pair, the need for high speed server interconnect is avoided. 151 This allows the use of existing high availability solutions, instead of inventing a new one. 152 This simpler solution comes at a price the cost of which is the need to manage a more 153 complex file name space. Since there is now not a single file system, administrators 154 must remember where all services are located a complexity not easily dealt with. 155 </p><p> 156 The <span class="emphasis"><em>virtual server</em></span> is still needed to redirect requests to backend 157 servers. Backend file space integrity is the responsibility of the administrator. 158 </p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2609930"></a>High Availability Server Products</h3></div></div></div><p> 159 Fail-over servers must communicate in order to handle resource fail-over. This is essential 160 for high availability services. The use of a dedicated heartbeat is a common technique to 161 introduce some intelligence into the fail-over process. This is often done over a dedicated 162 link (LAN or serial). 163 </p><p> 164<a class="indexterm" name="id2609946"></a> 165 Many fail-over solutions (like Red Hat Cluster Manager, as well as Microsoft Wolfpack) 166 can use a shared SCSI of Fiber Channel disk storage array for fail-over communication. 167 Information regarding Red Hat high availability solutions for Samba may be obtained from: 168 <a href="http://www.redhat.com/docs/manuals/enterprise/RHEL-AS-2.1-Manual/cluster-manager/s1-service-samba.html" target="_top">www.redhat.com.</a> 169 </p><p> 170 The Linux High Availability project is a resource worthy of consultation if your desire is 171 to build a highly available Samba file server solution. Please consult the home page at 172 <a href="http://www.linux-ha.org/" target="_top">www.linux-ha.org/.</a> 173 </p><p> 174 Front-end server complexity remains a challenge for high availability as it needs to deal 175 gracefully with backend failures, while at the same time it needs to provide continuity of service 176 to all network clients. 177 </p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2609986"></a>MS-DFS: The Poor Man's Cluster</h3></div></div></div><p> 178<a class="indexterm" name="id2609994"></a> 179<a class="indexterm" name="id2610001"></a> 180 MS-DFS links can be used to redirect clients to disparate backend servers. This pushes 181 complexity back to the network client, something already included by Microsoft. 182 MS-DFS creates the illusion of a simple, continuous file system name space, that even 183 works at the file level. 184 </p><p> 185 Above all, at the cost of complexity of management, a distributed (pseudo-cluster) can 186 be created using existing Samba functionality. 187 </p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="id2610024"></a>Conclusions</h3></div></div></div><div class="itemizedlist"><ul type="disc"><li><p>Transparent SMB clustering is hard to do!</p></li><li><p>Client fail-over is the best we can do today.</p></li><li><p>Much more work is needed before a practical and manageable high 188 availability transparent cluster solution will be possible.</p></li><li><p>MS-DFS can be used to create the illusion of a single transparent cluster.</p></li></ul></div></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="Backup.html">Prev</a>�</td><td width="20%" align="center"><a accesskey="u" href="optional.html">Up</a></td><td width="40%" align="right">�<a accesskey="n" href="largefile.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">Chapter�29.�Backup Techniques�</td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top">�Chapter�31.�Handling Large Directories</td></tr></table></div></body></html> 189