1 _ _ ____ _ 2 ___| | | | _ \| | 3 / __| | | | |_) | | 4 | (__| |_| | _ <| |___ 5 \___|\___/|_| \_\_____| 6 7Structs in libcurl 8 9This document should cover 7.32.0 pretty accurately, but will make sense even 10for older and later versions as things don't change drastically that often. 11 12 1. The main structs in libcurl 13 1.1 SessionHandle 14 1.2 connectdata 15 1.3 Curl_multi 16 1.4 Curl_handler 17 1.5 conncache 18 1.6 Curl_share 19 1.7 CookieInfo 20 21============================================================================== 22 231. The main structs in libcurl 24 25 1.1 SessionHandle 26 27 The SessionHandle handle struct is the one returned to the outside in the 28 external API as a "CURL *". This is usually known as an easy handle in API 29 documentations and examples. 30 31 Information and state that is related to the actual connection is in the 32 'connectdata' struct. When a transfer is about to be made, libcurl will 33 either create a new connection or re-use an existing one. The particular 34 connectdata that is used by this handle is pointed out by 35 SessionHandle->easy_conn. 36 37 Data and information that regard this particular single transfer is put in 38 the SingleRequest sub-struct. 39 40 When the SessionHandle struct is added to a multi handle, as it must be in 41 order to do any transfer, the ->multi member will point to the Curl_multi 42 struct it belongs to. The ->prev and ->next members will then be used by the 43 multi code to keep a linked list of SessionHandle structs that are added to 44 that same multi handle. libcurl always uses multi so ->multi *will* point to 45 a Curl_multi when a transfer is in progress. 46 47 ->mstate is the multi state of this particular SessionHandle. When 48 multi_runsingle() is called, it will act on this handle according to which 49 state it is in. The mstate is also what tells which sockets to return for a 50 specific SessionHandle when curl_multi_fdset() is called etc. 51 52 The libcurl source code generally use the name 'data' for the variable that 53 points to the SessionHandle. 54 55 56 1.2 connectdata 57 58 A general idea in libcurl is to keep connections around in a connection 59 "cache" after they have been used in case they will be used again and then 60 re-use an existing one instead of creating a new as it creates a significant 61 performance boost. 62 63 Each 'connectdata' identifies a single physical connection to a server. If 64 the connection can't be kept alive, the connection will be closed after use 65 and then this struct can be removed from the cache and freed. 66 67 Thus, the same SessionHandle can be used multiple times and each time select 68 another connectdata struct to use for the connection. Keep this in mind, as 69 it is then important to consider if options or choices are based on the 70 connection or the SessionHandle. 71 72 Functions in libcurl will assume that connectdata->data points to the 73 SessionHandle that uses this connection. 74 75 As a special complexity, some protocols supported by libcurl require a 76 special disconnect procedure that is more than just shutting down the 77 socket. It can involve sending one or more commands to the server before 78 doing so. Since connections are kept in the connection cache after use, the 79 original SessionHandle may no longer be around when the time comes to shut 80 down a particular connection. For this purpose, libcurl holds a special 81 dummy 'closure_handle' SessionHandle in the Curl_multi struct to 82 83 FTP uses two TCP connections for a typical transfer but it keeps both in 84 this single struct and thus can be considered a single connection for most 85 internal concerns. 86 87 The libcurl source code generally use the name 'conn' for the variable that 88 points to the connectdata. 89 90 91 1.3 Curl_multi 92 93 Internally, the easy interface is implemented as a wrapper around multi 94 interface functions. This makes everything multi interface. 95 96 Curl_multi is the multi handle struct exposed as "CURLM *" in external APIs. 97 98 This struct holds a list of SessionHandle structs that have been added to 99 this handle with curl_multi_add_handle(). The start of the list is ->easyp 100 and ->num_easy is a counter of added SessionHandles. 101 102 ->msglist is a linked list of messages to send back when 103 curl_multi_info_read() is called. Basically a node is added to that list 104 when an individual SessionHandle's transfer has completed. 105 106 ->hostcache points to the name cache. It is a hash table for looking up name 107 to IP. The nodes have a limited life time in there and this cache is meant 108 to reduce the time for when the same name is wanted within a short period of 109 time. 110 111 ->timetree points to a tree of SessionHandles, sorted by the remaining time 112 until it should be checked - normally some sort of timeout. Each 113 SessionHandle has one node in the tree. 114 115 ->sockhash is a hash table to allow fast lookups of socket descriptor to 116 which SessionHandle that uses that descriptor. This is necessary for the 117 multi_socket API. 118 119 ->conn_cache points to the connection cache. It keeps track of all 120 connections that are kept after use. The cache has a maximum size. 121 122 ->closure_handle is described in the 'connectdata' section. 123 124 The libcurl source code generally use the name 'multi' for the variable that 125 points to the Curl_multi struct. 126 127 128 1.4 Curl_handler 129 130 Each unique protocol that is supported by libcurl needs to provide at least 131 one Curl_handler struct. It defines what the protocol is called and what 132 functions the main code should call to deal with protocol specific issues. 133 In general, there's a source file named [protocol].c in which there's a 134 "struct Curl_handler Curl_handler_[protocol]" declared. In url.c there's 135 then the main array with all individual Curl_handler structs pointed to from 136 a single array which is scanned through when a URL is given to libcurl to 137 work with. 138 139 ->scheme is the URL scheme name, usually spelled out in uppercase. That's 140 "HTTP" or "FTP" etc. SSL versions of the protcol need its own Curl_handler 141 setup so HTTPS separate from HTTP. 142 143 ->setup_connection is called to allow the protocol code to allocate protocol 144 specific data that then gets associated with that SessionHandle for the rest 145 of this transfer. It gets freed again at the end of the transfer. It will be 146 called before the 'connectdata' for the transfer has been selected/created. 147 Most protocols will allocate its private 'struct [PROTOCOL]' here and assign 148 SessionHandle->req.protop to point to it. 149 150 ->connect_it allows a protocol to do some specific actions after the TCP 151 connect is done, that can still be considered part of the connection phase. 152 153 Some protocols will alter the connectdata->recv[] and connectdata->send[] 154 function pointers in this function. 155 156 ->connecting is similarly a function that keeps getting called as long as the 157 protocol considers itself still in the connecting phase. 158 159 ->do_it is the function called to issue the transfer request. What we call 160 the DO action internally. If the DO is not enough and things need to be kept 161 getting done for the entire DO sequence to complete, ->doing is then usually 162 also provided. Each protocol that needs to do multiple commands or similar 163 for do/doing need to implement their own state machines (see SCP, SFTP, 164 FTP). Some protocols (only FTP and only due to historical reasons) has a 165 separate piece of the DO state called DO_MORE. 166 167 ->doing keeps getting called while issuing the transfer request command(s) 168 169 ->done gets called when the transfer is complete and DONE. That's after the 170 main data has been transferred. 171 172 ->do_more gets called during the DO_MORE state. The FTP protocol uses this 173 state when setting up the second connection. 174 175 ->proto_getsock 176 ->doing_getsock 177 ->domore_getsock 178 ->perform_getsock 179 Functions that return socket information. Which socket(s) to wait for which 180 action(s) during the particular multi state. 181 182 ->disconnect is called immediately before the TCP connection is shutdown. 183 184 ->readwrite gets called during transfer to allow the protocol to do extra 185 reads/writes 186 187 ->defport is the default report TCP or UDP port this protocol uses 188 189 ->protocol is one or more bits in the CURLPROTO_* set. The SSL versions have 190 their "base" protocol set and then the SSL variation. Like "HTTP|HTTPS". 191 192 ->flags is a bitmask with additional information about the protocol that will 193 make it get treated differently by the generic engine: 194 195 PROTOPT_SSL - will make it connect and negotiate SSL 196 197 PROTOPT_DUAL - this protocol uses two connections 198 199 PROTOPT_CLOSEACTION - this protocol has actions to do before closing the 200 connection. This flag is no longer used by code, yet still set for a bunch 201 protocol handlers. 202 203 PROTOPT_DIRLOCK - "direction lock". The SSH protocols set this bit to 204 limit which "direction" of socket actions that the main engine will 205 concern itself about. 206 207 PROTOPT_NONETWORK - a protocol that doesn't use network (read file:) 208 209 PROTOPT_NEEDSPWD - this protocol needs a password and will use a default 210 one unless one is provided 211 212 PROTOPT_NOURLQUERY - this protocol can't handle a query part on the URL 213 (?foo=bar) 214 215 216 1.5 conncache 217 218 Is a hash table with connections for later re-use. Each SessionHandle has 219 a pointer to its connection cache. Each multi handle sets up a connection 220 cache that all added SessionHandles share by default. 221 222 223 1.6 Curl_share 224 225 The libcurl share API allocates a Curl_share struct, exposed to the external 226 API as "CURLSH *". 227 228 The idea is that the struct can have a set of own versions of caches and 229 pools and then by providing this struct in the CURLOPT_SHARE option, those 230 specific SessionHandles will use the caches/pools that this share handle 231 holds. 232 233 Then individual SessionHandle structs can be made to share specific things 234 that they otherwise wouldn't, such as cookies. 235 236 The Curl_share struct can currently hold cookies, DNS cache and the SSL 237 session cache. 238 239 240 1.7 CookieInfo 241 242 This is the main cookie struct. It holds all known cookies and related 243 information. Each SessionHandle has its own private CookieInfo even when 244 they are added to a multi handle. They can be made to share cookies by using 245 the share API. 246