From 1514f3506b08b6f950ac2a97cf386b3d70876479 Mon Sep 17 00:00:00 2001 From: Daniel Stenberg Date: Wed, 10 Jun 2015 00:11:54 +0200 Subject: [PATCH] INTERNALS: absorbed docs/LIBCURL-STRUCTS --- docs/INTERNALS | 234 +++++++++++++++++++++++++++++++++++++++- docs/LIBCURL-STRUCTS | 248 ------------------------------------------- docs/Makefile.am | 2 +- 3 files changed, 234 insertions(+), 250 deletions(-) delete mode 100644 docs/LIBCURL-STRUCTS diff --git a/docs/INTERNALS b/docs/INTERNALS index b134260c3..4cd63b4e2 100644 --- a/docs/INTERNALS +++ b/docs/INTERNALS @@ -37,6 +37,7 @@ Table of Contents - [hostip.c explained](#hostip) - [Track Down Memory Leaks](#memoryleak) - [`multi_socket`](#multi_socket) + - [Structs in libcurl](#structs) curl internals @@ -848,6 +849,235 @@ Track Down Memory Leaks only `fd_sets` and that is no longer good enough. The changes done to c-ares are available in c-ares 1.3.1 and later. + +Structs in libcurl +================== + +This section should cover 7.32.0 pretty accurately, but will make sense even +for older and later versions as things don't change drastically that often. + +## SessionHandle + + The SessionHandle handle struct is the one returned to the outside in the + external API as a "CURL *". This is usually known as an easy handle in API + documentations and examples. + + Information and state that is related to the actual connection is in the + 'connectdata' struct. When a transfer is about to be made, libcurl will + either create a new connection or re-use an existing one. The particular + connectdata that is used by this handle is pointed out by + SessionHandle->easy_conn. + + Data and information that regard this particular single transfer is put in + the SingleRequest sub-struct. + + When the SessionHandle struct is added to a multi handle, as it must be in + order to do any transfer, the ->multi member will point to the `Curl_multi` + struct it belongs to. The ->prev and ->next members will then be used by the + multi code to keep a linked list of SessionHandle structs that are added to + that same multi handle. libcurl always uses multi so ->multi *will* point to + a `Curl_multi` when a transfer is in progress. + + ->mstate is the multi state of this particular SessionHandle. When + `multi_runsingle()` is called, it will act on this handle according to which + state it is in. The mstate is also what tells which sockets to return for a + specific SessionHandle when [`curl_multi_fdset()`][12] is called etc. + + The libcurl source code generally use the name 'data' for the variable that + points to the SessionHandle. + + When doing multiplexed HTTP/2 transfers, each SessionHandle is associated + with an individual stream, sharing the same connectdata struct. Multiplexing + makes it even more important to keep things associated with the right thing! + +## connectdata + + A general idea in libcurl is to keep connections around in a connection + "cache" after they have been used in case they will be used again and then + re-use an existing one instead of creating a new as it creates a significant + performance boost. + + Each 'connectdata' identifies a single physical connection to a server. If + the connection can't be kept alive, the connection will be closed after use + and then this struct can be removed from the cache and freed. + + Thus, the same SessionHandle can be used multiple times and each time select + another connectdata struct to use for the connection. Keep this in mind, as + it is then important to consider if options or choices are based on the + connection or the SessionHandle. + + Functions in libcurl will assume that connectdata->data points to the + SessionHandle that uses this connection (for the moment). + + As a special complexity, some protocols supported by libcurl require a + special disconnect procedure that is more than just shutting down the + socket. It can involve sending one or more commands to the server before + doing so. Since connections are kept in the connection cache after use, the + original SessionHandle may no longer be around when the time comes to shut + down a particular connection. For this purpose, libcurl holds a special + dummy `closure_handle` SessionHandle in the `Curl_multi` struct to use when + needed. + + FTP uses two TCP connections for a typical transfer but it keeps both in + this single struct and thus can be considered a single connection for most + internal concerns. + + The libcurl source code generally use the name 'conn' for the variable that + points to the connectdata. + +## Curl_multi + + Internally, the easy interface is implemented as a wrapper around multi + interface functions. This makes everything multi interface. + + `Curl_multi` is the multi handle struct exposed as "CURLM *" in external APIs. + + This struct holds a list of SessionHandle structs that have been added to + this handle with [`curl_multi_add_handle()`][13]. The start of the list is + ->easyp and ->num_easy is a counter of added SessionHandles. + + ->msglist is a linked list of messages to send back when + [`curl_multi_info_read()`][14] is called. Basically a node is added to that + list when an individual SessionHandle's transfer has completed. + + ->hostcache points to the name cache. It is a hash table for looking up name + to IP. The nodes have a limited life time in there and this cache is meant + to reduce the time for when the same name is wanted within a short period of + time. + + ->timetree points to a tree of SessionHandles, sorted by the remaining time + until it should be checked - normally some sort of timeout. Each + SessionHandle has one node in the tree. + + ->sockhash is a hash table to allow fast lookups of socket descriptor to + which SessionHandle that uses that descriptor. This is necessary for the + `multi_socket` API. + + ->conn_cache points to the connection cache. It keeps track of all + connections that are kept after use. The cache has a maximum size. + + ->closure_handle is described in the 'connectdata' section. + + The libcurl source code generally use the name 'multi' for the variable that + points to the Curl_multi struct. + +## Curl_handler + + Each unique protocol that is supported by libcurl needs to provide at least + one `Curl_handler` struct. It defines what the protocol is called and what + functions the main code should call to deal with protocol specific issues. + In general, there's a source file named [protocol].c in which there's a + "struct `Curl_handler` `Curl_handler_[protocol]`" declared. In url.c there's + then the main array with all individual `Curl_handler` structs pointed to + from a single array which is scanned through when a URL is given to libcurl + to work with. + + ->scheme is the URL scheme name, usually spelled out in uppercase. That's + "HTTP" or "FTP" etc. SSL versions of the protcol need its own `Curl_handler` + setup so HTTPS separate from HTTP. + + ->setup_connection is called to allow the protocol code to allocate protocol + specific data that then gets associated with that SessionHandle for the rest + of this transfer. It gets freed again at the end of the transfer. It will be + called before the 'connectdata' for the transfer has been selected/created. + Most protocols will allocate its private 'struct [PROTOCOL]' here and assign + SessionHandle->req.protop to point to it. + + ->connect_it allows a protocol to do some specific actions after the TCP + connect is done, that can still be considered part of the connection phase. + + Some protocols will alter the connectdata->recv[] and connectdata->send[] + function pointers in this function. + + ->connecting is similarly a function that keeps getting called as long as the + protocol considers itself still in the connecting phase. + + ->do_it is the function called to issue the transfer request. What we call + the DO action internally. If the DO is not enough and things need to be kept + getting done for the entire DO sequence to complete, ->doing is then usually + also provided. Each protocol that needs to do multiple commands or similar + for do/doing need to implement their own state machines (see SCP, SFTP, + FTP). Some protocols (only FTP and only due to historical reasons) has a + separate piece of the DO state called `DO_MORE`. + + ->doing keeps getting called while issuing the transfer request command(s) + + ->done gets called when the transfer is complete and DONE. That's after the + main data has been transferred. + + ->do_more gets called during the `DO_MORE` state. The FTP protocol uses this + state when setting up the second connection. + + ->`proto_getsock` + ->`doing_getsock` + ->`domore_getsock` + ->`perform_getsock` + Functions that return socket information. Which socket(s) to wait for which + action(s) during the particular multi state. + + ->disconnect is called immediately before the TCP connection is shutdown. + + ->readwrite gets called during transfer to allow the protocol to do extra + reads/writes + + ->defport is the default report TCP or UDP port this protocol uses + + ->protocol is one or more bits in the `CURLPROTO_*` set. The SSL versions + have their "base" protocol set and then the SSL variation. Like + "HTTP|HTTPS". + + ->flags is a bitmask with additional information about the protocol that will + make it get treated differently by the generic engine: + + - `PROTOPT_SSL` - will make it connect and negotiate SSL + + - `PROTOPT_DUAL` - this protocol uses two connections + + - `PROTOPT_CLOSEACTION` - this protocol has actions to do before closing the + connection. This flag is no longer used by code, yet still set for a bunch + protocol handlers. + + - `PROTOPT_DIRLOCK` - "direction lock". The SSH protocols set this bit to + limit which "direction" of socket actions that the main engine will + concern itself about. + + - `PROTOPT_NONETWORK` - a protocol that doesn't use network (read file:) + + - `PROTOPT_NEEDSPWD` - this protocol needs a password and will use a default + one unless one is provided + + - `PROTOPT_NOURLQUERY` - this protocol can't handle a query part on the URL + (?foo=bar) + +## conncache + + Is a hash table with connections for later re-use. Each SessionHandle has + a pointer to its connection cache. Each multi handle sets up a connection + cache that all added SessionHandles share by default. + +## Curl_share + + The libcurl share API allocates a `Curl_share` struct, exposed to the + external API as "CURLSH *". + + The idea is that the struct can have a set of own versions of caches and + pools and then by providing this struct in the `CURLOPT_SHARE` option, those + specific SessionHandles will use the caches/pools that this share handle + holds. + + Then individual SessionHandle structs can be made to share specific things + that they otherwise wouldn't, such as cookies. + + The `Curl_share` struct can currently hold cookies, DNS cache and the SSL + session cache. + +## CookieInfo + + This is the main cookie struct. It holds all known cookies and related + information. Each SessionHandle has its own private CookieInfo even when + they are added to a multi handle. They can be made to share cookies by using + the share API. + [1]: http://curl.haxx.se/libcurl/c/curl_easy_setopt.html [2]: http://curl.haxx.se/libcurl/c/curl_easy_init.html @@ -860,4 +1090,6 @@ Track Down Memory Leaks [9]: http://curl.haxx.se/libcurl/c/curl_multi_setopt.html [10]: http://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html [11]: http://curl.haxx.se/libcurl/c/curl_multi_perform.html -[12]: http://curl.haxx.se/libcurl/c/curl_multi_fdset.html \ No newline at end of file +[12]: http://curl.haxx.se/libcurl/c/curl_multi_fdset.html +[13]: http://curl.haxx.se/libcurl/c/curl_multi_add_handle.html +[14]: http://curl.haxx.se/libcurl/c/curl_multi_info_read.html diff --git a/docs/LIBCURL-STRUCTS b/docs/LIBCURL-STRUCTS deleted file mode 100644 index 11dee8539..000000000 --- a/docs/LIBCURL-STRUCTS +++ /dev/null @@ -1,248 +0,0 @@ - _ _ ____ _ - ___| | | | _ \| | - / __| | | | |_) | | - | (__| |_| | _ <| |___ - \___|\___/|_| \_\_____| - -Structs in libcurl - -This document should cover 7.32.0 pretty accurately, but will make sense even -for older and later versions as things don't change drastically that often. - - 1. The main structs in libcurl - 1.1 SessionHandle - 1.2 connectdata - 1.3 Curl_multi - 1.4 Curl_handler - 1.5 conncache - 1.6 Curl_share - 1.7 CookieInfo - -============================================================================== - -1. The main structs in libcurl - - 1.1 SessionHandle - - The SessionHandle handle struct is the one returned to the outside in the - external API as a "CURL *". This is usually known as an easy handle in API - documentations and examples. - - Information and state that is related to the actual connection is in the - 'connectdata' struct. When a transfer is about to be made, libcurl will - either create a new connection or re-use an existing one. The particular - connectdata that is used by this handle is pointed out by - SessionHandle->easy_conn. - - Data and information that regard this particular single transfer is put in - the SingleRequest sub-struct. - - When the SessionHandle struct is added to a multi handle, as it must be in - order to do any transfer, the ->multi member will point to the Curl_multi - struct it belongs to. The ->prev and ->next members will then be used by the - multi code to keep a linked list of SessionHandle structs that are added to - that same multi handle. libcurl always uses multi so ->multi *will* point to - a Curl_multi when a transfer is in progress. - - ->mstate is the multi state of this particular SessionHandle. When - multi_runsingle() is called, it will act on this handle according to which - state it is in. The mstate is also what tells which sockets to return for a - specific SessionHandle when curl_multi_fdset() is called etc. - - The libcurl source code generally use the name 'data' for the variable that - points to the SessionHandle. - - When doing multiplexed HTTP/2 transfers, each SessionHandle is associated - with an individual stream, sharing the same connectdata struct. Multiplexing - makes it even more important to keep things associated with the right thing! - - 1.2 connectdata - - A general idea in libcurl is to keep connections around in a connection - "cache" after they have been used in case they will be used again and then - re-use an existing one instead of creating a new as it creates a significant - performance boost. - - Each 'connectdata' identifies a single physical connection to a server. If - the connection can't be kept alive, the connection will be closed after use - and then this struct can be removed from the cache and freed. - - Thus, the same SessionHandle can be used multiple times and each time select - another connectdata struct to use for the connection. Keep this in mind, as - it is then important to consider if options or choices are based on the - connection or the SessionHandle. - - Functions in libcurl will assume that connectdata->data points to the - SessionHandle that uses this connection (for the moment). - - As a special complexity, some protocols supported by libcurl require a - special disconnect procedure that is more than just shutting down the - socket. It can involve sending one or more commands to the server before - doing so. Since connections are kept in the connection cache after use, the - original SessionHandle may no longer be around when the time comes to shut - down a particular connection. For this purpose, libcurl holds a special - dummy 'closure_handle' SessionHandle in the Curl_multi struct to - - FTP uses two TCP connections for a typical transfer but it keeps both in - this single struct and thus can be considered a single connection for most - internal concerns. - - The libcurl source code generally use the name 'conn' for the variable that - points to the connectdata. - - - 1.3 Curl_multi - - Internally, the easy interface is implemented as a wrapper around multi - interface functions. This makes everything multi interface. - - Curl_multi is the multi handle struct exposed as "CURLM *" in external APIs. - - This struct holds a list of SessionHandle structs that have been added to - this handle with curl_multi_add_handle(). The start of the list is ->easyp - and ->num_easy is a counter of added SessionHandles. - - ->msglist is a linked list of messages to send back when - curl_multi_info_read() is called. Basically a node is added to that list - when an individual SessionHandle's transfer has completed. - - ->hostcache points to the name cache. It is a hash table for looking up name - to IP. The nodes have a limited life time in there and this cache is meant - to reduce the time for when the same name is wanted within a short period of - time. - - ->timetree points to a tree of SessionHandles, sorted by the remaining time - until it should be checked - normally some sort of timeout. Each - SessionHandle has one node in the tree. - - ->sockhash is a hash table to allow fast lookups of socket descriptor to - which SessionHandle that uses that descriptor. This is necessary for the - multi_socket API. - - ->conn_cache points to the connection cache. It keeps track of all - connections that are kept after use. The cache has a maximum size. - - ->closure_handle is described in the 'connectdata' section. - - The libcurl source code generally use the name 'multi' for the variable that - points to the Curl_multi struct. - - - 1.4 Curl_handler - - Each unique protocol that is supported by libcurl needs to provide at least - one Curl_handler struct. It defines what the protocol is called and what - functions the main code should call to deal with protocol specific issues. - In general, there's a source file named [protocol].c in which there's a - "struct Curl_handler Curl_handler_[protocol]" declared. In url.c there's - then the main array with all individual Curl_handler structs pointed to from - a single array which is scanned through when a URL is given to libcurl to - work with. - - ->scheme is the URL scheme name, usually spelled out in uppercase. That's - "HTTP" or "FTP" etc. SSL versions of the protcol need its own Curl_handler - setup so HTTPS separate from HTTP. - - ->setup_connection is called to allow the protocol code to allocate protocol - specific data that then gets associated with that SessionHandle for the rest - of this transfer. It gets freed again at the end of the transfer. It will be - called before the 'connectdata' for the transfer has been selected/created. - Most protocols will allocate its private 'struct [PROTOCOL]' here and assign - SessionHandle->req.protop to point to it. - - ->connect_it allows a protocol to do some specific actions after the TCP - connect is done, that can still be considered part of the connection phase. - - Some protocols will alter the connectdata->recv[] and connectdata->send[] - function pointers in this function. - - ->connecting is similarly a function that keeps getting called as long as the - protocol considers itself still in the connecting phase. - - ->do_it is the function called to issue the transfer request. What we call - the DO action internally. If the DO is not enough and things need to be kept - getting done for the entire DO sequence to complete, ->doing is then usually - also provided. Each protocol that needs to do multiple commands or similar - for do/doing need to implement their own state machines (see SCP, SFTP, - FTP). Some protocols (only FTP and only due to historical reasons) has a - separate piece of the DO state called DO_MORE. - - ->doing keeps getting called while issuing the transfer request command(s) - - ->done gets called when the transfer is complete and DONE. That's after the - main data has been transferred. - - ->do_more gets called during the DO_MORE state. The FTP protocol uses this - state when setting up the second connection. - - ->proto_getsock - ->doing_getsock - ->domore_getsock - ->perform_getsock - Functions that return socket information. Which socket(s) to wait for which - action(s) during the particular multi state. - - ->disconnect is called immediately before the TCP connection is shutdown. - - ->readwrite gets called during transfer to allow the protocol to do extra - reads/writes - - ->defport is the default report TCP or UDP port this protocol uses - - ->protocol is one or more bits in the CURLPROTO_* set. The SSL versions have - their "base" protocol set and then the SSL variation. Like "HTTP|HTTPS". - - ->flags is a bitmask with additional information about the protocol that will - make it get treated differently by the generic engine: - - PROTOPT_SSL - will make it connect and negotiate SSL - - PROTOPT_DUAL - this protocol uses two connections - - PROTOPT_CLOSEACTION - this protocol has actions to do before closing the - connection. This flag is no longer used by code, yet still set for a bunch - protocol handlers. - - PROTOPT_DIRLOCK - "direction lock". The SSH protocols set this bit to - limit which "direction" of socket actions that the main engine will - concern itself about. - - PROTOPT_NONETWORK - a protocol that doesn't use network (read file:) - - PROTOPT_NEEDSPWD - this protocol needs a password and will use a default - one unless one is provided - - PROTOPT_NOURLQUERY - this protocol can't handle a query part on the URL - (?foo=bar) - - - 1.5 conncache - - Is a hash table with connections for later re-use. Each SessionHandle has - a pointer to its connection cache. Each multi handle sets up a connection - cache that all added SessionHandles share by default. - - - 1.6 Curl_share - - The libcurl share API allocates a Curl_share struct, exposed to the external - API as "CURLSH *". - - The idea is that the struct can have a set of own versions of caches and - pools and then by providing this struct in the CURLOPT_SHARE option, those - specific SessionHandles will use the caches/pools that this share handle - holds. - - Then individual SessionHandle structs can be made to share specific things - that they otherwise wouldn't, such as cookies. - - The Curl_share struct can currently hold cookies, DNS cache and the SSL - session cache. - - - 1.7 CookieInfo - - This is the main cookie struct. It holds all known cookies and related - information. Each SessionHandle has its own private CookieInfo even when - they are added to a multi handle. They can be made to share cookies by using - the share API. diff --git a/docs/Makefile.am b/docs/Makefile.am index d4a083274..e3e27d333 100644 --- a/docs/Makefile.am +++ b/docs/Makefile.am @@ -37,7 +37,7 @@ EXTRA_DIST = MANUAL BUGS CONTRIBUTE FAQ FEATURES INTERNALS SSLCERTS \ README.win32 RESOURCES TODO TheArtOfHttpScripting THANKS VERSIONS \ KNOWN_BUGS BINDINGS $(man_MANS) $(HTMLPAGES) HISTORY INSTALL \ $(PDFPAGES) LICENSE-MIXING README.netware DISTRO-DILEMMA INSTALL.devcpp \ - MAIL-ETIQUETTE HTTP-COOKIES LIBCURL-STRUCTS SECURITY RELEASE-PROCEDURE \ + MAIL-ETIQUETTE HTTP-COOKIES SECURITY RELEASE-PROCEDURE \ SSL-PROBLEMS HTTP2.md ROADMAP.md MAN2HTML= roffit < $< >$@