1
0
mirror of https://github.com/moparisthebest/curl synced 2024-12-21 23:58:49 -05:00

INTERNALS: cat lib/README* >> INTERNALS

and a conversion to markdown. Removed the lib/README.* files. The idea
being to move toward having INTERNALS as the one and only "book" of
internals documentation.

Added a TOC to top of the document.
This commit is contained in:
Daniel Stenberg 2015-06-09 23:57:22 +02:00
parent cbf2920d02
commit 55f3eb588d
10 changed files with 528 additions and 627 deletions

View File

@ -1,18 +1,56 @@
_ _ ____ _
___| | | | _ \| |
/ __| | | | |_) | |
| (__| |_| | _ <| |___
\___|\___/|_| \_\_____|
Table of Contents
=================
INTERNALS
- [Intro](#intro)
- [git](#git)
- [Portability](#Portability)
- [Windows vs Unix](#winvsunix)
- [Library](#Library)
- [`Curl_connect`](#Curl_connect)
- [`Curl_do`](#Curl_do)
- [`Curl_readwrite`](#Curl_readwrite)
- [`Curl_done`](#Curl_done)
- [`Curl_disconnect`](#Curl_disconnect)
- [HTTP(S)](#http)
- [FTP](#ftp)
- [Kerberos](#kerberos)
- [TELNET](#telnet)
- [FILE](#file)
- [SMB](#smb)
- [LDAP](#ldap)
- [E-mail](#email)
- [General](#general)
- [Persistent Connections](#persistent)
- [multi interface/non-blocking](#multi)
- [SSL libraries](#ssl)
- [Library Symbols](#symbols)
- [Return Codes and Informationals](#returncodes)
- [AP/ABI](#abi)
- [Client](#client)
- [Memory Debugging](#memorydebug)
- [Test Suite](#test)
- [Asynchronous name resolves](#asyncdns)
- [c-ares](#cares)
- [`curl_off_t`](#curl_off_t)
- [curlx](#curlx)
- [Content Encoding](#contentencoding)
- [hostip.c explained](#hostip)
- [Track Down Memory Leaks](#memoryleak)
- [`multi_socket`](#multi_socket)
The project is split in two. The library and the client. The client part uses
the library, but the library is designed to allow other applications to use
it.
<a name="intro"></a>
curl internals
==============
This project is split in two. The library and the client. The client part
uses the library, but the library is designed to allow other applications to
use it.
The largest amount of code and complexity is in the library part.
GIT
<a name="git"></a>
git
===
All changes to the sources are committed to the git repository as soon as
@ -23,6 +61,7 @@ GIT
Tagging shall be used extensively, and by the time we release new archives we
should tag the sources with a name similar to the released version number.
<a name="Portability"></a>
Portability
===========
@ -34,45 +73,55 @@ Portability
want it to remain functional and buildable with these and later versions
(older versions may still work but is not what we work hard to maintain):
OpenSSL 0.9.7
GnuTLS 1.2
zlib 1.1.4
libssh2 0.16
c-ares 1.6.0
libidn 0.4.1
cyassl 2.0.0
openldap 2.0
MIT Kerberos 1.2.4
GSKit V5R3M0
NSS 3.14.x
axTLS 1.2.7
PolarSSL 1.3.0
Heimdal ?
nghttp2 1.0.0
Dependencies
------------
- OpenSSL 0.9.7
- GnuTLS 1.2
- zlib 1.1.4
- libssh2 0.16
- c-ares 1.6.0
- libidn 0.4.1
- cyassl 2.0.0
- openldap 2.0
- MIT Kerberos 1.2.4
- GSKit V5R3M0
- NSS 3.14.x
- axTLS 1.2.7
- PolarSSL 1.3.0
- Heimdal ?
- nghttp2 1.0.0
Operating Systems
-----------------
On systems where configure runs, we aim at working on them all - if they have
a suitable C compiler. On systems that don't run configure, we strive to keep
curl running fine on:
Windows 98
AS/400 V5R3M0
Symbian 9.1
Windows CE ?
TPF ?
- Windows 98
- AS/400 V5R3M0
- Symbian 9.1
- Windows CE ?
- TPF ?
Build tools
-----------
When writing code (mostly for generating stuff included in release tarballs)
we use a few "build tools" and we make sure that we remain functional with
these versions:
GNU Libtool 1.4.2
GNU Autoconf 2.57
GNU Automake 1.7 (we currently avoid 1.10 due to Solaris-related bugs)
GNU M4 1.4
perl 5.004
roffit 0.5
groff ? (any version that supports "groff -Tps -man [in] [out]")
ps2pdf (gs) ?
- GNU Libtool 1.4.2
- GNU Autoconf 2.57
- GNU Automake 1.7
- GNU M4 1.4
- perl 5.004
- roffit 0.5
- groff ? (any version that supports "groff -Tps -man [in] [out]")
- ps2pdf (gs) ?
<a name="winvsunix"></a>
Windows vs Unix
===============
@ -87,8 +136,9 @@ Windows vs Unix
2. Windows requires a couple of init calls for the socket stuff.
That's taken care of by the curl_global_init() call, but if other libs also
do it etc there might be reasons for applications to alter that behaviour.
That's taken care of by the `curl_global_init()` call, but if other libs
also do it etc there might be reasons for applications to alter that
behaviour.
3. The file descriptors for network communication and file operations are
not easily interchangeable as in unix.
@ -101,28 +151,29 @@ Windows vs Unix
We set stdout to binary under windows
Inside the source code, We make an effort to avoid '#ifdef [Your OS]'. All
Inside the source code, We make an effort to avoid `#ifdef [Your OS]`. All
conditionals that deal with features *should* instead be in the format
'#ifdef HAVE_THAT_WEIRD_FUNCTION'. Since Windows can't run configure scripts,
we maintain a curl_config-win32.h file in lib directory that is supposed to
look exactly as a curl_config.h file would have looked like on a Windows
`#ifdef HAVE_THAT_WEIRD_FUNCTION`. Since Windows can't run configure scripts,
we maintain a `curl_config-win32.h` file in lib directory that is supposed to
look exactly as a `curl_config.h` file would have looked like on a Windows
machine!
Generally speaking: always remember that this will be compiled on dozens of
operating systems. Don't walk on the edge.
<a name="Library"></a>
Library
=======
(See LIBCURL-STRUCTS for a separate document describing all major internal
(See `LIBCURL-STRUCTS` for a separate document describing all major internal
structs and their purposes.)
There are plenty of entry points to the library, namely each publicly defined
function that libcurl offers to applications. All of those functions are
rather small and easy-to-follow. All the ones prefixed with 'curl_easy' are
rather small and easy-to-follow. All the ones prefixed with `curl_easy` are
put in the lib/easy.c file.
curl_global_init_() and curl_global_cleanup() should be called by the
`curl_global_init_()` and `curl_global_cleanup()` should be called by the
application to initialize and clean up global stuff in the library. As of
today, it can handle the global SSL initing if SSL is enabled and it can init
the socket layer on windows machines. libcurl itself has no "global" scope.
@ -130,51 +181,56 @@ Library
All printf()-style functions use the supplied clones in lib/mprintf.c. This
makes sure we stay absolutely platform independent.
curl_easy_init() allocates an internal struct and makes some initializations.
The returned handle does not reveal internals. This is the 'SessionHandle'
struct which works as an "anchor" struct for all curl_easy functions. All
connections performed will get connect-specific data allocated that should be
used for things related to particular connections/requests.
[ `curl_easy_init()`][2] allocates an internal struct and makes some
initializations. The returned handle does not reveal internals. This is the
'SessionHandle' struct which works as an "anchor" struct for all `curl_easy`
functions. All connections performed will get connect-specific data allocated
that should be used for things related to particular connections/requests.
curl_easy_setopt() takes three arguments, where the option stuff must be
passed in pairs: the parameter-ID and the parameter-value. The list of
[`curl_easy_setopt()`][1] takes three arguments, where the option stuff must
be passed in pairs: the parameter-ID and the parameter-value. The list of
options is documented in the man page. This function mainly sets things in
the 'SessionHandle' struct.
curl_easy_perform() is just a wrapper function that makes use of the multi
API. It basically curl_multi_init(), curl_multi_add_handle(),
curl_multi_wait(), and curl_multi_perform() until the transfer is done and
then returns.
`curl_easy_perform()` is just a wrapper function that makes use of the multi
API. It basically calls `curl_multi_init()`, `curl_multi_add_handle()`,
`curl_multi_wait()`, and `curl_multi_perform()` until the transfer is done
and then returns.
Some of the most important key functions in url.c are called from multi.c
when certain key steps are to be made in the transfer operation.
o Curl_connect()
<a name="Curl_connect"></a>
Curl_connect()
--------------
Analyzes the URL, it separates the different components and connects to the
remote host. This may involve using a proxy and/or using SSL. The
Curl_resolv() function in lib/hostip.c is used for looking up host names
`Curl_resolv()` function in lib/hostip.c is used for looking up host names
(it does then use the proper underlying method, which may vary between
platforms and builds).
When Curl_connect is done, we are connected to the remote site. Then it is
time to tell the server to get a document/file. Curl_do() arranges this.
When `Curl_connect` is done, we are connected to the remote site. Then it
is time to tell the server to get a document/file. `Curl_do()` arranges
this.
This function makes sure there's an allocated and initiated 'connectdata'
struct that is used for this particular connection only (although there may
be several requests performed on the same connect). A bunch of things are
inited/inherited from the SessionHandle struct.
o Curl_do()
<a name="Curl_do"></a>
Curl_do()
---------
Curl_do() makes sure the proper protocol-specific function is called. The
`Curl_do()` makes sure the proper protocol-specific function is called. The
functions are named after the protocols they handle.
The protocol-specific functions of course deal with protocol-specific
negotiations and setup. They have access to the Curl_sendf() (from
negotiations and setup. They have access to the `Curl_sendf()` (from
lib/sendf.c) function to send printf-style formatted data to the remote
host and when they're ready to make the actual file transfer they call the
Curl_Transfer() function (in lib/transfer.c) to setup the transfer and
`Curl_Transfer()` function (in lib/transfer.c) to setup the transfer and
returns.
If this DO function fails and the connection is being re-used, libcurl will
@ -183,11 +239,13 @@ Library
we have discovered a dead connection before the DO function and thus we
might wrongly be re-using a connection that was closed by the remote peer.
Some time during the DO function, the Curl_setup_transfer() function must
Some time during the DO function, the `Curl_setup_transfer()` function must
be called with some basic info about the upcoming transfer: what socket(s)
to read/write and the expected file transfer sizes (if known).
o Curl_readwrite()
<a name="Curl_readwrite"></a>
Curl_readwrite()
----------------
Called during the transfer of the actual protocol payload.
@ -196,18 +254,22 @@ Library
called). The speedcheck functions in lib/speedcheck.c are also used to
verify that the transfer is as fast as required.
o Curl_done()
<a name="Curl_done"></a>
Curl_done()
-----------
Called after a transfer is done. This function takes care of everything
that has to be done after a transfer. This function attempts to leave
matters in a state so that Curl_do() should be possible to call again on
matters in a state so that `Curl_do()` should be possible to call again on
the same connection (in a persistent connection case). It might also soon
be closed with Curl_disconnect().
be closed with `Curl_disconnect()`.
o Curl_disconnect()
<a name="Curl_disconnect"></a>
Curl_disconnect()
-----------------
When doing normal connections and transfers, no one ever tries to close any
connections so this is not normally called when curl_easy_perform() is
connections so this is not normally called when `curl_easy_perform()` is
used. This function is only used when we are certain that no more transfers
is going to be made on the connection. It can be also closed by force, or
it can be called to make sure that libcurl doesn't keep too many
@ -216,8 +278,9 @@ Library
This function cleans up all resources that are associated with a single
connection.
HTTP(S)
<a name="http"></a>
HTTP(S)
=======
HTTP offers a lot and is the protocol in curl that uses the most lines of
code. There is a special file (lib/formdata.c) that offers all the multipart
@ -229,100 +292,123 @@ Library
HTTPS uses in almost every means the same procedure as HTTP, with only two
exceptions: the connect procedure is different and the function used to read
or write from the socket is different, although the latter fact is hidden in
the source by the use of Curl_read() for reading and Curl_write() for writing
data to the remote server.
the source by the use of `Curl_read()` for reading and `Curl_write()` for
writing data to the remote server.
http_chunks.c contains functions that understands HTTP 1.1 chunked transfer
`http_chunks.c` contains functions that understands HTTP 1.1 chunked transfer
encoding.
An interesting detail with the HTTP(S) request, is the Curl_add_buffer()
An interesting detail with the HTTP(S) request, is the `Curl_add_buffer()`
series of functions we use. They append data to one single buffer, and when
the building is done the entire request is sent off in one single write. This
is done this way to overcome problems with flawed firewalls and lame servers.
FTP
<a name="ftp"></a>
FTP
===
The Curl_if2ip() function can be used for getting the IP number of a
The `Curl_if2ip()` function can be used for getting the IP number of a
specified network interface, and it resides in lib/if2ip.c.
Curl_ftpsendf() is used for sending FTP commands to the remote server. It was
made a separate function to prevent us programmers from forgetting that they
must be CRLF terminated. They must also be sent in one single write() to make
firewalls and similar happy.
`Curl_ftpsendf()` is used for sending FTP commands to the remote server. It
was made a separate function to prevent us programmers from forgetting that
they must be CRLF terminated. They must also be sent in one single write() to
make firewalls and similar happy.
Kerberos
<a name="kerberos"></a>
Kerberos
--------
Kerberos support is mainly in lib/krb5.c and lib/security.c but also
curl_sasl_sspi.c and curl_sasl_gssapi.c for the email protocols and
socks_gssapi.c & socks_sspi.c for SOCKS5 proxy specifics.
`curl_sasl_sspi.c` and `curl_sasl_gssapi.c` for the email protocols and
`socks_gssapi.c` and `socks_sspi.c` for SOCKS5 proxy specifics.
TELNET
<a name="telnet"></a>
TELNET
======
Telnet is implemented in lib/telnet.c.
FILE
<a name="file"></a>
FILE
====
The file:// protocol is dealt with in lib/file.c.
SMB
<a name="smb"></a>
SMB
===
The smb:// protocol is dealt with in lib/smb.c.
LDAP
<a name="ldap"></a>
LDAP
====
Everything LDAP is in lib/ldap.c and lib/openldap.c
E-mail
<a name="email"></a>
E-mail
======
The e-mail related source code is in lib/imap.c, lib/pop3.c and lib/smtp.c.
GENERAL
<a name="general"></a>
General
=======
URL encoding and decoding, called escaping and unescaping in the source code,
is found in lib/escape.c.
While transferring data in Transfer() a few functions might get used.
curl_getdate() in lib/parsedate.c is for HTTP date comparisons (and more).
`curl_getdate()` in lib/parsedate.c is for HTTP date comparisons (and more).
lib/getenv.c offers curl_getenv() which is for reading environment variables
in a neat platform independent way. That's used in the client, but also in
lib/url.c when checking the proxy environment variables. Note that contrary
to the normal unix getenv(), this returns an allocated buffer that must be
free()ed after use.
lib/getenv.c offers `curl_getenv()` which is for reading environment
variables in a neat platform independent way. That's used in the client, but
also in lib/url.c when checking the proxy environment variables. Note that
contrary to the normal unix getenv(), this returns an allocated buffer that
must be free()ed after use.
lib/netrc.c holds the .netrc parser
lib/timeval.c features replacement functions for systems that don't have
gettimeofday() and a few support functions for timeval conversions.
A function named curl_version() that returns the full curl version string is
found in lib/version.c.
A function named `curl_version()` that returns the full curl version string
is found in lib/version.c.
<a name="persistent"></a>
Persistent Connections
======================
The persistent connection support in libcurl requires some considerations on
how to do things inside of the library.
o The 'SessionHandle' struct returned in the curl_easy_init() call must never
hold connection-oriented data. It is meant to hold the root data as well as
all the options etc that the library-user may choose.
o The 'SessionHandle' struct holds the "connection cache" (an array of
- The 'SessionHandle' struct returned in the [`curl_easy_init()`][2] call
must never hold connection-oriented data. It is meant to hold the root data
as well as all the options etc that the library-user may choose.
- The 'SessionHandle' struct holds the "connection cache" (an array of
pointers to 'connectdata' structs).
o This enables the 'curl handle' to be reused on subsequent transfers.
o When libcurl is told to perform a transfer, it first checks for an already
- This enables the 'curl handle' to be reused on subsequent transfers.
- When libcurl is told to perform a transfer, it first checks for an already
existing connection in the cache that we can use. Otherwise it creates a
new one and adds that the cache. If the cache is full already when a new
connection is added added, it will first close the oldest unused one.
o When the transfer operation is complete, the connection is left
- When the transfer operation is complete, the connection is left
open. Particular options may tell libcurl not to, and protocols may signal
closure on connections and then they won't be kept open of course.
o When curl_easy_cleanup() is called, we close all still opened connections,
- When `curl_easy_cleanup()` is called, we close all still opened connections,
unless of course the multi interface "owns" the connections.
The curl handle must be re-used in order for the persistent connections to
work.
<a name="multi"></a>
multi interface/non-blocking
============================
@ -341,6 +427,7 @@ multi interface/non-blocking
protocols are crappy examples and they are subject for rewrite in the future
to better fit the libcurl protocol family.
<a name="ssl"></a>
SSL libraries
=============
@ -350,36 +437,39 @@ SSL libraries
in future libcurl versions.
To deal with this internally in the best way possible, we have a generic SSL
function API as provided by the vtls.[ch] system, and they are the only SSL
functions we must use from within libcurl. vtls is then crafted to use the
appropriate lower-level function calls to whatever SSL library that is in
function API as provided by the vtls/vtls.[ch] system, and they are the only
SSL functions we must use from within libcurl. vtls is then crafted to use
the appropriate lower-level function calls to whatever SSL library that is in
use. For example vtls/openssl.[ch] for the OpenSSL library.
<a name="symbols"></a>
Library Symbols
===============
All symbols used internally in libcurl must use a 'Curl_' prefix if they're
All symbols used internally in libcurl must use a `Curl_` prefix if they're
used in more than a single file. Single-file symbols must be made static.
Public ("exported") symbols must use a 'curl_' prefix. (There are exceptions,
Public ("exported") symbols must use a `curl_` prefix. (There are exceptions,
but they are to be changed to follow this pattern in future versions.) Public
API functions are marked with CURL_EXTERN in the public header files so that
all others can be hidden on platforms where this is possible.
API functions are marked with `CURL_EXTERN` in the public header files so
that all others can be hidden on platforms where this is possible.
<a name="returncodes"></a>
Return Codes and Informationals
===============================
I've made things simple. Almost every function in libcurl returns a CURLcode,
that must be CURLE_OK if everything is OK or otherwise a suitable error code
as the curl/curl.h include file defines. The very spot that detects an error
must use the Curl_failf() function to set the human-readable error
that must be `CURLE_OK` if everything is OK or otherwise a suitable error
code as the curl/curl.h include file defines. The very spot that detects an
error must use the `Curl_failf()` function to set the human-readable error
description.
In aiding the user to understand what's happening and to debug curl usage, we
must supply a fair amount of informational messages by using the Curl_infof()
function. Those messages are only displayed when the user explicitly asks for
them. They are best used when revealing information that isn't otherwise
obvious.
must supply a fair amount of informational messages by using the
`Curl_infof()` function. Those messages are only displayed when the user
explicitly asks for them. They are best used when revealing information that
isn't otherwise obvious.
<a name="abi"></a>
API/ABI
=======
@ -387,29 +477,31 @@ API/ABI
that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI
for our promise to users.
<a name="client"></a>
Client
======
main() resides in src/tool_main.c.
main() resides in `src/tool_main.c`.
src/tool_hugehelp.c is automatically generated by the mkhelp.pl perl script
`src/tool_hugehelp.c` is automatically generated by the mkhelp.pl perl script
to display the complete "manual" and the src/tool_urlglob.c file holds the
functions used for the URL-"globbing" support. Globbing in the sense that the
{} and [] expansion stuff is there.
The client mostly messes around to setup its 'config' struct properly, then
it calls the curl_easy_*() functions of the library and when it gets back
control after the curl_easy_perform() it cleans up the library, checks status
and exits.
it calls the `curl_easy_*()` functions of the library and when it gets back
control after the `curl_easy_perform()` it cleans up the library, checks
status and exits.
When the operation is done, the ourWriteOut() function in src/writeout.c may
be called to report about the operation. That function is using the
curl_easy_getinfo() function to extract useful information from the curl
`curl_easy_getinfo()` function to extract useful information from the curl
session.
It may loop and do all this several times if many URLs were specified on the
command line or config file.
<a name="memorydebug"></a>
Memory Debugging
================
@ -439,6 +531,7 @@ Memory Debugging
the configure script. When --enable-debug is given both features will be
enabled, unless some restriction prevents memory tracking from being used.
<a name="test"></a>
Test Suite
==========
@ -456,29 +549,315 @@ Test Suite
The test suite automatically detects if curl was built with the memory
debugging enabled, and if it was it will detect memory leaks, too.
Building Releases
=================
<a name="asyncdns"></a>
Asynchronous name resolves
==========================
There's no magic to this. When you consider everything stable enough to be
released, do this:
libcurl can be built to do name resolves asynchronously, using either the
normal resolver in a threaded manner or by using c-ares.
1. Tag the source code accordingly.
<a name="cares"></a>
[c-ares][3]
------
2. run the 'maketgz' script (using 'make distcheck' will give you a pretty
good view on the status of the current sources). maketgz requires a
version number and creates the release archive. maketgz uses 'make dist'
for the actual archive building, why you need to fill in the Makefile.am
files properly for which files that should be included in the release
archives.
### Build libcurl to use a c-ares
3. When that's complete, sign the output files.
1. ./configure --enable-ares=/path/to/ares/install
2. make
4. Upload
### c-ares on win32
5. Update web site and changelog on site
First I compiled c-ares. I changed the default C runtime library to be the
single-threaded rather than the multi-threaded (this seems to be required to
prevent linking errors later on). Then I simply build the areslib project
(the other projects adig/ahost seem to fail under MSVC).
6. Send announcement to the mailing lists
Next was libcurl. I opened lib/config-win32.h and I added a:
`#define USE_ARES 1`
NOTE: you must have curl checked out from git to be able to do a proper
release build. The release tarballs do not have everything setup in order to
do releases properly.
Next thing I did was I added the path for the ares includes to the include
path, and the libares.lib to the libraries.
Lastly, I also changed libcurl to be single-threaded rather than
multi-threaded, again this was to prevent some duplicate symbol errors. I'm
not sure why I needed to change everything to single-threaded, but when I
didn't I got redefinition errors for several CRT functions (malloc, stricmp,
etc.)
<a name="curl_off_t"></a>
`curl_off_t`
==========
curl_off_t is a data type provided by the external libcurl include
headers. It is the type meant to be used for the [`curl_easy_setopt()`][1]
options that end with LARGE. The type is 64bit large on most modern
platforms.
curlx
=====
The libcurl source code offers a few functions by source only. They are not
part of the official libcurl API, but the source files might be useful for
others so apps can optionally compile/build with these sources to gain
additional functions.
We provide them through a single header file for easy access for apps:
"curlx.h"
`curlx_strtoofft()`
-------------------
A macro that converts a string containing a number to a curl_off_t number.
This might use the curlx_strtoll() function which is provided as source
code in strtoofft.c. Note that the function is only provided if no
strtoll() (or equivalent) function exist on your platform. If curl_off_t
is only a 32 bit number on your platform, this macro uses strtol().
`curlx_tvnow()`
---------------
returns a struct timeval for the current time.
`curlx_tvdiff()`
--------------
returns the difference between two timeval structs, in number of
milliseconds.
`curlx_tvdiff_secs()`
---------------------
returns the same as curlx_tvdiff but with full usec resolution (as a
double)
Future
------
Several functions will be removed from the public curl_ name space in a
future libcurl release. They will then only become available as curlx_
functions instead. To make the transition easier, we already today provide
these functions with the curlx_ prefix to allow sources to get built properly
with the new function names. The functions this concerns are:
- `curlx_getenv`
- `curlx_strequal`
- `curlx_strnequal`
- `curlx_mvsnprintf`
- `curlx_msnprintf`
- `curlx_maprintf`
- `curlx_mvaprintf`
- `curlx_msprintf`
- `curlx_mprintf`
- `curlx_mfprintf`
- `curlx_mvsprintf`
- `curlx_mvprintf`
- `curlx_mvfprintf`
<a name="contentencoding"></a>
Content Encoding
================
## About content encodings
[HTTP/1.1][4] specifies that a client may request that a server encode its
response. This is usually used to compress a response using one of a set of
commonly available compression techniques. These schemes are 'deflate' (the
zlib algorithm), 'gzip' and 'compress'. A client requests that the sever
perform an encoding by including an Accept-Encoding header in the request
document. The value of the header should be one of the recognized tokens
'deflate', ... (there's a way to register new schemes/tokens, see sec 3.5 of
the spec). A server MAY honor the client's encoding request. When a response
is encoded, the server includes a Content-Encoding header in the
response. The value of the Content-Encoding header indicates which scheme was
used to encode the data.
A client may tell a server that it can understand several different encoding
schemes. In this case the server may choose any one of those and use it to
encode the response (indicating which one using the Content-Encoding header).
It's also possible for a client to attach priorities to different schemes so
that the server knows which it prefers. See sec 14.3 of RFC 2616 for more
information on the Accept-Encoding header.
## Supported content encodings
The 'deflate' and 'gzip' content encoding are supported by libcurl. Both
regular and chunked transfers work fine. The zlib library is required for
this feature.
## The libcurl interface
To cause libcurl to request a content encoding use:
[`curl_easy_setopt`][1](curl, [`CURLOPT_ACCEPT_ENCODING`][5], string)
where string is the intended value of the Accept-Encoding header.
Currently, libcurl only understands how to process responses that use the
"deflate" or "gzip" Content-Encoding, so the only values for
[`CURLOPT_ACCEPT_ENCODING`][5] that will work (besides "identity," which does
nothing) are "deflate" and "gzip" If a response is encoded using the
"compress" or methods, libcurl will return an error indicating that the
response could not be decoded. If <string> is NULL no Accept-Encoding header
is generated. If <string> is a zero-length string, then an Accept-Encoding
header containing all supported encodings will be generated.
The [`CURLOPT_ACCEPT_ENCODING`][5] must be set to any non-NULL value for
content to be automatically decoded. If it is not set and the server still
sends encoded content (despite not having been asked), the data is returned
in its raw form and the Content-Encoding type is not checked.
## The curl interface
Use the [--compressed][6] option with curl to cause it to ask servers to
compress responses using any format supported by curl.
<a name="hostip"></a>
hostip.c explained
==================
The main compile-time defines to keep in mind when reading the host*.c source
file are these:
## `CURLRES_IPV6`
this host has getaddrinfo() and family, and thus we use that. The host may
not be able to resolve IPv6, but we don't really have to take that into
account. Hosts that aren't IPv6-enabled have CURLRES_IPV4 defined.
## `CURLRES_ARES`
is defined if libcurl is built to use c-ares for asynchronous name
resolves. This can be Windows or *nix.
## `CURLRES_THREADED`
is defined if libcurl is built to use threading for asynchronous name
resolves. The name resolve will be done in a new thread, and the supported
asynch API will be the same as for ares-builds. This is the default under
(native) Windows.
If any of the two previous are defined, `CURLRES_ASYNCH` is defined too. If
libcurl is not built to use an asynchronous resolver, `CURLRES_SYNCH` is
defined.
## host*.c sources
The host*.c sources files are split up like this:
- hostip.c - method-independent resolver functions and utility functions
- hostasyn.c - functions for asynchronous name resolves
- hostsyn.c - functions for synchronous name resolves
- asyn-ares.c - functions for asynchronous name resolves using c-ares
- asyn-thread.c - functions for asynchronous name resolves using threads
- hostip4.c - IPv4 specific functions
- hostip6.c - IPv6 specific functions
The hostip.h is the single united header file for all this. It defines the
`CURLRES_*` defines based on the config*.h and curl_setup.h defines.
<a name="memoryleak"></a>
Track Down Memory Leaks
=======================
## Single-threaded
Please note that this memory leak system is not adjusted to work in more
than one thread. If you want/need to use it in a multi-threaded app. Please
adjust accordingly.
## Build
Rebuild libcurl with -DCURLDEBUG (usually, rerunning configure with
--enable-debug fixes this). 'make clean' first, then 'make' so that all
files actually are rebuilt properly. It will also make sense to build
libcurl with the debug option (usually -g to the compiler) so that debugging
it will be easier if you actually do find a leak in the library.
This will create a library that has memory debugging enabled.
## Modify Your Application
Add a line in your application code:
`curl_memdebug("dump");`
This will make the malloc debug system output a full trace of all resource
using functions to the given file name. Make sure you rebuild your program
and that you link with the same libcurl you built for this purpose as
described above.
## Run Your Application
Run your program as usual. Watch the specified memory trace file grow.
Make your program exit and use the proper libcurl cleanup functions etc. So
that all non-leaks are returned/freed properly.
## Analyze the Flow
Use the tests/memanalyze.pl perl script to analyze the dump file:
tests/memanalyze.pl dump
This now outputs a report on what resources that were allocated but never
freed etc. This report is very fine for posting to the list!
If this doesn't produce any output, no leak was detected in libcurl. Then
the leak is mostly likely to be in your code.
<a name="multi_socket"></a>
`multi_socket`
==============
Implementation of the `curl_multi_socket` API
The main ideas of this API are simply:
1 - The application can use whatever event system it likes as it gets info
from libcurl about what file descriptors libcurl waits for what action
on. (The previous API returns `fd_sets` which is very select()-centric).
2 - When the application discovers action on a single socket, it calls
libcurl and informs that there was action on this particular socket and
libcurl can then act on that socket/transfer only and not care about
any other transfers. (The previous API always had to scan through all
the existing transfers.)
The idea is that [`curl_multi_socket_action()`][7] calls a given callback
with information about what socket to wait for what action on, and the
callback only gets called if the status of that socket has changed.
We also added a timer callback that makes libcurl call the application when
the timeout value changes, and you set that with [`curl_multi_setopt()`][9]
and the [`CURLMOPT_TIMERFUNCTION`][10] option. To get this to work,
Internally, there's an added a struct to each easy handle in which we store
an "expire time" (if any). The structs are then "splay sorted" so that we
can add and remove times from the linked list and yet somewhat swiftly
figure out both how long time there is until the next nearest timer expires
and which timer (handle) we should take care of now. Of course, the upside
of all this is that we get a [`curl_multi_timeout()`][8] that should also
work with old-style applications that use [`curl_multi_perform()`][11].
We created an internal "socket to easy handles" hash table that given
a socket (file descriptor) return the easy handle that waits for action on
that socket. This hash is made using the already existing hash code
(previously only used for the DNS cache).
To make libcurl able to report plain sockets in the socket callback, we had
to re-organize the internals of the [`curl_multi_fdset()`][12] etc so that
the conversion from sockets to `fd_sets` for that function is only done in
the last step before the data is returned. I also had to extend c-ares to
get a function that can return plain sockets, as that library too returned
only `fd_sets` and that is no longer good enough. The changes done to c-ares
are available in c-ares 1.3.1 and later.
[1]: http://curl.haxx.se/libcurl/c/curl_easy_setopt.html
[2]: http://curl.haxx.se/libcurl/c/curl_easy_init.html
[3]: http://c-ares.haxx.se/
[4]: https://tools.ietf.org/html/rfc7230 "RFC 7230"
[5]: http://curl.haxx.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html
[6]: http://curl.haxx.se/docs/manpage.html#--compressed
[7]: http://curl.haxx.se/libcurl/c/curl_multi_socket_action.html
[8]: http://curl.haxx.se/libcurl/c/curl_multi_timeout.html
[9]: http://curl.haxx.se/libcurl/c/curl_multi_setopt.html
[10]: http://curl.haxx.se/libcurl/c/CURLMOPT_TIMERFUNCTION.html
[11]: http://curl.haxx.se/libcurl/c/curl_multi_perform.html
[12]: http://curl.haxx.se/libcurl/c/curl_multi_fdset.html

View File

@ -21,9 +21,6 @@
###########################################################################
AUTOMAKE_OPTIONS = foreign nostdinc
DOCS = README.encoding README.memoryleak README.ares README.curlx \
README.hostip README.multi_socket README.httpauth README.curl_off_t
CMAKE_DIST = CMakeLists.txt curl_config.h.cmake
EXTRA_DIST = Makefile.b32 Makefile.m32 Makefile.vc6 config-win32.h \
@ -31,7 +28,7 @@ EXTRA_DIST = Makefile.b32 Makefile.m32 Makefile.vc6 config-win32.h \
makefile.dj config-dos.h libcurl.plist libcurl.rc config-amigaos.h \
makefile.amiga Makefile.netware nwlib.c nwos.c config-win32ce.h \
config-os400.h setup-os400.h config-symbian.h Makefile.Watcom \
config-tpf.h $(DOCS) mk-ca-bundle.pl mk-ca-bundle.vbs $(CMAKE_DIST) \
config-tpf.h mk-ca-bundle.pl mk-ca-bundle.vbs $(CMAKE_DIST) \
firefox-db2pem.sh config-vxworks.h Makefile.vxworks checksrc.pl \
objnames-test08.sh objnames-test10.sh objnames.inc checksrc.whitelist

View File

@ -1,69 +0,0 @@
_ _ ____ _
___| | | | _ \| |
/ __| | | | |_) | |
| (__| |_| | _ <| |___
\___|\___/|_| \_\_____|
How To Build libcurl to Use c-ares For Asynch Name Resolves
===========================================================
c-ares:
http://c-ares.haxx.se/
NOTE
The latest libcurl version requires c-ares 1.6.0 or later.
Once upon the time libcurl built fine with the "original" ares. That is no
longer true. You need to use c-ares.
Build c-ares
============
1. unpack the c-ares archive
2. cd c-ares-dir
3. ./configure
4. make
5. make install
Build libcurl to use c-ares in the curl source tree
===================================================
1. name or symlink the c-ares source directory 'ares' in the curl source
directory
2. ./configure --enable-ares
Optionally, you can point out the c-ares install tree root with the the
--enable-ares option.
3. make
Build libcurl to use an installed c-ares
========================================
1. ./configure --enable-ares=/path/to/ares/install
2. make
c-ares on win32
===============
(description brought by Dominick Meglio)
First I compiled c-ares. I changed the default C runtime library to be the
single-threaded rather than the multi-threaded (this seems to be required to
prevent linking errors later on). Then I simply build the areslib project (the
other projects adig/ahost seem to fail under MSVC).
Next was libcurl. I opened lib/config-win32.h and I added a:
#define USE_ARES 1
Next thing I did was I added the path for the ares includes to the include
path, and the libares.lib to the libraries.
Lastly, I also changed libcurl to be single-threaded rather than
multi-threaded, again this was to prevent some duplicate symbol errors. I'm
not sure why I needed to change everything to single-threaded, but when I
didn't I got redefinition errors for several CRT functions (malloc, stricmp,
etc.)
I would have modified the MSVC++ project files, but I only have VC.NET and it
uses a different format than VC6.0 so I didn't want to go and change
everything and remove VC6.0 support from libcurl.

View File

@ -1,68 +0,0 @@
curl_off_t explained
====================
curl_off_t is a data type provided by the external libcurl include headers. It
is the type meant to be used for the curl_easy_setopt() options that end with
LARGE. The type is 64bit large on most modern platforms.
Transition from < 7.19.0 to >= 7.19.0
-------------------------------------
Applications that used libcurl before 7.19.0 that are rebuilt with a libcurl
that is 7.19.0 or later may or may not have to worry about anything of
this. We have made a significant effort to make the transition really seamless
and transparent.
You have have to take notice if you are in one of the following situations:
o Your app is using or will after the transition use a libcurl that is built
with LFS (large file support) disabled even though your system otherwise
supports it.
o Your app is using or will after the transition use a libcurl that doesn't
support LFS at all, but your system and compiler support 64bit data types.
In both these cases, the curl_off_t type will now (after the transition) be
64bit where it previously was 32bit. This will cause a binary incompatibility
that you MAY need to deal with.
Benefits
--------
This new way has several benefits:
o Platforms without LFS support can still use libcurl to do >32 bit file
transfers and range operations etc as long as they have >32 bit data-types
supported.
o Applications will no longer easily build with the curl_off_t size
mismatched, which has been a very frequent (and annoying) problem with
libcurl <= 7.18.2
Historically
------------
Previously, before 7.19.0, the curl_off_t type would be rather strongly
connected to the size of the system off_t type, where currently curl_off_t is
independent of that.
The strong connection to off_t made it troublesome for application authors
since when they did mistakes, they could get curl_off_t type of different
sizes in the app vs libcurl, and that caused strange effects that were hard to
track and detect by users of libcurl.
SONAME
------
We opted to not bump the soname for the library unconditionally, simply
because soname bumping is causing a lot of grief and moaning all over the
community so we try to keep that at minimum. Also, our selected design path
should be 100% backwards compatible for the vast majority of all libcurl
users.
Enforce SONAME bump
-------------------
If configure doesn't detect your case where a bump is necessary, re-run it
with the --enable-soname-bump command line option!

View File

@ -1,61 +0,0 @@
_ _ ____ _
___| | | | _ \| |
/ __| | | | |_) | |
| (__| |_| | _ <| |___
\___|\___/|_| \_\_____|
Source Code Functions Apps Might Use
====================================
The libcurl source code offers a few functions by source only. They are not
part of the official libcurl API, but the source files might be useful for
others so apps can optionally compile/build with these sources to gain
additional functions.
We provide them through a single header file for easy access for apps:
"curlx.h"
curlx_strtoofft()
A macro that converts a string containing a number to a curl_off_t number.
This might use the curlx_strtoll() function which is provided as source
code in strtoofft.c. Note that the function is only provided if no
strtoll() (or equivalent) function exist on your platform. If curl_off_t
is only a 32 bit number on your platform, this macro uses strtol().
curlx_tvnow()
returns a struct timeval for the current time.
curlx_tvdiff()
returns the difference between two timeval structs, in number of
milliseconds.
curlx_tvdiff_secs()
returns the same as curlx_tvdiff but with full usec resolution (as a
double)
FUTURE
======
Several functions will be removed from the public curl_ name space in a
future libcurl release. They will then only become available as curlx_
functions instead. To make the transition easier, we already today provide
these functions with the curlx_ prefix to allow sources to get built properly
with the new function names. The functions this concerns are:
curlx_getenv
curlx_strequal
curlx_strnequal
curlx_mvsnprintf
curlx_msnprintf
curlx_maprintf
curlx_mvaprintf
curlx_msprintf
curlx_mprintf
curlx_mfprintf
curlx_mvsprintf
curlx_mvprintf
curlx_mvfprintf

View File

@ -1,60 +0,0 @@
Content Encoding Support for libcurl
* About content encodings:
HTTP/1.1 [RFC 2616] specifies that a client may request that a server encode
its response. This is usually used to compress a response using one of a set
of commonly available compression techniques. These schemes are `deflate' (the
zlib algorithm), `gzip' and `compress' [sec 3.5, RFC 2616]. A client requests
that the sever perform an encoding by including an Accept-Encoding header in
the request document. The value of the header should be one of the recognized
tokens `deflate', ... (there's a way to register new schemes/tokens, see sec
3.5 of the spec). A server MAY honor the client's encoding request. When a
response is encoded, the server includes a Content-Encoding header in the
response. The value of the Content-Encoding header indicates which scheme was
used to encode the data.
A client may tell a server that it can understand several different encoding
schemes. In this case the server may choose any one of those and use it to
encode the response (indicating which one using the Content-Encoding header).
It's also possible for a client to attach priorities to different schemes so
that the server knows which it prefers. See sec 14.3 of RFC 2616 for more
information on the Accept-Encoding header.
* Current support for content encoding:
Support for the 'deflate' and 'gzip' content encoding are supported by
libcurl. Both regular and chunked transfers should work fine. The library
zlib is required for this feature. 'deflate' support was added by James
Gallagher, and support for the 'gzip' encoding was added by Dan Fandrich.
* The libcurl interface:
To cause libcurl to request a content encoding use:
curl_easy_setopt(curl, CURLOPT_ACCEPT_ENCODING, <string>)
where <string> is the intended value of the Accept-Encoding header.
Currently, libcurl only understands how to process responses that use the
"deflate" or "gzip" Content-Encoding, so the only values for
CURLOPT_ACCEPT_ENCODING that will work (besides "identity," which does
nothing) are "deflate" and "gzip" If a response is encoded using the
"compress" or methods, libcurl will return an error indicating that the
response could not be decoded. If <string> is NULL no Accept-Encoding header
is generated. If <string> is a zero-length string, then an Accept-Encoding
header containing all supported encodings will be generated.
The CURLOPT_ACCEPT_ENCODING must be set to any non-NULL value for content to
be automatically decoded. If it is not set and the server still sends encoded
content (despite not having been asked), the data is returned in its raw form
and the Content-Encoding type is not checked.
* The curl interface:
Use the --compressed option with curl to cause it to ask servers to compress
responses using any format supported by curl.
James Gallagher <jgallagher@gso.uri.edu>
Dan Fandrich <dan@coneharvesters.com>

View File

@ -1,35 +0,0 @@
hostip.c explained
==================
The main COMPILE-TIME DEFINES to keep in mind when reading the host*.c
source file are these:
CURLRES_IPV6 - this host has getaddrinfo() and family, and thus we use
that. The host may not be able to resolve IPv6, but we don't really have to
take that into account. Hosts that aren't IPv6-enabled have CURLRES_IPV4
defined.
CURLRES_ARES - is defined if libcurl is built to use c-ares for asynchronous
name resolves. This can be Windows or *nix.
CURLRES_THREADED - is defined if libcurl is built to use threading for
asynchronous name resolves. The name resolve will be done in a new thread,
and the supported asynch API will be the same as for ares-builds. This is
the default under (native) Windows.
If any of the two previous are defined, CURLRES_ASYNCH is defined too. If
libcurl is not built to use an asynchronous resolver, CURLRES_SYNCH is
defined.
The host*.c sources files are split up like this:
hostip.c - method-independent resolver functions and utility functions
hostasyn.c - functions for asynchronous name resolves
hostsyn.c - functions for synchronous name resolves
asyn-ares.c - functions for asynchronous name resolves using c-ares
asyn-thread.c - functions for asynchronous name resolves using threads
hostip4.c - IPv4 specific functions
hostip6.c - IPv6 specific functions
The hostip.h is the single united header file for all this. It defines the
CURLRES_* defines based on the config*.h and curl_setup.h defines.

View File

@ -1,74 +0,0 @@
1. PUT/POST without a known auth to use (possibly no auth required):
(When explicitly set to use a multi-pass auth when doing a POST/PUT,
libcurl should immediately go the Content-Length: 0 bytes route to avoid
the first send all data phase, step 2. If told to use a single-pass auth,
goto step 3.)
Issue the proper PUT/POST request immediately, with the correct
Content-Length and Expect: headers.
If a 100 response is received or the wait for one times out, start sending
the request-body.
If a 401 (or 407 when talking through a proxy) is received, then:
If we have "more than just a little" data left to send, close the
connection. Exactly what "more than just a little" means will have to be
determined. Possibly the current transfer speed should be taken into
account as well.
NOTE: if the size of the POST data is less than MAX_INITIAL_POST_SIZE (when
CURLOPT_POSTFIELDS is used), libcurl will send everything in one single
write() (all request-headers and request-body) and thus it will
unconditionally send the full post data here.
2. PUT/POST with multi-pass auth but not yet completely negotiated:
Send a PUT/POST request, we know that it will be rejected and thus we claim
Content-Length zero to avoid having to send the request-body. (This seems
to be what IE does.)
3. PUT/POST as the last step in the auth negotiation, that is when we have
what we believe is a completed negotiation:
Send a full and proper PUT/POST request (again) with the proper
Content-Length and a following request-body.
NOTE: this may very well be the second (or even third) time the whole or at
least parts of the request body is sent to the server. Since the data may
be provided to libcurl with a callback, we need a way to tell the app that
the upload is to be restarted so that the callback will provide data from
the start again. This requires an API method/mechanism that libcurl
doesn't have today. See below.
Data Rewind
It will be troublesome for some apps to deal with a rewind like this in all
circumstances. I'm thinking for example when using 'curl' to upload data
from stdin. If libcurl ends up having to rewind the reading for a request
to succeed, of course a lack of this callback or if it returns failure, will
cause the request to fail completely.
The new callback is set with CURLOPT_IOCTLFUNCTION (in an attempt to add a
more generic function that might be used for other IO-related controls in
the future):
curlioerr curl_ioctl(CURL *handle, curliocmd cmd, void *clientp);
And in the case where the read is to be rewinded, it would be called with a
cmd named CURLIOCMD_RESTARTREAD. The callback would then return CURLIOE_OK,
if things are fine, or CURLIOE_FAILRESTART if not.
Backwards Compatibility
The approach used until now, that issues a HEAD on the given URL to trigger
the auth negotiation could still be supported and encouraged, but it would
be up to the app to first fetch a URL with GET/HEAD to negotiate on, since
then a following PUT/POST wouldn't need to negotiate authentication and
thus avoid double-sending data.
Optionally, we keep the current approach if some option is set
(CURLOPT_HEADBEFOREAUTH or similar), since it seems to work fairly well for
POST on most servers.

View File

@ -1,55 +0,0 @@
_ _ ____ _
___| | | | _ \| |
/ __| | | | |_) | |
| (__| |_| | _ <| |___
\___|\___/|_| \_\_____|
How To Track Down Suspected Memory Leaks in libcurl
===================================================
Single-threaded
Please note that this memory leak system is not adjusted to work in more
than one thread. If you want/need to use it in a multi-threaded app. Please
adjust accordingly.
Build
Rebuild libcurl with -DCURLDEBUG (usually, rerunning configure with
--enable-debug fixes this). 'make clean' first, then 'make' so that all
files actually are rebuilt properly. It will also make sense to build
libcurl with the debug option (usually -g to the compiler) so that debugging
it will be easier if you actually do find a leak in the library.
This will create a library that has memory debugging enabled.
Modify Your Application
Add a line in your application code:
curl_memdebug("dump");
This will make the malloc debug system output a full trace of all resource
using functions to the given file name. Make sure you rebuild your program
and that you link with the same libcurl you built for this purpose as
described above.
Run Your Application
Run your program as usual. Watch the specified memory trace file grow.
Make your program exit and use the proper libcurl cleanup functions etc. So
that all non-leaks are returned/freed properly.
Analyze the Flow
Use the tests/memanalyze.pl perl script to analyze the dump file:
tests/memanalyze.pl dump
This now outputs a report on what resources that were allocated but never
freed etc. This report is very fine for posting to the list!
If this doesn't produce any output, no leak was detected in libcurl. Then
the leak is mostly likely to be in your code.

View File

@ -1,53 +0,0 @@
Implementation of the curl_multi_socket API
The main ideas of the new API are simply:
1 - The application can use whatever event system it likes as it gets info
from libcurl about what file descriptors libcurl waits for what action
on. (The previous API returns fd_sets which is very select()-centric).
2 - When the application discovers action on a single socket, it calls
libcurl and informs that there was action on this particular socket and
libcurl can then act on that socket/transfer only and not care about
any other transfers. (The previous API always had to scan through all
the existing transfers.)
The idea is that curl_multi_socket_action() calls a given callback with
information about what socket to wait for what action on, and the callback
only gets called if the status of that socket has changed.
We also added a timer callback that makes libcurl call the application when
the timeout value changes, and you set that with curl_multi_setopt() and the
CURLMOPT_TIMERFUNCTION option. To get this to work, Internally, there's an
added a struct to each easy handle in which we store an "expire time" (if
any). The structs are then "splay sorted" so that we can add and remove
times from the linked list and yet somewhat swiftly figure out both how long
time there is until the next nearest timer expires and which timer (handle)
we should take care of now. Of course, the upside of all this is that we get
a curl_multi_timeout() that should also work with old-style applications
that use curl_multi_perform().
We created an internal "socket to easy handles" hash table that given
a socket (file descriptor) return the easy handle that waits for action on
that socket. This hash is made using the already existing hash code
(previously only used for the DNS cache).
To make libcurl able to report plain sockets in the socket callback, we had
to re-organize the internals of the curl_multi_fdset() etc so that the
conversion from sockets to fd_sets for that function is only done in the
last step before the data is returned. I also had to extend c-ares to get a
function that can return plain sockets, as that library too returned only
fd_sets and that is no longer good enough. The changes done to c-ares are
available in c-ares 1.3.1 and later.
We have done a test runs with up to 9000 connections (with a single active
one). The curl_multi_socket_action() invoke then takes less than 10
microseconds in average (using the read-only-1-byte-at-a-time hack). We are
now below the 60 microseconds "per socket action" goal (the extra 50 is the
time libevent needs).
Documentation
http://curl.haxx.se/libcurl/c/curl_multi_socket_action.html
http://curl.haxx.se/libcurl/c/curl_multi_timeout.html
http://curl.haxx.se/libcurl/c/curl_multi_setopt.html