mirror of
https://github.com/moparisthebest/curl
synced 2025-01-10 13:38:04 -05:00
84e7ea2ffc
The INTERNALS document suggested that compatibility should be maintained with perl version 4, but this was untrue - scripts such as chksource.pl and runtests.pl use perl5-isms.
506 lines
21 KiB
Plaintext
506 lines
21 KiB
Plaintext
_ _ ____ _
|
|
___| | | | _ \| |
|
|
/ __| | | | |_) | |
|
|
| (__| |_| | _ <| |___
|
|
\___|\___/|_| \_\_____|
|
|
|
|
INTERNALS
|
|
|
|
The project is split in two. The library and the client. The client part uses
|
|
the library, but the library is designed to allow other applications to use
|
|
it.
|
|
|
|
The largest amount of code and complexity is in the library part.
|
|
|
|
GIT
|
|
===
|
|
All changes to the sources are committed to the git repository as soon as
|
|
they're somewhat verified to work. Changes shall be committed as independently
|
|
as possible so that individual changes can be easier spotted and tracked
|
|
afterwards.
|
|
|
|
Tagging shall be used extensively, and by the time we release new archives we
|
|
should tag the sources with a name similar to the released version number.
|
|
|
|
Portability
|
|
===========
|
|
|
|
We write curl and libcurl to compile with C89 compilers. On 32bit and up
|
|
machines. Most of libcurl assumes more or less POSIX compliance but that's
|
|
not a requirement.
|
|
|
|
We write libcurl to build and work with lots of third party tools, and we
|
|
want it to remain functional and buildable with these and later versions
|
|
(older versions may still work but is not what we work hard to maintain):
|
|
|
|
OpenSSL 0.9.6
|
|
GnuTLS 1.2
|
|
zlib 1.1.4
|
|
libssh2 0.16
|
|
c-ares 1.6.0
|
|
libidn 0.4.1
|
|
cyassl 1.4.0
|
|
openldap 2.0
|
|
MIT krb5 lib 1.2.4
|
|
qsossl V5R2M0
|
|
NSS 3.11.x
|
|
axTLS 1.2.7
|
|
Heimdal ?
|
|
|
|
* = only partly functional, but that's due to bugs in the third party lib, not
|
|
because of libcurl code
|
|
|
|
On systems where configure runs, we aim at working on them all - if they have
|
|
a suitable C compiler. On systems that don't run configure, we strive to keep
|
|
curl running fine on:
|
|
|
|
Windows 98
|
|
AS/400 V5R2M0
|
|
Symbian 9.1
|
|
Windows CE ?
|
|
TPF ?
|
|
|
|
When writing code (mostly for generating stuff included in release tarballs)
|
|
we use a few "build tools" and we make sure that we remain functional with
|
|
these versions:
|
|
|
|
GNU Libtool 1.4.2
|
|
GNU Autoconf 2.57
|
|
GNU Automake 1.7 (we currently avoid 1.10 due to Solaris-related bugs)
|
|
GNU M4 1.4
|
|
perl 5.004
|
|
roffit 0.5
|
|
groff ? (any version that supports "groff -Tps -man [in] [out]")
|
|
ps2pdf (gs) ?
|
|
|
|
Windows vs Unix
|
|
===============
|
|
|
|
There are a few differences in how to program curl the unix way compared to
|
|
the Windows way. The four perhaps most notable details are:
|
|
|
|
1. Different function names for socket operations.
|
|
|
|
In curl, this is solved with defines and macros, so that the source looks
|
|
the same at all places except for the header file that defines them. The
|
|
macros in use are sclose(), sread() and swrite().
|
|
|
|
2. Windows requires a couple of init calls for the socket stuff.
|
|
|
|
That's taken care of by the curl_global_init() call, but if other libs also
|
|
do it etc there might be reasons for applications to alter that behaviour.
|
|
|
|
3. The file descriptors for network communication and file operations are
|
|
not easily interchangeable as in unix.
|
|
|
|
We avoid this by not trying any funny tricks on file descriptors.
|
|
|
|
4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus
|
|
destroying binary data, although you do want that conversion if it is
|
|
text coming through... (sigh)
|
|
|
|
We set stdout to binary under windows
|
|
|
|
Inside the source code, We make an effort to avoid '#ifdef [Your OS]'. All
|
|
conditionals that deal with features *should* instead be in the format
|
|
'#ifdef HAVE_THAT_WEIRD_FUNCTION'. Since Windows can't run configure scripts,
|
|
we maintain two curl_config-win32.h files (one in lib/ and one in src/) that
|
|
are supposed to look exactly as a curl_config.h file would have looked like on
|
|
a Windows machine!
|
|
|
|
Generally speaking: always remember that this will be compiled on dozens of
|
|
operating systems. Don't walk on the edge.
|
|
|
|
Library
|
|
=======
|
|
|
|
There are plenty of entry points to the library, namely each publicly defined
|
|
function that libcurl offers to applications. All of those functions are
|
|
rather small and easy-to-follow. All the ones prefixed with 'curl_easy' are
|
|
put in the lib/easy.c file.
|
|
|
|
curl_global_init_() and curl_global_cleanup() should be called by the
|
|
application to initialize and clean up global stuff in the library. As of
|
|
today, it can handle the global SSL initing if SSL is enabled and it can init
|
|
the socket layer on windows machines. libcurl itself has no "global" scope.
|
|
|
|
All printf()-style functions use the supplied clones in lib/mprintf.c. This
|
|
makes sure we stay absolutely platform independent.
|
|
|
|
curl_easy_init() allocates an internal struct and makes some initializations.
|
|
The returned handle does not reveal internals. This is the 'SessionHandle'
|
|
struct which works as an "anchor" struct for all curl_easy functions. All
|
|
connections performed will get connect-specific data allocated that should be
|
|
used for things related to particular connections/requests.
|
|
|
|
curl_easy_setopt() takes three arguments, where the option stuff must be
|
|
passed in pairs: the parameter-ID and the parameter-value. The list of
|
|
options is documented in the man page. This function mainly sets things in
|
|
the 'SessionHandle' struct.
|
|
|
|
curl_easy_perform() does a whole lot of things:
|
|
|
|
It starts off in the lib/easy.c file by calling Curl_perform() and the main
|
|
work then continues in lib/url.c. The flow continues with a call to
|
|
Curl_connect() to connect to the remote site.
|
|
|
|
o Curl_connect()
|
|
|
|
... analyzes the URL, it separates the different components and connects to
|
|
the remote host. This may involve using a proxy and/or using SSL. The
|
|
Curl_resolv() function in lib/hostip.c is used for looking up host names
|
|
(it does then use the proper underlying method, which may vary between
|
|
platforms and builds).
|
|
|
|
When Curl_connect is done, we are connected to the remote site. Then it is
|
|
time to tell the server to get a document/file. Curl_do() arranges this.
|
|
|
|
This function makes sure there's an allocated and initiated 'connectdata'
|
|
struct that is used for this particular connection only (although there may
|
|
be several requests performed on the same connect). A bunch of things are
|
|
inited/inherited from the SessionHandle struct.
|
|
|
|
o Curl_do()
|
|
|
|
Curl_do() makes sure the proper protocol-specific function is called. The
|
|
functions are named after the protocols they handle. Curl_ftp(),
|
|
Curl_http(), Curl_dict(), etc. They all reside in their respective files
|
|
(ftp.c, http.c and dict.c). HTTPS is handled by Curl_http() and FTPS by
|
|
Curl_ftp().
|
|
|
|
The protocol-specific functions of course deal with protocol-specific
|
|
negotiations and setup. They have access to the Curl_sendf() (from
|
|
lib/sendf.c) function to send printf-style formatted data to the remote
|
|
host and when they're ready to make the actual file transfer they call the
|
|
Curl_Transfer() function (in lib/transfer.c) to setup the transfer and
|
|
returns.
|
|
|
|
If this DO function fails and the connection is being re-used, libcurl will
|
|
then close this connection, setup a new connection and re-issue the DO
|
|
request on that. This is because there is no way to be perfectly sure that
|
|
we have discovered a dead connection before the DO function and thus we
|
|
might wrongly be re-using a connection that was closed by the remote peer.
|
|
|
|
Some time during the DO function, the Curl_setup_transfer() function must
|
|
be called with some basic info about the upcoming transfer: what socket(s)
|
|
to read/write and the expected file transfer sizes (if known).
|
|
|
|
o Transfer()
|
|
|
|
Curl_perform() then calls Transfer() in lib/transfer.c that performs the
|
|
entire file transfer.
|
|
|
|
During transfer, the progress functions in lib/progress.c are called at a
|
|
frequent interval (or at the user's choice, a specified callback might get
|
|
called). The speedcheck functions in lib/speedcheck.c are also used to
|
|
verify that the transfer is as fast as required.
|
|
|
|
o Curl_done()
|
|
|
|
Called after a transfer is done. This function takes care of everything
|
|
that has to be done after a transfer. This function attempts to leave
|
|
matters in a state so that Curl_do() should be possible to call again on
|
|
the same connection (in a persistent connection case). It might also soon
|
|
be closed with Curl_disconnect().
|
|
|
|
o Curl_disconnect()
|
|
|
|
When doing normal connections and transfers, no one ever tries to close any
|
|
connections so this is not normally called when curl_easy_perform() is
|
|
used. This function is only used when we are certain that no more transfers
|
|
is going to be made on the connection. It can be also closed by force, or
|
|
it can be called to make sure that libcurl doesn't keep too many
|
|
connections alive at the same time (there's a default amount of 5 but that
|
|
can be changed with the CURLOPT_MAXCONNECTS option).
|
|
|
|
This function cleans up all resources that are associated with a single
|
|
connection.
|
|
|
|
Curl_perform() is the function that does the main "connect - do - transfer -
|
|
done" loop. It loops if there's a Location: to follow.
|
|
|
|
When completed, the curl_easy_cleanup() should be called to free up used
|
|
resources. It runs Curl_disconnect() on all open connectons.
|
|
|
|
A quick roundup on internal function sequences (many of these call
|
|
protocol-specific function-pointers):
|
|
|
|
Curl_connect - connects to a remote site and does initial connect fluff
|
|
This also checks for an existing connection to the requested site and uses
|
|
that one if it is possible.
|
|
|
|
Curl_do - starts a transfer
|
|
Curl_handler::do_it() - transfers data
|
|
Curl_done - ends a transfer
|
|
|
|
Curl_disconnect - disconnects from a remote site. This is called when the
|
|
disconnect is really requested, which doesn't necessarily have to be
|
|
exactly after curl_done in case we want to keep the connection open for
|
|
a while.
|
|
|
|
HTTP(S)
|
|
|
|
HTTP offers a lot and is the protocol in curl that uses the most lines of
|
|
code. There is a special file (lib/formdata.c) that offers all the multipart
|
|
post functions.
|
|
|
|
base64-functions for user+password stuff (and more) is in (lib/base64.c) and
|
|
all functions for parsing and sending cookies are found in (lib/cookie.c).
|
|
|
|
HTTPS uses in almost every means the same procedure as HTTP, with only two
|
|
exceptions: the connect procedure is different and the function used to read
|
|
or write from the socket is different, although the latter fact is hidden in
|
|
the source by the use of Curl_read() for reading and Curl_write() for writing
|
|
data to the remote server.
|
|
|
|
http_chunks.c contains functions that understands HTTP 1.1 chunked transfer
|
|
encoding.
|
|
|
|
An interesting detail with the HTTP(S) request, is the Curl_add_buffer()
|
|
series of functions we use. They append data to one single buffer, and when
|
|
the building is done the entire request is sent off in one single write. This
|
|
is done this way to overcome problems with flawed firewalls and lame servers.
|
|
|
|
FTP
|
|
|
|
The Curl_if2ip() function can be used for getting the IP number of a
|
|
specified network interface, and it resides in lib/if2ip.c.
|
|
|
|
Curl_ftpsendf() is used for sending FTP commands to the remote server. It was
|
|
made a separate function to prevent us programmers from forgetting that they
|
|
must be CRLF terminated. They must also be sent in one single write() to make
|
|
firewalls and similar happy.
|
|
|
|
Kerberos
|
|
|
|
The kerberos support is mainly in lib/krb4.c and lib/security.c.
|
|
|
|
TELNET
|
|
|
|
Telnet is implemented in lib/telnet.c.
|
|
|
|
FILE
|
|
|
|
The file:// protocol is dealt with in lib/file.c.
|
|
|
|
LDAP
|
|
|
|
Everything LDAP is in lib/ldap.c and lib/openldap.c
|
|
|
|
GENERAL
|
|
|
|
URL encoding and decoding, called escaping and unescaping in the source code,
|
|
is found in lib/escape.c.
|
|
|
|
While transferring data in Transfer() a few functions might get used.
|
|
curl_getdate() in lib/parsedate.c is for HTTP date comparisons (and more).
|
|
|
|
lib/getenv.c offers curl_getenv() which is for reading environment variables
|
|
in a neat platform independent way. That's used in the client, but also in
|
|
lib/url.c when checking the proxy environment variables. Note that contrary
|
|
to the normal unix getenv(), this returns an allocated buffer that must be
|
|
free()ed after use.
|
|
|
|
lib/netrc.c holds the .netrc parser
|
|
|
|
lib/timeval.c features replacement functions for systems that don't have
|
|
gettimeofday() and a few support functions for timeval conversions.
|
|
|
|
A function named curl_version() that returns the full curl version string is
|
|
found in lib/version.c.
|
|
|
|
Persistent Connections
|
|
======================
|
|
|
|
The persistent connection support in libcurl requires some considerations on
|
|
how to do things inside of the library.
|
|
|
|
o The 'SessionHandle' struct returned in the curl_easy_init() call must never
|
|
hold connection-oriented data. It is meant to hold the root data as well as
|
|
all the options etc that the library-user may choose.
|
|
o The 'SessionHandle' struct holds the "connection cache" (an array of
|
|
pointers to 'connectdata' structs). There's one connectdata struct
|
|
allocated for each connection that libcurl knows about. Note that when you
|
|
use the multi interface, the multi handle will hold the connection cache
|
|
and not the particular easy handle. This of course to allow all easy handles
|
|
in a multi stack to be able to share and re-use connections.
|
|
o This enables the 'curl handle' to be reused on subsequent transfers.
|
|
o When we are about to perform a transfer with curl_easy_perform(), we first
|
|
check for an already existing connection in the cache that we can use,
|
|
otherwise we create a new one and add to the cache. If the cache is full
|
|
already when we add a new connection, we close one of the present ones. We
|
|
select which one to close dependent on the close policy that may have been
|
|
previously set.
|
|
o When the transfer operation is complete, we try to leave the connection
|
|
open. Particular options may tell us not to, and protocols may signal
|
|
closure on connections and then we don't keep it open of course.
|
|
o When curl_easy_cleanup() is called, we close all still opened connections,
|
|
unless of course the multi interface "owns" the connections.
|
|
|
|
You do realize that the curl handle must be re-used in order for the
|
|
persistent connections to work.
|
|
|
|
multi interface/non-blocking
|
|
============================
|
|
|
|
We make an effort to provide a non-blocking interface to the library, the
|
|
multi interface. To make that interface work as good as possible, no
|
|
low-level functions within libcurl must be written to work in a blocking
|
|
manner.
|
|
|
|
One of the primary reasons we introduced c-ares support was to allow the name
|
|
resolve phase to be perfectly non-blocking as well.
|
|
|
|
The ultimate goal is to provide the easy interface simply by wrapping the
|
|
multi interface functions and thus treat everything internally as the multi
|
|
interface is the single interface we have.
|
|
|
|
The FTP and the SFTP/SCP protocols are thus perfect examples of how we adapt
|
|
and adjust the code to allow non-blocking operations even on multi-stage
|
|
protocols. They are built around state machines that return when they could
|
|
block waiting for data. The DICT, LDAP and TELNET protocols are crappy
|
|
examples and they are subject for rewrite in the future to better fit the
|
|
libcurl protocol family.
|
|
|
|
SSL libraries
|
|
=============
|
|
|
|
Originally libcurl supported SSLeay for SSL/TLS transports, but that was then
|
|
extended to its successor OpenSSL but has since also been extended to several
|
|
other SSL/TLS libraries and we expect and hope to further extend the support
|
|
in future libcurl versions.
|
|
|
|
To deal with this internally in the best way possible, we have a generic SSL
|
|
function API as provided by the sslgen.[ch] system, and they are the only SSL
|
|
functions we must use from within libcurl. sslgen is then crafted to use the
|
|
appropriate lower-level function calls to whatever SSL library that is in
|
|
use.
|
|
|
|
Library Symbols
|
|
===============
|
|
|
|
All symbols used internally in libcurl must use a 'Curl_' prefix if they're
|
|
used in more than a single file. Single-file symbols must be made static.
|
|
Public ("exported") symbols must use a 'curl_' prefix. (There are exceptions,
|
|
but they are to be changed to follow this pattern in future versions.) Public
|
|
API functions are marked with CURL_EXTERN in the public header files so that
|
|
all others can be hidden on platforms where this is possible.
|
|
|
|
Return Codes and Informationals
|
|
===============================
|
|
|
|
I've made things simple. Almost every function in libcurl returns a CURLcode,
|
|
that must be CURLE_OK if everything is OK or otherwise a suitable error code
|
|
as the curl/curl.h include file defines. The very spot that detects an error
|
|
must use the Curl_failf() function to set the human-readable error
|
|
description.
|
|
|
|
In aiding the user to understand what's happening and to debug curl usage, we
|
|
must supply a fair amount of informational messages by using the Curl_infof()
|
|
function. Those messages are only displayed when the user explicitly asks for
|
|
them. They are best used when revealing information that isn't otherwise
|
|
obvious.
|
|
|
|
API/ABI
|
|
=======
|
|
|
|
We make an effort to not export or show internals or how internals work, as
|
|
that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI
|
|
for our promise to users.
|
|
|
|
Client
|
|
======
|
|
|
|
main() resides in src/main.c together with most of the client code.
|
|
|
|
src/hugehelp.c is automatically generated by the mkhelp.pl perl script to
|
|
display the complete "manual" and the src/urlglob.c file holds the functions
|
|
used for the URL-"globbing" support. Globbing in the sense that the {} and []
|
|
expansion stuff is there.
|
|
|
|
The client mostly messes around to setup its 'config' struct properly, then
|
|
it calls the curl_easy_*() functions of the library and when it gets back
|
|
control after the curl_easy_perform() it cleans up the library, checks status
|
|
and exits.
|
|
|
|
When the operation is done, the ourWriteOut() function in src/writeout.c may
|
|
be called to report about the operation. That function is using the
|
|
curl_easy_getinfo() function to extract useful information from the curl
|
|
session.
|
|
|
|
Recent versions may loop and do all this several times if many URLs were
|
|
specified on the command line or config file.
|
|
|
|
Memory Debugging
|
|
================
|
|
|
|
The file lib/memdebug.c contains debug-versions of a few functions. Functions
|
|
such as malloc, free, fopen, fclose, etc that somehow deal with resources
|
|
that might give us problems if we "leak" them. The functions in the memdebug
|
|
system do nothing fancy, they do their normal function and then log
|
|
information about what they just did. The logged data can then be analyzed
|
|
after a complete session,
|
|
|
|
memanalyze.pl is the perl script present in tests/ that analyzes a log file
|
|
generated by the memory tracking system. It detects if resources are
|
|
allocated but never freed and other kinds of errors related to resource
|
|
management.
|
|
|
|
Internally, definition of preprocessor symbol DEBUGBUILD restricts code which
|
|
is only compiled for debug enabled builds. And symbol CURLDEBUG is used to
|
|
differentiate code which is _only_ used for memory tracking/debugging.
|
|
|
|
Use -DCURLDEBUG when compiling to enable memory debugging, this is also
|
|
switched on by running configure with --enable-curldebug. Use -DDEBUGBUILD
|
|
when compiling to enable a debug build or run configure with --enable-debug.
|
|
|
|
curl --version will list 'Debug' feature for debug enabled builds, and
|
|
will list 'TrackMemory' feature for curl debug memory tracking capable
|
|
builds. These features are independent and can be controlled when running
|
|
the configure script. When --enable-debug is given both features will be
|
|
enabled, unless some restriction prevents memory tracking from being used.
|
|
|
|
Test Suite
|
|
==========
|
|
|
|
The test suite is placed in its own subdirectory directly off the root in the
|
|
curl archive tree, and it contains a bunch of scripts and a lot of test case
|
|
data.
|
|
|
|
The main test script is runtests.pl that will invoke test servers like
|
|
httpserver.pl and ftpserver.pl before all the test cases are performed. The
|
|
test suite currently only runs on unix-like platforms.
|
|
|
|
You'll find a description of the test suite in the tests/README file, and the
|
|
test case data files in the tests/FILEFORMAT file.
|
|
|
|
The test suite automatically detects if curl was built with the memory
|
|
debugging enabled, and if it was it will detect memory leaks, too.
|
|
|
|
Building Releases
|
|
=================
|
|
|
|
There's no magic to this. When you consider everything stable enough to be
|
|
released, do this:
|
|
|
|
1. Tag the source code accordingly.
|
|
|
|
2. run the 'maketgz' script (using 'make distcheck' will give you a pretty
|
|
good view on the status of the current sources). maketgz requires a
|
|
version number and creates the release archive. maketgz uses 'make dist'
|
|
for the actual archive building, why you need to fill in the Makefile.am
|
|
files properly for which files that should be included in the release
|
|
archives.
|
|
|
|
3. When that's complete, sign the output files.
|
|
|
|
4. Upload
|
|
|
|
5. Update web site and changelog on site
|
|
|
|
6. Send announcement to the mailing lists
|
|
|
|
NOTE: you must have curl checked out from git to be able to do a proper
|
|
release build. The release tarballs do not have everything setup in order to
|
|
do releases properly.
|