diff --git a/docs/libcurl/Makefile.am b/docs/libcurl/Makefile.am index 81ef28f86..27f8d7816 100644 --- a/docs/libcurl/Makefile.am +++ b/docs/libcurl/Makefile.am @@ -15,7 +15,8 @@ man_MANS = curl_easy_cleanup.3 curl_easy_getinfo.3 curl_easy_init.3 \ curl_multi_perform.3 curl_multi_remove_handle.3 curl_share_cleanup.3 \ curl_share_init.3 curl_share_setopt.3 libcurl.3 libcurl-easy.3 \ libcurl-multi.3 libcurl-share.3 libcurl-errors.3 curl_easy_strerror.3 \ - curl_multi_strerror.3 curl_share_strerror.3 curl_global_init_mem.3 + curl_multi_strerror.3 curl_share_strerror.3 curl_global_init_mem.3 \ + libcurl-tutorial.3 HTMLPAGES = curl_easy_cleanup.html curl_easy_getinfo.html \ curl_easy_init.html curl_easy_perform.html curl_easy_setopt.html \ @@ -30,7 +31,8 @@ HTMLPAGES = curl_easy_cleanup.html curl_easy_getinfo.html \ curl_share_cleanup.html curl_share_init.html curl_share_setopt.html \ libcurl.html libcurl-multi.html libcurl-easy.html libcurl-share.html \ libcurl-errors.html curl_easy_strerror.html curl_multi_strerror.html \ - curl_share_strerror.html curl_global_init_mem.html + curl_share_strerror.html curl_global_init_mem.html \ + libcurl-tutorial.html PDFPAGES = curl_easy_cleanup.pdf curl_easy_getinfo.pdf \ curl_easy_init.pdf curl_easy_perform.pdf curl_easy_setopt.pdf \ @@ -45,7 +47,7 @@ PDFPAGES = curl_easy_cleanup.pdf curl_easy_getinfo.pdf \ curl_share_init.pdf curl_share_setopt.pdf libcurl.pdf \ libcurl-multi.pdf libcurl-easy.pdf libcurl-share.pdf \ libcurl-errors.pdf curl_easy_strerror.pdf curl_multi_strerror.pdf \ - curl_share_strerror.pdf curl_global_init_mem.pdf + curl_share_strerror.pdf curl_global_init_mem.pdf libcurl-tutorial.pdf CLEANFILES = $(HTMLPAGES) $(PDFPAGES) diff --git a/docs/libcurl/libcurl-tutorial.3 b/docs/libcurl/libcurl-tutorial.3 new file mode 100644 index 000000000..f1716b692 --- /dev/null +++ b/docs/libcurl/libcurl-tutorial.3 @@ -0,0 +1,1171 @@ +.\" ************************************************************************** +.\" * _ _ ____ _ +.\" * Project ___| | | | _ \| | +.\" * / __| | | | |_) | | +.\" * | (__| |_| | _ <| |___ +.\" * \___|\___/|_| \_\_____| +.\" * +.\" * Copyright (C) 1998 - 2004, Daniel Stenberg, , et al. +.\" * +.\" * This software is licensed as described in the file COPYING, which +.\" * you should have received as part of this distribution. The terms +.\" * are also available at http://curl.haxx.se/docs/copyright.html. +.\" * +.\" * You may opt to use, copy, modify, merge, publish, distribute and/or sell +.\" * copies of the Software, and permit persons to whom the Software is +.\" * furnished to do so, under the terms of the COPYING file. +.\" * +.\" * This software is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY +.\" * KIND, either express or implied. +.\" * +.\" * $Id$ +.\" ************************************************************************** +.\" +.TH libcurl-tutorial 3 "18 Jun 2004" "libcurl" "libcurl programming" +.SH NAME +libcurl-tutorial \- libcurl programming tutorial +.SH "Objective" +This document attempts to describe the general principles and some basic +approaches to consider when programming with libcurl. The text will focus +mainly on the C interface but might apply fairly well on other interfaces as +well as they usually follow the C one pretty closely. + +This document will refer to 'the user' as the person writing the source code +that uses libcurl. That would probably be you or someone in your position. +What will be generally referred to as 'the program' will be the collected +source code that you write that is using libcurl for transfers. The program +is outside libcurl and libcurl is outside of the program. + +To get the more details on all options and functions described herein, please +refer to their respective man pages. + +.SH "Building" +There are many different ways to build C programs. This chapter will assume a +unix-style build process. If you use a different build system, you can still +read this to get general information that may apply to your environment as +well. +.IP "Compiling the Program" +Your compiler needs to know where the libcurl headers are located. Therefore +you must set your compiler's include path to point to the directory where you +installed them. The 'curl-config'[3] tool can be used to get this information: + +$ curl-config --cflags + +.IP "Linking the Program with libcurl" +When having compiled the program, you need to link your object files to create +a single executable. For that to succeed, you need to link with libcurl and +possibly also with other libraries that libcurl itself depends on. Like the +OpenSSL libraries, but even some standard OS libraries may be needed on the +command line. To figure out which flags to use, once again the 'curl-config' +tool comes to the rescue: + +$ curl-config --libs + +.IP "SSL or Not" +libcurl can be built and customized in many ways. One of the things that +varies from different libraries and builds is the support for SSL-based +transfers, like HTTPS and FTPS. If OpenSSL was detected properly at +build-time, libcurl will be built with SSL support. To figure out if an +installed libcurl has been built with SSL support enabled, use 'curl-config' +like this: + +$ curl-config --feature + +And if SSL is supported, the keyword 'SSL' will be written to stdout, +possibly together with a few other features that can be on and off on +different libcurls. + +See also the "Features libcurl Provides" further down. + +.SH "Portable Code in a Portable World" +The people behind libcurl have put a considerable effort to make libcurl work +on a large amount of different operating systems and environments. + +You program libcurl the same way on all platforms that libcurl runs on. There +are only very few minor considerations that differs. If you just make sure to +write your code portable enough, you may very well create yourself a very +portable program. libcurl shouldn't stop you from that. + +.SH "Global Preparation" +The program must initialize some of the libcurl functionality globally. That +means it should be done exactly once, no matter how many times you intend to +use the library. Once for your program's entire life time. This is done using + + curl_global_init() + +and it takes one parameter which is a bit pattern that tells libcurl what to +initialize. Using CURL_GLOBAL_ALL will make it initialize all known internal +sub modules, and might be a good default option. The current two bits that +are specified are: +.RS +.IP "CURL_GLOBAL_WIN32" +which only does anything on Windows machines. When used on +a Windows machine, it'll make libcurl initialize the win32 socket +stuff. Without having that initialized properly, your program cannot use +sockets properly. You should only do this once for each application, so if +your program already does this or of another library in use does it, you +should not tell libcurl to do this as well. +.IP CURL_GLOBAL_SSL +which only does anything on libcurls compiled and built +SSL-enabled. On these systems, this will make libcurl initialize OpenSSL +properly for this application. This is only needed to do once for each +application so if your program or another library already does this, this +bit should not be needed. +.RE + +libcurl has a default protection mechanism that detects if curl_global_init() +hasn't been called by the time curl_easy_perform() is called and if that is +the case, libcurl runs the function itself with a guessed bit pattern. Please +note that depending solely on this is not considered nice nor very good. + +When the program no longer uses libcurl, it should call curl_global_cleanup(), +which is the opposite of the init call. It will then do the reversed +operations to cleanup the resources the curl_global_init() call initialized. + +Repeated calls to curl_global_init() and curl_global_cleanup() should be +avoided. They should only be called once each. + +.SH "Features libcurl Provides" +It is considered best-practice to determine libcurl features run-time rather +than build-time (if possible of course). By calling curl_version_info() and +checking tout he details of the returned struct, your program can figure out +exactly what the currently running libcurl supports. + +.SH "Handle the Easy libcurl" +libcurl first introduced the so called easy interface. All operations in the +easy interface are prefixed with 'curl_easy'. + +Recent libcurl versions also offer the multi interface. More about that +interface, what it is targeted for and how to use it is detailed in a separate +chapter further down. You still need to understand the easy interface first, +so please continue reading for better understanding. + +To use the easy interface, you must first create yourself an easy handle. You +need one handle for each easy session you want to perform. Basically, you +should use one handle for every thread you plan to use for transferring. You +must never share the same handle in multiple threads. + +Get an easy handle with + + easyhandle = curl_easy_init(); + +It returns an easy handle. Using that you proceed to the next step: setting +up your preferred actions. A handle is just a logic entity for the upcoming +transfer or series of transfers. + +You set properties and options for this handle using curl_easy_setopt(). They +control how the subsequent transfer or transfers will be made. Options remain +set in the handle until set again to something different. Alas, multiple +requests using the same handle will use the same options. + +Many of the options you set in libcurl are "strings", pointers to data +terminated with a zero byte. Keep in mind that when you set strings with +curl_easy_setopt(), libcurl will not copy the data. It will merely point to +the data. You MUST make sure that the data remains available for libcurl to +use until finished or until you use the same option again to point to +something else. + +One of the most basic properties to set in the handle is the URL. You set +your preferred URL to transfer with CURLOPT_URL in a manner similar to: + +.nf + curl_easy_setopt(handle, CURLOPT_URL, "http://domain.com/"); +.fi + +Let's assume for a while that you want to receive data as the URL identifies a +remote resource you want to get here. Since you write a sort of application +that needs this transfer, I assume that you would like to get the data passed +to you directly instead of simply getting it passed to stdout. So, you write +your own function that matches this prototype: + + size_t write_data(void *buffer, size_t size, size_t nmemb, void *userp); + +You tell libcurl to pass all data to this function by issuing a function +similar to this: + + curl_easy_setopt(easyhandle, CURLOPT_WRITEFUNCTION, write_data); + +You can control what data your function get in the forth argument by setting +another property: + + curl_easy_setopt(easyhandle, CURLOPT_FILE, &internal_struct); + +Using that property, you can easily pass local data between your application +and the function that gets invoked by libcurl. libcurl itself won't touch the +data you pass with CURLOPT_FILE. + +libcurl offers its own default internal callback that'll take care of the data +if you don't set the callback with CURLOPT_WRITEFUNCTION. It will then simply +output the received data to stdout. You can have the default callback write +the data to a different file handle by passing a 'FILE *' to a file opened for +writing with the CURLOPT_FILE option. + +Now, we need to take a step back and have a deep breath. Here's one of those +rare platform-dependent nitpicks. Did you spot it? On some platforms[2], +libcurl won't be able to operate on files opened by the program. Thus, if you +use the default callback and pass in a an open file with CURLOPT_FILE, it will +crash. You should therefore avoid this to make your program run fine virtually +everywhere. + +There are of course many more options you can set, and we'll get back to a few +of them later. Let's instead continue to the actual transfer: + + success = curl_easy_perform(easyhandle); + +The \fIcurl_easy_perform(3)\fP will connect to the remote site, do the +necessary commands and receive the transfer. Whenever it receives data, it +calls the callback function we previously set. The function may get one byte +at a time, or it may get many kilobytes at once. libcurl delivers as much as +possible as often as possible. Your callback function should return the number +of bytes it "took care of". If that is not the exact same amount of bytes that +was passed to it, libcurl will abort the operation and return with an error +code. + +When the transfer is complete, the function returns a return code that informs +you if it succeeded in its mission or not. If a return code isn't enough for +you, you can use the CURLOPT_ERRORBUFFER to point libcurl to a buffer of yours +where it'll store a human readable error message as well. + +If you then want to transfer another file, the handle is ready to be used +again. Mind you, it is even preferred that you re-use an existing handle if +you intend to make another transfer. libcurl will then attempt to re-use the +previous + +.SH "Multi-threading Issues" +libcurl is completely thread safe, except for two issues: signals and alarm +handlers. Signals are needed for a SIGPIPE handler, and the alarm() Bacall +is used to catch timeouts (mostly during ENS lookup). + +If you are accessing HTTPS or FTPS URLs in a multi-threaded manner, you are +then of course using OpenSSL multi-threaded and it has itself a few +requirements on this. Basilio, you need to provide one or two functions to +allow it to function properly. For all details, see this: + + http://www.openssl.org/docs/crypto/threads.html#DESCRIPTION + +When using multiple threads you should set the CURLOPT_NOSIGNAL option to +TRUE for all handles. Everything will work fine except that timeouts are not +honored during the DNS lookup - which you can work around by building libcurl +with c-ares support. c-ares is a library that provides asynchronous name +resolves. Unfortunately, c-ares does not yet support IPv6. + +Also, note that CURLOPT_DNS_USE_GLOBAL_CACHE is not thread-safe. + +.SH "When It Doesn't Work" +There will always be times when the transfer fails for some reason. You might +have set the wrong libcurl option or misunderstood what the libcurl option +actually does, or the remote server might return non-standard replies that +confuse the library which then confuses your program. + +There's one golden rule when these things occur: set the CURLOPT_VERBOSE +option to TRUE. It'll cause the library to spew out the entire protocol +details it sends, some internal info and some received protocol data as well +(especially when using FTP). If you're using HTTP, adding the headers in the +received output to study is also a clever way to get a better understanding +why the server behaves the way it does. Include headers in the normal body +output with CURLOPT_HEADER set TRUE. + +Of course there are bugs left. We need to get to know about them to be able +to fix them, so we're quite dependent on your bug reports! When you do report +suspected bugs in libcurl, please include as much details you possibly can: a +protocol dump that CURLOPT_VERBOSE produces, library version, as much as +possible of your code that uses libcurl, operating system name and version, +compiler name and version etc. + +If CURLOPT_VERBOSE is not enough, you increase the level of debug data your +application receive by using the CURLOPT_DEBUGFUNCTION. + +Getting some in-depth knowledge about the protocols involved is never wrong, +and if you're trying to do funny things, you might very well understand +libcurl and how to use it better if you study the appropriate RFC documents +at least briefly. + +.SH "Upload Data to a Remote Site" +libcurl tries to keep a protocol independent approach to most transfers, thus +uploading to a remote FTP site is very similar to uploading data to a HTTP +server with a PUT request. + +Of course, first you either create an easy handle or you re-use one existing +one. Then you set the URL to operate on just like before. This is the remote +URL, that we now will upload. + +Since we write an application, we most likely want libcurl to get the upload +data by asking us for it. To make it do that, we set the read callback and +the custom pointer libcurl will pass to our read callback. The read callback +should have a prototype similar to: + + size_t function(char *bufptr, size_t size, size_t nitems, void *userp); + +Where bufptr is the pointer to a buffer we fill in with data to upload and +size*nitems is the size of the buffer and therefore also the maximum amount +of data we can return to libcurl in this call. The 'userp' pointer is the +custom pointer we set to point to a struct of ours to pass private data +between the application and the callback. + + curl_easy_setopt(easyhandle, CURLOPT_READFUNCTION, read_function); + + curl_easy_setopt(easyhandle, CURLOPT_INFILE, &filedata); + +Tell libcurl that we want to upload: + + curl_easy_setopt(easyhandle, CURLOPT_UPLOAD, TRUE); + +A few protocols won't behave properly when uploads are done without any prior +knowledge of the expected file size. So, set the upload file size using the +CURLOPT_INFILESIZE_LARGE for all known file sizes like this[1]: + +.nf + /* in this example, file_size must be an off_t variable */ + curl_easy_setopt(easyhandle, CURLOPT_INFILESIZE_LARGE, file_size); +.fi + +When you call curl_easy_perform() this time, it'll perform all the necessary +operations and when it has invoked the upload it'll call your supplied +callback to get the data to upload. The program should return as much data as +possible in every invoke, as that is likely to make the upload perform as +fast as possible. The callback should return the number of bytes it wrote in +the buffer. Returning 0 will signal the end of the upload. + +.SH "Passwords" +Many protocols use or even require that user name and password are provided +to be able to download or upload the data of your choice. libcurl offers +several ways to specify them. + +Most protocols support that you specify the name and password in the URL +itself. libcurl will detect this and use them accordingly. This is written +like this: + + protocol://user:password@example.com/path/ + +If you need any odd letters in your user name or password, you should enter +them URL encoded, as %XX where XX is a two-digit hexadecimal number. + +libcurl also provides options to set various passwords. The user name and +password as shown embedded in the URL can instead get set with the +CURLOPT_USERPWD option. The argument passed to libcurl should be a char * to +a string in the format "user:password:". In a manner like this: + + curl_easy_setopt(easyhandle, CURLOPT_USERPWD, "myname:thesecret"); + +Another case where name and password might be needed at times, is for those +users who need to authenticate themselves to a proxy they use. libcurl offers +another option for this, the CURLOPT_PROXYUSERPWD. It is used quite similar +to the CURLOPT_USERPWD option like this: + + curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "myname:thesecret"); + +There's a long time unix "standard" way of storing ftp user names and +passwords, namely in the $HOME/.netrc file. The file should be made private +so that only the user may read it (see also the "Security Considerations" +chapter), as it might contain the password in plain text. libcurl has the +ability to use this file to figure out what set of user name and password to +use for a particular host. As an extension to the normal functionality, +libcurl also supports this file for non-FTP protocols such as HTTP. To make +curl use this file, use the CURLOPT_NETRC option: + + curl_easy_setopt(easyhandle, CURLOPT_NETRC, TRUE); + +And a very basic example of how such a .netrc file may look like: + +.nf + machine myhost.mydomain.com + login userlogin + password secretword +.fi + +All these examples have been cases where the password has been optional, or +at least you could leave it out and have libcurl attempt to do its job +without it. There are times when the password isn't optional, like when +you're using an SSL private key for secure transfers. + +To pass the known private key password to libcurl: + + curl_easy_setopt(easyhandle, CURLOPT_SSLKEYPASSWD, "keypassword"); + +.SH "HTTP Authentication" +The previous chapter showed how to set user name and password for getting +URLs that require authentication. When using the HTTP protocol, there are +many different ways a client can provide those credentials to the server and +you can control what way libcurl will (attempt to) use. The default HTTP +authentication method is called 'Basic', which is sending the name and +password in clear-text in the HTTP request, base64-encoded. This is insecure. + +At the time of this writing libcurl can be built to use: Basic, Digest, NTLM, +Negotiate, GSS-Negotiate and SPNEGO. You can tell libcurl which one to use +with CURLOPT_HTTPAUTH as in: + + curl_easy_setopt(easyhandle, CURLOPT_HTTPAUTH, CURLAUTH_DIGEST); + +And when you send authentication to a proxy, you can also set authentication +type the same way but instead with CURLOPT_PROXYAUTH: + + curl_easy_setopt(easyhandle, CURLOPT_PROXYAUTH, CURLAUTH_NTLM); + +Both these options allow you to set multiple types (by ORing them together), +to make libcurl pick the most secure one out of the types the server/proxy +claims to support. This method does however add a round-trip since libcurl +must first ask the server what it supports: + + curl_easy_setopt(easyhandle, CURLOPT_HTTPAUTH, + CURLAUTH_DIGEST|CURLAUTH_BASIC); + +For convenience, you can use the 'CURLAUTH_ANY' define (instead of a list +with specific types) which allows libcurl to use whatever method it wants. + +When asking for multiple types, libcurl will pick the available one it +considers "best" in its own internal order of preference. + +.SH "HTTP POSTing" +We get many questions regarding how to issue HTTP POSTs with libcurl the +proper way. This chapter will thus include examples using both different +versions of HTTP POST that libcurl supports. + +The first version is the simple POST, the most common version, that most HTML +pages using the
tag uses. We provide a pointer to the data and tell +libcurl to post it all to the remote site: + +.nf + char *data="name=daniel&project=curl"; + curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDS, data); + curl_easy_setopt(easyhandle, CURLOPT_URL, "http://posthere.com/"); + + curl_easy_perform(easyhandle); /* post away! */ +.fi + +Simple enough, huh? Since you set the POST options with the +CURLOPT_POSTFIELDS, this automatically switches the handle to use POST in the +upcoming request. + +Ok, so what if you want to post binary data that also requires you to set the +Content-Type: header of the post? Well, binary posts prevents libcurl from +being able to do strlen() on the data to figure out the size, so therefore we +must tell libcurl the size of the post data. Setting headers in libcurl +requests are done in a generic way, by building a list of our own headers and +then passing that list to libcurl. + +.nf + struct curl_slist *headers=NULL; + headers = curl_slist_append(headers, "Content-Type: text/xml"); + + /* post binary data */ + curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDS, binaryptr); + + /* set the size of the postfields data */ + curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDSIZE, 23); + + /* pass our list of custom made headers */ + curl_easy_setopt(easyhandle, CURLOPT_HTTPHEADER, headers); + + curl_easy_perform(easyhandle); /* post away! */ + + curl_slist_free_all(headers); /* free the header list */ +.fi + +While the simple examples above cover the majority of all cases where HTTP +POST operations are required, they don't do multi-part formposts. Multi-part +formposts were introduced as a better way to post (possibly large) binary +data and was first documented in the RFC1867. They're called multi-part +because they're built by a chain of parts, each being a single unit. Each +part has its own name and contents. You can in fact create and post a +multi-part formpost with the regular libcurl POST support described above, but +that would require that you build a formpost yourself and provide to +libcurl. To make that easier, libcurl provides curl_formadd(). Using this +function, you add parts to the form. When you're done adding parts, you post +the whole form. + +The following example sets two simple text parts with plain textual contents, +and then a file with binary contents and upload the whole thing. + +.nf + struct curl_httppost *post=NULL; + struct curl_httppost *last=NULL; + curl_formadd(&post, &last, + CURLFORM_COPYNAME, "name", + CURLFORM_COPYCONTENTS, "daniel", CURLFORM_END); + curl_formadd(&post, &last, + CURLFORM_COPYNAME, "project", + CURLFORM_COPYCONTENTS, "curl", CURLFORM_END); + curl_formadd(&post, &last, + CURLFORM_COPYNAME, "logotype-image", + CURLFORM_FILECONTENT, "curl.png", CURLFORM_END); + + /* Set the form info */ + curl_easy_setopt(easyhandle, CURLOPT_HTTPPOST, post); + + curl_easy_perform(easyhandle); /* post away! */ + + /* free the post data again */ + curl_formfree(post); +.fi + +Multipart formposts are chains of parts using MIME-style separators and +headers. It means that each one of these separate parts get a few headers set +that describe the individual content-type, size etc. To enable your +application to handicraft this formpost even more, libcurl allows you to +supply your own set of custom headers to such an individual form part. You can +of course supply headers to as many parts you like, but this little example +will show how you set headers to one specific part when you add that to the +post handle: + +.nf + struct curl_slist *headers=NULL; + headers = curl_slist_append(headers, "Content-Type: text/xml"); + + curl_formadd(&post, &last, + CURLFORM_COPYNAME, "logotype-image", + CURLFORM_FILECONTENT, "curl.xml", + CURLFORM_CONTENTHEADER, headers, + CURLFORM_END); + + curl_easy_perform(easyhandle); /* post away! */ + + curl_formfree(post); /* free post */ + curl_slist_free_all(post); /* free custom header list */ +.fi + +Since all options on an easyhandle are "sticky", they remain the same until +changed even if you do call curl_easy_perform(), you may need to tell curl to +go back to a plain GET request if you intend to do such a one as your next +request. You force an easyhandle to back to GET by using the CURLOPT_HTTPGET +option: + + curl_easy_setopt(easyhandle, CURLOPT_HTTPGET, TRUE); + +Just setting CURLOPT_POSTFIELDS to "" or NULL will *not* stop libcurl from +doing a POST. It will just make it POST without any data to send! + +.SH "Showing Progress" + +For historical and traditional reasons, libcurl has a built-in progress meter +that can be switched on and then makes it presents a progress meter in your +terminal. + +Switch on the progress meter by, oddly enough, set CURLOPT_NOPROGRESS to +FALSE. This option is set to TRUE by default. + +For most applications however, the built-in progress meter is useless and +what instead is interesting is the ability to specify a progress +callback. The function pointer you pass to libcurl will then be called on +irregular intervals with information about the current transfer. + +Set the progress callback by using CURLOPT_PROGRESSFUNCTION. And pass a +pointer to a function that matches this prototype: + +.nf + int progress_callback(void *clientp, + double dltotal, + double dlnow, + double ultotal, + double ulnow); +.fi + +If any of the input arguments is unknown, a 0 will be passed. The first +argument, the 'clientp' is the pointer you pass to libcurl with +CURLOPT_PROGRESSDATA. libcurl won't touch it. + +.SH "libcurl with C++" + +There's basically only one thing to keep in mind when using C++ instead of C +when interfacing libcurl: + + "The Callbacks Must Be Plain C" + +So if you want a write callback set in libcurl, you should put it within +\&'extern'. Similar to this: + +.nf + extern "C" { + size_t write_data(void *ptr, size_t size, size_t nmemb, + void *ourpointer) + { + /* do what you want with the data */ + } + } +.fi + +This will of course effectively turn the callback code into C. There won't be +any "this" pointer available etc. + +.SH "Proxies" + +What "proxy" means according to Merriam-Webster: "a person authorized to act +for another" but also "the agency, function, or office of a deputy who acts as +a substitute for another". + +Proxies are exceedingly common these days. Companies often only offer +Internet access to employees through their HTTP proxies. Network clients or +user-agents ask the proxy for documents, the proxy does the actual request +and then it returns them. + +libcurl has full support for HTTP proxies, so when a given URL is wanted, +libcurl will ask the proxy for it instead of trying to connect to the actual +host identified in the URL. + +The fact that the proxy is a HTTP proxy puts certain restrictions on what can +actually happen. A requested URL that might not be a HTTP URL will be still +be passed to the HTTP proxy to deliver back to libcurl. This happens +transparently, and an application may not need to know. I say "may", because +at times it is very important to understand that all operations over a HTTP +proxy is using the HTTP protocol. For example, you can't invoke your own +custom FTP commands or even proper FTP directory listings. + +.IP "Proxy Options" + +To tell libcurl to use a proxy at a given port number: + + curl_easy_setopt(easyhandle, CURLOPT_PROXY, "proxy-host.com:8080"); + +Some proxies require user authentication before allowing a request, and you +pass that information similar to this: + + curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "user:password"); + +If you want to, you can specify the host name only in the CURLOPT_PROXY +option, and set the port number separately with CURLOPT_PROXYPORT. + +.IP "Environment Variables" + +libcurl automatically checks and uses a set of environment variables to +know what proxies to use for certain protocols. The names of the variables +are following an ancient de facto standard and are built up as +"[protocol]_proxy" (note the lower casing). Which makes the variable +'http_proxy' checked for a name of a proxy to use when the input URL is +HTTP. Following the same rule, the variable named 'ftp_proxy' is checked +for FTP URLs. Again, the proxies are always HTTP proxies, the different +names of the variables simply allows different HTTP proxies to be used. + +The proxy environment variable contents should be in the format +\&"[protocol://][user:password@]machine[:port]". Where the protocol:// part is +simply ignored if present (so http://proxy and bluerk://proxy will do the +same) and the optional port number specifies on which port the proxy operates +on the host. If not specified, the internal default port number will be used +and that is most likely *not* the one you would like it to be. + +There are two special environment variables. 'all_proxy' is what sets proxy +for any URL in case the protocol specific variable wasn't set, and +\&'no_proxy' defines a list of hosts that should not use a proxy even though a +variable may say so. If 'no_proxy' is a plain asterisk ("*") it matches all +hosts. + +.IP "SSL and Proxies" + +SSL is for secure point-to-point connections. This involves strong encryption +and similar things, which effectively makes it impossible for a proxy to +operate as a "man in between" which the proxy's task is, as previously +discussed. Instead, the only way to have SSL work over a HTTP proxy is to ask +the proxy to tunnel trough everything without being able to check or fiddle +with the traffic. + +Opening an SSL connection over a HTTP proxy is therefor a matter of asking the +proxy for a straight connection to the target host on a specified port. This +is made with the HTTP request CONNECT. ("please mr proxy, connect me to that +remote host"). + +Because of the nature of this operation, where the proxy has no idea what kind +of data that is passed in and out through this tunnel, this breaks some of the +very few advantages that come from using a proxy, such as caching. Many +organizations prevent this kind of tunneling to other destination port numbers +than 443 (which is the default HTTPS port number). + +.IP "Tunneling Through Proxy" +As explained above, tunneling is required for SSL to work and often even +restricted to the operation intended for SSL; HTTPS. + +This is however not the only time proxy-tunneling might offer benefits to +you or your application. + +As tunneling opens a direct connection from your application to the remote +machine, it suddenly also re-introduces the ability to do non-HTTP +operations over a HTTP proxy. You can in fact use things such as FTP +upload or FTP custom commands this way. + +Again, this is often prevented by the administrators of proxies and is +rarely allowed. + +Tell libcurl to use proxy tunneling like this: + + curl_easy_setopt(easyhandle, CURLOPT_HTTPPROXYTUNNEL, TRUE); + +In fact, there might even be times when you want to do plain HTTP +operations using a tunnel like this, as it then enables you to operate on +the remote server instead of asking the proxy to do so. libcurl will not +stand in the way for such innovative actions either! + +.IP "Proxy Auto-Config" + +Netscape first came up with this. It is basically a web page (usually using a +\&.pac extension) with a javascript that when executed by the browser with the +requested URL as input, returns information to the browser on how to connect +to the URL. The returned information might be "DIRECT" (which means no proxy +should be used), "PROXY host:port" (to tell the browser where the proxy for +this particular URL is) or "SOCKS host:port" (to direct the browser to a SOCKS +proxy). + +libcurl has no means to interpret or evaluate javascript and thus it doesn't +support this. If you get yourself in a position where you face this nasty +invention, the following advice have been mentioned and used in the past: + +- Depending on the javascript complexity, write up a script that translates it +to another language and execute that. + +- Read the javascript code and rewrite the same logic in another language. + +- Implement a javascript interpreted, people have successfully used the +Mozilla javascript engine in the past. + +- Ask your admins to stop this, for a static proxy setup or similar. + +.SH "Persistence Is The Way to Happiness" + +Re-cycling the same easy handle several times when doing multiple requests is +the way to go. + +After each single curl_easy_perform() operation, libcurl will keep the +connection alive and open. A subsequent request using the same easy handle to +the same host might just be able to use the already open connection! This +reduces network impact a lot. + +Even if the connection is dropped, all connections involving SSL to the same +host again, will benefit from libcurl's session ID cache that drastically +reduces re-connection time. + +FTP connections that are kept alive saves a lot of time, as the command- +response round-trips are skipped, and also you don't risk getting blocked +without permission to login again like on many FTP servers only allowing N +persons to be logged in at the same time. + +libcurl caches DNS name resolving results, to make lookups of a previously +looked up name a lot faster. + +Other interesting details that improve performance for subsequent requests +may also be added in the future. + +Each easy handle will attempt to keep the last few connections alive for a +while in case they are to be used again. You can set the size of this "cache" +with the CURLOPT_MAXCONNECTS option. Default is 5. It is very seldom any +point in changing this value, and if you think of changing this it is often +just a matter of thinking again. + +When the connection cache gets filled, libcurl must close an existing +connection in order to get room for the new one. To know which connection to +close, libcurl uses a "close policy" that you can affect with the +CURLOPT_CLOSEPOLICY option. There's only two polices implemented as of this +writing (libcurl 7.9.4) and they are: + +.RS +.IP CURLCLOSEPOLICY_LEAST_RECENTLY_USED +simply close the one that hasn't been used for the longest time. This is the +default behavior. +.IP CURLCLOSEPOLICY_OLDEST +closes the oldest connection, the one that was created the longest time ago. +.RE + +There are, or at least were, plans to support a close policy that would call +a user-specified callback to let the user be able to decide which connection +to dump when this is necessary and therefor is the CURLOPT_CLOSEFUNCTION an +existing option still today. Nothing ever uses this though and this will not +be used within the foreseeable future either. + +To force your upcoming request to not use an already existing connection (it +will even close one first if there happens to be one alive to the same host +you're about to operate on), you can do that by setting CURLOPT_FRESH_CONNECT +to TRUE. In a similar spirit, you can also forbid the upcoming request to be +"lying" around and possibly get re-used after the request by setting +CURLOPT_FORBID_REUSE to TRUE. + +.SH "HTTP Headers Used by libcurl" +When you use libcurl to do HTTP requests, it'll pass along a series of headers +automatically. It might be good for you to know and understand these ones. + +.IP "Host" +This header is required by HTTP 1.1 and even many 1.0 servers and should be +the name of the server we want to talk to. This includes the port number if +anything but default. + +.IP "Pragma" +\&"no-cache". Tells a possible proxy to not grab a copy from the cache but to +fetch a fresh one. + +.IP "Accept" +\&"*/*". + +.IP "Expect:" +When doing multi-part formposts, libcurl will set this header to +\&"100-continue" to ask the server for an "OK" message before it proceeds with +sending the data part of the post. + +.SH "Customizing Operations" +There is an ongoing development today where more and more protocols are built +upon HTTP for transport. This has obvious benefits as HTTP is a tested and +reliable protocol that is widely deployed and have excellent proxy-support. + +When you use one of these protocols, and even when doing other kinds of +programming you may need to change the traditional HTTP (or FTP or...) +manners. You may need to change words, headers or various data. + +libcurl is your friend here too. + +.IP CUSTOMREQUEST +If just changing the actual HTTP request keyword is what you want, like when +GET, HEAD or POST is not good enough for you, CURLOPT_CUSTOMREQUEST is there +for you. It is very simple to use: + + curl_easy_setopt(easyhandle, CURLOPT_CUSTOMREQUEST, "MYOWNRUQUEST"); + +When using the custom request, you change the request keyword of the actual +request you are performing. Thus, by default you make GET request but you can +also make a POST operation (as described before) and then replace the POST +keyword if you want to. You're the boss. + +.IP "Modify Headers" +HTTP-like protocols pass a series of headers to the server when doing the +request, and you're free to pass any amount of extra headers that you +think fit. Adding headers are this easy: + +.nf + struct curl_slist *headers=NULL; /* init to NULL is important */ + + headers = curl_slist_append(headers, "Hey-server-hey: how are you?"); + headers = curl_slist_append(headers, "X-silly-content: yes"); + + /* pass our list of custom made headers */ + curl_easy_setopt(easyhandle, CURLOPT_HTTPHEADER, headers); + + curl_easy_perform(easyhandle); /* transfer http */ + + curl_slist_free_all(headers); /* free the header list */ +.fi + +\&... and if you think some of the internally generated headers, such as +Accept: or Host: don't contain the data you want them to contain, you can +replace them by simply setting them too: + +.nf + headers = curl_slist_append(headers, "Accept: Agent-007"); + headers = curl_slist_append(headers, "Host: munged.host.line"); +.fi + +.IP "Delete Headers" +If you replace an existing header with one with no contents, you will prevent +the header from being sent. Like if you want to completely prevent the +\&"Accept:" header to be sent, you can disable it with code similar to this: + + headers = curl_slist_append(headers, "Accept:"); + +Both replacing and canceling internal headers should be done with careful +consideration and you should be aware that you may violate the HTTP protocol +when doing so. + +.IP "Enforcing chunked transfer-encoding" + +By making sure a request uses the custom header "Transfer-Encoding: chunked" +when doing a non-GET HTTP operation, libcurl will switch over to "chunked" +upload, even though the size of the data to upload might be known. By default, +libcurl usually switches over to chunked upload automatically if the upload +data size is unknown. + +.IP "HTTP Version" + +There's only one aspect left in the HTTP requests that we haven't yet +mentioned how to modify: the version field. All HTTP requests includes the +version number to tell the server which version we support. libcurl speak +HTTP 1.1 by default. Some very old servers don't like getting 1.1-requests +and when dealing with stubborn old things like that, you can tell libcurl +to use 1.0 instead by doing something like this: + + curl_easy_setopt(easyhandle, CURLOPT_HTTP_VERSION, CURLHTTP_VERSION_1_0); + +.IP "FTP Custom Commands" + +Not all protocols are HTTP-like, and thus the above may not help you when +you want to make for example your FTP transfers to behave differently. + +Sending custom commands to a FTP server means that you need to send the +commands exactly as the FTP server expects them (RFC959 is a good guide +here), and you can only use commands that work on the control-connection +alone. All kinds of commands that requires data interchange and thus needs +a data-connection must be left to libcurl's own judgment. Also be aware +that libcurl will do its very best to change directory to the target +directory before doing any transfer, so if you change directory (with CWD +or similar) you might confuse libcurl and then it might not attempt to +transfer the file in the correct remote directory. + +A little example that deletes a given file before an operation: + +.nf + headers = curl_slist_append(headers, "DELE file-to-remove"); + + /* pass the list of custom commands to the handle */ + curl_easy_setopt(easyhandle, CURLOPT_QUOTE, headers); + + curl_easy_perform(easyhandle); /* transfer ftp data! */ + + curl_slist_free_all(headers); /* free the header list */ +.fi + +If you would instead want this operation (or chain of operations) to happen +_after_ the data transfer took place the option to curl_easy_setopt() would +instead be called CURLOPT_POSTQUOTE and used the exact same way. + +The custom FTP command will be issued to the server in the same order they are +added to the list, and if a command gets an error code returned back from the +server, no more commands will be issued and libcurl will bail out with an +error code (CURLE_FTP_QUOTE_ERROR). Note that if you use CURLOPT_QUOTE to send +commands before a transfer, no transfer will actually take place when a quote +command has failed. + +If you set the CURLOPT_HEADER to true, you will tell libcurl to get +information about the target file and output "headers" about it. The headers +will be in "HTTP-style", looking like they do in HTTP. + +The option to enable headers or to run custom FTP commands may be useful to +combine with CURLOPT_NOBODY. If this option is set, no actual file content +transfer will be performed. + +.IP "FTP Custom CUSTOMREQUEST" +If you do what list the contents of a FTP directory using your own defined FTP +command, CURLOPT_CUSTOMREQUEST will do just that. "NLST" is the default one +for listing directories but you're free to pass in your idea of a good +alternative. + +.SH "Cookies Without Chocolate Chips" +In the HTTP sense, a cookie is a name with an associated value. A server sends +the name and value to the client, and expects it to get sent back on every +subsequent request to the server that matches the particular conditions +set. The conditions include that the domain name and path match and that the +cookie hasn't become too old. + +In real-world cases, servers send new cookies to replace existing one to +update them. Server use cookies to "track" users and to keep "sessions". + +Cookies are sent from server to clients with the header Set-Cookie: and +they're sent from clients to servers with the Cookie: header. + +To just send whatever cookie you want to a server, you can use CURLOPT_COOKIE +to set a cookie string like this: + + curl_easy_setopt(easyhandle, CURLOPT_COOKIE, "name1=var1; name2=var2;"); + +In many cases, that is not enough. You might want to dynamically save +whatever cookies the remote server passes to you, and make sure those cookies +are then use accordingly on later requests. + +One way to do this, is to save all headers you receive in a plain file and +when you make a request, you tell libcurl to read the previous headers to +figure out which cookies to use. Set header file to read cookies from with +CURLOPT_COOKIEFILE. + +The CURLOPT_COOKIEFILE option also automatically enables the cookie parser in +libcurl. Until the cookie parser is enabled, libcurl will not parse or +understand incoming cookies and they will just be ignored. However, when the +parser is enabled the cookies will be understood and the cookies will be kept +in memory and used properly in subsequent requests when the same handle is +used. Many times this is enough, and you may not have to save the cookies to +disk at all. Note that the file you specify to CURLOPT_COOKIEFILE doesn't +have to exist to enable the parser, so a common way to just enable the parser +and not read able might be to use a file name you know doesn't exist. + +If you rather use existing cookies that you've previously received with your +Netscape or Mozilla browsers, you can make libcurl use that cookie file as +input. The CURLOPT_COOKIEFILE is used for that too, as libcurl will +automatically find out what kind of file it is and act accordingly. + +The perhaps most advanced cookie operation libcurl offers, is saving the +entire internal cookie state back into a Netscape/Mozilla formatted cookie +file. We call that the cookie-jar. When you set a file name with +CURLOPT_COOKIEJAR, that file name will be created and all received cookies +will be stored in it when curl_easy_cleanup() is called. This enabled cookies +to get passed on properly between multiple handles without any information +getting lost. + +.SH "FTP Peculiarities We Need" + +FTP transfers use a second TCP/IP connection for the data transfer. This is +usually a fact you can forget and ignore but at times this fact will come +back to haunt you. libcurl offers several different ways to custom how the +second connection is being made. + +libcurl can either connect to the server a second time or tell the server to +connect back to it. The first option is the default and it is also what works +best for all the people behind firewalls, NATs or IP-masquerading setups. +libcurl then tells the server to open up a new port and wait for a second +connection. This is by default attempted with EPSV first, and if that doesn't +work it tries PASV instead. (EPSV is an extension to the original FTP spec +and does not exist nor work on all FTP servers.) + +You can prevent libcurl from first trying the EPSV command by setting +CURLOPT_FTP_USE_EPSV to FALSE. + +In some cases, you will prefer to have the server connect back to you for the +second connection. This might be when the server is perhaps behind a firewall +or something and only allows connections on a single port. libcurl then +informs the remote server which IP address and port number to connect to. +This is made with the CURLOPT_FTPPORT option. If you set it to "-", libcurl +will use your system's "default IP address". If you want to use a particular +IP, you can set the full IP address, a host name to resolve to an IP address +or even a local network interface name that libcurl will get the IP address +from. + +When doing the "PORT" approach, libcurl will attempt to use the EPRT and the +LPRT before trying PORT, as they work with more protocols. You can disable +this behavior by setting CURLOPT_FTP_USE_EPRT to FALSE. + +.SH "Headers Equal Fun" + +Some protocols provide "headers", meta-data separated from the normal +data. These headers are by default not included in the normal data stream, +but you can make them appear in the data stream by setting CURLOPT_HEADER to +TRUE. + +What might be even more useful, is libcurl's ability to separate the headers +from the data and thus make the callbacks differ. You can for example set a +different pointer to pass to the ordinary write callback by setting +CURLOPT_WRITEHEADER. + +Or, you can set an entirely separate function to receive the headers, by +using CURLOPT_HEADERFUNCTION. + +The headers are passed to the callback function one by one, and you can +depend on that fact. It makes it easier for you to add custom header parsers +etc. + +"Headers" for FTP transfers equal all the FTP server responses. They aren't +actually true headers, but in this case we pretend they are! ;-) + +.SH "Post Transfer Information" + + [ curl_easy_getinfo ] + +.SH "Security Considerations" + +libcurl is in itself not insecure. If used the right way, you can use libcurl +to transfer data pretty safely. + +There are of course many things to consider that may loosen up this +situation: + +.IP "Command Lines" +If you use a command line tool (such as curl) that uses libcurl, and you give +option to the tool on the command line those options can very likely get read +by other users of your system when they use 'ps' or other tools to list +currently running processes. + +To avoid this problem, never feed sensitive things to programs using command +line options. + +.IP ".netrc" +\&.netrc is a pretty handy file/feature that allows you to login quickly and +automatically to frequently visited sites. The file contains passwords in +clear text and is a real security risk. In some cases, your .netrc is also +stored in a home directory that is NFS mounted or used on another network +based file system, so the clear text password will fly through your network +every time anyone reads that file! + +To avoid this problem, don't use .netrc files and never store passwords in +plain text anywhere. + +.IP "Clear Text Passwords" +Many of the protocols libcurl supports send name and password unencrypted as +clear text (HTTP Basic authentication, FTP, TELNET etc). It is very easy for +anyone on your network or a network nearby yours, to just fire up a network +analyzer tool and eavesdrop on your passwords. Don't let the fact that HTTP +uses base64 encoded passwords fool you. They may not look readable at a first +glance, but they very easily "deciphered" by anyone within seconds. + +To avoid this problem, use protocols that don't let snoopers see your +password: HTTPS, FTPS and FTP-kerberos are a few examples. HTTP Digest +authentication allows this too, but isn't supported by libcurl as of this +writing. + +.IP "Showing What You Do" +On a related issue, be aware that even in situations like when you have +problems with libcurl and ask someone for help, everything you reveal in order +to get best possible help might also impose certain security related +risks. Host names, user names, paths, operating system specifics etc (not to +mention passwords of course) may in fact be used by intruders to gain +additional information of a potential target. + +To avoid this problem, you must of course use your common sense. Often, you +can just edit out the sensitive data or just search/replace your true +information with faked data. + +.SH "Multiple Transfers Using the multi Interface" + +The easy interface as described in detail in this document is a synchronous +interface that transfers one file at a time and doesn't return until its +done. + +The multi interface on the other hand, allows your program to transfer +multiple files in both directions at the same time, without forcing you to +use multiple threads. + +To use this interface, you are better off if you first understand the basics +of how to use the easy interface. The multi interface is simply a way to make +multiple transfers at the same time, by adding up multiple easy handles in to +a "multi stack". + +You create the easy handles you want and you set all the options just like +you have been told above, and then you create a multi handle with +curl_multi_init() and add all those easy handles to that multi handle with +curl_multi_add_handle(). + +When you've added the handles you have for the moment (you can still add new +ones at any time), you start the transfers by call curl_multi_perform(). + +curl_multi_perform() is asynchronous. It will only execute as little as +possible and then return back control to your program. It is designed to +never block. If it returns CURLM_CALL_MULTI_PERFORM you better call it again +soon, as that is a signal that it still has local data to send or remote data +to receive. + +The best usage of this interface is when you do a select() on all possible +file descriptors or sockets to know when to call libcurl again. This also +makes it easy for you to wait and respond to actions on your own +application's sockets/handles. You figure out what to select() for by using +curl_multi_fdset(), that fills in a set of fd_set variables for you with the +particular file descriptors libcurl uses for the moment. + +When you then call select(), it'll return when one of the file handles signal +action and you then call curl_multi_perform() to allow libcurl to do what it +wants to do. Take note that libcurl does also feature some time-out code so +we advice you to never use very long timeouts on select() before you call +curl_multi_perform(), which thus should be called unconditionally every now +and then even if none of its file descriptors have signaled ready. Another +precaution you should use: always call curl_multi_fdset() immediately before +the select() call since the current set of file descriptors may change when +calling a curl function. + +If you want to stop the transfer of one of the easy handles in the stack, you +can use curl_multi_remove_handle() to remove individual easy +handles. Remember that easy handles should be curl_easy_cleanup()ed. + +When a transfer within the multi stack has finished, the counter of running +transfers (as filled in by curl_multi_perform()) will decrease. When the +number reaches zero, all transfers are done. + +curl_multi_info_read() can be used to get information about completed +transfers. It then returns the CURLcode for each easy transfer, to allow you +to figure out success on each individual transfer. + +.SH "SSL, Certificates and Other Tricks" + + [ seeding, passwords, keys, certificates, ENGINE, ca certs ] + +.SH "Sharing Data Between Easy Handles" + + [ fill in ] + +.SH "Footnotes" + +.IP "[1]" +libcurl 7.10.3 and later have the ability to switch over to chunked +Transfer-Encoding in cases were HTTP uploads are done with data of an unknown +size. +.IP "[2]" +This happens on Windows machines when libcurl is built and used as a +DLL. However, you can still do this on Windows if you link with a static +library. +.IP "[3]" +The curl-config tool is generated at build-time (on unix-like systems) and +should be installed with the 'make install' or similar instruction that +installs the library, header files, man pages etc.