mirror of
https://github.com/moparisthebest/curl
synced 2024-11-16 06:25:03 -05:00
530 lines
22 KiB
Plaintext
530 lines
22 KiB
Plaintext
$Id$
|
|
_ _ ____ _
|
|
___| | | | _ \| |
|
|
/ __| | | | |_) | |
|
|
| (__| |_| | _ <| |___
|
|
\___|\___/|_| \_\_____|
|
|
|
|
PROGRAMMING WITH LIBCURL
|
|
|
|
About this Document
|
|
|
|
This document will attempt to describe the general principle and some basic
|
|
approaches to consider when programming with libcurl. The text will focus
|
|
mainly on the C/C++ interface but might apply fairly well on other interfaces
|
|
as well as they usually follow the C one pretty closely.
|
|
|
|
This document will refer to 'the user' as the person writing the source code
|
|
that uses libcurl. That would probably be you or someone in your position.
|
|
What will be generally refered to as 'the program' will be the collected
|
|
source code that you write that is using libcurl for transfers. The program
|
|
is outside libcurl and libcurl is outside of the program.
|
|
|
|
To get the more details on all options and functions described herein, please
|
|
refer to their respective man pages.
|
|
|
|
Building
|
|
|
|
There are many different ways to build C programs. This chapter will assume a
|
|
unix-style build process. If you use a different build system, you can still
|
|
read this to get general information that may apply to your environment as
|
|
well.
|
|
|
|
Compiling the Program
|
|
|
|
Your compiler needs to know where the libcurl headers are
|
|
located. Therefore you must set your compiler's include path to point to
|
|
the directory where you installed them. The 'curl-config'[3] tool can be
|
|
used to get this information:
|
|
|
|
$ curl-config --cflags
|
|
|
|
Linking the Program with libcurl
|
|
|
|
When having compiled the program, you need to link your object files to
|
|
create a single executable. For that to succeed, you need to link with
|
|
libcurl and possibly also with other libraries that libcurl itself depends
|
|
on. Like OpenSSL librararies, but even some standard OS libraries may be
|
|
needed on the command line. To figure out which flags to use, once again
|
|
the 'curl-config' tool comes to the rescue:
|
|
|
|
$ curl-config --libs
|
|
|
|
SSL or Not
|
|
|
|
libcurl can be built and customized in many ways. One of the things that
|
|
varies from different libraries and builds is the support for SSL-based
|
|
transfers, like HTTPS and FTPS. If OpenSSL was detected properly at
|
|
build-time, libcurl will be built with SSL support. To figure out if an
|
|
installed libcurl has been built with SSL support enabled, use
|
|
'curl-config' like this:
|
|
|
|
$ curl-config --feature
|
|
|
|
And if SSL is supported, the keyword 'SSL' will be written to stdout,
|
|
possibly together with a few other features that can be on and off on
|
|
different libcurls.
|
|
|
|
|
|
Portable Code in a Portable World
|
|
|
|
The people behind libcurl have put a considerable effort to make libcurl work
|
|
on a large amount of different operating systems and environments.
|
|
|
|
You program libcurl the same way on all platforms that libcurl runs on. There
|
|
are only very few minor considerations that differs. If you just make sure to
|
|
write your code portable enough, you may very well create yourself a very
|
|
portable program. libcurl shouldn't stop you from that.
|
|
|
|
|
|
Global Preparation
|
|
|
|
The program must initialize some of the libcurl functionality globally. That
|
|
means it should be done exactly once, no matter how many times you intend to
|
|
use the library. Once for your program's entire life time. This is done using
|
|
|
|
curl_global_init()
|
|
|
|
and it takes one parameter which is a bit pattern that tells libcurl what to
|
|
intialize. Using CURL_GLOBAL_ALL will make it initialize all known internal
|
|
sub modules, and might be a good default option. The current two bits that
|
|
are specified are:
|
|
|
|
CURL_GLOBAL_WIN32 which only does anything on Windows machines. When used on
|
|
a Windows machine, it'll make libcurl intialize the win32 socket
|
|
stuff. Without having that initialized properly, your program cannot use
|
|
sockets properly. You should only do this once for each application, so if
|
|
your program already does this or of another library in use does it, you
|
|
should not tell libcurl to do this as well.
|
|
|
|
CURL_GLOBAL_SSL which only does anything on libcurls compiled and built
|
|
SSL-enabled. On these systems, this will make libcurl init OpenSSL properly
|
|
for this application. This is only needed to do once for each application so
|
|
if your program or another library already does this, this bit should not be
|
|
needed.
|
|
|
|
libcurl has a default protection mechanism that detects if curl_global_init()
|
|
hasn't been called by the time curl_easy_perform() is called and if that is
|
|
the case, libcurl runs the function itself with a guessed bit pattern. Please
|
|
note that depending solely on this is not considered nice nor very good.
|
|
|
|
When the program no longer uses libcurl, it should call
|
|
curl_global_cleanup(), which is the opposite of the init call. It will then
|
|
do the reversed operations to cleanup the resources the curl_global_init()
|
|
call initialized.
|
|
|
|
Repeated calls to curl_global_init() and curl_global_cleanup() should be
|
|
avoided. They should be called once each.
|
|
|
|
Handle the Easy libcurl
|
|
|
|
libcurl version 7 is oriented around the so called easy interface. All
|
|
operations in the easy interface are prefixed with 'curl_easy'.
|
|
|
|
Future libcurls will also offer the multi interface. More about that
|
|
interface, what it is targeted for and how to use it is still only debated on
|
|
the libcurl mailing list and developer web pages. Join up to discuss and
|
|
figure out!
|
|
|
|
To use the easy interface, you must first create yourself an easy handle. You
|
|
need one handle for each easy session you want to perform. Basicly, you
|
|
should use one handle for every thread you plan to use for transferring. You
|
|
must never share the same handle in multiple threads.
|
|
|
|
Get an easy handle with
|
|
|
|
easyhandle = curl_easy_init();
|
|
|
|
It returns an easy handle. Using that you proceed to the next step: setting
|
|
up your preferred actions. A handle is just a logic entity for the upcoming
|
|
transfer or series of transfers. One of the most basic properties to set in
|
|
the handle is the URL. You set your preferred URL to transfer with
|
|
CURLOPT_URL in a manner similar to:
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_URL, "http://curl.haxx.se/");
|
|
|
|
Let's assume for a while that you want to receive data as the URL indentifies
|
|
a remote resource you want to get here. Since you write a sort of application
|
|
that needs this transfer, I assume that you would like to get the data passed
|
|
to you directly instead of simply getting it passed to stdout. So, you write
|
|
your own function that matches this prototype:
|
|
|
|
size_t write_data(void *buffer, size_t size, size_t nmemb, void *userp);
|
|
|
|
You tell libcurl to pass all data to this function by issuing a function
|
|
similar to this:
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_WRITEFUNCTION, write_data);
|
|
|
|
You can control what data your function get in the forth argument by setting
|
|
another property:
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_FILE, &internal_struct);
|
|
|
|
Using that property, you can easily pass local data between your application
|
|
and the function that gets invoked by libcurl. libcurl itself won't touch the
|
|
data you pass with CURLOPT_FILE.
|
|
|
|
libcurl offers its own default internal callback that'll take care of the
|
|
data if you don't set the callback with CURLOPT_WRITEFUNCTION. It will then
|
|
simply output the received data to stdout. You can have the default callback
|
|
write the data to a different file handle by passing a 'FILE *' to a file
|
|
opened for writing with the CURLOPT_FILE option.
|
|
|
|
Now, we need to take a step back and have a deep breath. Here's one of those
|
|
rare platform-dependent nitpicks. Did you spot it? On some platforms[2],
|
|
libcurl won't be able to operate on files opened by the program. Thus, if you
|
|
use the default callback and pass in a an open file with CURLOPT_FILE, it
|
|
will crash. You should therefore avoid this to make your program run fine
|
|
virtually everywhere.
|
|
|
|
There are of course many more options you can set, and we'll get back to a
|
|
few of them later. Let's instead continue to the actual transfer:
|
|
|
|
success = curl_easy_perform(easyhandle);
|
|
|
|
The curl_easy_perform() will connect to the remote site, do the necessary
|
|
commands and receive the transfer. Whenever it receives data, it calls the
|
|
callback function we previously set. The function may get one byte at a time,
|
|
or it may get many kilobytes at once. libcurl delivers as much as possible as
|
|
often as possible. Your callback function should return the number of bytes
|
|
it "took care of". If that is not the exact same amount of bytes that was
|
|
passed to it, libcurl will abort the operation and return with an error code.
|
|
|
|
When the transfer is complete, the function returns a return code that
|
|
informs you if it succeeded in its mission or not. If a return code isn't
|
|
enough for you, you can use the CURLOPT_ERRORBUFFER to point libcurl to a
|
|
buffer of yours where it'll store a human readable error message as well.
|
|
|
|
If you then want to transfer another file, the handle is ready to be used
|
|
again. Mind you, it is even preferred that you re-use an existing handle if
|
|
you intend to make another transfer. libcurl will then attempt to re-use the
|
|
previous
|
|
|
|
|
|
When It Doesn't Work
|
|
|
|
There will always be times when the transfer fails for some reason. You might
|
|
have set the wrong libcurl option or misunderstood what the libcurl option
|
|
actually does, or the remote server might return non-standard replies that
|
|
confuse the library which then confuses your program.
|
|
|
|
There's one golden rule when these things occur: set the CURLOPT_VERBOSE
|
|
option to TRUE. It'll cause the library to spew out the entire protocol
|
|
details it sends, some internal info and some received protcol data as well
|
|
(especially when using FTP). If you're using HTTP, adding the headers in the
|
|
received output to study is also a clever way to get a better understanding
|
|
wht the server behaves the way it does. Include headers in the normal body
|
|
output with CURLOPT_HEADER set TRUE.
|
|
|
|
Of course there are bugs left. We need to get to know about them to be able
|
|
to fix them, so we're quite dependent on your bug reports! When you do report
|
|
suspected bugs in libcurl, please include as much details you possibly can: a
|
|
protocol dump that CURLOPT_VERBOSE produces, library version, as much as
|
|
possible of your code that uses libcurl, operating system name and version,
|
|
compiler name and version etc.
|
|
|
|
Getting some in-depth knowledge about the protocols involved is never wrong,
|
|
and if you're trying to funny things, you might very well understand libcurl
|
|
and how to use it better if you study the appropriate RFC documents at least
|
|
briefly.
|
|
|
|
|
|
Upload Data to a Remote Site
|
|
|
|
libcurl tries to keep a protocol independent approach to most transfers, thus
|
|
uploading to a remote FTP site is very similar to uploading data to a HTTP
|
|
server with a PUT request.
|
|
|
|
Of course, first you either create an easy handle or you re-use one existing
|
|
one. Then you set the URL to operate on just like before. This is the remote
|
|
URL, that we now will upload.
|
|
|
|
Since we write an application, we most likely want libcurl to get the upload
|
|
data by asking us for it. To make it do that, we set the read callback and
|
|
the custom pointer libcurl will pass to our read callback. The read callback
|
|
should have a prototype similar to:
|
|
|
|
size_t function(char *bufptr, size_t size, size_t nitems, void *userp);
|
|
|
|
Where bufptr is the pointer to a buffer we fill in with data to upload and
|
|
size*nitems is the size of the buffer and therefore also the maximum amount
|
|
of data we can return to libcurl in this call. The 'userp' pointer is the
|
|
custom pointer we set to point to a struct of ours to pass private data
|
|
between the application and the callback.
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_READFUNCTION, read_function);
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_INFILE, &filedata);
|
|
|
|
Tell libcurl that we want to upload:
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_UPLOAD, TRUE);
|
|
|
|
A few protocols won't behave properly when uploads are done without any prior
|
|
knowledge of the expected file size. HTTP PUT is one example [1]. So, set the
|
|
upload file size using the CURLOPT_INFILESIZE like this:
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_INFILESIZE, file_size);
|
|
|
|
When you call curl_easy_perform() this time, it'll perform all the necessary
|
|
operations and when it has invoked the upload it'll call your supplied
|
|
callback to get the data to upload. The program should return as much data as
|
|
possible in every invoke, as that is likely to make the upload perform as
|
|
fast as possible. The callback should return the number of bytes it wrote in
|
|
the buffer. Returning 0 will signal the end of the upload.
|
|
|
|
|
|
Passwords
|
|
|
|
Many protocols use or even require that user name and password are provided
|
|
to be able to download or upload the data of your choice. libcurl offers
|
|
several ways to specify them.
|
|
|
|
Most protocols support that you specify the name and password in the URL
|
|
itself. libcurl will detect this and use them accordingly. This is written
|
|
like this:
|
|
|
|
protocol://user:password@example.com/path/
|
|
|
|
If you need any odd letters in your user name or password, you should enter
|
|
them URL encoded, as %XX where XX is a two-digit hexadecimal number.
|
|
|
|
libcurl also provides options to set various passwords. The user name and
|
|
password as shown embedded in the URL can instead get set with the
|
|
CURLOPT_USERPWD option. The argument passed to libcurl should be a char * to
|
|
a string in the format "user:password:". In a manner like this:
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_USERPWD, "myname:thesecret");
|
|
|
|
Another case where name and password might be needed at times, is for those
|
|
users who need to athenticate themselves to a proxy they use. libcurl offers
|
|
another option for this, the CURLOPT_PROXYUSERPWD. It is used quite similar
|
|
to the CURLOPT_USERPWD option like this:
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "myname:thesecret");
|
|
|
|
There's a long time unix "standard" way of storing ftp user names and
|
|
passwords, namely in the $HOME/.netrc file. The file should be made private
|
|
so that only the user may read it (see also the "Security Considerations"
|
|
chapter), as it might contain the password in plain text. libcurl has the
|
|
ability to use this file to figure out what set of user name and password to
|
|
use for a particular host. As an extension to the normal functionality,
|
|
libcurl also supports this file for non-FTP protocols such as HTTP. To make
|
|
curl use this file, use the CURLOPT_NETRC option:
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_NETRC, TRUE);
|
|
|
|
And a very basic example of how such a .netrc file may look like:
|
|
|
|
machine myhost.mydomain.com
|
|
login userlogin
|
|
password secretword
|
|
|
|
All these examples have been cases where the password has been optional, or
|
|
at least you could leave it out and have libcurl attempt to do its job
|
|
without it. There are times when the password isn't optional, like when
|
|
you're using an SSL private key for secure transfers.
|
|
|
|
You can in this situation either pass a password to libcurl to use to unlock
|
|
the private key, or you can let libcurl prompt the user for it. If you prefer
|
|
to ask the user, then you can provide your own callback function that will be
|
|
called when libcurl wants the password. That way, you can control how the
|
|
question will appear to the user.
|
|
|
|
To pass the known private key password to libcurl:
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_SSLKEYPASSWD, "keypassword");
|
|
|
|
To make a password callback:
|
|
|
|
int enter_passwd(void *ourp, const char *prompt, char *buffer, int len);
|
|
curl_easy_setopt(easyhandle, CURLOPT_PASSWDFUNCTION, enter_passwd);
|
|
|
|
|
|
HTTP POSTing
|
|
|
|
We get many questions regarding how to issue HTTP POSTs with libcurl the
|
|
proper way. This chapter will thus include examples using both different
|
|
versions of HTTP POST that libcurl supports.
|
|
|
|
The first version is the simple POST, the most common version, that most HTML
|
|
pages using the <form> tag uses. We provide a pointer to the data and tell
|
|
libcurl to post it all to the remote site:
|
|
|
|
char *data="name=daniel&project=curl";
|
|
curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDS, data);
|
|
curl_easy_setopt(easyhandle, CURLOPT_URL, "http://posthere.com/");
|
|
|
|
curl_easy_perform(easyhandle); /* post away! */
|
|
|
|
Simple enough, huh? Ok, so what if you want to post binary data that also
|
|
requires you to set the Content-Type: header of the post? Well, binary posts
|
|
prevents libcurl from being able to do strlen() on the data to figure out the
|
|
size, so therefore we must tell libcurl the size of the post data. Setting
|
|
headers in libcurl requests are done in a generic way, by building a list of
|
|
our own headers and then passing that list to libcurl.
|
|
|
|
struct curl_slist *headers=NULL;
|
|
headers = curl_slist_append(headers, "Content-Type: text/xml");
|
|
|
|
/* post binary data */
|
|
curl_easy_setopt(easyhandle, CURLOPT_POSTFIELD, binaryptr);
|
|
|
|
/* set the size of the postfields data */
|
|
curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDSIZE, 23);
|
|
|
|
/* pass our list of custom made headers */
|
|
curl_easy_setopt(easyhandle, CURLOPT_HTTPHEADER, headers);
|
|
|
|
curl_easy_perform(easyhandle); /* post away! */
|
|
|
|
curl_slist_free_all(headers); /* free the header list */
|
|
|
|
While the simple examples above cover the majority of all cases where HTTP
|
|
POST operations are required, they don't do multipart formposts. Multipart
|
|
formposts were introduced as a better way to post (possibly large) binary
|
|
data and was first documented in the RFC1867. They're called multipart
|
|
because they're built by a chain of parts, each being a single unit. Each
|
|
part has its own name and contents. You can in fact create and post a
|
|
multipart formpost with the regular libcurl POST support described above, but
|
|
that would require that you build a formpost yourself and provide to
|
|
libcurl. To make that easier, libcurl provides curl_formadd(). Using this
|
|
function, you add parts to the form. When you're done adding parts, you post
|
|
the whole form.
|
|
|
|
The following example sets two simple text parts with plain textual contents,
|
|
and then a file with binary contents and upload the whole thing.
|
|
|
|
struct HttpPost *post=NULL;
|
|
struct HttpPost *last=NULL;
|
|
curl_formadd(&post, &last,
|
|
CURLFORM_COPYNAME, "name",
|
|
CURLFORM_COPYCONTENTS, "daniel", CURLFORM_END);
|
|
curl_formadd(&post, &last,
|
|
CURLFORM_COPYNAME, "project",
|
|
CURLFORM_COPYCONTENTS, "curl", CURLFORM_END);
|
|
curl_formadd(&post, &last,
|
|
CURLFORM_COPYNAME, "logotype-image",
|
|
CURLFORM_FILECONTENT, "curl.png", CURLFORM_END);
|
|
|
|
/* Set the form info */
|
|
curl_easy_setopt(easyhandle, CURLOPT_HTTPPOST, post);
|
|
|
|
curl_easy_perform(easyhandle); /* post away! */
|
|
|
|
/* free the post data again */
|
|
curl_formfree(post);
|
|
|
|
The multipart formposts are a chain of parts using MIME-style separators and
|
|
headers. That means that each of these separate parts get a few headers set
|
|
that describes its individual content-type, size etc. Now, to enable your
|
|
application to handicraft this formpost even more, libcurl allows you to
|
|
supply your own custom headers to an individual form part. You can of course
|
|
supply headers to as many parts you like, but this little example will show
|
|
how you have set headers to one specific part when you add that to post
|
|
handle:
|
|
|
|
struct curl_slist *headers=NULL;
|
|
headers = curl_slist_append(headers, "Content-Type: text/xml");
|
|
|
|
curl_formadd(&post, &last,
|
|
CURLFORM_COPYNAME, "logotype-image",
|
|
CURLFORM_FILECONTENT, "curl.xml",
|
|
CURLFORM_CONTENTHEADER, headers,
|
|
CURLFORM_END);
|
|
|
|
curl_easy_perform(easyhandle); /* post away! */
|
|
|
|
curl_formfree(post); /* free post */
|
|
curl_slist_free_all(post); /* free custom header list */
|
|
|
|
|
|
Showing Progress
|
|
|
|
|
|
libcurl with C++
|
|
|
|
There's basicly only one thing to keep in mind when using C++ instead of C
|
|
when interfacing libcurl:
|
|
|
|
"The Callbacks Must Be Plain C"
|
|
|
|
So if you want a write callback set in libcurl, you should put it within
|
|
'extern'. Similar to this:
|
|
|
|
extern "C" {
|
|
size_t write_data(void *ptr, size_t size, size_t nmemb,
|
|
void *ourpointer)
|
|
{
|
|
/* do what you want with the data */
|
|
}
|
|
}
|
|
|
|
This will of course effectively turn the callback code into C. There won't be
|
|
any "this" pointer available etc.
|
|
|
|
|
|
Proxies
|
|
|
|
What "proxy" means according to Merriam-Webster: "a person authorized to act
|
|
for another" but also "the agency, function, or office of a deputy who acts
|
|
as a substitute for another".
|
|
|
|
Proxies are exceedingly common these days. Companies often only offer
|
|
internet access to employees through their HTTP proxies. Network clients or
|
|
user-agents ask the proxy for docuements, the proxy does the actual request
|
|
and then it returns them.
|
|
|
|
libcurl has full support for HTTP proxies, so when a given URL is wanted,
|
|
libcurl will ask the proxy for it instead of trying to connect to the actual
|
|
host identified in the URL.
|
|
|
|
The fact that the proxy is a HTTP proxy puts certain restrictions on what can
|
|
actually happen. A requested URL that might not be a HTTP URL will be still
|
|
be passed to the HTTP proxy to deliver back to libcurl. This happens
|
|
transparantly, and an application may not need to know. I say "may", because
|
|
at times it is very important to understand that all operations over a HTTP
|
|
proxy is using the HTTP protocol. For example, you can't invoke your own
|
|
custom FTP commands or even proper FTP directory listings.
|
|
|
|
To tell libcurl to use a proxy at a given port number:
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_PROXY, "proxy-host.com:8080");
|
|
|
|
Some proxies require user authentication before allowing a request, and you
|
|
pass that information similar to this:
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "user:password");
|
|
|
|
[ environment variables, SSL, tunneling, automatic proxy config (.pac) ]
|
|
|
|
|
|
Security Considerations
|
|
|
|
[ ps output, netrc plain text, plain text protocols / base64 ]
|
|
|
|
|
|
Certificates and Other SSL Tricks
|
|
|
|
|
|
Future
|
|
|
|
|
|
|
|
-----
|
|
Footnotes:
|
|
|
|
[1] = HTTP PUT without knowing the size prior to transfer is indeed possible,
|
|
but libcurl does not support the chunked transfers on uploading that is
|
|
necessary for this feature to work. We'd gratefully appreciate patches
|
|
that bring this functionality...
|
|
|
|
[2] = This happens on Windows machines when libcurl is built and used as a
|
|
DLL. However, you can still do this on Windows if you link with a static
|
|
library.
|
|
|
|
[3] = The curl-config tool is generated at build-time (on unix-like systems)
|
|
and should be installed with the 'make install' or similar instruction
|
|
that installs the library, header files, man pages etc.
|