2004-06-21 04:17:08 -04:00
|
|
|
.\" **************************************************************************
|
|
|
|
.\" * _ _ ____ _
|
|
|
|
.\" * Project ___| | | | _ \| |
|
|
|
|
.\" * / __| | | | |_) | |
|
|
|
|
.\" * | (__| |_| | _ <| |___
|
|
|
|
.\" * \___|\___/|_| \_\_____|
|
|
|
|
.\" *
|
2006-02-09 17:25:41 -05:00
|
|
|
.\" * Copyright (C) 1998 - 2006, Daniel Stenberg, <daniel@haxx.se>, et al.
|
2004-06-21 04:17:08 -04:00
|
|
|
.\" *
|
|
|
|
.\" * This software is licensed as described in the file COPYING, which
|
|
|
|
.\" * you should have received as part of this distribution. The terms
|
|
|
|
.\" * are also available at http://curl.haxx.se/docs/copyright.html.
|
|
|
|
.\" *
|
|
|
|
.\" * You may opt to use, copy, modify, merge, publish, distribute and/or sell
|
|
|
|
.\" * copies of the Software, and permit persons to whom the Software is
|
|
|
|
.\" * furnished to do so, under the terms of the COPYING file.
|
|
|
|
.\" *
|
|
|
|
.\" * This software is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY
|
|
|
|
.\" * KIND, either express or implied.
|
|
|
|
.\" *
|
|
|
|
.\" * $Id$
|
|
|
|
.\" **************************************************************************
|
|
|
|
.\"
|
2005-05-09 09:13:38 -04:00
|
|
|
.TH libcurl-tutorial 3 "9 May 2005" "libcurl" "libcurl programming"
|
2004-06-21 04:17:08 -04:00
|
|
|
.SH NAME
|
|
|
|
libcurl-tutorial \- libcurl programming tutorial
|
|
|
|
.SH "Objective"
|
|
|
|
This document attempts to describe the general principles and some basic
|
|
|
|
approaches to consider when programming with libcurl. The text will focus
|
|
|
|
mainly on the C interface but might apply fairly well on other interfaces as
|
|
|
|
well as they usually follow the C one pretty closely.
|
|
|
|
|
|
|
|
This document will refer to 'the user' as the person writing the source code
|
|
|
|
that uses libcurl. That would probably be you or someone in your position.
|
|
|
|
What will be generally referred to as 'the program' will be the collected
|
|
|
|
source code that you write that is using libcurl for transfers. The program
|
|
|
|
is outside libcurl and libcurl is outside of the program.
|
|
|
|
|
|
|
|
To get the more details on all options and functions described herein, please
|
|
|
|
refer to their respective man pages.
|
|
|
|
|
|
|
|
.SH "Building"
|
|
|
|
There are many different ways to build C programs. This chapter will assume a
|
|
|
|
unix-style build process. If you use a different build system, you can still
|
|
|
|
read this to get general information that may apply to your environment as
|
|
|
|
well.
|
|
|
|
.IP "Compiling the Program"
|
|
|
|
Your compiler needs to know where the libcurl headers are located. Therefore
|
|
|
|
you must set your compiler's include path to point to the directory where you
|
|
|
|
installed them. The 'curl-config'[3] tool can be used to get this information:
|
|
|
|
|
|
|
|
$ curl-config --cflags
|
|
|
|
|
|
|
|
.IP "Linking the Program with libcurl"
|
|
|
|
When having compiled the program, you need to link your object files to create
|
|
|
|
a single executable. For that to succeed, you need to link with libcurl and
|
|
|
|
possibly also with other libraries that libcurl itself depends on. Like the
|
|
|
|
OpenSSL libraries, but even some standard OS libraries may be needed on the
|
|
|
|
command line. To figure out which flags to use, once again the 'curl-config'
|
|
|
|
tool comes to the rescue:
|
|
|
|
|
|
|
|
$ curl-config --libs
|
|
|
|
|
|
|
|
.IP "SSL or Not"
|
|
|
|
libcurl can be built and customized in many ways. One of the things that
|
|
|
|
varies from different libraries and builds is the support for SSL-based
|
|
|
|
transfers, like HTTPS and FTPS. If OpenSSL was detected properly at
|
|
|
|
build-time, libcurl will be built with SSL support. To figure out if an
|
|
|
|
installed libcurl has been built with SSL support enabled, use 'curl-config'
|
|
|
|
like this:
|
|
|
|
|
|
|
|
$ curl-config --feature
|
|
|
|
|
|
|
|
And if SSL is supported, the keyword 'SSL' will be written to stdout,
|
|
|
|
possibly together with a few other features that can be on and off on
|
|
|
|
different libcurls.
|
|
|
|
|
|
|
|
See also the "Features libcurl Provides" further down.
|
2005-01-25 04:29:05 -05:00
|
|
|
.IP "autoconf macro"
|
|
|
|
When you write your configure script to detect libcurl and setup variables
|
|
|
|
accordingly, we offer a prewritten macro that probably does everything you
|
|
|
|
need in this area. See docs/libcurl/libcurl.m4 file - it includes docs on how
|
|
|
|
to use it.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
.SH "Portable Code in a Portable World"
|
|
|
|
The people behind libcurl have put a considerable effort to make libcurl work
|
|
|
|
on a large amount of different operating systems and environments.
|
|
|
|
|
|
|
|
You program libcurl the same way on all platforms that libcurl runs on. There
|
|
|
|
are only very few minor considerations that differs. If you just make sure to
|
|
|
|
write your code portable enough, you may very well create yourself a very
|
|
|
|
portable program. libcurl shouldn't stop you from that.
|
|
|
|
|
|
|
|
.SH "Global Preparation"
|
|
|
|
The program must initialize some of the libcurl functionality globally. That
|
|
|
|
means it should be done exactly once, no matter how many times you intend to
|
|
|
|
use the library. Once for your program's entire life time. This is done using
|
|
|
|
|
|
|
|
curl_global_init()
|
|
|
|
|
|
|
|
and it takes one parameter which is a bit pattern that tells libcurl what to
|
2005-01-22 17:43:04 -05:00
|
|
|
initialize. Using \fICURL_GLOBAL_ALL\fP will make it initialize all known
|
|
|
|
internal sub modules, and might be a good default option. The current two bits
|
|
|
|
that are specified are:
|
2004-06-21 04:17:08 -04:00
|
|
|
.RS
|
|
|
|
.IP "CURL_GLOBAL_WIN32"
|
|
|
|
which only does anything on Windows machines. When used on
|
|
|
|
a Windows machine, it'll make libcurl initialize the win32 socket
|
|
|
|
stuff. Without having that initialized properly, your program cannot use
|
|
|
|
sockets properly. You should only do this once for each application, so if
|
|
|
|
your program already does this or of another library in use does it, you
|
|
|
|
should not tell libcurl to do this as well.
|
|
|
|
.IP CURL_GLOBAL_SSL
|
|
|
|
which only does anything on libcurls compiled and built
|
|
|
|
SSL-enabled. On these systems, this will make libcurl initialize OpenSSL
|
|
|
|
properly for this application. This is only needed to do once for each
|
|
|
|
application so if your program or another library already does this, this
|
|
|
|
bit should not be needed.
|
|
|
|
.RE
|
|
|
|
|
2005-01-22 17:43:04 -05:00
|
|
|
libcurl has a default protection mechanism that detects if
|
|
|
|
\fIcurl_global_init(3)\fP hasn't been called by the time
|
|
|
|
\fIcurl_easy_perform(3)\fP is called and if that is the case, libcurl runs the
|
|
|
|
function itself with a guessed bit pattern. Please note that depending solely
|
|
|
|
on this is not considered nice nor very good.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
2005-01-22 17:43:04 -05:00
|
|
|
When the program no longer uses libcurl, it should call
|
|
|
|
\fIcurl_global_cleanup(3)\fP, which is the opposite of the init call. It will
|
|
|
|
then do the reversed operations to cleanup the resources the
|
|
|
|
\fIcurl_global_init(3)\fP call initialized.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
2005-01-22 17:43:04 -05:00
|
|
|
Repeated calls to \fIcurl_global_init(3)\fP and \fIcurl_global_cleanup(3)\fP
|
|
|
|
should be avoided. They should only be called once each.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
.SH "Features libcurl Provides"
|
|
|
|
It is considered best-practice to determine libcurl features run-time rather
|
|
|
|
than build-time (if possible of course). By calling curl_version_info() and
|
|
|
|
checking tout he details of the returned struct, your program can figure out
|
|
|
|
exactly what the currently running libcurl supports.
|
|
|
|
|
|
|
|
.SH "Handle the Easy libcurl"
|
|
|
|
libcurl first introduced the so called easy interface. All operations in the
|
|
|
|
easy interface are prefixed with 'curl_easy'.
|
|
|
|
|
|
|
|
Recent libcurl versions also offer the multi interface. More about that
|
|
|
|
interface, what it is targeted for and how to use it is detailed in a separate
|
|
|
|
chapter further down. You still need to understand the easy interface first,
|
|
|
|
so please continue reading for better understanding.
|
|
|
|
|
|
|
|
To use the easy interface, you must first create yourself an easy handle. You
|
|
|
|
need one handle for each easy session you want to perform. Basically, you
|
|
|
|
should use one handle for every thread you plan to use for transferring. You
|
|
|
|
must never share the same handle in multiple threads.
|
|
|
|
|
|
|
|
Get an easy handle with
|
|
|
|
|
|
|
|
easyhandle = curl_easy_init();
|
|
|
|
|
|
|
|
It returns an easy handle. Using that you proceed to the next step: setting
|
|
|
|
up your preferred actions. A handle is just a logic entity for the upcoming
|
|
|
|
transfer or series of transfers.
|
|
|
|
|
2005-01-22 17:43:04 -05:00
|
|
|
You set properties and options for this handle using
|
|
|
|
\fIcurl_easy_setopt(3)\fP. They control how the subsequent transfer or
|
|
|
|
transfers will be made. Options remain set in the handle until set again to
|
|
|
|
something different. Alas, multiple requests using the same handle will use
|
|
|
|
the same options.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
Many of the options you set in libcurl are "strings", pointers to data
|
|
|
|
terminated with a zero byte. Keep in mind that when you set strings with
|
2005-01-22 17:43:04 -05:00
|
|
|
\fIcurl_easy_setopt(3)\fP, libcurl will not copy the data. It will merely
|
|
|
|
point to the data. You MUST make sure that the data remains available for
|
|
|
|
libcurl to use until finished or until you use the same option again to point
|
|
|
|
to something else.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
One of the most basic properties to set in the handle is the URL. You set
|
|
|
|
your preferred URL to transfer with CURLOPT_URL in a manner similar to:
|
|
|
|
|
|
|
|
.nf
|
|
|
|
curl_easy_setopt(handle, CURLOPT_URL, "http://domain.com/");
|
|
|
|
.fi
|
|
|
|
|
|
|
|
Let's assume for a while that you want to receive data as the URL identifies a
|
|
|
|
remote resource you want to get here. Since you write a sort of application
|
|
|
|
that needs this transfer, I assume that you would like to get the data passed
|
|
|
|
to you directly instead of simply getting it passed to stdout. So, you write
|
|
|
|
your own function that matches this prototype:
|
|
|
|
|
|
|
|
size_t write_data(void *buffer, size_t size, size_t nmemb, void *userp);
|
|
|
|
|
|
|
|
You tell libcurl to pass all data to this function by issuing a function
|
|
|
|
similar to this:
|
|
|
|
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_WRITEFUNCTION, write_data);
|
|
|
|
|
|
|
|
You can control what data your function get in the forth argument by setting
|
|
|
|
another property:
|
|
|
|
|
2005-01-22 17:43:04 -05:00
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_WRITEDATA, &internal_struct);
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
Using that property, you can easily pass local data between your application
|
|
|
|
and the function that gets invoked by libcurl. libcurl itself won't touch the
|
2005-01-22 17:43:04 -05:00
|
|
|
data you pass with \fICURLOPT_WRITEDATA\fP.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
libcurl offers its own default internal callback that'll take care of the data
|
2005-01-22 17:24:17 -05:00
|
|
|
if you don't set the callback with \fICURLOPT_WRITEFUNCTION\fP. It will then
|
|
|
|
simply output the received data to stdout. You can have the default callback
|
|
|
|
write the data to a different file handle by passing a 'FILE *' to a file
|
|
|
|
opened for writing with the \fICURLOPT_WRITEDATA\fP option.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
Now, we need to take a step back and have a deep breath. Here's one of those
|
|
|
|
rare platform-dependent nitpicks. Did you spot it? On some platforms[2],
|
|
|
|
libcurl won't be able to operate on files opened by the program. Thus, if you
|
2005-01-22 17:24:17 -05:00
|
|
|
use the default callback and pass in a an open file with
|
|
|
|
\fICURLOPT_WRITEDATA\fP, it will crash. You should therefore avoid this to
|
|
|
|
make your program run fine virtually everywhere.
|
2004-07-02 10:00:49 -04:00
|
|
|
|
2005-01-22 17:24:17 -05:00
|
|
|
(\fICURLOPT_WRITEDATA\fP was formerly known as \fICURLOPT_FILE\fP. Both names
|
|
|
|
still work and do the same thing).
|
|
|
|
|
|
|
|
If you're using libcurl as a win32 DLL, you MUST use the
|
|
|
|
\fICURLOPT_WRITEFUNCTION\fP if you set \fICURLOPT_WRITEDATA\fP - or you will
|
|
|
|
experience crashes.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
There are of course many more options you can set, and we'll get back to a few
|
|
|
|
of them later. Let's instead continue to the actual transfer:
|
|
|
|
|
|
|
|
success = curl_easy_perform(easyhandle);
|
|
|
|
|
2005-01-22 17:43:04 -05:00
|
|
|
\fIcurl_easy_perform(3)\fP will connect to the remote site, do the necessary
|
|
|
|
commands and receive the transfer. Whenever it receives data, it calls the
|
|
|
|
callback function we previously set. The function may get one byte at a time,
|
|
|
|
or it may get many kilobytes at once. libcurl delivers as much as possible as
|
|
|
|
often as possible. Your callback function should return the number of bytes it
|
|
|
|
\&"took care of". If that is not the exact same amount of bytes that was
|
|
|
|
passed to it, libcurl will abort the operation and return with an error code.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
When the transfer is complete, the function returns a return code that informs
|
|
|
|
you if it succeeded in its mission or not. If a return code isn't enough for
|
|
|
|
you, you can use the CURLOPT_ERRORBUFFER to point libcurl to a buffer of yours
|
|
|
|
where it'll store a human readable error message as well.
|
|
|
|
|
|
|
|
If you then want to transfer another file, the handle is ready to be used
|
|
|
|
again. Mind you, it is even preferred that you re-use an existing handle if
|
|
|
|
you intend to make another transfer. libcurl will then attempt to re-use the
|
2005-09-21 02:07:41 -04:00
|
|
|
previous connection.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
.SH "Multi-threading Issues"
|
2005-05-09 09:13:38 -04:00
|
|
|
The first basic rule is that you must \fBnever\fP share a libcurl handle (be
|
|
|
|
it easy or multi or whatever) between multiple threads. Only use one handle in
|
|
|
|
one thread at a time.
|
|
|
|
|
|
|
|
libcurl is completely thread safe, except for two issues: signals and SSL/TLS
|
|
|
|
handlers. Signals are used timeouting name resolves (during DNS lookup) - when
|
|
|
|
built without c-ares support and not on Windows..
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
If you are accessing HTTPS or FTPS URLs in a multi-threaded manner, you are
|
2005-05-09 09:13:38 -04:00
|
|
|
then of course using OpenSSL/GnuTLS multi-threaded and those libs have their
|
|
|
|
own requirements on this issue. Basically, you need to provide one or two
|
|
|
|
functions to allow it to function properly. For all details, see this:
|
|
|
|
|
|
|
|
OpenSSL
|
|
|
|
|
|
|
|
http://www.openssl.org/docs/crypto/threads.html#DESCRIPTION
|
|
|
|
|
|
|
|
GnuTLS
|
2004-06-21 04:17:08 -04:00
|
|
|
|
2005-05-09 09:13:38 -04:00
|
|
|
http://www.gnu.org/software/gnutls/manual/html_node/Multi_002dthreaded-applications.html
|
2004-06-21 04:17:08 -04:00
|
|
|
|
2006-02-09 17:25:41 -05:00
|
|
|
When using multiple threads you should set the CURLOPT_NOSIGNAL option to TRUE
|
|
|
|
for all handles. Everything will or might work fine except that timeouts are
|
|
|
|
not honored during the DNS lookup - which you can work around by building
|
|
|
|
libcurl with c-ares support. c-ares is a library that provides asynchronous
|
|
|
|
name resolves. Unfortunately, c-ares does not yet fully support IPv6. On some
|
|
|
|
platforms, libcurl simply will not function properly multi-threaded unless
|
|
|
|
this option is set.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
Also, note that CURLOPT_DNS_USE_GLOBAL_CACHE is not thread-safe.
|
|
|
|
|
|
|
|
.SH "When It Doesn't Work"
|
|
|
|
There will always be times when the transfer fails for some reason. You might
|
|
|
|
have set the wrong libcurl option or misunderstood what the libcurl option
|
|
|
|
actually does, or the remote server might return non-standard replies that
|
|
|
|
confuse the library which then confuses your program.
|
|
|
|
|
|
|
|
There's one golden rule when these things occur: set the CURLOPT_VERBOSE
|
|
|
|
option to TRUE. It'll cause the library to spew out the entire protocol
|
|
|
|
details it sends, some internal info and some received protocol data as well
|
|
|
|
(especially when using FTP). If you're using HTTP, adding the headers in the
|
|
|
|
received output to study is also a clever way to get a better understanding
|
|
|
|
why the server behaves the way it does. Include headers in the normal body
|
|
|
|
output with CURLOPT_HEADER set TRUE.
|
|
|
|
|
|
|
|
Of course there are bugs left. We need to get to know about them to be able
|
|
|
|
to fix them, so we're quite dependent on your bug reports! When you do report
|
|
|
|
suspected bugs in libcurl, please include as much details you possibly can: a
|
|
|
|
protocol dump that CURLOPT_VERBOSE produces, library version, as much as
|
|
|
|
possible of your code that uses libcurl, operating system name and version,
|
|
|
|
compiler name and version etc.
|
|
|
|
|
|
|
|
If CURLOPT_VERBOSE is not enough, you increase the level of debug data your
|
|
|
|
application receive by using the CURLOPT_DEBUGFUNCTION.
|
|
|
|
|
|
|
|
Getting some in-depth knowledge about the protocols involved is never wrong,
|
|
|
|
and if you're trying to do funny things, you might very well understand
|
|
|
|
libcurl and how to use it better if you study the appropriate RFC documents
|
|
|
|
at least briefly.
|
|
|
|
|
|
|
|
.SH "Upload Data to a Remote Site"
|
|
|
|
libcurl tries to keep a protocol independent approach to most transfers, thus
|
|
|
|
uploading to a remote FTP site is very similar to uploading data to a HTTP
|
|
|
|
server with a PUT request.
|
|
|
|
|
|
|
|
Of course, first you either create an easy handle or you re-use one existing
|
|
|
|
one. Then you set the URL to operate on just like before. This is the remote
|
|
|
|
URL, that we now will upload.
|
|
|
|
|
|
|
|
Since we write an application, we most likely want libcurl to get the upload
|
|
|
|
data by asking us for it. To make it do that, we set the read callback and
|
|
|
|
the custom pointer libcurl will pass to our read callback. The read callback
|
|
|
|
should have a prototype similar to:
|
|
|
|
|
|
|
|
size_t function(char *bufptr, size_t size, size_t nitems, void *userp);
|
|
|
|
|
|
|
|
Where bufptr is the pointer to a buffer we fill in with data to upload and
|
|
|
|
size*nitems is the size of the buffer and therefore also the maximum amount
|
|
|
|
of data we can return to libcurl in this call. The 'userp' pointer is the
|
|
|
|
custom pointer we set to point to a struct of ours to pass private data
|
|
|
|
between the application and the callback.
|
|
|
|
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_READFUNCTION, read_function);
|
|
|
|
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_INFILE, &filedata);
|
|
|
|
|
|
|
|
Tell libcurl that we want to upload:
|
|
|
|
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_UPLOAD, TRUE);
|
|
|
|
|
|
|
|
A few protocols won't behave properly when uploads are done without any prior
|
|
|
|
knowledge of the expected file size. So, set the upload file size using the
|
|
|
|
CURLOPT_INFILESIZE_LARGE for all known file sizes like this[1]:
|
|
|
|
|
|
|
|
.nf
|
|
|
|
/* in this example, file_size must be an off_t variable */
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_INFILESIZE_LARGE, file_size);
|
|
|
|
.fi
|
|
|
|
|
2005-01-22 17:43:04 -05:00
|
|
|
When you call \fIcurl_easy_perform(3)\fP this time, it'll perform all the
|
|
|
|
necessary operations and when it has invoked the upload it'll call your
|
|
|
|
supplied callback to get the data to upload. The program should return as much
|
|
|
|
data as possible in every invoke, as that is likely to make the upload perform
|
|
|
|
as fast as possible. The callback should return the number of bytes it wrote
|
|
|
|
in the buffer. Returning 0 will signal the end of the upload.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
.SH "Passwords"
|
|
|
|
Many protocols use or even require that user name and password are provided
|
|
|
|
to be able to download or upload the data of your choice. libcurl offers
|
|
|
|
several ways to specify them.
|
|
|
|
|
|
|
|
Most protocols support that you specify the name and password in the URL
|
|
|
|
itself. libcurl will detect this and use them accordingly. This is written
|
|
|
|
like this:
|
|
|
|
|
|
|
|
protocol://user:password@example.com/path/
|
|
|
|
|
|
|
|
If you need any odd letters in your user name or password, you should enter
|
|
|
|
them URL encoded, as %XX where XX is a two-digit hexadecimal number.
|
|
|
|
|
|
|
|
libcurl also provides options to set various passwords. The user name and
|
|
|
|
password as shown embedded in the URL can instead get set with the
|
|
|
|
CURLOPT_USERPWD option. The argument passed to libcurl should be a char * to
|
|
|
|
a string in the format "user:password:". In a manner like this:
|
|
|
|
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_USERPWD, "myname:thesecret");
|
|
|
|
|
|
|
|
Another case where name and password might be needed at times, is for those
|
|
|
|
users who need to authenticate themselves to a proxy they use. libcurl offers
|
|
|
|
another option for this, the CURLOPT_PROXYUSERPWD. It is used quite similar
|
|
|
|
to the CURLOPT_USERPWD option like this:
|
|
|
|
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "myname:thesecret");
|
|
|
|
|
|
|
|
There's a long time unix "standard" way of storing ftp user names and
|
|
|
|
passwords, namely in the $HOME/.netrc file. The file should be made private
|
|
|
|
so that only the user may read it (see also the "Security Considerations"
|
|
|
|
chapter), as it might contain the password in plain text. libcurl has the
|
|
|
|
ability to use this file to figure out what set of user name and password to
|
|
|
|
use for a particular host. As an extension to the normal functionality,
|
|
|
|
libcurl also supports this file for non-FTP protocols such as HTTP. To make
|
|
|
|
curl use this file, use the CURLOPT_NETRC option:
|
|
|
|
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_NETRC, TRUE);
|
|
|
|
|
|
|
|
And a very basic example of how such a .netrc file may look like:
|
|
|
|
|
|
|
|
.nf
|
|
|
|
machine myhost.mydomain.com
|
|
|
|
login userlogin
|
|
|
|
password secretword
|
|
|
|
.fi
|
|
|
|
|
|
|
|
All these examples have been cases where the password has been optional, or
|
|
|
|
at least you could leave it out and have libcurl attempt to do its job
|
|
|
|
without it. There are times when the password isn't optional, like when
|
|
|
|
you're using an SSL private key for secure transfers.
|
|
|
|
|
|
|
|
To pass the known private key password to libcurl:
|
|
|
|
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_SSLKEYPASSWD, "keypassword");
|
|
|
|
|
|
|
|
.SH "HTTP Authentication"
|
|
|
|
The previous chapter showed how to set user name and password for getting
|
|
|
|
URLs that require authentication. When using the HTTP protocol, there are
|
|
|
|
many different ways a client can provide those credentials to the server and
|
|
|
|
you can control what way libcurl will (attempt to) use. The default HTTP
|
|
|
|
authentication method is called 'Basic', which is sending the name and
|
|
|
|
password in clear-text in the HTTP request, base64-encoded. This is insecure.
|
|
|
|
|
|
|
|
At the time of this writing libcurl can be built to use: Basic, Digest, NTLM,
|
|
|
|
Negotiate, GSS-Negotiate and SPNEGO. You can tell libcurl which one to use
|
|
|
|
with CURLOPT_HTTPAUTH as in:
|
|
|
|
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_HTTPAUTH, CURLAUTH_DIGEST);
|
|
|
|
|
|
|
|
And when you send authentication to a proxy, you can also set authentication
|
|
|
|
type the same way but instead with CURLOPT_PROXYAUTH:
|
|
|
|
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_PROXYAUTH, CURLAUTH_NTLM);
|
|
|
|
|
|
|
|
Both these options allow you to set multiple types (by ORing them together),
|
|
|
|
to make libcurl pick the most secure one out of the types the server/proxy
|
|
|
|
claims to support. This method does however add a round-trip since libcurl
|
|
|
|
must first ask the server what it supports:
|
|
|
|
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_HTTPAUTH,
|
|
|
|
CURLAUTH_DIGEST|CURLAUTH_BASIC);
|
|
|
|
|
|
|
|
For convenience, you can use the 'CURLAUTH_ANY' define (instead of a list
|
|
|
|
with specific types) which allows libcurl to use whatever method it wants.
|
|
|
|
|
|
|
|
When asking for multiple types, libcurl will pick the available one it
|
|
|
|
considers "best" in its own internal order of preference.
|
|
|
|
|
|
|
|
.SH "HTTP POSTing"
|
|
|
|
We get many questions regarding how to issue HTTP POSTs with libcurl the
|
|
|
|
proper way. This chapter will thus include examples using both different
|
|
|
|
versions of HTTP POST that libcurl supports.
|
|
|
|
|
|
|
|
The first version is the simple POST, the most common version, that most HTML
|
|
|
|
pages using the <form> tag uses. We provide a pointer to the data and tell
|
|
|
|
libcurl to post it all to the remote site:
|
|
|
|
|
|
|
|
.nf
|
|
|
|
char *data="name=daniel&project=curl";
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDS, data);
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_URL, "http://posthere.com/");
|
|
|
|
|
|
|
|
curl_easy_perform(easyhandle); /* post away! */
|
|
|
|
.fi
|
|
|
|
|
|
|
|
Simple enough, huh? Since you set the POST options with the
|
|
|
|
CURLOPT_POSTFIELDS, this automatically switches the handle to use POST in the
|
|
|
|
upcoming request.
|
|
|
|
|
|
|
|
Ok, so what if you want to post binary data that also requires you to set the
|
|
|
|
Content-Type: header of the post? Well, binary posts prevents libcurl from
|
|
|
|
being able to do strlen() on the data to figure out the size, so therefore we
|
|
|
|
must tell libcurl the size of the post data. Setting headers in libcurl
|
|
|
|
requests are done in a generic way, by building a list of our own headers and
|
|
|
|
then passing that list to libcurl.
|
|
|
|
|
|
|
|
.nf
|
|
|
|
struct curl_slist *headers=NULL;
|
|
|
|
headers = curl_slist_append(headers, "Content-Type: text/xml");
|
|
|
|
|
|
|
|
/* post binary data */
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDS, binaryptr);
|
|
|
|
|
|
|
|
/* set the size of the postfields data */
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDSIZE, 23);
|
|
|
|
|
|
|
|
/* pass our list of custom made headers */
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_HTTPHEADER, headers);
|
|
|
|
|
|
|
|
curl_easy_perform(easyhandle); /* post away! */
|
|
|
|
|
|
|
|
curl_slist_free_all(headers); /* free the header list */
|
|
|
|
.fi
|
|
|
|
|
|
|
|
While the simple examples above cover the majority of all cases where HTTP
|
|
|
|
POST operations are required, they don't do multi-part formposts. Multi-part
|
2005-01-22 17:43:04 -05:00
|
|
|
formposts were introduced as a better way to post (possibly large) binary data
|
|
|
|
and was first documented in the RFC1867. They're called multi-part because
|
|
|
|
they're built by a chain of parts, each being a single unit. Each part has its
|
|
|
|
own name and contents. You can in fact create and post a multi-part formpost
|
|
|
|
with the regular libcurl POST support described above, but that would require
|
|
|
|
that you build a formpost yourself and provide to libcurl. To make that
|
|
|
|
easier, libcurl provides \fIcurl_formadd(3)\fP. Using this function, you add
|
|
|
|
parts to the form. When you're done adding parts, you post the whole form.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
The following example sets two simple text parts with plain textual contents,
|
|
|
|
and then a file with binary contents and upload the whole thing.
|
|
|
|
|
|
|
|
.nf
|
|
|
|
struct curl_httppost *post=NULL;
|
|
|
|
struct curl_httppost *last=NULL;
|
|
|
|
curl_formadd(&post, &last,
|
|
|
|
CURLFORM_COPYNAME, "name",
|
|
|
|
CURLFORM_COPYCONTENTS, "daniel", CURLFORM_END);
|
|
|
|
curl_formadd(&post, &last,
|
|
|
|
CURLFORM_COPYNAME, "project",
|
|
|
|
CURLFORM_COPYCONTENTS, "curl", CURLFORM_END);
|
|
|
|
curl_formadd(&post, &last,
|
|
|
|
CURLFORM_COPYNAME, "logotype-image",
|
|
|
|
CURLFORM_FILECONTENT, "curl.png", CURLFORM_END);
|
|
|
|
|
|
|
|
/* Set the form info */
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_HTTPPOST, post);
|
|
|
|
|
|
|
|
curl_easy_perform(easyhandle); /* post away! */
|
|
|
|
|
|
|
|
/* free the post data again */
|
|
|
|
curl_formfree(post);
|
|
|
|
.fi
|
|
|
|
|
|
|
|
Multipart formposts are chains of parts using MIME-style separators and
|
|
|
|
headers. It means that each one of these separate parts get a few headers set
|
|
|
|
that describe the individual content-type, size etc. To enable your
|
|
|
|
application to handicraft this formpost even more, libcurl allows you to
|
|
|
|
supply your own set of custom headers to such an individual form part. You can
|
|
|
|
of course supply headers to as many parts you like, but this little example
|
|
|
|
will show how you set headers to one specific part when you add that to the
|
|
|
|
post handle:
|
|
|
|
|
|
|
|
.nf
|
|
|
|
struct curl_slist *headers=NULL;
|
|
|
|
headers = curl_slist_append(headers, "Content-Type: text/xml");
|
|
|
|
|
|
|
|
curl_formadd(&post, &last,
|
|
|
|
CURLFORM_COPYNAME, "logotype-image",
|
|
|
|
CURLFORM_FILECONTENT, "curl.xml",
|
|
|
|
CURLFORM_CONTENTHEADER, headers,
|
|
|
|
CURLFORM_END);
|
|
|
|
|
|
|
|
curl_easy_perform(easyhandle); /* post away! */
|
|
|
|
|
|
|
|
curl_formfree(post); /* free post */
|
|
|
|
curl_slist_free_all(post); /* free custom header list */
|
|
|
|
.fi
|
|
|
|
|
|
|
|
Since all options on an easyhandle are "sticky", they remain the same until
|
2005-01-22 17:43:04 -05:00
|
|
|
changed even if you do call \fIcurl_easy_perform(3)\fP, you may need to tell
|
|
|
|
curl to go back to a plain GET request if you intend to do such a one as your
|
|
|
|
next request. You force an easyhandle to back to GET by using the
|
|
|
|
CURLOPT_HTTPGET option:
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_HTTPGET, TRUE);
|
|
|
|
|
|
|
|
Just setting CURLOPT_POSTFIELDS to "" or NULL will *not* stop libcurl from
|
|
|
|
doing a POST. It will just make it POST without any data to send!
|
|
|
|
|
|
|
|
.SH "Showing Progress"
|
|
|
|
|
|
|
|
For historical and traditional reasons, libcurl has a built-in progress meter
|
|
|
|
that can be switched on and then makes it presents a progress meter in your
|
|
|
|
terminal.
|
|
|
|
|
|
|
|
Switch on the progress meter by, oddly enough, set CURLOPT_NOPROGRESS to
|
|
|
|
FALSE. This option is set to TRUE by default.
|
|
|
|
|
|
|
|
For most applications however, the built-in progress meter is useless and
|
|
|
|
what instead is interesting is the ability to specify a progress
|
|
|
|
callback. The function pointer you pass to libcurl will then be called on
|
|
|
|
irregular intervals with information about the current transfer.
|
|
|
|
|
|
|
|
Set the progress callback by using CURLOPT_PROGRESSFUNCTION. And pass a
|
|
|
|
pointer to a function that matches this prototype:
|
|
|
|
|
|
|
|
.nf
|
|
|
|
int progress_callback(void *clientp,
|
|
|
|
double dltotal,
|
|
|
|
double dlnow,
|
|
|
|
double ultotal,
|
|
|
|
double ulnow);
|
|
|
|
.fi
|
|
|
|
|
|
|
|
If any of the input arguments is unknown, a 0 will be passed. The first
|
|
|
|
argument, the 'clientp' is the pointer you pass to libcurl with
|
|
|
|
CURLOPT_PROGRESSDATA. libcurl won't touch it.
|
|
|
|
|
|
|
|
.SH "libcurl with C++"
|
|
|
|
|
|
|
|
There's basically only one thing to keep in mind when using C++ instead of C
|
|
|
|
when interfacing libcurl:
|
|
|
|
|
2004-07-14 11:32:09 -04:00
|
|
|
The callbacks CANNOT be non-static class member functions
|
2004-06-21 04:17:08 -04:00
|
|
|
|
2004-07-14 11:32:09 -04:00
|
|
|
Example C++ code:
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
.nf
|
2004-07-14 11:32:09 -04:00
|
|
|
class AClass {
|
|
|
|
static size_t write_data(void *ptr, size_t size, size_t nmemb,
|
|
|
|
void *ourpointer)
|
2004-06-21 04:17:08 -04:00
|
|
|
{
|
|
|
|
/* do what you want with the data */
|
|
|
|
}
|
|
|
|
}
|
|
|
|
.fi
|
|
|
|
|
|
|
|
.SH "Proxies"
|
|
|
|
|
|
|
|
What "proxy" means according to Merriam-Webster: "a person authorized to act
|
|
|
|
for another" but also "the agency, function, or office of a deputy who acts as
|
|
|
|
a substitute for another".
|
|
|
|
|
|
|
|
Proxies are exceedingly common these days. Companies often only offer
|
|
|
|
Internet access to employees through their HTTP proxies. Network clients or
|
|
|
|
user-agents ask the proxy for documents, the proxy does the actual request
|
|
|
|
and then it returns them.
|
|
|
|
|
|
|
|
libcurl has full support for HTTP proxies, so when a given URL is wanted,
|
|
|
|
libcurl will ask the proxy for it instead of trying to connect to the actual
|
|
|
|
host identified in the URL.
|
|
|
|
|
|
|
|
The fact that the proxy is a HTTP proxy puts certain restrictions on what can
|
|
|
|
actually happen. A requested URL that might not be a HTTP URL will be still
|
|
|
|
be passed to the HTTP proxy to deliver back to libcurl. This happens
|
|
|
|
transparently, and an application may not need to know. I say "may", because
|
|
|
|
at times it is very important to understand that all operations over a HTTP
|
|
|
|
proxy is using the HTTP protocol. For example, you can't invoke your own
|
|
|
|
custom FTP commands or even proper FTP directory listings.
|
|
|
|
|
|
|
|
.IP "Proxy Options"
|
|
|
|
|
|
|
|
To tell libcurl to use a proxy at a given port number:
|
|
|
|
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_PROXY, "proxy-host.com:8080");
|
|
|
|
|
|
|
|
Some proxies require user authentication before allowing a request, and you
|
|
|
|
pass that information similar to this:
|
|
|
|
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "user:password");
|
|
|
|
|
|
|
|
If you want to, you can specify the host name only in the CURLOPT_PROXY
|
|
|
|
option, and set the port number separately with CURLOPT_PROXYPORT.
|
|
|
|
|
|
|
|
.IP "Environment Variables"
|
|
|
|
|
|
|
|
libcurl automatically checks and uses a set of environment variables to
|
|
|
|
know what proxies to use for certain protocols. The names of the variables
|
|
|
|
are following an ancient de facto standard and are built up as
|
|
|
|
"[protocol]_proxy" (note the lower casing). Which makes the variable
|
|
|
|
'http_proxy' checked for a name of a proxy to use when the input URL is
|
|
|
|
HTTP. Following the same rule, the variable named 'ftp_proxy' is checked
|
|
|
|
for FTP URLs. Again, the proxies are always HTTP proxies, the different
|
|
|
|
names of the variables simply allows different HTTP proxies to be used.
|
|
|
|
|
|
|
|
The proxy environment variable contents should be in the format
|
|
|
|
\&"[protocol://][user:password@]machine[:port]". Where the protocol:// part is
|
|
|
|
simply ignored if present (so http://proxy and bluerk://proxy will do the
|
|
|
|
same) and the optional port number specifies on which port the proxy operates
|
|
|
|
on the host. If not specified, the internal default port number will be used
|
|
|
|
and that is most likely *not* the one you would like it to be.
|
|
|
|
|
|
|
|
There are two special environment variables. 'all_proxy' is what sets proxy
|
|
|
|
for any URL in case the protocol specific variable wasn't set, and
|
|
|
|
\&'no_proxy' defines a list of hosts that should not use a proxy even though a
|
|
|
|
variable may say so. If 'no_proxy' is a plain asterisk ("*") it matches all
|
|
|
|
hosts.
|
|
|
|
|
|
|
|
.IP "SSL and Proxies"
|
|
|
|
|
|
|
|
SSL is for secure point-to-point connections. This involves strong encryption
|
|
|
|
and similar things, which effectively makes it impossible for a proxy to
|
|
|
|
operate as a "man in between" which the proxy's task is, as previously
|
|
|
|
discussed. Instead, the only way to have SSL work over a HTTP proxy is to ask
|
|
|
|
the proxy to tunnel trough everything without being able to check or fiddle
|
|
|
|
with the traffic.
|
|
|
|
|
|
|
|
Opening an SSL connection over a HTTP proxy is therefor a matter of asking the
|
|
|
|
proxy for a straight connection to the target host on a specified port. This
|
|
|
|
is made with the HTTP request CONNECT. ("please mr proxy, connect me to that
|
|
|
|
remote host").
|
|
|
|
|
|
|
|
Because of the nature of this operation, where the proxy has no idea what kind
|
|
|
|
of data that is passed in and out through this tunnel, this breaks some of the
|
|
|
|
very few advantages that come from using a proxy, such as caching. Many
|
|
|
|
organizations prevent this kind of tunneling to other destination port numbers
|
|
|
|
than 443 (which is the default HTTPS port number).
|
|
|
|
|
|
|
|
.IP "Tunneling Through Proxy"
|
|
|
|
As explained above, tunneling is required for SSL to work and often even
|
|
|
|
restricted to the operation intended for SSL; HTTPS.
|
|
|
|
|
|
|
|
This is however not the only time proxy-tunneling might offer benefits to
|
|
|
|
you or your application.
|
|
|
|
|
|
|
|
As tunneling opens a direct connection from your application to the remote
|
|
|
|
machine, it suddenly also re-introduces the ability to do non-HTTP
|
|
|
|
operations over a HTTP proxy. You can in fact use things such as FTP
|
|
|
|
upload or FTP custom commands this way.
|
|
|
|
|
|
|
|
Again, this is often prevented by the administrators of proxies and is
|
|
|
|
rarely allowed.
|
|
|
|
|
|
|
|
Tell libcurl to use proxy tunneling like this:
|
|
|
|
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_HTTPPROXYTUNNEL, TRUE);
|
|
|
|
|
|
|
|
In fact, there might even be times when you want to do plain HTTP
|
|
|
|
operations using a tunnel like this, as it then enables you to operate on
|
|
|
|
the remote server instead of asking the proxy to do so. libcurl will not
|
|
|
|
stand in the way for such innovative actions either!
|
|
|
|
|
|
|
|
.IP "Proxy Auto-Config"
|
|
|
|
|
|
|
|
Netscape first came up with this. It is basically a web page (usually using a
|
|
|
|
\&.pac extension) with a javascript that when executed by the browser with the
|
|
|
|
requested URL as input, returns information to the browser on how to connect
|
|
|
|
to the URL. The returned information might be "DIRECT" (which means no proxy
|
|
|
|
should be used), "PROXY host:port" (to tell the browser where the proxy for
|
|
|
|
this particular URL is) or "SOCKS host:port" (to direct the browser to a SOCKS
|
|
|
|
proxy).
|
|
|
|
|
|
|
|
libcurl has no means to interpret or evaluate javascript and thus it doesn't
|
|
|
|
support this. If you get yourself in a position where you face this nasty
|
|
|
|
invention, the following advice have been mentioned and used in the past:
|
|
|
|
|
|
|
|
- Depending on the javascript complexity, write up a script that translates it
|
|
|
|
to another language and execute that.
|
|
|
|
|
|
|
|
- Read the javascript code and rewrite the same logic in another language.
|
|
|
|
|
|
|
|
- Implement a javascript interpreted, people have successfully used the
|
|
|
|
Mozilla javascript engine in the past.
|
|
|
|
|
|
|
|
- Ask your admins to stop this, for a static proxy setup or similar.
|
|
|
|
|
|
|
|
.SH "Persistence Is The Way to Happiness"
|
|
|
|
|
|
|
|
Re-cycling the same easy handle several times when doing multiple requests is
|
|
|
|
the way to go.
|
|
|
|
|
2005-01-22 17:43:04 -05:00
|
|
|
After each single \fIcurl_easy_perform(3)\fP operation, libcurl will keep the
|
2004-06-21 04:17:08 -04:00
|
|
|
connection alive and open. A subsequent request using the same easy handle to
|
|
|
|
the same host might just be able to use the already open connection! This
|
|
|
|
reduces network impact a lot.
|
|
|
|
|
|
|
|
Even if the connection is dropped, all connections involving SSL to the same
|
|
|
|
host again, will benefit from libcurl's session ID cache that drastically
|
|
|
|
reduces re-connection time.
|
|
|
|
|
|
|
|
FTP connections that are kept alive saves a lot of time, as the command-
|
|
|
|
response round-trips are skipped, and also you don't risk getting blocked
|
|
|
|
without permission to login again like on many FTP servers only allowing N
|
|
|
|
persons to be logged in at the same time.
|
|
|
|
|
|
|
|
libcurl caches DNS name resolving results, to make lookups of a previously
|
|
|
|
looked up name a lot faster.
|
|
|
|
|
|
|
|
Other interesting details that improve performance for subsequent requests
|
|
|
|
may also be added in the future.
|
|
|
|
|
|
|
|
Each easy handle will attempt to keep the last few connections alive for a
|
|
|
|
while in case they are to be used again. You can set the size of this "cache"
|
|
|
|
with the CURLOPT_MAXCONNECTS option. Default is 5. It is very seldom any
|
|
|
|
point in changing this value, and if you think of changing this it is often
|
|
|
|
just a matter of thinking again.
|
|
|
|
|
|
|
|
When the connection cache gets filled, libcurl must close an existing
|
|
|
|
connection in order to get room for the new one. To know which connection to
|
|
|
|
close, libcurl uses a "close policy" that you can affect with the
|
|
|
|
CURLOPT_CLOSEPOLICY option. There's only two polices implemented as of this
|
|
|
|
writing (libcurl 7.9.4) and they are:
|
|
|
|
|
|
|
|
.RS
|
|
|
|
.IP CURLCLOSEPOLICY_LEAST_RECENTLY_USED
|
|
|
|
simply close the one that hasn't been used for the longest time. This is the
|
|
|
|
default behavior.
|
|
|
|
.IP CURLCLOSEPOLICY_OLDEST
|
|
|
|
closes the oldest connection, the one that was created the longest time ago.
|
|
|
|
.RE
|
|
|
|
|
|
|
|
There are, or at least were, plans to support a close policy that would call
|
|
|
|
a user-specified callback to let the user be able to decide which connection
|
|
|
|
to dump when this is necessary and therefor is the CURLOPT_CLOSEFUNCTION an
|
|
|
|
existing option still today. Nothing ever uses this though and this will not
|
|
|
|
be used within the foreseeable future either.
|
|
|
|
|
|
|
|
To force your upcoming request to not use an already existing connection (it
|
|
|
|
will even close one first if there happens to be one alive to the same host
|
|
|
|
you're about to operate on), you can do that by setting CURLOPT_FRESH_CONNECT
|
|
|
|
to TRUE. In a similar spirit, you can also forbid the upcoming request to be
|
|
|
|
"lying" around and possibly get re-used after the request by setting
|
|
|
|
CURLOPT_FORBID_REUSE to TRUE.
|
|
|
|
|
|
|
|
.SH "HTTP Headers Used by libcurl"
|
|
|
|
When you use libcurl to do HTTP requests, it'll pass along a series of headers
|
|
|
|
automatically. It might be good for you to know and understand these ones.
|
|
|
|
|
|
|
|
.IP "Host"
|
|
|
|
This header is required by HTTP 1.1 and even many 1.0 servers and should be
|
|
|
|
the name of the server we want to talk to. This includes the port number if
|
|
|
|
anything but default.
|
|
|
|
|
|
|
|
.IP "Pragma"
|
|
|
|
\&"no-cache". Tells a possible proxy to not grab a copy from the cache but to
|
|
|
|
fetch a fresh one.
|
|
|
|
|
|
|
|
.IP "Accept"
|
|
|
|
\&"*/*".
|
|
|
|
|
|
|
|
.IP "Expect:"
|
|
|
|
When doing multi-part formposts, libcurl will set this header to
|
|
|
|
\&"100-continue" to ask the server for an "OK" message before it proceeds with
|
|
|
|
sending the data part of the post.
|
|
|
|
|
|
|
|
.SH "Customizing Operations"
|
|
|
|
There is an ongoing development today where more and more protocols are built
|
|
|
|
upon HTTP for transport. This has obvious benefits as HTTP is a tested and
|
|
|
|
reliable protocol that is widely deployed and have excellent proxy-support.
|
|
|
|
|
|
|
|
When you use one of these protocols, and even when doing other kinds of
|
|
|
|
programming you may need to change the traditional HTTP (or FTP or...)
|
|
|
|
manners. You may need to change words, headers or various data.
|
|
|
|
|
|
|
|
libcurl is your friend here too.
|
|
|
|
|
|
|
|
.IP CUSTOMREQUEST
|
|
|
|
If just changing the actual HTTP request keyword is what you want, like when
|
|
|
|
GET, HEAD or POST is not good enough for you, CURLOPT_CUSTOMREQUEST is there
|
|
|
|
for you. It is very simple to use:
|
|
|
|
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_CUSTOMREQUEST, "MYOWNRUQUEST");
|
|
|
|
|
|
|
|
When using the custom request, you change the request keyword of the actual
|
|
|
|
request you are performing. Thus, by default you make GET request but you can
|
|
|
|
also make a POST operation (as described before) and then replace the POST
|
|
|
|
keyword if you want to. You're the boss.
|
|
|
|
|
|
|
|
.IP "Modify Headers"
|
|
|
|
HTTP-like protocols pass a series of headers to the server when doing the
|
|
|
|
request, and you're free to pass any amount of extra headers that you
|
|
|
|
think fit. Adding headers are this easy:
|
|
|
|
|
|
|
|
.nf
|
|
|
|
struct curl_slist *headers=NULL; /* init to NULL is important */
|
|
|
|
|
|
|
|
headers = curl_slist_append(headers, "Hey-server-hey: how are you?");
|
|
|
|
headers = curl_slist_append(headers, "X-silly-content: yes");
|
|
|
|
|
|
|
|
/* pass our list of custom made headers */
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_HTTPHEADER, headers);
|
|
|
|
|
|
|
|
curl_easy_perform(easyhandle); /* transfer http */
|
|
|
|
|
|
|
|
curl_slist_free_all(headers); /* free the header list */
|
|
|
|
.fi
|
|
|
|
|
|
|
|
\&... and if you think some of the internally generated headers, such as
|
|
|
|
Accept: or Host: don't contain the data you want them to contain, you can
|
|
|
|
replace them by simply setting them too:
|
|
|
|
|
|
|
|
.nf
|
|
|
|
headers = curl_slist_append(headers, "Accept: Agent-007");
|
|
|
|
headers = curl_slist_append(headers, "Host: munged.host.line");
|
|
|
|
.fi
|
|
|
|
|
|
|
|
.IP "Delete Headers"
|
|
|
|
If you replace an existing header with one with no contents, you will prevent
|
|
|
|
the header from being sent. Like if you want to completely prevent the
|
|
|
|
\&"Accept:" header to be sent, you can disable it with code similar to this:
|
|
|
|
|
|
|
|
headers = curl_slist_append(headers, "Accept:");
|
|
|
|
|
|
|
|
Both replacing and canceling internal headers should be done with careful
|
|
|
|
consideration and you should be aware that you may violate the HTTP protocol
|
|
|
|
when doing so.
|
|
|
|
|
|
|
|
.IP "Enforcing chunked transfer-encoding"
|
|
|
|
|
|
|
|
By making sure a request uses the custom header "Transfer-Encoding: chunked"
|
|
|
|
when doing a non-GET HTTP operation, libcurl will switch over to "chunked"
|
|
|
|
upload, even though the size of the data to upload might be known. By default,
|
|
|
|
libcurl usually switches over to chunked upload automatically if the upload
|
|
|
|
data size is unknown.
|
|
|
|
|
|
|
|
.IP "HTTP Version"
|
|
|
|
|
|
|
|
There's only one aspect left in the HTTP requests that we haven't yet
|
|
|
|
mentioned how to modify: the version field. All HTTP requests includes the
|
2006-02-07 09:03:17 -05:00
|
|
|
version number to tell the server which version we support. libcurl speak HTTP
|
|
|
|
1.1 by default. Some very old servers don't like getting 1.1-requests and when
|
|
|
|
dealing with stubborn old things like that, you can tell libcurl to use 1.0
|
|
|
|
instead by doing something like this:
|
2004-06-21 04:17:08 -04:00
|
|
|
|
2006-02-07 09:03:17 -05:00
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0);
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
.IP "FTP Custom Commands"
|
|
|
|
|
|
|
|
Not all protocols are HTTP-like, and thus the above may not help you when
|
|
|
|
you want to make for example your FTP transfers to behave differently.
|
|
|
|
|
|
|
|
Sending custom commands to a FTP server means that you need to send the
|
|
|
|
commands exactly as the FTP server expects them (RFC959 is a good guide
|
|
|
|
here), and you can only use commands that work on the control-connection
|
|
|
|
alone. All kinds of commands that requires data interchange and thus needs
|
|
|
|
a data-connection must be left to libcurl's own judgment. Also be aware
|
|
|
|
that libcurl will do its very best to change directory to the target
|
|
|
|
directory before doing any transfer, so if you change directory (with CWD
|
|
|
|
or similar) you might confuse libcurl and then it might not attempt to
|
|
|
|
transfer the file in the correct remote directory.
|
|
|
|
|
|
|
|
A little example that deletes a given file before an operation:
|
|
|
|
|
|
|
|
.nf
|
|
|
|
headers = curl_slist_append(headers, "DELE file-to-remove");
|
|
|
|
|
|
|
|
/* pass the list of custom commands to the handle */
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_QUOTE, headers);
|
|
|
|
|
|
|
|
curl_easy_perform(easyhandle); /* transfer ftp data! */
|
|
|
|
|
|
|
|
curl_slist_free_all(headers); /* free the header list */
|
|
|
|
.fi
|
|
|
|
|
|
|
|
If you would instead want this operation (or chain of operations) to happen
|
2005-01-22 17:43:04 -05:00
|
|
|
_after_ the data transfer took place the option to \fIcurl_easy_setopt(3)\fP
|
|
|
|
would instead be called CURLOPT_POSTQUOTE and used the exact same way.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
The custom FTP command will be issued to the server in the same order they are
|
|
|
|
added to the list, and if a command gets an error code returned back from the
|
|
|
|
server, no more commands will be issued and libcurl will bail out with an
|
|
|
|
error code (CURLE_FTP_QUOTE_ERROR). Note that if you use CURLOPT_QUOTE to send
|
|
|
|
commands before a transfer, no transfer will actually take place when a quote
|
|
|
|
command has failed.
|
|
|
|
|
|
|
|
If you set the CURLOPT_HEADER to true, you will tell libcurl to get
|
|
|
|
information about the target file and output "headers" about it. The headers
|
|
|
|
will be in "HTTP-style", looking like they do in HTTP.
|
|
|
|
|
|
|
|
The option to enable headers or to run custom FTP commands may be useful to
|
|
|
|
combine with CURLOPT_NOBODY. If this option is set, no actual file content
|
|
|
|
transfer will be performed.
|
|
|
|
|
|
|
|
.IP "FTP Custom CUSTOMREQUEST"
|
|
|
|
If you do what list the contents of a FTP directory using your own defined FTP
|
|
|
|
command, CURLOPT_CUSTOMREQUEST will do just that. "NLST" is the default one
|
|
|
|
for listing directories but you're free to pass in your idea of a good
|
|
|
|
alternative.
|
|
|
|
|
|
|
|
.SH "Cookies Without Chocolate Chips"
|
|
|
|
In the HTTP sense, a cookie is a name with an associated value. A server sends
|
|
|
|
the name and value to the client, and expects it to get sent back on every
|
|
|
|
subsequent request to the server that matches the particular conditions
|
|
|
|
set. The conditions include that the domain name and path match and that the
|
|
|
|
cookie hasn't become too old.
|
|
|
|
|
|
|
|
In real-world cases, servers send new cookies to replace existing one to
|
|
|
|
update them. Server use cookies to "track" users and to keep "sessions".
|
|
|
|
|
|
|
|
Cookies are sent from server to clients with the header Set-Cookie: and
|
|
|
|
they're sent from clients to servers with the Cookie: header.
|
|
|
|
|
|
|
|
To just send whatever cookie you want to a server, you can use CURLOPT_COOKIE
|
|
|
|
to set a cookie string like this:
|
|
|
|
|
|
|
|
curl_easy_setopt(easyhandle, CURLOPT_COOKIE, "name1=var1; name2=var2;");
|
|
|
|
|
|
|
|
In many cases, that is not enough. You might want to dynamically save
|
|
|
|
whatever cookies the remote server passes to you, and make sure those cookies
|
|
|
|
are then use accordingly on later requests.
|
|
|
|
|
|
|
|
One way to do this, is to save all headers you receive in a plain file and
|
|
|
|
when you make a request, you tell libcurl to read the previous headers to
|
|
|
|
figure out which cookies to use. Set header file to read cookies from with
|
|
|
|
CURLOPT_COOKIEFILE.
|
|
|
|
|
|
|
|
The CURLOPT_COOKIEFILE option also automatically enables the cookie parser in
|
|
|
|
libcurl. Until the cookie parser is enabled, libcurl will not parse or
|
|
|
|
understand incoming cookies and they will just be ignored. However, when the
|
|
|
|
parser is enabled the cookies will be understood and the cookies will be kept
|
|
|
|
in memory and used properly in subsequent requests when the same handle is
|
|
|
|
used. Many times this is enough, and you may not have to save the cookies to
|
|
|
|
disk at all. Note that the file you specify to CURLOPT_COOKIEFILE doesn't
|
|
|
|
have to exist to enable the parser, so a common way to just enable the parser
|
|
|
|
and not read able might be to use a file name you know doesn't exist.
|
|
|
|
|
|
|
|
If you rather use existing cookies that you've previously received with your
|
|
|
|
Netscape or Mozilla browsers, you can make libcurl use that cookie file as
|
|
|
|
input. The CURLOPT_COOKIEFILE is used for that too, as libcurl will
|
|
|
|
automatically find out what kind of file it is and act accordingly.
|
|
|
|
|
|
|
|
The perhaps most advanced cookie operation libcurl offers, is saving the
|
|
|
|
entire internal cookie state back into a Netscape/Mozilla formatted cookie
|
|
|
|
file. We call that the cookie-jar. When you set a file name with
|
|
|
|
CURLOPT_COOKIEJAR, that file name will be created and all received cookies
|
2005-01-22 17:43:04 -05:00
|
|
|
will be stored in it when \fIcurl_easy_cleanup(3)\fP is called. This enabled
|
|
|
|
cookies to get passed on properly between multiple handles without any
|
|
|
|
information getting lost.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
.SH "FTP Peculiarities We Need"
|
|
|
|
|
|
|
|
FTP transfers use a second TCP/IP connection for the data transfer. This is
|
|
|
|
usually a fact you can forget and ignore but at times this fact will come
|
|
|
|
back to haunt you. libcurl offers several different ways to custom how the
|
|
|
|
second connection is being made.
|
|
|
|
|
|
|
|
libcurl can either connect to the server a second time or tell the server to
|
|
|
|
connect back to it. The first option is the default and it is also what works
|
|
|
|
best for all the people behind firewalls, NATs or IP-masquerading setups.
|
|
|
|
libcurl then tells the server to open up a new port and wait for a second
|
|
|
|
connection. This is by default attempted with EPSV first, and if that doesn't
|
|
|
|
work it tries PASV instead. (EPSV is an extension to the original FTP spec
|
|
|
|
and does not exist nor work on all FTP servers.)
|
|
|
|
|
|
|
|
You can prevent libcurl from first trying the EPSV command by setting
|
|
|
|
CURLOPT_FTP_USE_EPSV to FALSE.
|
|
|
|
|
|
|
|
In some cases, you will prefer to have the server connect back to you for the
|
|
|
|
second connection. This might be when the server is perhaps behind a firewall
|
|
|
|
or something and only allows connections on a single port. libcurl then
|
|
|
|
informs the remote server which IP address and port number to connect to.
|
|
|
|
This is made with the CURLOPT_FTPPORT option. If you set it to "-", libcurl
|
|
|
|
will use your system's "default IP address". If you want to use a particular
|
|
|
|
IP, you can set the full IP address, a host name to resolve to an IP address
|
|
|
|
or even a local network interface name that libcurl will get the IP address
|
|
|
|
from.
|
|
|
|
|
|
|
|
When doing the "PORT" approach, libcurl will attempt to use the EPRT and the
|
|
|
|
LPRT before trying PORT, as they work with more protocols. You can disable
|
|
|
|
this behavior by setting CURLOPT_FTP_USE_EPRT to FALSE.
|
|
|
|
|
|
|
|
.SH "Headers Equal Fun"
|
|
|
|
|
|
|
|
Some protocols provide "headers", meta-data separated from the normal
|
|
|
|
data. These headers are by default not included in the normal data stream,
|
|
|
|
but you can make them appear in the data stream by setting CURLOPT_HEADER to
|
|
|
|
TRUE.
|
|
|
|
|
|
|
|
What might be even more useful, is libcurl's ability to separate the headers
|
|
|
|
from the data and thus make the callbacks differ. You can for example set a
|
|
|
|
different pointer to pass to the ordinary write callback by setting
|
|
|
|
CURLOPT_WRITEHEADER.
|
|
|
|
|
|
|
|
Or, you can set an entirely separate function to receive the headers, by
|
|
|
|
using CURLOPT_HEADERFUNCTION.
|
|
|
|
|
|
|
|
The headers are passed to the callback function one by one, and you can
|
|
|
|
depend on that fact. It makes it easier for you to add custom header parsers
|
|
|
|
etc.
|
|
|
|
|
|
|
|
"Headers" for FTP transfers equal all the FTP server responses. They aren't
|
|
|
|
actually true headers, but in this case we pretend they are! ;-)
|
|
|
|
|
|
|
|
.SH "Post Transfer Information"
|
|
|
|
|
|
|
|
[ curl_easy_getinfo ]
|
|
|
|
|
|
|
|
.SH "Security Considerations"
|
|
|
|
|
|
|
|
libcurl is in itself not insecure. If used the right way, you can use libcurl
|
|
|
|
to transfer data pretty safely.
|
|
|
|
|
|
|
|
There are of course many things to consider that may loosen up this
|
|
|
|
situation:
|
|
|
|
|
|
|
|
.IP "Command Lines"
|
|
|
|
If you use a command line tool (such as curl) that uses libcurl, and you give
|
|
|
|
option to the tool on the command line those options can very likely get read
|
|
|
|
by other users of your system when they use 'ps' or other tools to list
|
|
|
|
currently running processes.
|
|
|
|
|
|
|
|
To avoid this problem, never feed sensitive things to programs using command
|
|
|
|
line options.
|
|
|
|
|
|
|
|
.IP ".netrc"
|
|
|
|
\&.netrc is a pretty handy file/feature that allows you to login quickly and
|
|
|
|
automatically to frequently visited sites. The file contains passwords in
|
|
|
|
clear text and is a real security risk. In some cases, your .netrc is also
|
|
|
|
stored in a home directory that is NFS mounted or used on another network
|
|
|
|
based file system, so the clear text password will fly through your network
|
|
|
|
every time anyone reads that file!
|
|
|
|
|
|
|
|
To avoid this problem, don't use .netrc files and never store passwords in
|
|
|
|
plain text anywhere.
|
|
|
|
|
|
|
|
.IP "Clear Text Passwords"
|
|
|
|
Many of the protocols libcurl supports send name and password unencrypted as
|
|
|
|
clear text (HTTP Basic authentication, FTP, TELNET etc). It is very easy for
|
|
|
|
anyone on your network or a network nearby yours, to just fire up a network
|
|
|
|
analyzer tool and eavesdrop on your passwords. Don't let the fact that HTTP
|
|
|
|
uses base64 encoded passwords fool you. They may not look readable at a first
|
|
|
|
glance, but they very easily "deciphered" by anyone within seconds.
|
|
|
|
|
|
|
|
To avoid this problem, use protocols that don't let snoopers see your
|
|
|
|
password: HTTPS, FTPS and FTP-kerberos are a few examples. HTTP Digest
|
|
|
|
authentication allows this too, but isn't supported by libcurl as of this
|
|
|
|
writing.
|
|
|
|
|
|
|
|
.IP "Showing What You Do"
|
|
|
|
On a related issue, be aware that even in situations like when you have
|
|
|
|
problems with libcurl and ask someone for help, everything you reveal in order
|
|
|
|
to get best possible help might also impose certain security related
|
|
|
|
risks. Host names, user names, paths, operating system specifics etc (not to
|
|
|
|
mention passwords of course) may in fact be used by intruders to gain
|
|
|
|
additional information of a potential target.
|
|
|
|
|
|
|
|
To avoid this problem, you must of course use your common sense. Often, you
|
|
|
|
can just edit out the sensitive data or just search/replace your true
|
|
|
|
information with faked data.
|
|
|
|
|
|
|
|
.SH "Multiple Transfers Using the multi Interface"
|
|
|
|
|
|
|
|
The easy interface as described in detail in this document is a synchronous
|
|
|
|
interface that transfers one file at a time and doesn't return until its
|
|
|
|
done.
|
|
|
|
|
|
|
|
The multi interface on the other hand, allows your program to transfer
|
|
|
|
multiple files in both directions at the same time, without forcing you to
|
|
|
|
use multiple threads.
|
|
|
|
|
|
|
|
To use this interface, you are better off if you first understand the basics
|
|
|
|
of how to use the easy interface. The multi interface is simply a way to make
|
|
|
|
multiple transfers at the same time, by adding up multiple easy handles in to
|
|
|
|
a "multi stack".
|
|
|
|
|
2005-01-22 17:43:04 -05:00
|
|
|
You create the easy handles you want and you set all the options just like you
|
|
|
|
have been told above, and then you create a multi handle with
|
|
|
|
\fIcurl_multi_init(3)\fP and add all those easy handles to that multi handle
|
|
|
|
with \fIcurl_multi_add_handle(3)\fP.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
When you've added the handles you have for the moment (you can still add new
|
2005-01-22 17:43:04 -05:00
|
|
|
ones at any time), you start the transfers by call
|
|
|
|
\fIcurl_multi_perform(3)\fP.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
2005-01-22 17:43:04 -05:00
|
|
|
\fIcurl_multi_perform(3)\fP is asynchronous. It will only execute as little as
|
|
|
|
possible and then return back control to your program. It is designed to never
|
|
|
|
block. If it returns CURLM_CALL_MULTI_PERFORM you better call it again soon,
|
|
|
|
as that is a signal that it still has local data to send or remote data to
|
|
|
|
receive.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
The best usage of this interface is when you do a select() on all possible
|
|
|
|
file descriptors or sockets to know when to call libcurl again. This also
|
2005-01-22 17:43:04 -05:00
|
|
|
makes it easy for you to wait and respond to actions on your own application's
|
|
|
|
sockets/handles. You figure out what to select() for by using
|
|
|
|
\fIcurl_multi_fdset(3)\fP, that fills in a set of fd_set variables for you
|
|
|
|
with the particular file descriptors libcurl uses for the moment.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
When you then call select(), it'll return when one of the file handles signal
|
2005-01-22 17:43:04 -05:00
|
|
|
action and you then call \fIcurl_multi_perform(3)\fP to allow libcurl to do
|
|
|
|
what it wants to do. Take note that libcurl does also feature some time-out
|
|
|
|
code so we advice you to never use very long timeouts on select() before you
|
|
|
|
call \fIcurl_multi_perform(3)\fP, which thus should be called unconditionally
|
|
|
|
every now and then even if none of its file descriptors have signaled
|
|
|
|
ready. Another precaution you should use: always call
|
|
|
|
\fIcurl_multi_fdset(3)\fP immediately before the select() call since the
|
|
|
|
current set of file descriptors may change when calling a curl function.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
If you want to stop the transfer of one of the easy handles in the stack, you
|
2005-01-22 17:43:04 -05:00
|
|
|
can use \fIcurl_multi_remove_handle(3)\fP to remove individual easy
|
|
|
|
handles. Remember that easy handles should be \fIcurl_easy_cleanup(3)\fPed.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
|
|
|
When a transfer within the multi stack has finished, the counter of running
|
2005-01-22 17:43:04 -05:00
|
|
|
transfers (as filled in by \fIcurl_multi_perform(3)\fP) will decrease. When
|
|
|
|
the number reaches zero, all transfers are done.
|
2004-06-21 04:17:08 -04:00
|
|
|
|
2005-01-22 17:43:04 -05:00
|
|
|
\fIcurl_multi_info_read(3)\fP can be used to get information about completed
|
2004-06-21 04:17:08 -04:00
|
|
|
transfers. It then returns the CURLcode for each easy transfer, to allow you
|
|
|
|
to figure out success on each individual transfer.
|
|
|
|
|
|
|
|
.SH "SSL, Certificates and Other Tricks"
|
|
|
|
|
|
|
|
[ seeding, passwords, keys, certificates, ENGINE, ca certs ]
|
|
|
|
|
|
|
|
.SH "Sharing Data Between Easy Handles"
|
|
|
|
|
|
|
|
[ fill in ]
|
|
|
|
|
|
|
|
.SH "Footnotes"
|
|
|
|
|
|
|
|
.IP "[1]"
|
|
|
|
libcurl 7.10.3 and later have the ability to switch over to chunked
|
|
|
|
Transfer-Encoding in cases were HTTP uploads are done with data of an unknown
|
|
|
|
size.
|
|
|
|
.IP "[2]"
|
|
|
|
This happens on Windows machines when libcurl is built and used as a
|
|
|
|
DLL. However, you can still do this on Windows if you link with a static
|
|
|
|
library.
|
|
|
|
.IP "[3]"
|
|
|
|
The curl-config tool is generated at build-time (on unix-like systems) and
|
|
|
|
should be installed with the 'make install' or similar instruction that
|
|
|
|
installs the library, header files, man pages etc.
|