mirror of
https://github.com/moparisthebest/curl
synced 2024-12-21 15:48:49 -05:00
extended the proxy chapter mucho
This commit is contained in:
parent
5b58e61f28
commit
14e9420d2c
@ -137,9 +137,22 @@ Handle the Easy libcurl
|
||||
|
||||
It returns an easy handle. Using that you proceed to the next step: setting
|
||||
up your preferred actions. A handle is just a logic entity for the upcoming
|
||||
transfer or series of transfers. One of the most basic properties to set in
|
||||
the handle is the URL. You set your preferred URL to transfer with
|
||||
CURLOPT_URL in a manner similar to:
|
||||
transfer or series of transfers.
|
||||
|
||||
You set properties and options for this handle using curl_easy_setopt(). They
|
||||
control how the subsequent transfer or transfers will be made. Options remain
|
||||
set in the handle until set again to something different. Alas, multiple
|
||||
requests using the same handle will use the same options.
|
||||
|
||||
Many of the informationals you set in libcurl are "strings", pointers to data
|
||||
terminated with a zero byte. Keep in mind that when you set strings with
|
||||
curl_easy_setopt(), libcurl will not copy the data. It will merely point to
|
||||
the data. You MUST make sure that the data remains available for libcurl to
|
||||
use until finished or until you use the same option again to point to
|
||||
something else.
|
||||
|
||||
One of the most basic properties to set in the handle is the URL. You set
|
||||
your preferred URL to transfer with CURLOPT_URL in a manner similar to:
|
||||
|
||||
curl_easy_setopt(easyhandle, CURLOPT_URL, "http://curl.haxx.se/");
|
||||
|
||||
@ -358,12 +371,16 @@ HTTP POSTing
|
||||
|
||||
curl_easy_perform(easyhandle); /* post away! */
|
||||
|
||||
Simple enough, huh? Ok, so what if you want to post binary data that also
|
||||
requires you to set the Content-Type: header of the post? Well, binary posts
|
||||
prevents libcurl from being able to do strlen() on the data to figure out the
|
||||
size, so therefore we must tell libcurl the size of the post data. Setting
|
||||
headers in libcurl requests are done in a generic way, by building a list of
|
||||
our own headers and then passing that list to libcurl.
|
||||
Simple enough, huh? Since you set the POST options with the
|
||||
CURLOPT_POSTFIELDS, this automaticly switches the handle to use POST in the
|
||||
upcoming request.
|
||||
|
||||
Ok, so what if you want to post binary data that also requires you to set the
|
||||
Content-Type: header of the post? Well, binary posts prevents libcurl from
|
||||
being able to do strlen() on the data to figure out the size, so therefore we
|
||||
must tell libcurl the size of the post data. Setting headers in libcurl
|
||||
requests are done in a generic way, by building a list of our own headers and
|
||||
then passing that list to libcurl.
|
||||
|
||||
struct curl_slist *headers=NULL;
|
||||
headers = curl_slist_append(headers, "Content-Type: text/xml");
|
||||
@ -416,14 +433,14 @@ HTTP POSTing
|
||||
/* free the post data again */
|
||||
curl_formfree(post);
|
||||
|
||||
The multipart formposts are a chain of parts using MIME-style separators and
|
||||
headers. That means that each of these separate parts get a few headers set
|
||||
that describes its individual content-type, size etc. Now, to enable your
|
||||
Multipart formposts are chains of parts using MIME-style separators and
|
||||
headers. It means that each one of these separate parts get a few headers set
|
||||
that describe the individual content-type, size etc. To enable your
|
||||
application to handicraft this formpost even more, libcurl allows you to
|
||||
supply your own custom headers to an individual form part. You can of course
|
||||
supply headers to as many parts you like, but this little example will show
|
||||
how you have set headers to one specific part when you add that to post
|
||||
handle:
|
||||
supply your own set of custom headers to such an individual form part. You
|
||||
can of course supply headers to as many parts you like, but this little
|
||||
example will show how you set headers to one specific part when you add that
|
||||
to the post handle:
|
||||
|
||||
struct curl_slist *headers=NULL;
|
||||
headers = curl_slist_append(headers, "Content-Type: text/xml");
|
||||
@ -439,9 +456,22 @@ HTTP POSTing
|
||||
curl_formfree(post); /* free post */
|
||||
curl_slist_free_all(post); /* free custom header list */
|
||||
|
||||
Since all options on an easyhandle are "sticky", they remain the same until
|
||||
changed even if you do call curl_easy_perform(), you may need to tell curl to
|
||||
go back to a plain GET request if you intend to do such a one as your next
|
||||
request. You force an easyhandle to back to GET by using the CURLOPT_HTTPGET
|
||||
option:
|
||||
|
||||
curl_easy_setopt(easyhandle, CURLOPT_HTTPGET, TRUE);
|
||||
|
||||
Just setting CURLOPT_POSTFIELDS to "" or NULL will *not* stop libcurl from
|
||||
doing a POST. It will just make it POST without any data to send!
|
||||
|
||||
|
||||
Showing Progress
|
||||
|
||||
[ built-in progress meter, progress callback ]
|
||||
|
||||
|
||||
libcurl with C++
|
||||
|
||||
@ -488,16 +518,107 @@ Proxies
|
||||
proxy is using the HTTP protocol. For example, you can't invoke your own
|
||||
custom FTP commands or even proper FTP directory listings.
|
||||
|
||||
To tell libcurl to use a proxy at a given port number:
|
||||
Proxy Options
|
||||
|
||||
curl_easy_setopt(easyhandle, CURLOPT_PROXY, "proxy-host.com:8080");
|
||||
To tell libcurl to use a proxy at a given port number:
|
||||
|
||||
Some proxies require user authentication before allowing a request, and you
|
||||
pass that information similar to this:
|
||||
curl_easy_setopt(easyhandle, CURLOPT_PROXY, "proxy-host.com:8080");
|
||||
|
||||
curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "user:password");
|
||||
Some proxies require user authentication before allowing a request, and
|
||||
you pass that information similar to this:
|
||||
|
||||
[ environment variables, SSL, tunneling, automatic proxy config (.pac) ]
|
||||
curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "user:password");
|
||||
|
||||
If you want to, you can specify the host name only in the CURLOPT_PROXY
|
||||
option, and set the port number separately with CURLOPT_PROXYPORT.
|
||||
|
||||
Environment Variables
|
||||
|
||||
libcurl automaticly checks and uses a set of environment variables to know
|
||||
what proxies to use for certain protocols. The names of the variables are
|
||||
following an ancient de facto standard and are built up as
|
||||
"[protocol]_proxy" (note the lower casing). Which makes the variable
|
||||
'http_proxy' checked for a name of a proxy to use when the input URL is
|
||||
HTTP. Following the same rule, the variable named 'ftp_proxy' is checked
|
||||
for FTP URLs. Again, the proxies are always HTTP proxies, the different
|
||||
names of the variables simply allows different HTTP proxies to be used.
|
||||
|
||||
The proxy environment variable contents should be in the format
|
||||
"[protocol://]machine[:port]". Where the protocol:// part is simply
|
||||
ignored if present (so http://proxy and bluerk://proxy will do the same)
|
||||
and the optional port number specifies on which port the proxy operates on
|
||||
the host. If not specified, the internal default port number will be used
|
||||
and that is most likely *not* the one you would like it to be.
|
||||
|
||||
There are two special environment variables. 'all_proxy' is what sets
|
||||
proxy for any URL in case the protocol specific variable wasn't set, and
|
||||
'no_proxy' defines a list of hosts that should not use a proxy even though
|
||||
a variable may say so. If 'no_proxy' is a plain asterisk ("*") it matches
|
||||
all hosts.
|
||||
|
||||
SSL and Proxies
|
||||
|
||||
SSL is for secure point-to-point connections. This envolves strong
|
||||
encryption and similar things, which effectivly makes it impossible for a
|
||||
proxy to operate as a "man in between" which the proxy's task is as
|
||||
previously discussed. Instead, the only way to have SSL work over a HTTP
|
||||
proxy is to ask the proxy to tunnel trough everything without being able
|
||||
to check the traffic.
|
||||
|
||||
Opening an SSL connection over a HTTP proxy is therefor a matter of asking
|
||||
the proxy for a straight connection to the target host on a specified
|
||||
port. This is made with the HTTP request CONNECT.
|
||||
|
||||
Because of the nature of this operation, where the proxy has no idea what
|
||||
kind of data that is passed in and out through this tunnel, this
|
||||
effectively breaks some of the pros a proxy might offer, such as caching.
|
||||
Many organizations prevent this kind of tunneling to other destination
|
||||
port numbers than 443 (which is the default HTTPS port number).
|
||||
|
||||
Tunneling Through Proxy
|
||||
|
||||
As explained above, tunneling is required for SSL to work and often even
|
||||
restricted to the operation intended for SSL; HTTPS.
|
||||
|
||||
This is however not the only time proxy-tunneling might offer benefits to
|
||||
you or your application.
|
||||
|
||||
As tunneling opens a direct connection from your application to the remote
|
||||
machine, it suddenly also re-introduces the ability to do non-HTTP
|
||||
operations over a HTTP proxy. You can in fact use things such as FTP
|
||||
upload or FTP custom commands this way.
|
||||
|
||||
Again, this is often prevented by the adminstrators of proxies and is
|
||||
rarely allowed.
|
||||
|
||||
Tell libcurl to use proxy tunneling like this:
|
||||
|
||||
curl_easy_setopt(easyhandle, CURLOPT_HTTPPROXYTUNNEL, TRUE);
|
||||
|
||||
Proxy Auto-Config
|
||||
|
||||
Netscape first came up with this. It is basicly a web page (usually using
|
||||
a .pac extension) with a javascript that when executed by the browser with
|
||||
the requested URL as input, returns information to the browser on how to
|
||||
connect to the URL. The returned information might be "DIRECT" (which
|
||||
means no proxy should be used), "PROXY host:port" (to tell the browser
|
||||
where the proxy for this particular URL is) or "SOCKS host:port" (to
|
||||
direct the brower to a SOCKS proxy).
|
||||
|
||||
libcurl has no means to interpret or evaluate javascript and thus it
|
||||
doesn't support this. If you get yourself in a position where you face
|
||||
this nasty invention, the following advice have been mentioned and used in
|
||||
the past:
|
||||
|
||||
- Depending on the javascript complexity, write up a script that
|
||||
translates it to another language and execute that.
|
||||
|
||||
- Read the javascript code and rewrite the same logic in another language.
|
||||
|
||||
- Implement a javascript interpreted, people have successfully used the
|
||||
Mozilla javascript engine in the past.
|
||||
|
||||
- Ask your admins to stop this, for a static proxy setup or similar.
|
||||
|
||||
|
||||
Security Considerations
|
||||
@ -505,7 +626,7 @@ Security Considerations
|
||||
[ ps output, netrc plain text, plain text protocols / base64 ]
|
||||
|
||||
|
||||
Certificates and Other SSL Tricks
|
||||
SSL, Certificates and Other Tricks
|
||||
|
||||
|
||||
Future
|
||||
|
Loading…
Reference in New Issue
Block a user