They were never officially allowed and slipped in only due to sloppy
parsing. Spaces (ascii 32) should be correctly encoded (to %20) before
being part of a URL.
The new flag bit CURLU_ALLOW_SPACE when a full URL is set, makes libcurl
allow spaces.
Updated test 1560 to verify.
Closes#7073
When the host name in a URL is given as an IPv4 numerical address, the
address can be specified with dotted numericals in four different ways:
a32, a.b24, a.b.c16 or a.b.c.d and each part can be specified in
decimal, octal (0-prefixed) or hexadecimal (0x-prefixed).
Instead of passing on the name as-is and leaving the handling to the
underlying name functions, which made them not work with c-ares but work
with getaddrinfo, this change now makes the curl URL API itself detect
and "normalize" host names specified as IPv4 numericals.
The WHATWG URL Spec says this is an okay way to specify a host name in a
URL. RFC 3896 does not allow them, but curl didn't prevent them before
and it seems other RFC 3896-using tools have not either. Host names used
like this are widely supported by other tools as well due to the
handling being done by getaddrinfo and friends.
I decided to add the functionality into the URL API itself so that all
users of these functions get the benefits, when for example wanting to
compare two URLs. Also, it makes curl built to use c-ares now support
them as well and make curl builds more consistent.
The normalization makes HTTPS and virtual hosted HTTP work fine even
when curl gets the address specified using one of the "obscure" formats.
Test 1560 is extended to verify.
Fixes#6863Closes#6871
configure --enable-debug now enables -Wassign-enum with clang,
identifying several enum "abuses" also fixed.
Reported-by: Gisle Vanem
Bug: 879007f811 (commitcomment-42087553)Closes#5929
When using maximum code optimization level (-O3), valgrind wrongly
detects uses of uninitialized values in strcmp().
Preset buffers with all zeroes to avoid that.
In the "scheme-less" parsing case, we need to strip off credentials
first before we guess scheme based on the host name!
Assisted-by: Jay Satiro
Fixes#4856Closes#4857
The parser would check for a query part before fragment, which caused it
to do wrong when the fragment contains a question mark.
Extended test 1560 to verify.
Reported-by: Alex Konev
Fixes#4412Closes#4413
CURLU_NO_AUTHORITY is intended for use with unknown schemes (i.e. not
"file:///") to override cURL's default demand that an authority exists.
Closes#4349
It needs to parse correctly. Otherwise it could be tricked into letting
through a-f using host names that libcurl would then resolve. Like
'[ab.be]'.
Reported-by: Thomas Vegas
Closes#4315
... to make the host name "usable". Store the scope id and put it back
when extracting a URL out of it.
Also makes curl_url_set() syntax check CURLUPART_HOST.
Fixes#3817Closes#3822
Only allow well formed decimal numbers in the input.
Document that the number MUST be between 1 and 65535.
Add tests to test 1560 to verify the above.
Ref: https://github.com/curl/curl/issues/3753Closes#3762
The function does not return the same value as snprintf() normally does,
so readers may be mislead into thinking the code works differently than
it actually does. A different function name makes this easier to detect.
Reported-by: Tomas Hoger
Assisted-by: Daniel Gustafsson
Fixes#3296Closes#3297
APPENDQUERY + URLENCODE would skip all equals signs but now it only skip
encoding the first to better allow "name=content" for any content.
Reported-by: Alexey Melnichuk
Fixes#3231Closes#3231
The function identifying a leading "scheme" part of the URL considered a
few letters ending with a colon to be a scheme, making something like
"short:80" to become an unknown scheme instead of a short host name and
a port number.
Extended test 1560 to verify.
Also fixed test203 to use file_pwd to make it get the correct path on
windows. Removed test 2070 since it was a duplicate of 203.
Assisted-by: Marcel Raad
Reported-by: Hagai Auro
Fixes#3220Fixes#3233Closes#3223Closes#3235
The function identifying a leading "scheme" part of the URL considered a few
letters ending with a colon to be a scheme, making something like "short:80"
to become an unknown scheme instead of a short host name and a port number.
Extended test 1560 to verify.
Reported-by: Hagai Auro
Fixes#3220Closes#3223
This fixes potential out-of-buffer access on "file:./" URL
$ valgrind curl "file:./"
==24516== Memcheck, a memory error detector
==24516== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==24516== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==24516== Command: /home/even/install-curl-git/bin/curl file:./
==24516==
==24516== Conditional jump or move depends on uninitialised value(s)
==24516== at 0x4C31F9C: strcmp (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24516== by 0x4EBB315: seturl (urlapi.c:801)
==24516== by 0x4EBB568: parseurl (urlapi.c:861)
==24516== by 0x4EBC509: curl_url_set (urlapi.c:1199)
==24516== by 0x4E644C6: parseurlandfillconn (url.c:2044)
==24516== by 0x4E67AEF: create_conn (url.c:3613)
==24516== by 0x4E68A4F: Curl_connect (url.c:4119)
==24516== by 0x4E7F0A4: multi_runsingle (multi.c:1440)
==24516== by 0x4E808E5: curl_multi_perform (multi.c:2173)
==24516== by 0x4E7558C: easy_transfer (easy.c:686)
==24516== by 0x4E75801: easy_perform (easy.c:779)
==24516== by 0x4E75868: curl_easy_perform (easy.c:798)
Was originally spotted by
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10637
Credit to OSS-Fuzz
Closes#3039
In order for this API to fully work for libcurl itself, it now offers a
CURLU_GUESS_SCHEME flag that makes it "guess" scheme based on the host
name prefix just like libcurl always did. If there's no known prefix, it
will guess "http://".
Separately, it relaxes the check of the host name so that IDN host names
can be passed in as well.
Both these changes are necessary for libcurl itself to use this API.
Assisted-by: Daniel Gustafsson
Closes#3018