We were using atol(), which on 32 bit, cannot handle values greater than
2GiB, which is fail.
Switch to a strtoull() wrapper function tailored toward parsing off_t
values. This allows parsing of very large positive integer values. off_t
is a signed type, but in our usages, we never parse or have a need for
negative values, so the function will return -1 on error.
Before:
$ pacman -Si flightgear-data | grep Size
Download Size : 2097152.00 K
Installed Size : 2097152.00 K
After:
$ ./src/pacman/pacman -Si flightgear-data | grep Size
Download Size : 2312592.52 KiB
Installed Size : 5402896.00 KiB
Signed-off-by: Dan McGee <dan@archlinux.org>
Hard to believe there was still more room to improve on this, but I
found an easily correctable oversight tonight. Our databases (both sync
and local) contain many blank lines, and we were not moving onto the
next line right away in these cases; instead we would proceed through
our strcmp() conditional checks as normal.
Some local numbers follow to show the effects of this patch:
Sync `-Ss foobarbaz`:
71,709 blank lines skipped early
~1,505,889 strcmp() calls avoided (21 per line)
~15% speed improvement (.210 --> .179 sec)
Local `-Qs foobarbaz`:
6,823 blank lines skipped early
115,991 strcmp() calls avoided (17 per line)
~6% speed improvement (.080 -> .071 sec)
Signed-off-by: Dan McGee <dan@archlinux.org>
Free "syncpath" and restore umask if we fail to grab a lock.
Signed-off-by: Lukas Fleischer <archlinux@cryptocrack.de>
Signed-off-by: Dan McGee <dan@archlinux.org>
This one wasn't all that necessary as we only used it in one place in
the function, which can be checked easily enough at the call site.
Signed-off-by: Dan McGee <dan@archlinux.org>
Let callers of _alpm_download state whether we should delete on fail,
rather than inferring it from context. We still override this decision
and always unlink when a temp file is used.
Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
* Move is_local standalone field to status enum
* Create VALID/INVALID flag pair
* Create EXISTS/MISSING flag pair
With these additional fields, we can be more intelligent with database
loading and messages to the user. We now only warn once if a sync
database does not exist and do not continue to try to load it once we
have marked it as missing.
The reason for the flags existing in pairs is so the unknown case can be
represented. There should never be a time when both flags in the same
group are true, but if they are both false, it represents the unknown
case. Care is taken to always manipulate both flags at the same time.
Signed-off-by: Dan McGee <dan@archlinux.org>
The precedence goes as follows: signature > sha256sum > md5sum
Add some logic and helper methods to check what we have available when
loading a package, and then only check what is necessary to verify the
package. This should speed up sync database verifies as we no longer
will be doing both a checksum and a signature validation.
Signed-off-by: Dan McGee <dan@archlinux.org>
We did this with depends way back in commit c244cfecf6 in 2007. We
can do it with these fields as well.
Of note is the inclusion of provides even though only '=' is supported-
we'll parse other things, but no guarantees are given as to behavior,
which is more or less similar to before since we only looked for the
equals sign.
Also of note is the non-inclusion of optdepends; this will likely be
resolved down the road.
The biggest benefactors of this change will be the resolving code that
formerly had to parse and reparse several of these fields; it only
happens once now at load time. This does lead to the disadvantage that
we will now always be parsing this information up front even if we never
need it in the split form, but as these are uncommon fields and our
parser is quite efficient it shouldn't be a big concern.
Signed-off-by: Dan McGee <dan@archlinux.org>
This adds a field in the package struct for this checksum type as well
as allowing access via the API to it. The frontend is now able to
display any read value. Note that this does not implement any use or
verification of the value internally.
Signed-off-by: Dan McGee <dan@archlinux.org>
We don't write with extra or unknown whitespace, so there is little
reason for us to trim it when reading either. This also fixes the
hopefully never encountered "paths that start or end with spaces" issue,
for which two pactests have been added. The tests also contain other
evil characters that we have encountered before and handle just fine,
but it doesn't hurt to ensure we don't break such support in the future.
Signed-off-by: Dan McGee <dan@archlinux.org>
As noted by Allan, we failed pretty hard if gpgme was compiled out. With
these changes, only sign001.py fails. This can/will be fixed later once
we beef up the test suite with more signing tests anyway.
Signed-off-by: Dan McGee <dan@archlinux.org>
Restore some sanity to the number of arguments passed to _alpm_download
and curl_download_internal.
Signed-off-by: Dave Reisner <dreisner@archlinux.org>
This means creating a new struct which can pass more descriptive data
from the back end sync functions to the downloader. In particular, we're
interested in the download size read from the sync DB. When the remote
server reports a size larger than this (via a content-length header),
abort the transfer.
In cases where the size is unknown, we set a hard upper limit of:
* 25MiB for a sync DB
* 16KiB for a signature
For reference, 25MiB is more than twice the size of all of the current
binary repos (with files) combined, and 16KiB is a truly gargantuan
signature.
Signed-off-by: Dave Reisner <dreisner@archlinux.org>
URLs might end with a slash and follow redirects, or could be a
generated by a script such as /getpkg.php?id=12345. In both cases, we
may have a better filename that we can write to, taken from either
content-disposition header, or the effective URL.
Specific to the first case, we write to a temporary file of the format
'alpmtmp.XXXXXX', where XXXXXX is randomized by mkstemp(3). Since this
is a randomly generated file, we cannot support resuming and the file is
unlinked in the event of an interrupt.
We also run into the possibility of changing out the filename from under
alpm on a -U operation, so callers of _alpm_download can optionally pass
a pointer to a *char to be filled in by curl_download_internal with the
actual filename we wrote to. Any sync operation will pass a NULL pointer
here, as we rely on specific names for packages from a mirror.
Fixes FS#22645.
Signed-off-by: Dave Reisner <d@falconindy.com>
They are placeholders, but important for things like trying to re-sync a
database missing a signature. By using the alpm_db_validity() method at
the right time, a client can take the appropriate action with these
invalid databases as necessary.
In pacman's case, we disallow just about anything that involves looking
at a sync database outside of an '-Sy' operation (although we do check
the validity immediately after). A few operations are still permitted-
'-Q' ops that don't touch sync databases as well as '-R'.
Signed-off-by: Dan McGee <dan@archlinux.org>
This gives us more granularity than the former Never/Optional/Always
trifecta. The frontend still uses these values temporarily but that will
be changed in a future patch.
* Use 'siglevel' consistenly in method names, 'level' as variable name
* The level becomes an enum bitmask value for flexibility
* Signature check methods now return a array of status codes rather than
a simple integer success/failure value. This allows callers to
determine whether things such as an unknown signature are valid.
* Specific signature error codes mostly disappear in favor of the above
returned status code; pm_errno is now set only to PKG_INVALID_SIG or
DB_INVALID_SIG as appropriate.
Signed-off-by: Dan McGee <dan@archlinux.org>
We passed in 'line', but not 'buf.line'. In addition, the macros
building off of READ_NEXT() assume variable names anyway. Since we only
use these macros in one function, might as well simplify them.
Signed-off-by: Dan McGee <dan@archlinux.org>
We can reorganize things a bit to not require reading a directory-only
entry first (or at all). This was noticed while working on some pactest
improvements, but should be a good step forward anyway.
Also make _alpm_splitname() a bit more generic in where it stores the
data it parses.
Signed-off-by: Dan McGee <dan@archlinux.org>
Start by converting all of our flags to a 'status' bitmask (pkgcache
status, grpcache status). Add a new 'valid' flag as well. This will let
us keep track if the database itself has been marked valid in whatever
fashion.
For local databases at the moment we ensure there are no depends files;
for sync databases we ensure the PGP signature is valid if
required/requested. The loading of the pkgcache is prohibited if the
database is invalid.
Signed-off-by: Dan McGee <dan@archlinux.org>
This is another step toward doing both local database validation
(ensuring we don't have depends files) and sync database validation (via
signatures if present) when the database is registered.
Signed-off-by: Dan McGee <dan@archlinux.org>
This is the ideal place to do it as all clients should be checking the
return value and ensuring there are no errors. This is similar to
pkg_load().
We also add an additional step of validation after we download a new
database; a subsequent '-y' operation can potentially invalidate the
original check at registration time.
Note that this implementation is still a bit naive; if a signature is
invalid it is currently impossible to refresh and re-download the file
without manually deleting it first. Similarly, if one downloads a
database and the check fails, the database object is still there and can
be used. These shortcomings will be addressed in a future commit.
Signed-off-by: Dan McGee <dan@archlinux.org>
This doesn't fix the real (bigger) problem of failing to parse sync
databases without directory entries, but it does prevent the parser from
segfaulting when the first desc file encountered did not have a
directory entry, among other conditions.
Signed-off-by: Dan McGee <dan@archlinux.org>
Added a line to the top of each of be_local.c, be_package.c, and
be_sync.c indicating their purposes.
Signed-off-by: Kerrick Staley <mail@kerrickstaley.com>
Signed-off-by: Dan McGee <dan@archlinux.org>
We didn't do due diligence before and ensure prior pm_errno values
weren't influencing what happened in further ALPM calls. I observed one
case of early setup code setting pm_errno to PM_ERR_WRONG_ARGS and that
flag persisting the entire time we were calling library code.
Add a new CHECK_HANDLE() macro that does two things: 1) ensures the
handle variable passed to it is non-NULL and 2) clears any existing
pm_errno flag set on the handle. This macro can replace many places we
used the ASSERT(handle != NULL, ...) pattern before.
Several other other places only need a simple 'set to zero' of the
pm_errno field.
Signed-off-by: Dan McGee <dan@archlinux.org>
This is the last user of our global handle object. Once again the diff
is large but the functional changes are not.
Signed-off-by: Dan McGee <dan@archlinux.org>
This requires a lot of line changes, but not many functional changes as
more often than not our handle variable is already available in some
fashion.
Signed-off-by: Dan McGee <dan@archlinux.org>
This allows us to not require the context (e.g. handle) when calling
this function. Also beef up the checks in the two callers of this
function to bail if the last return code is not ARCHIVE_EOF, which is
the expected value.
This requires a change to one of the pactest return codes and the
overall result of the test, but results in a much safer operating
condition whereby invalid database entries will stop the operation.
Signed-off-by: Dan McGee <dan@archlinux.org>
This will make the patching process less invasive as we start to remove
this variable from all source files.
Signed-off-by: Dan McGee <dan@archlinux.org>
Similar to what we just did for the database; this will make it easy to
always know what handle a given package originated from.
Signed-off-by: Dan McGee <dan@archlinux.org>
This is the first step in a long process to remove our dependence on the
global handle variable we currently share in libalpm, with the goal to
make things a bit more thread-safe and re-entrant.
Signed-off-by: Dan McGee <dan@archlinux.org>
The usefulness of this is rather limited due to it not being compiled
into production builds. When you do choose to see the output, it is
often overwhelming and not helpful. The best bet is to use a debugger
and/or well-placed fprintf() statements.
Signed-off-by: Dan McGee <dan@archlinux.org>
It's your own damn fault if you do this, and this code is remnants from
an old time when we weren't very good at coding.
Signed-off-by: Dan McGee <dan@archlinux.org>
The switch from FUNCTION to DEBUG was ill-advised inside the local
database load. Instead, add a DEBUG level logger to both local and sync
database loads that shows the number of packages processed.
Signed-off-by: Dan McGee <dan@archlinux.org>
This started off removing the "(void)foo" hacks to work around
unused function parameters and ended up fixing every warning
generated by -Wunused-parameter.
Dan: rename to UNUSED.
Signed-off-by: Allan McRae <allan@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
Given that we offer no transparency into the pmpgpsig_t type, we don't
really need to expose it outside of the library, and at this point, we
don't need it at all. Don't decode anything except when checking
signatures. For packages/files not from a sync database, we now just
read the signature file directly anyway.
Also push the decoding logic down further into the check method so we
don't need this hanging out in a less than ideal place. This will make
it easier to conditionally compile things down the road.
Signed-off-by: Dan McGee <dan@archlinux.org>
There's a lot of related moving parts here:
* Iteration through mirrors is moved back to the calling functions. This
allows removal of _alpm_download_single_file and _alpm_download_files.
* The download function gets a few more arguments to influence behavior.
This allows several different scenarios to customize behavior:
- database
- database signature (req'd and optional)
- package
- package via direct URL
- package signature via direct URL (req'd and optional)
* For databases, we need signatures from the same mirror, so structure
the code accordingly.
Some-inspiration-from: Dave Reisner <d@falconindy.com>
Signed-off-by: Dan McGee <dan@archlinux.org>
This does touch a lot of things, and hopefully doesn't break things on
other platforms, but allows us to also clean up a bunch of crud that no
longer needs to be there.
Signed-off-by: Dan McGee <dan@archlinux.org>
There is no reason to not support versions of libarchive that lack
ARCHIVE_COMPRESSION_UU. Distributions should work properly without
this.
Signed-off-by: Rémy Oudompheng <remy@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
We didn't do this sanity check before trying to open an archive. If
the alpm dbpath wasn't set, the sync database dbpath would be NULL,
causing us to hang indefinitely in archive_read_open_filename() rather
than erroring out.
We already have a corresponding check in local_db_populate().
The following program will test this case, and hangs before this patch
without the call to set_dbpath:
int main(int argc, char *argv[]) {
alpm_initialize();
// alpm_option_set_dbpath("/var/lib/pacman/");
pmdb_t *core = alpm_db_register_sync("core");
pmpkg_t *pkg = alpm_db_get_pkg(core, "pacman");
return 0;
}
Signed-off-by: Dan McGee <dan@archlinux.org>
After updating a database, remove the old signature to prevent it
being used in validation if the new signature fails to download.
Signed-off-by: Allan McRae <allan@archlinux.org>
If signature verification is needed, attempt to download a signature
file for a repo when it is updated. Return an error if unable to
download signature only when checking is mandatory, or if signature is
invalid.
TODO: At the moment the database signature is only checked on download.
Should we do anything with a database if it fails to be verified to prevent
its future usage?
Signed-off-by: Allan McRae <allan@archlinux.org>
Implements FS#23103. Also modify libalpm so it ignores this value
without any warning as we know it is likely to exist.
Signed-off-by: Dan McGee <dan@archlinux.org>
This was discussed and more or less agreed upon on the mailing list. A
huge checkin, but if we just do it and let people adjust the pain will
end soon enough. Rebasing should be relatively straighforward for anyone
that sees conflicts; just be sure you use the new return style if
possible.
The following semantic patch was used to do the change, along with some
hand-massaging in order to preserve parenthesis where appropriate:
The semantic match that finds this problem is as follows, although some
hand-massaging was done in order to keep parenthesis where appropriate:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression a;
@@
- return(a);
+ return a;
// </smpl>
A macros_file was also provided with the following content:
Additional steps taken, mainly for ASSERT() macros:
$ sed -i -e 's#return(NULL)#return NULL#' lib/libalpm/*.c
$ sed -i -e 's#return(-1)#return -1#' lib/libalpm/*.c
Signed-off-by: Dan McGee <dan@archlinux.org>
We erroniously dropped the call to _alpm_delta_parse() when macro-izing,
causing segfaults for repos that provide deltas. Addresses FS#23314.
Signed-off-by: Dan McGee <dan@archlinux.org>
repo-add can add a "files" entry into the sync db. Currently we
do nothing with this file, so explicitly skip it to prevent
unknown database file warnings.
Signed-off-by: Allan McRae <allan@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
A lot of these were places that should have used the same message but
didn't, or were very easy to convert to using the same message and
letting some of the burden off of the translators.
Signed-off-by: Dan McGee <dan@archlinux.org>
Ensure we have a local DB version that is up to par with what we expect
before we go down any road that might modify it. This should prevent
stupid mistakes with the 3.5.X upgrade and people not running
pacman-db-upgrade after the transaction as they will need to.
Signed-off-by: Dan McGee <dan@archlinux.org>
* Use stat() and not lstat(); we don't care for the size of the symlink if
it is one, we want the size of the reference file.
* FS#22896, fix local database estimation on platforms that don't abide by
the nlink assumption for number of children.
* Fix a missing newline on an error message.
Signed-off-by: Dan McGee <dan@archlinux.org>
In sync_db_populate() and local_db_populate(), a NULL db->pkgcache is not
caught, allowing the functions to continue instead of exiting.
A later alpm_list_msort() call which uses alpm_list_nth() will thus traverse
invalid pointers in a non-existent db->pkgcache->list.
pm_errno is set to PM_ERR_MEMORY as _alpm_pkghash_create() will only return
NULL when we run out of memory / exceed max hash table size. The local/sync
db_populate() functions are also exited.
Signed-off-by: Pang Yan Han <pangyanhan@gmail.com>
Signed-off-by: Dan McGee <dan@archlinux.org>
Since the sync database never changes size once we initialize it, we
allow it to be filled a bit more. This reduces the overall memory
footprint needed by the hash table.
Signed-off-by: Dan McGee <dan@archlinux.org>
Read the package information for sync/local databases into a pmpkghash_t
structure.
Provide a alpm_db_get_pkgcache_list() method that returns the list from
the hash object. Most usages of alpm_db_get_pkgcache are converted to
this at this stage for ease of implementation. Review whether these are
better accessing the hash table directly at a later stage.
Signed-off-by: Allan McRae <allan@archlinux.org>
This works for both local and sync databases in slightly different ways. For
the local database, we can use the directory hard link count on the local/
folder. For sync databases, we use the archive size coupled with some
computed average per-package sizes to determine an estimate.
This is currently a dead assignment once calculated, but could be used to
set the initial size of a hash table.
Signed-off-by: Dan McGee <dan@archlinux.org>
Signed-off-by: Allan McRae <allan@archlinux.org>
Noted in FS#22697. When I factored out _alpm_parsedate() into a common
function, I didn't move the <locale.h> include properly, causing a build
failure when NLS is disabled and this header isn't automatically included
everywhere.
Signed-off-by: Dan McGee <dan@archlinux.org>
We were returning a package error code rather than a DB one, and we
would leak the archive memory if the database file didn't exist.
Signed-off-by: Dan McGee <dan@archlinux.org>
Instead, go the same route we have always taken with version-release in
libalpm and treat it all as one piece of information. Makepkg is the only
script that knows about epoch as a distinct value; from there on out we will
parse out the components as necessary.
This makes the code a lot simpler as far as epoch handling goes. The
downside here is that we are tossing some compatibility to the wind;
packages using force will have to be rebuilt with an incremented epoch to
keep their special status.
Signed-off-by: Dan McGee <dan@archlinux.org>
In most (all?) cases, we will process all files for a given sync database
entry sequentially. The code currently does an _alpm_pkg_find() for every
file in the database, but we had the "current" package readily available.
Shift some local variables around a bit to expose this to sync_db_read() and
use it if the package is the correct one.
On my system, this cuts calls to _alpm_pkg_find() from 20,769 to 10,349
calls during a -Qu operation, and results in a ~30% speedup of the same
operation (0.35 sec -> 0.27 sec). This benefit should be apparent anywhere
we read in the full contents of the sync databases.
Signed-off-by: Dan McGee <dan@archlinux.org>
There is a lot of swtiching between size_t and int for alpm_list sizes
in the codebase. Start converting these to all be size_t by adjusting
the return type of alpm_list_count and fixing all additional warnings
given by -Wconversion that are generated by this change.
Dan: a few more small changes to ensure things compile, adjusting some
printf format string characters to accommodate the larger size on x86_64.
Signed-off-by: Allan McRae <allan@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
POSIX does not require PATH_MAX be defined when there is not actual
limit to its value. This affects HURD based systems. Work around
this by defining PATH_MAX to 4096 (as on Linux) when this is not
defined.
Also, clean up inclusions of limits.h and remove autoconf check for
this header as we do not use macro shields for its inclusion anyway.
Signed-off-by: Allan McRae <allan@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
The old function was written in a time before we relied on it for nearly
every operation. Since then, we have switched to the archive backend and now
fast parsing is a big deal.
The former function made a per-character call to the libarchive
archive_read_data() function, which resulted in some 21 million calls in a
typical "load all sync dbs" operation. If we instead do some buffering of
our own and read the blocks directly, and then find our newlines from there,
we can cut out the multiple layers of overhead and go from archive to parsed
data much quicker.
Both users of the former function are switched over to the new signature,
made easier by the macros now in place in the sync backend parsing code.
Performance: for a `pacman -Su` (no upgrades available),
_alpm_archive_fgets() goes from being 29% of the total time to 12% The time
spent on the libarchive function being called dropped from 24% to 6%.
This pushes _alpm_pkg_find back to the title of slowest low-level function.
Signed-off-by: Dan McGee <dan@archlinux.org>
This simplifies a lot of the repetative code and makes it obvious where the
tricky or different ones are (e.g. depends, dates). It also makes it
significantly easier to change the way this code works in the future.
There should be no functional change with this patch.
Signed-off-by: Dan McGee <dan@archlinux.org>
Rather than error out, this is easy enough. Looks quite similar to the code
in be_local for creating the local directory.
Signed-off-by: Dan McGee <dan@archlinux.org>
Whether it be "desc", "depends", or "deltas", it really doesn't matter-
treat them all the same and have the ability to read any data from any file
in that list. This continues the work in a44c7b8956.
Signed-off-by: Dan McGee <dan@archlinux.org>
We were including the header in a lot of places it is no longer used.
Additionally, use the correct autoconf macro for determining whether
d_type is available as a member: HAVE_STRUCT_DIRENT_D_TYPE.
Signed-off-by: Dan McGee <dan@archlinux.org>
Signed-off-by: Allan McRae <allan@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
This will allow us to eventually combine the depends and desc entries
within the sync database.
Signed-off-by: Allan McRae <allan@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
This will allow for better control of what was previously the 'force' option
in a PKGBUILD and transferred into the built package.
Signed-off-by: Dan McGee <dan@archlinux.org>
The splitname function is a general utility function and so is better
suited to util.h. Rename it to _alpm_splitname to indicate it is an
internal libalpm function as was the case prior to splitting local and
sync db handling.
Signed-off-by: Allan McRae <allan@archlinux.org>
Read in list of packages for sync db from tar archive.
Breaks reading in _alpm_sync_db_read and a lot of pactests (which
is expected as they do not handle sync db in archives...).
Signed-off-by: Allan McRae <allan@archlinux.org>