Commit Graph

117 Commits

Author SHA1 Message Date
Dan McGee 7f51ba99ae Fix segfaults on opening invalid archive files
"invalid" in this case simply means files that may or may not be
archives. Discovered via a `pacman -Sc` operation with delta files in
the package cache directory, but can be triggered if any file is passed
to `pacman -Ql` that isn't an archive, for instance, or if the sync
database file is not an archive.

Fix it up so we are more careful about calling archive_read_finish()
only on archives that are valid and have not already been closed, and
teach our archive open function to set the returned archive to NULL if
we aren't going to be returning something valid anyway.

Signed-off-by: Dan McGee <dan@archlinux.org>
2012-01-08 11:53:22 -06:00
Dan McGee 2d794cc031 Update sync database size estimation values
As the comment states, this is more like a dartboard than science.

Signed-off-by: Dan McGee <dan@archlinux.org>
2012-01-07 11:28:37 -06:00
Dan McGee 0b9cd9448c pkghash improvements/modifications
This patch changes a variety of small things related to our pkghash
implementation with an eye toward performance, especially on native
32-bit systems.

* Use `unsigned int` rather than `size_t` for hash sizes. We already
  return ERANGE for any attempted creation of a hash greater than 1
  million elements, so unsigned int is more than large enough for our
  purposes. Switching to this type allows 32 bit systems to do native
  math without helper functions from libgcc.
* _alpm_pkghash_create() now internally adds extra padding for
  additional array elements, rather than that being the responsibility of
  the caller.
* #define values are moved into static const values in pkghash.c; a new
  `stride` value is also extracted (but remains set at 1).
* Division and modulus operators are removed from the normal find and
  add paths if possible. We store the upper limit of the number of
  elements in the hash so we no longer need to calculate this every
  element addition. When doing wraparound position calculations, we only
  apply the modulus operator if the value is greater than the number of
  buckets.

Signed-off-by: Dan McGee <dan@archlinux.org>
2012-01-02 19:41:42 -06:00
Dan McGee 177beb91c9 Minor optimization on likely sync package lookup
We have a name_hash value here, so add a cheap compare of it before
falling to the strcmp() call.

Signed-off-by: Dan McGee <dan@archlinux.org>
2012-01-02 13:17:44 -06:00
Dan McGee 819c28bf8c Only compile delta regex once
This reduces the number of regcomp() calls when parsing delta entries in
the database from once per entry to once for the entire context handle
by storing the compiled regex data on the handle itself. Just as we do
with the cURL handle, we initialize it the first time it is needed and
free it when releasing the handle.

A few other small tweaks to the parsing function also take place,
including using the stack to store the transient and short file size
string while parsing it.

When parsing a sync database with 1378 delta entries, this reduces the
time of a `pacman -Sl deltas` operation by 50% from 0.22s to 0.12s.

Signed-off-by: Dan McGee <dan@archlinux.org>
2012-01-02 12:55:55 -06:00
Dave Reisner d6ccd44390 include config.h via Makefiles
Ensures that config.h is always ordered correctly (first) in the
includes. Also means that new source files get this for free without
having to remember to add it.

We opt for -imacros over -include as its more portable, and the
added constraint by -imacros doesn't bother us for config.h.

This also touches the HACKING file to remove the explicit mention of
config.h as part of the includes.

Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
2011-12-21 18:13:17 -06:00
Dan McGee dee8144ce5 Add helper method for creating and opening archive object
This moves the common setup code of about 5 different callers into one
method. Error messages will now be common and shared in all places;
several paths did not have any messages at all before.

In addition, we now pick an ideal block size for the archive read based
off the larger value of our default buffer size or the st.st_blksize
field. For a filesystem such as NFS, this is often much larger than the
default 8192- values such as 32768 and 131072 are common.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-11-16 14:51:06 -06:00
Dan McGee 6df558177f Convert package and database archive reads to use file descriptors
This gives us a bit more control and over the archive reading process,
and a bit less is done behind the scenes. It also allows us to use
fstat() in preference to stat(), which should avoid some potential race
conditions.

Some reorganization is necessary to move the stat calls after the open()
calls. Error handling and cleanup in general is also improved, as we had
several potential memory and file handle leaks before in some error
paths.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-11-01 10:27:31 -05:00
Dan McGee 3343185473 Introduce ALPM_BUFFER_SIZE constant
This takes the place of three previously used constants:
ARCHIVE_DEFAULT_BYTES_PER_BLOCK, BUFFER_SIZE, and CPBUFSIZE.

In libarchive 3.0, the first constant will be no more, so we can ensure
we are forward-compatible by removing our usage of it now. The rest are
unified for consistency.

By default, we will use the value of BUFSIZ provided by <stdio.h>, which
is 8192 on Linux. If that is undefined, a default value is provided.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-10-27 14:59:24 -05:00
Dan McGee e0acf2f144 Refactor download payload reset and free
This was done to squash a memory leak in the sync database download
code. When we downloaded a database and then reused the payload struct,
we could find ourselves calling get_fullpath() for the signatures and
overwriting non-freed values we had left over from the database
download.

Refactor the payload_free function into a payload_reset function that we
can call that does NOT free the payload itself, so we can reuse payload
structs. This also allows us to move the payload to the stack in some
call paths, relieving us of the need to alloc space.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-09-28 13:01:03 -05:00
Dan McGee d8fab9b441 be_sync: fetch only entry mode, not full stat struct
This saves a lot of unnecessary work since we don't need any of the
other fields in the stat struct.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-09-27 17:49:34 -05:00
Dan McGee c406949226 Ensure database validity flags are set in invalid signature case
We returned the right error code but never set the flags accordingly.
Also, now that we can bail early, ensure we set the error code.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-09-22 16:01:11 -05:00
Dan McGee ec790ced7c signing: add a process and retry loop for database signatures
In reality, there is no retrying that happens as of now because we don't
have any import or changing of the keyring going on, but the code is set
up so we can drop this in our new _alpm_process_siglist() function. Wire
up the basics to the sync database validation code, so we see something
like the following:

    $ pacman -Ss unknowntrust
    error: core: signature from "Dan McGee <dpmcgee@gmail.com>" is unknown trust
    error: core: signature from "Dan McGee <dpmcgee@gmail.com>" is unknown trust
    error: database 'core' is not valid (invalid or corrupted database (PGP signature))

    $ pacman -Ss missingsig
    error: core: missing required signature
    error: core: missing required signature
    error: database 'core' is not valid (invalid or corrupted database (PGP signature))

Yes, there is some double output, but this should be fixable in the
future.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-09-20 10:23:11 -05:00
Dan McGee 994cb4da4f Allow our PGP helper method to pass back the signature results
This will make its way up the call chain eventually to allow trusting
and importing of keys as necessary.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-09-20 10:23:11 -05:00
Dan McGee 83a1e4fee3 Clean up handling of size fields
We currently have csize, isize, and size concepts, and sometimes the
difference isn't clear. Ensure the following holds:

* size (aka csize): always the compressed size of the package; available
  for everything except local packages (where it will return 0)
* isize: always the installed size of the package; available for all
  three package types

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-09-02 19:26:07 -05:00
Dan McGee 234b6ffc2c Parse > 2GiB file sizes correctly
We were using atol(), which on 32 bit, cannot handle values greater than
2GiB, which is fail.

Switch to a strtoull() wrapper function tailored toward parsing off_t
values. This allows parsing of very large positive integer values. off_t
is a signed type, but in our usages, we never parse or have a need for
negative values, so the function will return -1 on error.

Before:
    $ pacman -Si flightgear-data | grep Size
    Download Size  : 2097152.00 K
    Installed Size : 2097152.00 K

After:
    $ ./src/pacman/pacman -Si flightgear-data | grep Size
    Download Size  : 2312592.52 KiB
    Installed Size : 5402896.00 KiB

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-08-29 19:57:05 -05:00
Dan McGee b221af660d Database read optimizations
Hard to believe there was still more room to improve on this, but I
found an easily correctable oversight tonight. Our databases (both sync
and local) contain many blank lines, and we were not moving onto the
next line right away in these cases; instead we would proceed through
our strcmp() conditional checks as normal.

Some local numbers follow to show the effects of this patch:

Sync `-Ss foobarbaz`:
71,709 blank lines skipped early
~1,505,889 strcmp() calls avoided (21 per line)
~15% speed improvement (.210 --> .179 sec)

Local `-Qs foobarbaz`:
6,823 blank lines skipped early
115,991 strcmp() calls avoided (17 per line)
~6% speed improvement (.080 -> .071 sec)

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-08-28 23:49:27 -05:00
Lukas Fleischer 455ca55f4e be_sync.c: Fix memory leak in alpm_db_update()
Free "syncpath" and restore umask if we fail to grab a lock.

Signed-off-by: Lukas Fleischer <archlinux@cryptocrack.de>
Signed-off-by: Dan McGee <dan@archlinux.org>
2011-08-28 19:40:22 -05:00
Dan McGee e1dce078b2 Remove argument from check_pgp_helper
This one wasn't all that necessary as we only used it in one place in
the function, which can be checked easily enough at the call site.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-08-25 16:12:49 -05:00
Dan McGee d9e5dab6ac Remove alpm_option_get_* usage from backend
These are all available directly on the handle without indirection.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-08-19 19:12:21 -05:00
Dave Reisner 24824b54ce dload: add 'unlink_on_fail' to payload struct
Let callers of _alpm_download state whether we should delete on fail,
rather than inferring it from context. We still override this decision
and always unlink when a temp file is used.

Signed-off-by: Dave Reisner <dreisner@archlinux.org>
Signed-off-by: Dan McGee <dan@archlinux.org>
2011-08-19 09:43:19 -05:00
Dan McGee c885a953eb Enhance and utilize database status flags
* Move is_local standalone field to status enum
* Create VALID/INVALID flag pair
* Create EXISTS/MISSING flag pair

With these additional fields, we can be more intelligent with database
loading and messages to the user. We now only warn once if a sync
database does not exist and do not continue to try to load it once we
have marked it as missing.

The reason for the flags existing in pairs is so the unknown case can be
represented. There should never be a time when both flags in the same
group are true, but if they are both false, it represents the unknown
case. Care is taken to always manipulate both flags at the same time.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-08-18 10:47:41 -05:00
Dan McGee 11f4a7a48e Only check necessary signatures and checksums
The precedence goes as follows: signature > sha256sum > md5sum

Add some logic and helper methods to check what we have available when
loading a package, and then only check what is necessary to verify the
package. This should speed up sync database verifies as we no longer
will be doing both a checksum and a signature validation.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-08-15 15:15:11 -05:00
Dan McGee a628feee46 Parse conflicts/provides/replaces at database load time
We did this with depends way back in commit c244cfecf6 in 2007. We
can do it with these fields as well.

Of note is the inclusion of provides even though only '=' is supported-
we'll parse other things, but no guarantees are given as to behavior,
which is more or less similar to before since we only looked for the
equals sign.

Also of note is the non-inclusion of optdepends; this will likely be
resolved down the road.

The biggest benefactors of this change will be the resolving code that
formerly had to parse and reparse several of these fields; it only
happens once now at load time. This does lead to the disadvantage that
we will now always be parsing this information up front even if we never
need it in the split form, but as these are uncommon fields and our
parser is quite efficient it shouldn't be a big concern.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-08-15 12:56:41 -05:00
Dan McGee ebb2e36cc4 Load and allow access to sha256sum
This adds a field in the package struct for this checksum type as well
as allowing access via the API to it. The frontend is now able to
display any read value. Note that this does not implement any use or
verification of the value internally.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-08-15 12:11:55 -05:00
Dan McGee cbaff216b3 Don't trim whitespace when reading database entries
We don't write with extra or unknown whitespace, so there is little
reason for us to trim it when reading either. This also fixes the
hopefully never encountered "paths that start or end with spaces" issue,
for which two pactests have been added. The tests also contain other
evil characters that we have encountered before and handle just fine,
but it doesn't hurt to ensure we don't break such support in the future.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-08-02 08:20:34 -04:00
Dan McGee 70d6fe6632 Clean up my debug logger mess
Signed-off-by: Dan McGee <dan@archlinux.org>
2011-07-19 00:23:42 -05:00
Dan McGee 05f7c0280e Fix test suite when GPGME is disabled
As noted by Allan, we failed pretty hard if gpgme was compiled out. With
these changes, only sign001.py fails. This can/will be fixed later once
we beef up the test suite with more signing tests anyway.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-07-18 21:06:24 -05:00
Dave Reisner 3eec745910 absorb some _alpm_download params into payload struct
Restore some sanity to the number of arguments passed to _alpm_download
and curl_download_internal.

Signed-off-by: Dave Reisner <dreisner@archlinux.org>
2011-07-05 23:00:02 -04:00
Dave Reisner 6dc71926f9 lib/dload: prevent large file attacks
This means creating a new struct which can pass more descriptive data
from the back end sync functions to the downloader. In particular, we're
interested in the download size read from the sync DB. When the remote
server reports a size larger than this (via a content-length header),
abort the transfer.

In cases where the size is unknown, we set a hard upper limit of:

* 25MiB for a sync DB
* 16KiB for a signature

For reference, 25MiB is more than twice the size of all of the current
binary repos (with files) combined, and 16KiB is a truly gargantuan
signature.

Signed-off-by: Dave Reisner <dreisner@archlinux.org>
2011-07-05 22:58:55 -04:00
Dave Reisner 6c9b82e72a dload: handle irregular URLs
URLs might end with a slash and follow redirects, or could be a
generated by a script such as /getpkg.php?id=12345. In both cases, we
may have a better filename that we can write to, taken from either
content-disposition header, or the effective URL.

Specific to the first case, we write to a temporary file of the format
'alpmtmp.XXXXXX', where XXXXXX is randomized by mkstemp(3). Since this
is a randomly generated file, we cannot support resuming and the file is
unlinked in the event of an interrupt.

We also run into the possibility of changing out the filename from under
alpm on a -U operation, so callers of _alpm_download can optionally pass
a pointer to a *char to be filled in by curl_download_internal with the
actual filename we wrote to. Any sync operation will pass a NULL pointer
here, as we rely on specific names for packages from a mirror.

Fixes FS#22645.

Signed-off-by: Dave Reisner <d@falconindy.com>
2011-07-05 22:58:27 -04:00
Dan McGee c748eadc80 Allow invalid sync DBs to be returned by the library
They are placeholders, but important for things like trying to re-sync a
database missing a signature. By using the alpm_db_validity() method at
the right time, a client can take the appropriate action with these
invalid databases as necessary.

In pacman's case, we disallow just about anything that involves looking
at a sync database outside of an '-Sy' operation (although we do check
the validity immediately after). A few operations are still permitted-
'-Q' ops that don't touch sync databases as well as '-R'.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-07-05 10:13:20 -05:00
Dan McGee 7af0ab1cde signing: move to new signing verification and return scheme
This gives us more granularity than the former Never/Optional/Always
trifecta. The frontend still uses these values temporarily but that will
be changed in a future patch.

* Use 'siglevel' consistenly in method names, 'level' as variable name
* The level becomes an enum bitmask value for flexibility
* Signature check methods now return a array of status codes rather than
  a simple integer success/failure value. This allows callers to
  determine whether things such as an unknown signature are valid.
* Specific signature error codes mostly disappear in favor of the above
  returned status code; pm_errno is now set only to PKG_INVALID_SIG or
  DB_INVALID_SIG as appropriate.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-07-05 10:13:20 -05:00
Dan McGee 1ce7f39ad7 Merge remote-tracking branch 'allan/ALPM'
Conflicts:
	lib/libalpm/be_local.c
	lib/libalpm/be_package.c
	lib/libalpm/conflict.c
	lib/libalpm/diskspace.c
	lib/libalpm/dload.c
	lib/libalpm/remove.c
2011-07-03 14:44:57 -05:00
Dan McGee de8b9a85a5 be_sync: make READ_NEXT() a no-arg macro
We passed in 'line', but not 'buf.line'. In addition, the macros
building off of READ_NEXT() assume variable names anyway. Since we only
use these macros in one function, might as well simplify them.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-07-03 13:07:19 -05:00
Allan McRae afc96f2ab3 Prefix _alpm_errno_t members with ALPM
Signed-off-by: Allan McRae <allan@archlinux.org>
2011-07-02 02:01:39 +10:00
Allan McRae ca43fdd92f Prefix alpm_loglevel_t members with ALPM
Signed-off-by: Allan McRae <allan@archlinux.org>
2011-07-02 02:01:38 +10:00
Dan McGee 23a2d2c16a Make alpm_db_get_sigverify_level() public
Signed-off-by: Dan McGee <dan@archlinux.org>
2011-06-30 16:16:41 -05:00
Dan McGee 68284da0d7 Add an alpm_db_get_valid() public function
This allows one to check if a database is valid or invalid.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-06-30 16:05:20 -05:00
Allan McRae 8a04bc25a1 Rename pmpkg_t to alpm_pkg_t
Signed-off-by: Allan McRae <allan@archlinux.org>
2011-06-28 23:26:39 +10:00
Allan McRae 939d5a9511 Rename pmdb_t to alpm_db_t
Signed-off-by: Allan McRae <allan@archlinux.org>
2011-06-28 14:16:12 +10:00
Allan McRae 64c1cf7921 Rename pmhandle_t to alpm_handle_t
Signed-off-by: Allan McRae <allan@archlinux.org>
2011-06-28 14:04:00 +10:00
Dan McGee 777bdc6c50 Make sync DB reading a bit more flexible
We can reorganize things a bit to not require reading a directory-only
entry first (or at all). This was noticed while working on some pactest
improvements, but should be a good step forward anyway.

Also make _alpm_splitname() a bit more generic in where it stores the
data it parses.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-06-24 14:04:39 -05:00
Dan McGee 4f8ae2bab6 Don't require a transaction for sync DB updates
Instead, just do the required locking directly in the backend in calls
to alpm_db_update().

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-06-24 04:11:38 -05:00
Dan McGee 79e98316ea Add a 'valid' flag to the database object
Start by converting all of our flags to a 'status' bitmask (pkgcache
status, grpcache status). Add a new 'valid' flag as well. This will let
us keep track if the database itself has been marked valid in whatever
fashion.

For local databases at the moment we ensure there are no depends files;
for sync databases we ensure the PGP signature is valid if
required/requested. The loading of the pkgcache is prohibited if the
database is invalid.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-06-24 03:46:12 -05:00
Dan McGee 1150d9e15a Move database 'version' check to registration time
This is another step toward doing both local database validation
(ensuring we don't have depends files) and sync database validation (via
signatures if present) when the database is registered.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-06-24 03:31:32 -05:00
Dan McGee db3b86e7f3 Do database signature checking at load time
This is the ideal place to do it as all clients should be checking the
return value and ensuring there are no errors. This is similar to
pkg_load().

We also add an additional step of validation after we download a new
database; a subsequent '-y' operation can potentially invalidate the
original check at registration time.

Note that this implementation is still a bit naive; if a signature is
invalid it is currently impossible to refresh and re-download the file
without manually deleting it first. Similarly, if one downloads a
database and the check fails, the database object is still there and can
be used. These shortcomings will be addressed in a future commit.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-06-24 02:34:50 -05:00
Dan McGee d589a7b5db Prevent segfault when parsing unexpected sync database file
This doesn't fix the real (bigger) problem of failing to parse sync
databases without directory entries, but it does prevent the parser from
segfaulting when the first desc file encountered did not have a
directory entry, among other conditions.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-06-24 01:36:47 -05:00
Kerrick Staley d980bd403d Documented purpose of be_*.c
Added a line to the top of each of be_local.c, be_package.c, and
be_sync.c indicating their purposes.

Signed-off-by: Kerrick Staley <mail@kerrickstaley.com>
Signed-off-by: Dan McGee <dan@archlinux.org>
2011-06-20 00:17:54 -05:00
Dan McGee ee015f086f Ensure handle is valid and pm_errno is reset when calling into API
We didn't do due diligence before and ensure prior pm_errno values
weren't influencing what happened in further ALPM calls. I observed one
case of early setup code setting pm_errno to PM_ERR_WRONG_ARGS and that
flag persisting the entire time we were calling library code.

Add a new CHECK_HANDLE() macro that does two things: 1) ensures the
handle variable passed to it is non-NULL and 2) clears any existing
pm_errno flag set on the handle. This macro can replace many places we
used the ASSERT(handle != NULL, ...) pattern before.

Several other other places only need a simple 'set to zero' of the
pm_errno field.

Signed-off-by: Dan McGee <dan@archlinux.org>
2011-06-14 10:01:08 -05:00