rswiki-book/src/Archive-Format.md

\[\[Category Cache\]\]

== Introduction ==

Since 194 all way up until 377, all the files in cache 0 have an
archive-like format which contains a collection of named files (e.g.
'''BADENC.TXT''' is a file which contains bad words in the '''wordenc'''
archive).

=== Diagram ===

------------------------------------------------------------------------

\[http://img263.imageshack.us/img263/9678/68481568.png External Diagram
Image\]

== Usage ==

These files are used by the client for a variety of purposes. Some, such
as the '''DATA''' file contain data themselves (in this case the
interfaces). Others, such as the '''MAP\_INDEX''' file, contain
information about where to locate the map and landscape files in the
cache.

== Format ==

tribyte uncompressedsize tribyte compressedsize

If the uncompressed and compressed sizes are equal, the whole file is
not compressed but the individual entries are compressed using bzip2. If
they are not equal, the entire file is compressed using bzip2 but the
individual entries are not.

Also note, the magic id at the start of the bzip2 entries are not
included in the cache. If you use an existing API to read the files and
want to add this back, you must append the four characters: BZh1 before
you decompress.

short fileCount

Each file entry has the format:

int nameHash tribyte uncompressedSize tribyte compressedSize

When you are looping through the files, you need to keep track of the
file offset yourself. This psuedocode demonstrates how:

int offset = buffer.getCurrentOffset() + numFiles \* 10; for(int i = 0;
i \< numFiles; i++) { // read values int thisFileOffset = offset; offset
+= thisFileCompressedSize; }

To get a named file by its name, you should first hash the name using
this method:

public static int hash(String name) { int hash = 0; name =
name.toUpperCase(); for(int j = 0; j \< name.length(); j++) { hash =
(hash \* 61 + name.charAt(j)) - 32; } return hash; }

Then, loop through the file entries you loaded earlier to find a
matching hash. Read the compressed file size from the offset. If the
whole file is not compressed, you should decompress the individual
entry.

== '''\#194 Archive Format''' ==

The 194 (RuneScape 2 beta) client worked with a very simple cache
format. Each file in the cache was a file on the operating system.

=== Name hashing ===

Every name in the cache was hashed using the following method which is,
incidentally, similar to the way player names are converted to longs.

public static final long gethash(String string) { string =
string.trim(); long l = 0L; for (int i = 0; i \< string.length() && i \<
12; i++) { char c = string.charAt(i); l \*= 37L; if (c \>= 'A' && c \<=
'Z') l += (long) ('\\001' + c - 'A'); else if (c \>= 'a' && c \<= 'z') l
+= (long) ('\\001' + c - 'a'); else if (c \>= '0' && c \<= '9') l +=
(long) ('\\033' + c - '0'); } return l; }

The resulting long was converted to a string and the file was given that
name.

=== Files ===

The files in the cache were the ones used in the \[\[JAGGRAB
Protocol\|JAGGRAB Protocol\]\] (i.e. files in cache 0 in old engine
caches) and map (mX\_Y) and landscape (lX\_Y) files. Incidentally, this
naming is very similar to the names of the map and landscape files in
new engine caches.

== '''\#317 Archive Format''' ==

The old engine cache is made up two types of files.

=== Data file ===

The data file holds all of the files in the cache and is named
'''main\_file\_cache.dat'''. It is therefore very big, typically \~10-20
megabytes..

=== Index file ===

There are several index files, named '''main\_file\_cache.idx''' and
then postfixed with a number. Each index file holds 'pointers' to where
a file is located in the main cache. Each index file represents a type
of file.

=== Index file format ===

The index file is made up of 6 byte blocks which hold information about
where a file can be located in the data file. The format of a single
block is as follows:

tribyte fileSize tribyte initialDataBlockId

=== Data file format ===

The data file is made up of 520 byte blocks. The format of each of these
blocks is as follows:

short nextFileId short currentFilePartId tribyte nextDataBlockId byte
nextFileTypeId byte\[512\] blockData

=== Explanation ===

An example will be used here as it is easier to follow.

Let us say, the client wishes to fetch file type 2, file id 17.

First off, it will open the main\_file\_cache.idx2 file and seek to the
index 17 \* 6 (102). It will then read two tribytes.

fileSize = 1200 intialDataBlockId = 4

The client will now open the main\_file\_cache.dat file and seek to the
index 4 \* 520 (2080). The values it reads will be:

nextFileId = 17 currentFilePartId = 0 nextDataBlockId = 5 nextFileTypeId
= 2 blockData = ...

It will read the first 512 bytes of the file and then knows that there
is 688 bytes left. Therefore, it has to read the next block.

nextFileId = 17 currentFilePartId = 1 nextDataBlockId = 6 nextFileTypeId
= 2 blockData ...

It reads these next 512 bytes of the file and now knows that there are
176 bytes left. So for a final time, it will read the next block.

nextFileId = 18 currentFilePartId = 2 nextDataBlockId = 7 nextFileTypeId
= 2 blockData = ...

It can ignore most of these values (the next ones are meaningless at
this stage) and read the final 176 bytes. The whole 1200 byte file has
now been read.