rswiki-book/Archive-Format.mediawiki

[[Category Cache]]

== Introduction ==

Since 194 all way up until 377, all the files in cache 0 have an archive-like format which contains a collection of named files (e.g. '''BADENC.TXT''' is a file which contains bad words in the '''wordenc''' archive).

=== Diagram ===

----

[http://img263.imageshack.us/img263/9678/68481568.png External Diagram Image]

== Usage ==

These files are used by the client for a variety of purposes. Some, such as the '''DATA''' file contain data themselves (in this case the interfaces). Others, such as the '''MAP_INDEX''' file, contain information about where to locate the map and landscape files in the cache.

== Format ==

 tribyte uncompressedsize
 tribyte compressedsize

If the uncompressed and compressed sizes are equal, the whole file is not compressed but the individual entries are compressed using bzip2. If they are not equal, the entire file is compressed using bzip2 but the individual entries are not.

Also note, the magic id at the start of the bzip2 entries are not included in the cache. If you use an existing API to read the files and want to add this back, you must append the four characters: BZh1 before you decompress.

 short fileCount

Each file entry has the format:

 int nameHash
 tribyte uncompressedSize
 tribyte compressedSize

When you are looping through the files, you need to keep track of the file offset yourself. This psuedocode demonstrates how:

 int offset = buffer.getCurrentOffset() + numFiles * 10;
 for(int i = 0; i < numFiles; i++) {
    // read values
    int thisFileOffset = offset;
    offset += thisFileCompressedSize;
 }

To get a named file by its name, you should first hash the name using this method:

 public static int hash(String name) {
    int hash = 0;
    name = name.toUpperCase();
    for(int j = 0; j < name.length(); j++) {
        hash = (hash * 61 + name.charAt(j)) - 32;
    }
    return hash;
 }

Then, loop through the file entries you loaded earlier to find a matching hash. Read the compressed file size from the offset. If the whole file is not compressed, you should decompress the individual entry.


== '''#194 Archive Format''' ==

The 194 (RuneScape 2 beta) client worked with a very simple cache format. Each file in the cache was a file on the operating system.

=== Name hashing ===

Every name in the cache was hashed using the following method which is, incidentally, similar to the way player names are converted to longs.

 public static final long gethash(String string) {
     string = string.trim();
     long l = 0L;
     for (int i = 0; i < string.length() && i < 12; i++) {
         char c = string.charAt(i);
         l *= 37L;
         if (c >= 'A' && c <= 'Z')
             l += (long) ('\001' + c - 'A');
         else if (c >= 'a' && c <= 'z')
             l += (long) ('\001' + c - 'a');
         else if (c >= '0' && c <= '9')
             l += (long) ('\033' + c - '0');
     }
     return l;
 }

The resulting long was converted to a string and the file was given that name.

=== Files ===

The files in the cache were the ones used in the [[JAGGRAB Protocol|JAGGRAB Protocol]] (i.e. files in cache 0 in old engine caches) and map (mX_Y) and landscape (lX_Y) files. Incidentally, this naming is very similar to the names of the map and landscape files in new engine caches.


== '''#317 Archive Format''' ==

The old engine cache is made up two types of files.

=== Data file ===

The data file holds all of the files in the cache and is named '''main_file_cache.dat'''. It is therefore very big, typically ~10-20 megabytes..

=== Index file ===

There are several index files, named '''main_file_cache.idx''' and then postfixed with a number. Each index file holds 'pointers' to where a file is located in the main cache. Each index file represents a type of file.

=== Index file format ===

The index file is made up of 6 byte blocks which hold information about where a file can be located in the data file. The format of a single block is as follows:

 tribyte fileSize
 tribyte initialDataBlockId

=== Data file format ===

The data file is made up of 520 byte blocks. The format of each of these blocks is as follows:

 short nextFileId
 short currentFilePartId
 tribyte nextDataBlockId
 byte nextFileTypeId
 byte[512] blockData

=== Explanation ===

An example will be used here as it is easier to follow.

Let us say, the client wishes to fetch file type 2, file id 17.

First off, it will open the main_file_cache.idx2 file and seek to the index 17 * 6 (102). It will then read two tribytes.

 fileSize = 1200
 intialDataBlockId = 4

The client will now open the main_file_cache.dat file and seek to the index 4 * 520 (2080). The values it reads will be:

 nextFileId = 17
 currentFilePartId = 0
 nextDataBlockId = 5
 nextFileTypeId = 2
 blockData = ...

It will read the first 512 bytes of the file and then knows that there is 688 bytes left. Therefore, it has to read the next block.

 nextFileId = 17
 currentFilePartId = 1
 nextDataBlockId = 6
 nextFileTypeId = 2
 blockData ...

It reads these next 512 bytes of the file and now knows that there are 176 bytes left. So for a final time, it will read the next block.

 nextFileId = 18
 currentFilePartId = 2
 nextDataBlockId = 7
 nextFileTypeId = 2
 blockData = ...

It can ignore most of these values (the next ones are meaningless at this stage) and read the final 176 bytes. The whole 1200 byte file has now been read.
Create MediaWiki page 'Archive Format' 2012-08-27 22:20:17 -04:00			`[[Category Cache]]`

			`== Introduction ==`

			`Since 194 all way up until 377, all the files in cache 0 have an archive-like format which contains a collection of named files (e.g. '''BADENC.TXT''' is a file which contains bad words in the '''wordenc''' archive).`

			`=== Diagram ===`

			`----`

			`[http://img263.imageshack.us/img263/9678/68481568.png External Diagram Image]`

			`== Usage ==`

			`These files are used by the client for a variety of purposes. Some, such as the '''DATA''' file contain data themselves (in this case the interfaces). Others, such as the '''MAP_INDEX''' file, contain information about where to locate the map and landscape files in the cache.`

			`== Format ==`

			`tribyte uncompressedsize`
			`tribyte compressedsize`

			`If the uncompressed and compressed sizes are equal, the whole file is not compressed but the individual entries are compressed using bzip2. If they are not equal, the entire file is compressed using bzip2 but the individual entries are not.`

			`Also note, the magic id at the start of the bzip2 entries are not included in the cache. If you use an existing API to read the files and want to add this back, you must append the four characters: BZh1 before you decompress.`

			`short fileCount`

			`Each file entry has the format:`

			`int nameHash`
			`tribyte uncompressedSize`
			`tribyte compressedSize`

			`When you are looping through the files, you need to keep track of the file offset yourself. This psuedocode demonstrates how:`

			`int offset = buffer.getCurrentOffset() + numFiles * 10;`
			`for(int i = 0; i < numFiles; i++) {`
			`// read values`
			`int thisFileOffset = offset;`
			`offset += thisFileCompressedSize;`
			`}`

			`To get a named file by its name, you should first hash the name using this method:`

			`public static int hash(String name) {`
			`int hash = 0;`
			`name = name.toUpperCase();`
			`for(int j = 0; j < name.length(); j++) {`
			`hash = (hash * 61 + name.charAt(j)) - 32;`
			`}`
			`return hash;`
			`}`

			`Then, loop through the file entries you loaded earlier to find a matching hash. Read the compressed file size from the offset. If the whole file is not compressed, you should decompress the individual entry.`


			`== '''#194 Archive Format''' ==`

			`The 194 (RuneScape 2 beta) client worked with a very simple cache format. Each file in the cache was a file on the operating system.`

			`=== Name hashing ===`

			`Every name in the cache was hashed using the following method which is, incidentally, similar to the way player names are converted to longs.`

			`public static final long gethash(String string) {`
			`string = string.trim();`
			`long l = 0L;`
			`for (int i = 0; i < string.length() && i < 12; i++) {`
			`char c = string.charAt(i);`
			`l *= 37L;`
			`if (c >= 'A' && c <= 'Z')`
			`l += (long) ('\001' + c - 'A');`
			`else if (c >= 'a' && c <= 'z')`
			`l += (long) ('\001' + c - 'a');`
			`else if (c >= '0' && c <= '9')`
			`l += (long) ('\033' + c - '0');`
			`}`
			`return l;`
			`}`

			`The resulting long was converted to a string and the file was given that name.`

			`=== Files ===`

			`The files in the cache were the ones used in the [[JAGGRAB Protocol\|JAGGRAB Protocol]] (i.e. files in cache 0 in old engine caches) and map (mX_Y) and landscape (lX_Y) files. Incidentally, this naming is very similar to the names of the map and landscape files in new engine caches.`


			`== '''#317 Archive Format''' ==`

			`The old engine cache is made up two types of files.`

			`=== Data file ===`

			`The data file holds all of the files in the cache and is named '''main_file_cache.dat'''. It is therefore very big, typically ~10-20 megabytes..`

			`=== Index file ===`

			`There are several index files, named '''main_file_cache.idx''' and then postfixed with a number. Each index file holds 'pointers' to where a file is located in the main cache. Each index file represents a type of file.`

			`=== Index file format ===`

			`The index file is made up of 6 byte blocks which hold information about where a file can be located in the data file. The format of a single block is as follows:`

			`tribyte fileSize`
			`tribyte initialDataBlockId`

			`=== Data file format ===`

			`The data file is made up of 520 byte blocks. The format of each of these blocks is as follows:`

			`short nextFileId`
			`short currentFilePartId`
			`tribyte nextDataBlockId`
			`byte nextFileTypeId`
			`byte[512] blockData`

			`=== Explanation ===`

			`An example will be used here as it is easier to follow.`

			`Let us say, the client wishes to fetch file type 2, file id 17.`

			`First off, it will open the main_file_cache.idx2 file and seek to the index 17 * 6 (102). It will then read two tribytes.`

			`fileSize = 1200`
			`intialDataBlockId = 4`

			`The client will now open the main_file_cache.dat file and seek to the index 4 * 520 (2080). The values it reads will be:`

			`nextFileId = 17`
			`currentFilePartId = 0`
			`nextDataBlockId = 5`
			`nextFileTypeId = 2`
			`blockData = ...`

			`It will read the first 512 bytes of the file and then knows that there is 688 bytes left. Therefore, it has to read the next block.`

			`nextFileId = 17`
			`currentFilePartId = 1`
			`nextDataBlockId = 6`
			`nextFileTypeId = 2`
			`blockData ...`

			`It reads these next 512 bytes of the file and now knows that there are 176 bytes left. So for a final time, it will read the next block.`

			`nextFileId = 18`
			`currentFilePartId = 2`
			`nextDataBlockId = 7`
			`nextFileTypeId = 2`
			`blockData = ...`

			`It can ignore most of these values (the next ones are meaningless at this stage) and read the final 176 bytes. The whole 1200 byte file has now been read.`