Difference between revisions of "ARZL"

From Vita Development Wiki
Jump to navigation Jump to search
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
ARZL is a (standard ?) compressed format which is used by the [[Boot Sequence|secure bootloader]] to load the non-secure bootloader and the secure kernel.
+
LZRA (ARZL in big-endian) is a compression and encoding format used on PS Vita. It is used for example to store files used by the [[SKBL]] like the [[NSKBL]] and some [[Tzs]] modules. It is also used on GIM texture data used by /sce_sys/right/right.suprx.
  
== Obfuscation ==
+
== Naming ==
  
The raw decompressed ARZL output is obfuscated. Although there are three versions of the obfuscation, the basic operation is the same. The obfuscation is just bit swaps as well as some deterministic changes using information from the offset.
+
It must be part of the "LZ" algorithms. See [https://fr.slideshare.net/rajanstvinod/cjb0912010-lz-algorithms LZ algorithms overview]. It might be similar to:
 +
* LZMA (Lempel-Ziv-Markov). 1998. See [https://tukaani.org/xz/], [https://www.7-zip.org/sdk.html].
 +
* LZHAM (Lempel-Ziv-Huffman-Arithmetic-Markov). 2010. See [https://code.google.com/archive/p/lzham/wikis/DetailedVersionHistory.wiki].
 +
* LZR (Lempel-Ziv-Renau): PSP. 2004. PSP also uses KL3E and KL4E. LZR modification to LZ77 allows pointers to reference anything that has been encoded without being limited by the length of the search. See psxtract (?is PS3 same algorithm?) or pspdecrypt. See:
 +
** BenHur's code: [https://github.com/John-K/pspdecrypt/blob/master/libLZR.c pspdecrypt/libLZR.c by BenHur], [https://github.com/Grumbel/rfactortools/blob/master/other/quickbms/src/compression/libLZR.c rfactortools/libLZR.c by BenHur]
 +
** TPUnix's code: [https://github.com/tpunix/kirk_engine/blob/master/npdpc/tlzrc.c npdpc/tlzrc.c by TPUnix]
 +
* LZRC (LZMA based, unknown variant of LZRC): PS3. 2006. See:
 +
** psxtract: See [https://github.com/xdotnano/PSXtract/blob/master/Windows/lz.cpp PSXtract/Windows/lz.cpp by Hykem] or [https://github.com/libretro/PSXtract/blob/master/Windows/lz.c PSXtract/Windows/lz.c by Hykem] or [https://github.com/ErikPshat/psxtract_hykem/blob/master/Linux/lz.c psxtract/Linux/lz.c by Hykem]
 +
** sign_np: See [https://github.com/swarzesherz/sign_np/blob/master/tlzrc.c sign_np/tlzrc.c by Hykem]
 +
** make_npdata: [https://github.com/ErikPshat/make_npdata-hykem/blob/master/Windows/src/lz.cpp make_npdata/Windows/lz.cpp by Hykem] or [https://github.com/ErikPshat/make_npdata-hykem/blob/master/Linux/lz.cpp make_npdata/Linux/lz.cpp by Hykem]
 +
 
 +
There is a typo in [[SKBL]] functions names where it is named ARLZ instead of ARZL.
 +
 
 +
== Header ==
 +
 
 +
ARZL header is simply the string "ARZL" (41 52 5A 4C).
 +
 
 +
== Encoding ==
 +
 
 +
To encode data into ARZL:
 +
1) Apply ARM filter. See [[#ARM Filter]].
 +
2) ARZL encode
 +
 
 +
== Decoding ==
 +
 
 +
To decode ARZL data:
 +
1) ARZL decode. See [[SKBL#sceArlzDecode]].
 +
2) Remove ARM filter. See [[#ARM Filter]] and [[SKBL#sceArlzArmFilter]].
 +
 
 +
== ARM Filter ==
 +
 
 +
ARZL encoded/decoded data is not the raw data but filtered data. It is applied an ARM filter for efficient compression, rather than obfuscation.
 +
 
 +
Although there are three versions of the ARM filters, the basic operation is the same. The filter is just bit swaps as well as some deterministic changes using information from the offset.
 +
 
 +
=== Version 0 ===
  
 
<source lang="c">
 
<source lang="c">
int arzl_deobfuscate(unsigned char *buffer, int len)
+
int arzl_arm_filter_remove(unsigned char *buffer, int len) {
{
 
 
   unsigned char *buf, *bufend;
 
   unsigned char *buf, *bufend;
 
   uint32_t data;
 
   uint32_t data;
 
   int change_stride;
 
   int change_stride;
 
 
   buf = buffer;
 
   buf = buffer;
 
   bufend = &buffer[len];
 
   bufend = &buffer[len];
 
+
   do {
   do
 
  {
 
 
     data = *(uint32_t *)buf;
 
     data = *(uint32_t *)buf;
 
     buf += 4;
 
     buf += 4;
 
     change_stride = (data & 0xF800F800) >> 27;
 
     change_stride = (data & 0xF800F800) >> 27;
     if ( (data & 0xF800F800) == 0xF800F000 )
+
     if ((data & 0xF800F800) == 0xF800F000) {
    {
 
 
       data = (((data >> 16) & 0xFFC007FF) | ((data & 0x7FF) << 11)) - ((buf - buffer) >> 1);
 
       data = (((data >> 16) & 0xFFC007FF) | ((data & 0x7FF) << 11)) - ((buf - buffer) >> 1);
 
       *((uint32_t *)buf - 1) = ((((data & 0x7FF) << 16) | 0xF800F000) & 0xFFFFF800) | ((data >> 11) & 0x7FF);
 
       *((uint32_t *)buf - 1) = ((((data & 0x7FF) << 16) | 0xF800F000) & 0xFFFFF800) | ((data >> 11) & 0x7FF);
     }
+
     } else if (change_stride == 30)
    else if ( change_stride == 30 )
 
    {
 
 
       buf -= 2;
 
       buf -= 2;
    }
+
   } while (bufend > buf);
   }
 
  while ( bufend > buf );
 
 
}
 
}
 
</source>
 
</source>
Line 36: Line 62:
 
=== Version 1 ===
 
=== Version 1 ===
  
The only change is that the offset information is added instead of subtracted.
+
ARM filter version 1 is the same as version 0 except that the offset information is added instead of subtracted.
  
 
<source lang="c">
 
<source lang="c">
Line 44: Line 70:
 
=== Version 2 ===
 
=== Version 2 ===
  
Version 2 is the same as version 0 but in addition, there's an additional operation to swap two nibbles in certain conditions. The condition is found through a learning process and may be overfitted.
+
ARM filter version 2 is the same as version 0 but in addition, there is an additional operation to swap two nibbles in certain conditions. The condition is found through a learning process and may be overfitted.
 +
 
 
<source lang="c">
 
<source lang="c">
      else if ( (data & 0x8000FBF0) == 0x0000F2C0 )
+
else if ((data & 0x8000FBF0) == 0x0000F2C0) {
      {
+
  data = (data & 0xF0FFFFF0) | ((data & 0xF) << 24) | ((data >> 24) & 0xF);
        data = (data & 0xF0FFFFF0) | ((data & 0xF) << 24) | ((data >> 24) & 0xF);
+
  *((uint32_t *)buf - 1) = data;
        *((uint32_t *)buf - 1) = data;
+
}
      }
 
 
</source>
 
</source>
 +
 +
== Tools ==
 +
 +
TODO
 +
  
 
[[Category:Formats]]
 
[[Category:Formats]]

Revision as of 20:44, 30 January 2022

LZRA (ARZL in big-endian) is a compression and encoding format used on PS Vita. It is used for example to store files used by the SKBL like the NSKBL and some Tzs modules. It is also used on GIM texture data used by /sce_sys/right/right.suprx.

Naming

It must be part of the "LZ" algorithms. See LZ algorithms overview. It might be similar to:

There is a typo in SKBL functions names where it is named ARLZ instead of ARZL.

Header

ARZL header is simply the string "ARZL" (41 52 5A 4C).

Encoding

To encode data into ARZL: 1) Apply ARM filter. See #ARM Filter. 2) ARZL encode

Decoding

To decode ARZL data: 1) ARZL decode. See SKBL#sceArlzDecode. 2) Remove ARM filter. See #ARM Filter and SKBL#sceArlzArmFilter.

ARM Filter

ARZL encoded/decoded data is not the raw data but filtered data. It is applied an ARM filter for efficient compression, rather than obfuscation.

Although there are three versions of the ARM filters, the basic operation is the same. The filter is just bit swaps as well as some deterministic changes using information from the offset.

Version 0

int arzl_arm_filter_remove(unsigned char *buffer, int len) {
  unsigned char *buf, *bufend;
  uint32_t data;
  int change_stride;
  buf = buffer;
  bufend = &buffer[len];
  do {
    data = *(uint32_t *)buf;
    buf += 4;
    change_stride = (data & 0xF800F800) >> 27;
    if ((data & 0xF800F800) == 0xF800F000) {
      data = (((data >> 16) & 0xFFC007FF) | ((data & 0x7FF) << 11)) - ((buf - buffer) >> 1);
      *((uint32_t *)buf - 1) = ((((data & 0x7FF) << 16) | 0xF800F000) & 0xFFFFF800) | ((data >> 11) & 0x7FF);
    } else if (change_stride == 30)
      buf -= 2;
  } while (bufend > buf);
}

Version 1

ARM filter version 1 is the same as version 0 except that the offset information is added instead of subtracted.

data = (((data >> 16) & 0xFFC007FF) | ((data & 0x7FF) << 11)) + ((buf - buffer) >> 1);

Version 2

ARM filter version 2 is the same as version 0 but in addition, there is an additional operation to swap two nibbles in certain conditions. The condition is found through a learning process and may be overfitted.

else if ((data & 0x8000FBF0) == 0x0000F2C0) {
  data = (data & 0xF0FFFFF0) | ((data & 0xF) << 24) | ((data >> 24) & 0xF);
  *((uint32_t *)buf - 1) = data;
}

Tools

TODO