Bugs

From Vita Development Wiki
Revision as of 15:20, 11 August 2023 by CreepNT (talk | contribs) (Add SceTimerForUsleep accessible from NS)
Jump to navigation Jump to search

The PS Vita has bugs. Some bugs can lead to Vulnerabilities. Others lead to nothing useful (yet) but can serve as examples of what not to do.

Exploitable bugs

See Vulnerabilities.

Non-exploitable bugs

TrustZone

SceTimerForUsleep is accessible from NS

The SceTimerForUsleep timer (Word Timer 7) is reserved for the TrustZone kernel. However, this timer's MMIO range is not blacklisted from access by ARM cores in Non-Secure state. An attacker in Non-Secure state can thus change the timer's configuration, even if it is being used from Secure state.

This timer is used for the implementation of sceKernelUsleep, which can be roughly summed up as "program the timer to send an interrupt after delay has elapsed and wait for the interrupt". By stopping the timer while core X is executing sceKernelUsleep() in Secure state, this allows an attacker on another core in Non-Secure state to hang core X (as it waits for an interrupt that will never be delivered). In theory, this could be used to create or enlarge a ToCToU window; in practice, this attack is useless because only the SMC 0x121 handler (inside SceDriverTzs) uses sceKernelUsleep() and it isn't vulnerable to ToCToU. This was only checked in 3.65 and may not be true in older firmwares.

For obvious reasons, interrupts are not disabled while waiting in sceKernelUsleep - thus, this vulnerability does not allow an attacker to prevent Secure state interrupts handling on a core.

Kernel

Syscall table collision between modules

Because of the time between a slot is allocated and the time it is written to, there could be collisions. For example, assume there is one empty syscall slot left. Two modules each exporting syscalls are loaded and both of them are assigned the final free slot. One user library is loaded that imports from the first module. Then it imports from the second module. At this point, the function pointer exported by the first module is replaced with the second one.

It is unlikely this would lead to any security vulnerabilities, but it could create system instability. However, if the system has so many (let us assume more than 3000) syscalls loaded, it may be already in an unstable state.

Kernel heap pointer leak in sceKernelGetLibraryInfoByNID

Discovered on 2019-12-17 by Princess of Sleeping.

SceKernelModulemgr#sceKernelGetLibraryInfoByNID leaks a kernel heap pointer, but it is probably not useful for kernel exploitation.

SceKernelLibraryInfo.libname is a pointer to kernel memory. See SceKernelModulemgr#Types.

PoC code:

SceUID modids[0x80];
SceSize num = 0x80;
SceKernelLibraryInfo libinfo;
libinfo.size = sizeof(libinfo);
sceKernelGetModuleList(~0, modids, (int *)&num);
sceKernelGetLibraryInfoByNID(modids[num - 2], 0xCAE9ACE6, &libinfo);
sceClibPrintf("LEAKED KERNEL HEAP POINTER !!! ---> 0x%X <--- !!!\n", libinfo.libname);

Not fixed as of FW 3.600.011.

SceIofilemgr misses internal NULL pointer checks

SceIofilemgr's syscalls wrappers do various checks in usermode for the sanity of usermode arguments, but some internal functions that the syscalls call do not do proper checks.

For example, you can simply trigger a Kernel DABT by running the following code:

sceIoDevctl(NULL, 0, NULL, 0, NULL, 0);

Confirmed in FW 2.10. FWs >=3.60 have proper checks.

sceAppMgrDestroyAppByAppId triggers kernel panic

Triggering a usermode exception immediately after calling sceAppMgrDestroyAppByAppId causes ?SceKernelThreadMgr? to get confused and trigger a kernel exception.

sceKernelCreateThread in thumb mode

SceKernelThreadMgr#sceKernelCreateThreadForUser checks the memory attributes to see if the entry point is executable, but in thumb mode, the function pointer always has bit 0 as 1, so if the entry point is the last 4-bytes of a memory page, then the next check fails and returns 0x80020006.

res = sceKernelIsEqualAccessibleRangeProcBySWForDriver(pid, memory_attr, entry, 4);

sceNetRecvfromForDriver 0xC0022005 error on kernel call

This is because the internal function always sets the is_user flag in the parameter, so setting the kernel memory pointer to data in SceNetPs#sceNetRecvfromForDriver will result in an error in SceSysmem#sceKernelCopyToUserDomainForKernel or SceSysmem#sceKernelCopyToUserTextDomainForKernel.

// Offsets are for FW 3.60

// Patch by function hook
SceUID target = -1;
tai_hook_ref_t FUN_8100d5a8_ref;
int FUN_8100d5a8_patch(void *a1, void *a2, void *a3, int a4, void *a5, void *a6) {
	if (target == sceKernelGetThreadIdForDriver())
		*(int *)(a3 + 5 * 4) = 1; // 0:user 1:kernel 2~:kpanic
	return TAI_CONTINUE(int, FUN_8100d5a8_ref, a1, a2, a3, a4, a5, a6);
}


// Patch by code injection (recommended)
int patch_netrecv_0xC0022005(void) {
/*
        810067b2 c0 ef 10 00     vmov.i32   d16,#0              -> DD F8 30 C0   ldr ip, [sp, #0x30]
        810067b6 19 68           ldr        r1, [r3]
        810067b8 a2 60           str        r2, [r4, #8]
        810067ba da f8 0c 30     ldr.w      r3, [sl, #0xc]

        810067be c4 e9 07 55     strd       r5, r5, [r4,#0x1c]  -> C4 E9 07 C5   strd ip, r5, [r4,#0x1c]
        810067c2 a5 61           str        r5, [r4, #0x18]

        810067c4 e3 60           str        r3, [r4, #0xc]
        810067c6 61 62           str        r1, [r4, #0x24]

        810067c8 c4 ed 04 0b     vstr.64    d16, [r4,#0x10]     -> C4 E9 04 55   strd r5, r5, [r4,#0x10]
*/
	SceUID module_id;
	void *patch_point;
	char inst[0x20];
	module_id = sceKernelSearchModuleByNameForDriver("SceNetPs");
	module_get_offset(0x10005, module_id, 0, 0x67b2, (uintptr_t *)&patch_point);
	memcpy(inst, patch_point, 0x1E);
	memcpy(&(inst[0x0]), (const char[4]){0xDD, 0xF8, 0x30, 0xC0}, 4);
	memcpy(&(inst[0xC]), (const char[4]){0xC4, 0xE9, 0x07, 0xC5}, 4);
	memcpy(&(inst[0x16]), (const char[4]){0xC4, 0xE9, 0x04, 0x55}, 4);
	taiInjectDataForKernel(0x10005, module_id, 0, 0x67B2, inst, 0x1E);
	return 0;
}

Illegal alignment check of kernel allocator

Discovered on 2021-08-30 by Princess of Sleeping.

For example, if 0x880 is passed as the alignment argument of kernel malloc, the function will not return NULL.

This affects at least SceNetPs malloc and system malloc internal/external.

Ignored sceGUIDGetNameCore error propagation

Discovered on 2022-03-10 by Princess of Sleeping.

sceGUIDGetNameCore, which is called internally by SceSysmem#sceGUIDGetNameForDriver or SceSysmem#sceGUIDGetName2ForDriver, always returns 0 even if an error occurs in the function.

void unsafe_calling_example_1(void) {
    int res;
    const char *name;

    // Use some tricks to reach sceGUIDGetNameCore with invalid guid.
    res = sceGUIDGetName((invalid_guid | 1) & ~0xC0000000, &name);

    // res is always 0 even failed internally.
    // And sceGUIDGetNameCore initializes name with NULL, but if the internal check fails too early, name is not initialized and is undefined.
}

void unsafe_calling_example_2(void) {
    const char *name;

    // Use some tricks to reach sceGUIDGetNameCore with invalid guid.
    name = sceGUIDGetName2((invalid_guid | 1) & ~0xC0000000);

    // res is always 0 even failed internally.
    // And sceGUIDGetNameCore initializes name with NULL, but if the internal check fails too early, name is not initialized and is undefined.
    // If sceGUIDGetNameCore failed internally, name value is *(uint32_t *)(unsafe_calling_example_2_current_sp - 0x10)
}

void safe_calling_example_1(void) {
    int res;
    const char *name = NULL; // Initialize with NULL in advance

    // Use some tricks to reach sceGUIDGetNameCore with invalid guid.
    res = sceGUIDGetName((invalid_guid | 1) & ~0xC0000000, &name);

    // res is always 0 even failed internally.

    if(NULL == name){
        sceKernelPrintf("Failed %s\n", "sceGUIDGetName");
    }
}

void safe_calling_example_2(void) {
    int res;
    const char *name;

    // Add guid valid check
    res = some_guid_valid_check(invalid_guid);
    if (res < 0)
        return; // If invalid guid it, do not call sceGUIDGetName2.

    // Use some tricks to reach sceGUIDGetNameCore with invalid guid.
    name = sceGUIDGetName2((invalid_guid | 1) & ~0xC0000000);

    // name is always not NULL.
}

Incomplete register restore on intr handler

Discovered on 2023-03-08 by Princess of Sleeping.

Confirmed on fw 1.810.

In the example below, an interrupt occurs when rw_data is loaded. In that case the interrupt handler will handle it, but not fully restore the DACR when leaving the function, but restore it from the ThreadCB.

And what is set in ThreadCB is the kernel's default client setting of 0x55550000. So when the interrupt ends and you try to write to rx_data, a DABT occurs.

int resolve_something(void *rx_data, void *rw_data){

	SceUInt32 dacr = sceKernelGetDACR();
	sceKernelSetDACR(dacr | 0xFFFF0000);

	int write_data = *(int *)rw_data;

	// Happened intr And Setting dacr to 0x55550000 on intr_handler register restore.

	*(int *)rx_data = write_data; // Trigger DABT because there is no write in RX.

	sceKernelSetDACR(dacr);

	return 0;
}

Add disable intr to fix these.

int resolve_something(void *rx_data, void *rw_data){

	asm volatile ("cpsid aif\n");
	SceUInt32 dacr = sceKernelGetDACR();
	sceKernelSetDACR(dacr | 0xFFFF0000);

	int write_data = *(int *)rw_data;
	*(int *)rx_data = write_data;

	sceKernelSetDACR(dacr);
	asm volatile ("cpsie aif\n");

	return 0;
}

DACR corrupte due to sceKernelIsAccessibleRangeProc

If sceKernelIsAccessibleRangeProc is specified in the pid argument, switches to the target process MMU Mapping, but does not restore DACR correctly at the time of termination processing.

Simplified code.

int sceKernelIsAccessibleRangeProc(int pid, int perm, const void *addr, int size){

	if(pid != 0){
		SceUInt32 dacr = sceKernelGetDACR();
		set_process_mmu(pid);
		sceKernelIsAccessibleRangeProc_core(perm, addr, size);
		sceKernelSetDACR(dacr & 0x55555555);
	}else{
		// ...
	}
}

Be careful if you are in Development Kit. The callback of sceKernellPrintf calls sceKernelIsAccessibleRangeProc. (SceSysmem::sceKernellPrintf -> SceDeci4pSTtyp::handler -> call sceKernelIsAccessibleRangeProc for n/s format)

Also, if you crash something when you write on RX with after DACR 0xFFFFFFFF, suspect this. This is not the only function for MMU Mapping like this.

Limited buffer size in dbginfo handler for sceKernelPrintf*

The handler properly converts dbginfo like 0:0xAAAAAAAA55555555(something_func:335):0xA5A5A5A5(file.c):Hi\n and outputs it to tty, but its buffer size is limited, so if the function name or file name is too long, the conversion will be cut off and the incorrect output will be output to tty.

Wrong range control in vnode lock/unlock

If your thread tries to lock the vnode while another thread is locking the target vnode, vp->waiter is incremented, but waiter is a 32-bit member, but the lock function Tries to increment over a range of 64-bits.

R_ARM_CALL/R_ARM_JUMP24 relocations not performed properly

Discovered on 2023-06-17 by CreepNT.

There is a bug in the SceKernelModulemgr routine that handles relocation types R_ARM_CALL (28) and R_ARM_JUMP24 (29):

//S, A and P correspond to the relocation variables detailed in the "ELF for the Arm® Architecture" document.
int displacement = (A - P) + S;
unsigned opcode = read_opcode_from_address(P);

if ((opcode & 0xF0000000) == 0x0) { //<- bug here
   opcode = (opcode & 0xFEFFFFFF) | (displacement & 0x2) << 23; //write bit 1 of displacement in 'H' bit of BLX
}
opcode = (opcode & 0xFF000000) | (displacement >> 2) & 0xFFFFFF; //write displacement in imm24 (bottom 2 bits not needed due to code alignment)

write_opcode_to_address(P);

The if-gated code is supposed to handle the special case of the BLX instruction, which has an additional bit (H) of storage for the offset to target function (because ARM code is 4-byte aligned but Thumb code is 2-byte aligned). The BLX instruction should be identified because it has cond=0xF, but this code checks for cond=0x0 instead (EQ).

This bug will thus cause all relocated BLEQ instructions to turn into BEQ instructions - fortunately, this has no consequence because the instructions are equivalent.

However, and most importantly, it also results in some BLX instructions not being properly relocated (as H is not set/cleared when it should). One of three scenarios happens when a BLX is "relocated":

  • H has the correct value: everything goes fine
  • H is set but should be clear: BLX will skip the first instruction of the target function
  • H is clear but should be set: BLX will jump to the one instruction right before the target function's start

When an improperly relocated BLX is executed, the program may end crashing (e.g. UNDEFINED abort), behave unexpectedly (function doesn't actually run, argument is corrupted, etc) or appear to work properly depending on the exact situation.

This bug exists since at least firmware 0.920 and has never been fixed. It is plausible it exists since an earlier (or even the first) revision of the OS.

Kernel Boot Loader

Out of range access in SKBL

Discovered on 2022-01-20 by Princess of Sleeping.

To decode ARZL encoded TrustZone SceSysmem, SKBL maps Compati SRAM (PA 0x1C000000) to TrustZone VA with a size of 2MiB. It then calls SKBL#sceArlzDecode with an improper argument, thus using glitches during decoding to exceed 2MiB will pass the size check and access outside the range of the device, so it can trigger a Data abort exception.

Moreover, even if SKBL#sceArlzDecode returns an error code, it will be passed to the argument of SKBL#sceArlzArmFilter without being checked, so access for up to 0x80560201-bytes will occur.

if (sceKernelCpuId() == 0) {
  sceKernelMMUMapSections(*(void **)(param_1 + 0x60), 0x1061D007, 0xC, 0x1C000000, 0x200000 /* mapping size */, 0x1C000000);
  res = sceArlzDecode(0x1C000000 /* dst */, 0x1000000 /* dst max size */, &ARZL_encoded_SceSysmem[4] /* src */, NULL);
  size = sceArlzArmFilter(0x1C000000, res, 0);
  g_Tzs_SceSysmem_start_address = 0x1C000000;
  g_Tzs_SceSysmem_end_address = 0x1C000000 + size;
}

It is currently just a bug as no glitching has been tried and as a Data abort exception is not useful.

Incorrect mapping size specified to MapASLR

Discovered on 2023-07-25 by CreepNT. This bug also affects NSKBL.

Since System Software version ?, SKBL and NSKBL randomize the virtual address of objects allocated during boot that remain mapped after KBL ends. To achieve this, the ASLR seed from KBL Param and the size of each mapping are used by the MapASLR routine. Along with an internal bitmap to keep track of the previously allocated virtual memory pages, it finds a virtual address aligned with vsize such that enough pages to fit vsize bytes are free after it, before marking the whole range as allocated. However, the first call to MapASLR (for SceKernelL2PageTable000) is performed with vsize=0x1000 instead of vsize=0x2000. This results in an improper update of the bitmap - some virtual memory that should be considered allocated remains marked as free. This bug will usually not result in any noticeable behavior because all other allocations are performed properly - in addition, all other vsizes are >= 0x2000 so they are more strictly aligned, thus reducing the risk of overlap. However, due to the random nature of this algorithm, it might be possible that certain ASLR seeds cause a kernel panic (in SKBL) due to two allocations overlapping (this should be caught later during Sysmem start, as the Memblock objects created to back these mappings should conflict).

Probably present since ASLR was introduced in SKBL & NSKBL. Not fixed as of System Software 3.74.

Non-Secure Kernel Boot Loader (NSKBL)

Null dereference in the NSKBL kernel panic handler

(2021/06/19 by Princess of Sleeping) The kernel panic handler accesses the SceSysroot pointer, but since that pointer is set to NULL during early boot, NULL access to SceSysroot occurs.

CelesteBlue: If I understood correctly, this means that as long as NSKBL is running, a non-secure Kernel panic from any cause will end up in a DABT exception at NSKBL level.

CreepNT: The global SceSysroot pointer is initialized during sceKernelSysrootStart (soon after the MMU is brought up) and any panic after this point will not DABT. However, if a panic happens before but the MMU is disabled, since 0 is a valid (physical) address, no DABT will occur (but since bogus "Sysroot" data will be read, system may e.g. PABT if bogus data is interpreted as a function pointer). This only leaves a tiny window during which the MMU is enabled but sceKernelSysrootStart has not been executed where a panic will cause a DABT (but since there is basically no code in that window that can panic, a DABT should never happen because of this bug).

Present in FW 3.600.011, 3.650.011.

Out-of-bounds write in sceKernelSysrootStart

(2023/04/25 by CreepNT)

In old firmwares, during the execution of the sceKernelSysrootStart function, 0x80 bytes are allocated from the Sysroot heap then divided in 4 blocks of 0x20 bytes each. Each CPU then loads the address of its block into the TPIDRPRW register.

Later on during this same function, the sceKernelTlsKernelSet subroutine is called - however, it expects TPIDRPRW to hold a pointer to a ThreadCB (Thread object) which is much larger than 0x20, and does the following: *(uint32_t*)(TPIDRPRW + 0x74) = 0;. This results in an out-of-bounds write at offsets 0x74, 0x94, 0xB4 and 0xD4 of the Sysroot heap for CPU0, CPU1, CPU2 and CPU3 respectively.

However, this bug has no real consequence (and was probably never noticed) because the data at these offsets is:

  • 0x74: inside the TPIDRPRW block, which is unused
  • 0x94: inside a 0x28 bytes allocation which has not been written to yet
  • 0xB4: inside heap padding (all allocations are rounded up to 32/64B boundary)
  • 0xD4: offset 0x14 inside the SceKblParam structure, which is unused

Exists since at least System Software version 0.920.050. Fixed in System Software 0.990 - due to a rework of the kernel TLS system, sceKernelTlsKernelSet was removed; thus the invalid call is no longer performed.

Incorrect mapping size specified to MapASLR

See the description in SKBL.

Shell

Unvalidated IPMI arguments lead to DoS

(2023/05/31 by CreepNT, reported by M Ibrahim) At least one IPMI server (SceDownload) does not validate the amount of IPMI::DataInfo (input) or IPMI::BufferInfo (output) arguments received before using them. This leads to garbage being used as pointers and dereferenced, in turn crashing the SceShell process due to a Data Abort Exception.

It might be possible to use this as a vector for a data-only attack on Shell.

Present in firmware 3.60.