Venezia
VENEZIA (Vision Enabling Engine / Zen-Inspired Architecture) (or VIP) is a Toshiba original multi-core sub-system which has multiple sets of “MPE (Media Processing Engine)” suitable for image processing and multiple image processing accelerators.
Description
The VENEZIA is multi-grain parallellism architecture – application level (multi-core), instruction level (VLIW) and data level (SIMD).
The VENEZIA includes 4 sets of Toshiba original media processing engine “MPE” that enables simultaneous execution of up to four image recognition applications. Each MPE has a Toshiba original 32-bit RISC CPU core MeP-c5 and an image recognition coprocessor IVC2 (the 2nd generation of Image recognition VLIW Coprocessor). The MPE is 3way VLIW machine that can issue up to 3 instructions (one CPU instruction and up to 2 coprocessor instructions). The IVC2 provides SIMD (Single Instruction stream, Multiple Data stream) instructions for simultaneous operations on eight sets of 8-bit data, four sets of 16-bit data, and two sets of 32-bit data and VLIW (Very Long Instruction Word) technology to issue multiple instructions. The IVC2 can execute simultaneously up to 2 SIMD instructions. Since parallel processing of two instructions is enabled, operations on a maximum of 16 sets of 8-bit data can be processed simultaneously.
There are 8 MeP cores that serve as the media co-processor Venezia. The CPU ID register for them are:
0x00300500 0x00310500 0x00320500 0x00330500 0x00340500 0x00350500 0x00360500 0x00370500
This means they are MeP c5 cores and are uniquely numbered 48-55.
Features
- Toshiba original 32-bit RISC multi-core sub-system
- Four media processing engines MPE (MPE0, MPE1, MPE2 and MPE3)(*10)
- Each MPE features
- 32-bit RISC core MeP-c5
- 16KB (2-way set-associative) instruction cache and 16KB (2-way set-associative) data cache
- 64KB (16 KB × 4 banks) data RAM used for efficient image data processing
- Image recognition VLIW co-processor IVC2. VLIW (Very Long Instruction Word) technology that issues up to three instructions simultaneously (one MeP instruction and up to two IVC2 instructions can be encoded in a 64-bit VLIW code).
- SIMD (Single Instruction stream, Multiple Data stream) that perform simultaneous operations on eight sets of 8-bit data, four sets of 16-bit data, and two sets of 32-bit data. Each IVC2 can execute simultaneously up to two SIMD instructions. Some SIMD instructions can store 256 bits of operation results in accumulators for high speed carry processing (eg. 8-bit data + 8bit data → 32-bit of accumulator).
- DMA controller used for transferring the data of data RAM
- Operating frequency: 266.7 MHz maximum
- 256KB (4-way set-associative) L2 cache for shared use among the four MPE
- 2 channels of timer
- JTAG debug port. Also connected to VENEZIA, so a single ICE supports the control MeP and VENEZIA debugging
Running Code
It is easy to run your own MeP code. It appears that the code is DMA copied to MeP's private memory before resetting and executing it.
MeP payload:
.text _start: jmp 0x800018 jmp 0x800018 jmp 0x800018 jmp 0x800018 jmp 0x800018 jmp 0x800018 _init: movh $0,0xf184 add3 $0,$0,0 mov $1,0x8 sw $1,0($0) _writecpuid: ldc $2,17 srl $2,14 ldc $1,17 add3 $2,$2,$0 add3 $2,$2,4 sw $1,0($2) .wait: bra .wait
On ARM:
// the following code has hard coded offsets in 1.69 int restart_vnz() { volatile unsigned int *regs, *spram; // r4@1 int v1; // r6@1 unsigned int img_paddr; // r3@2 int dram_base; // r0@2 int size; unsigned int v4; // r0@3 unsigned int v5; // r3@3 int v7; // r1@6 int v8; // r2@6 spram = *(volatile unsigned int **)0x0190F67C; regs = *(volatile unsigned int **)0x0190F670; printf("ScePervasive2Reg: 0x%08X, spram: 0x%08X\n", regs, spram); ScePervasiveForDriver_0xFB01A2DD(); img_paddr = 0x40800000; // paddr of region dram_base = 0x40300000; size = 0x400000; regs[192] = 0x1D001000; if ( dram_base < img_paddr ) { v7 = img_paddr + size; v8 = dram_base + 0x2500000; if ( img_paddr + size < v8 ) { regs[193] = dram_base; regs[194] = img_paddr; regs[195] = v7; regs[196] = v8; } } v4 = img_paddr >> 23 << 23; regs[224] = v4; regs[228] = 0x1E000000; regs[229] = 0x20000000; regs[230] = 0x22000000; regs[231] = img_paddr & 0xFE000000; regs[232] = 0x18000000; regs[233] = 0x20000000; regs[234] = img_paddr & 0xF8000000; regs[235] = (img_paddr + 0x8000000) & 0xF8000000; regs[236] = 0x1E000000; regs[237] = 0x20000000; regs[238] = 0x22000000; regs[239] = img_paddr & 0xFE000000; regs[256] = v4; regs[898] = 0xFFFF0000; spram[0] = 1; SceCpuForDriver_0xE813EBB2_clean_l2(); regs[898]; __asm__("dsb sy" ::: "memory"); ScePervasiveForDriver_0xA7E64C6F_reset_vnz(); v1 = 0; while (spram[0] != 8) { printf("waiting..."); sceKernelDelayThread(100); if (v1++ > 20) break; } return 0; } int vnz_hack() { volatile unsigned int *spram_vaddr; unsigned int *img_vaddr; int i; int ret; spram_vaddr = *(volatile unsigned int **)0x0190F67C; img_vaddr = *(unsigned int **)0x0190F664; printf("Vaddr: %08X\n", img_vaddr); ret = make_smc_call(1, 1, 0, 0, 0x110); printf("Reset VNZ: 0x%08X\n", ret); memset(spram_vaddr, 0, 256); memcpy(img_vaddr, YOUR_MEP_CODE, YOUR_MEP_CODE_LEN); ret = restart_vnz(); printf("Restart VNZ: 0x%08X\n", ret); return 0; }