Documentation about the VPU in the JZ4780 programming manual is limited, referring to related documentation which is not publicly available.
These are some notes from studying the Ingenic MPlayer code and other resources including the paper "A Hybrid Scheme Based on Pipelining and Multitasking in Mobile Application Processors for Advanced Video Coding".
From jz47_vae_map.c and the manual:
Region | Offset | Size | Description |
CPM | 0x10000000 | 0x00001000 | Clock/power |
VPU/SCH | 0x13200000 | 0x00001000 | Scheduler |
GP0/VDMA | 0x13210000 | 0x00001000 | VPU DMA |
GP1 | 0x13220000 | 0x00001000 | VPU DMA |
GP2 | 0x13230000 | 0x00001000 | VPU DMA |
EFE | 0x13240000 | YUV encoder front-end | |
MC/MCE | 0x13250000 | 0x00001000 | Motion comp./est. |
DBLK0 | 0x13270000 | 0x00001000 | Deblock |
VMAU | 0x13280000 | 0x0000F000 | Pixel recovery |
SDE | 0x13290000 | 0x00010000 | Bitstream parser |
AUX | 0x132A0000 | 0x00004000 | XBurst core |
TCSM0 | 0x132B0000 | 0x00004000 | Shared memory (16K) |
TCSM1 | 0x132C0000 | 0x0000C000 | Shared memory (48K) |
DBLK1 | 0x132D0000 | 0x00001000 | Deblock |
JPGC | 0x132E0000 | JPEG codec | |
SRAM | 0x132F0000 | 0x00007000 | Scratch RAM (28K) |
Offsets indicated for each VPU peripheral region in the following sections are specified relative to the VPU peripheral base (0x13200000).
The CPM (clock and power management) registers provide more generally applicable functionality. Registers of pertinence to the VPU are described here.
CPM_CLKGR0 (clock gate register #0) is at offset 0x20.
Flag | Bit | Description |
IPU | 29 | Stop IPU clock |
CPM_VPU_SWRST (soft reset and bus control) is at offset 0xC4.
Flag | Bit | Description |
SR | 31 | Soft reset |
STP | 30 | Stop request |
ACK | 29 | Stop acknowledgement |
CPM_CPSPR (scratch pad) at offset 0x34.
CPM_LPCR (low power control) is at offset 0x04.
CPM_OPCR (oscillator and power control) is at offset 0x24.
Register | Offset | Description |
SCH_GLBC | 0x00000 | |
VPU_DCNT | 0x00008 | |
VPU_CCNT | 0x0000C | |
VPU_DBGC | 0x00010 | |
VPU_DWD | 0x00014 | |
VPU_CWD | 0x00018 | |
VPU_DWA | 0x0001C | |
VPU_CWA | 0x00020 | |
SCH_TLBA | 0x00030 | TLB |
SCH_STAT | 0x00034 | VPU/scheduler status |
SCH_SLDE0 | 0x00040 | |
SCH_SLDE1 | 0x00044 | |
SCH_SLDE2 | 0x00048 | |
SCH_SLDE3 | 0x0004C | |
SCH_TLBC | 0x00050 | TLB virtual address match |
SCH_TLBV | 0x00054 | TLB |
SCH_SCHC | 0x00060 | Schedule control? |
SCH_BND/_SCHCS | 0x00064 | |
SCH_SCHG0 | 0x00068 | |
SCH_SCHG1/E0 | 0x0006C | |
SCH_SCHE1 | 0x00070 | SCH1_DSA |
SCH_SCHE2 | 0x00074 | SCH2_DSA |
SCH_SCHE3 | 0x00078 | SCH3_DSA |
SCH_SCHE4 | 0x0007C | SCH4_DSA |
Some registers in this region are defined by libjzcommon/t_vputlb.h. They are given with a VPU prefix above.
Some additional registers appear to exist, defined by macros in jzm_vpu.h:
Register | Offset | Description |
MSCOPE_START | 0x00024 | Start involves "mbnum" |
MSCOPE_STOP | 0x00028 | Stop involves writing zero |
Field | Bits | Description |
GLBC_SLDE | 31 | |
GLBC_TLBE | 30 | |
GLBC_TLBINV | 29 | |
... | ... | |
TLBE_JPGC | 26 | TLB |
TLBE_DBLK | 25 | TLB |
TLBE_SDE | 24 | TLB |
TLBE_EFE | 23 | TLB |
TLBE_VDMA | 22 | TLB |
TLBE_MCE | 21 | TLB |
INTE_ACFGERR | 20 | Interrupt |
... | ... | |
INTE_TLBERR | 18 | Interrupt |
INTE_BSERR | 17 | Interrupt |
INTE_ENDF | 16 | Interrupt |
GLBC_HIMAP | 15 | Interrupt |
... | ... | |
GLBC_HIAXI | 9 | |
GLBC_EPRI | 8..7 | Priority? |
... | ... |
The register is set to GLBC_HIAXI before the ACFG fields are set in VDMA_TASKRG.
The definitions in libjzcommon/t_vputlb.h do not agree with the above. Instead, the following fields are defined:
Field | Bits | Description |
GLBC_TLBE | 31 | Enable? |
GLBC_ENGM | 11..10 | |
GLBC_EPRI | 9..8 | |
GLBC_DPRI | 6..4 | |
GLBC_CPRI | 2..0 |
The SET_VPU_GLBC macro employs these definitions and is used, whereas the only active definition from the other set is GLBC_HIAXI, which would correspond to GLBC_EPRI as 0b10.
Field | Bits | Description |
VTAG | 31..20 | |
MASK | 19..8 | |
... | ... | |
VLD | 0 |
The definitions in libjzcommon/t_vputlb.h do not agree completely with the above. Instead, the following fields are defined:
Field | Bits | Description |
VTAG | 31..22 | Virtual |
PTAG | 21..12 | Physical |
PSIZE | 2..1 | Page size |
VALID | 0 | Valid |
Page sizes (PSIZE) are 4M (0), 8M (1), 16M (2), 32M (3).
TLB virtual address match register, possibly comparable to EntryHi in the MIPS architecture TLB.
Field | Bits | Description |
VPN | 31..12 | Virtual page number |
RIDX | 11..4 | Index |
... | ... | |
INVLD | 1 | Invalid |
RETRY | 0 | Retry |
Field | Bits | Description |
... | ... | |
CNM | 27..16 | |
... | ... | |
GCN | 11..0 |
Scheduler channel control.
Field | Bits | Description |
CH4_ACT | 31 | From t_vputlb.h |
CH4_PE1 | 30 | From t_vputlb.h |
CH4_PCH1 | 29..28 | From t_vputlb.h |
CH4_GS | 27 | |
CH4_PE0 | 26 | Enable? |
CH4_PCH0 | 25..24 | |
CH3_ACT | 23 | From t_vputlb.h |
... | ... | |
CH3_GS | 19 | |
CH3_PE0 | 18 | Enable? |
CH3_PCH0 | 17..16 | |
CH2_ACT | 15 | From t_vputlb.h |
... | ... | |
CH2_GS | 11 | |
CH2_PE | 10 | Enable? |
CH2_PCH | 9..8 | |
CH1_ACT | 7 | From t_vputlb.h |
... | ... | |
CH1_GS | 3 | |
CH1_PE | 2 | Enable? |
CH1_PCH | 1..0 |
From t_vputlb.h:
Field | Bits | Description |
SN | 4 | |
ID | 3..0 |
Scheduler group control?
Field | Bits | Description |
CH4_HID | 31..28 | |
CH3_HID | 27..24 | |
CH2_HID | 23..20 | |
CH1_HID | 19..16 | |
... | ... | |
DEPTH | 11..8 | |
G1F4 | 7 | |
G1F3 | 6 | |
G1F2 | 5 | |
G1F1 | 4 | |
G0F4 | 3 | |
G0F3 | 2 | |
G0F2 | 1 | |
G0F1 | 0 |
The HID fields employ values corresponding to hardware units. Files such as libh264/jzm_vpu.h provide definitions of the appropriate values as follows:
Symbol | Value | Alternative Symbol |
HID_SCH | 0x0 | HID_CFGC |
HID_VDMA | 0x1 | HID_GP0 |
0x2 | HID_GP1 | |
0x3 | HID_GP2 | |
HID_EFE | 0x4 | |
HID_MCE | 0x5 | |
HID_DBLK | 0x7 | |
HID_VMAU | 0x8 | |
HID_SDE | 0x9 | |
HID_AUX | 0xA | |
HID_TCSM | 0xB | HID_TCSM0 |
0xC | HID_TCSM1 | |
0xD | HID_DBLK2 | |
HID_JPGC | 0xE | |
HID_SRAM | 0xF |
Alternative symbols are defined in libjzcommon/t_vputlb.h.
These values are provided to macros in order to populate the appropriate channel field in SCH_BND. The GEN_VDMA_ACFG macro employed by files such as libmpeg2/soc/jzm_mpeg2_dec.c may feature such values as in the following example:
GEN_VDMA_ACFG(chn, REG_SCH_BND, 0, (SCH_CH3_HID(HID_DBLK) | SCH_CH2_HID(HID_VMAU) | SCH_CH1_HID(HID_MCE) | SCH_DEPTH(MPEG2_FIFO_DEP)));
The operation in t_vputlb.h, employing the name SCH_SCHCS, defines the register differently:
Field | Bits | Description |
CS4 | 27..24 | |
CS3 | 19..16 | |
CS2 | 11..8 | |
CS1 | 3..0 |
However, this operation does not appear to be used.
These are defined in jzm_vpu.h as follows:
Register | Offset | Description |
VDMA_LOCK | 0x10000 | (Obsolete - see below) |
VDMA_UNLK | 0x10004 | (Obsolete - see below) |
VDMA_TASKRG | 0x10008 | VDMA DHA |
VDMA_TASKST | 0x1000C | VDMA status |
However, these definitions are not referenced elsewhere, and they might therefore not be valid or relevant. Nevertheless, they are provided here for reference.
Code such as libmpeg2/slice.c does appear to use VDMA_TASKRG and VDMA_TASKST. Meanwhile, the 2D DMA mechanism employs offsets 0x10000 and 0x10004 differently for GP0, along with the corresponding offsets for GP1 and GP2.
Field | Bits | Description |
ACFG_DHA | 31..7 | Descriptor physical address |
DESC_DHA | 31..16 | (Alternative definition) |
... | ... | |
ACFG_CLR | 3 | Clear error? |
ACFG_SAFE | 2 | |
DESC? | 1 | (Combined with ACFG_RUN) |
ACFG_RUN | 0 | Run? |
Scheduled task status? The bits appear to correspond to those in VDMA_TASKRG.
Field | Bits | Description |
... | ... | |
ACFG_ERR | 3 | |
ACFG_END | 2 | |
DESC_END | 1 | |
VPU_BUSY | 0 |
Register | Offset | Description |
EFE_CTRL | 0x40000 | |
EFE_GEOM | 0x40004 | |
EFE_COEF_BA | 0x4000C | |
EFE_RAWY_SBA | 0x40010 | |
EFE_RAWC_SBA | 0x40014 | |
EFE_RAWU_SBA | 0x40014 | |
EFE_TOPMV_BA | 0x40018 | |
EFE_TOPPA_BA | 0x4001C | |
EFE_MECHN_BA | 0x40020 | |
EFE_MAUCHN_BA | 0x40024 | |
EFE_DBLKCHN_BA | 0x40028 | |
EFE_SDECHN_BA | 0x4002C | |
EFE_RAW_DBA | 0x40030 | |
EFE_RAWV_SBA | 0x40034 | |
EFE_RAW_STRD | 0x40038 | |
EFE_DBG_INFO | 0x4003C | |
EFE_MVRP | 0x40100 | |
EFE_STAT | 0x40110 |
MCE stands for Motion Compensation and Estimation.
Register | Offset | Description |
MCE_CTRL | 0x50000 | Control |
MCE_CH1_STAT | 0x50004 | |
MCE_CH2_STAT | 0x50804 | |
MCE_MVPA | 0x5000C | |
MCE_IWTA | 0x5000C | |
MCE_CH1_PINFO | 0x50020 | |
MCE_CH2_PINFO | 0x50820 | |
MCE_CH1_WINFO | 0x50024 | |
MCE_CH2_WINFO1 | 0x50824 | |
MCE_CH2_WINFO2 | 0x50828 | |
MCE_CH1_WTRND | 0x5002C | |
MCE_CH2_WTRND | 0x5082C | |
MCE_CH1_BINFO | 0x50030 | |
MCE_CH2_BINFO | 0x50830 | |
MCE_CH1_IINFO1 | 0x50034 | |
MCE_CH1_IINFO2 | 0x50038 | |
MCE_CH2_IINFO1 | 0x50834 | |
MCE_CH2_IINFO2 | 0x50838 | |
MCE_CH1_TAP1L | 0x5003C | |
MCE_CH1_TAP2L | 0x50040 | |
MCE_CH1_TAP1M | 0x50044 | |
MCE_CH1_TAP2M | 0x50048 | |
MCE_CH1_STRD | 0x5004C | Stride |
MCE_CH2_STRD | 0x5084C | Stride |
MCE_GEOM | 0x50050 | Geometry |
MCE_DDC | 0x50054 | |
MCE_DSA | 0x50058 | |
MCE_ESTIC | 0x5005C | |
MCE_CH1_RLUT | 0x50300 | |
MCE_CH2_RLUT | 0x50B00 | |
MCE_CH1_CLUT | 0x50400 | |
MCE_CH1_ILUT | 0x50500 | |
MCE_CH2_ILUT | 0x50D00 |
Defined in...
The librv9 definitions differ from the libvp8 definitions.
Field | Bits | Description |
EBMS | 31..28 | ESMS in the librv9 files |
ESMS | 27..24 | ERMS in the librv9 files |
EARM | 23 | |
EPMV | 22 | PMVE in the librv9 files |
ESA | 21..20 | |
EBME | 19 | EMET in the librv9 files |
... | 18 | |
CAE | 17 | |
CSF | 16 | Defined in librv9 files |
PGC | 15..12 | |
CH2EN | 11 | |
PRI | 10..9 | |
CKGE | 8 | |
OFA | 7 | |
ROT | 6 | ROTE in the librv9 files |
ROTADIR | 5 | |
WM | 4 | |
CCF | 3 | |
IRQE | 2 | IRQ enable? |
RST | 1 | Reset? |
EN | 0 | Enable? |
In vp8.c, all fields are set to zero apart from...
PGC = 0xF, CH2EN = 1, PRI = 3, CKGE = 1, CCF = 1, EN = 1
When decoding a frame, the following additional fields appear to be set:
CAE = 1, OFA = 1
In rv9_p0_mc.c, all fields are set to zero apart from...
CAE = 1, PGC = 0xF, CH2EN = 1, PRI = 3, CKGE = 1, OFA = 1, CCF = 1, EN = 1
Defined in libvp8/t_motion_p0.h and other files.
Field | Bits | Description |
REF | 27..16 | |
RAW | 15..8 | |
DST | 7..0 | Stride |
Defined in libvp8/t_motion_p0.h and other files.
Field | Bits | Description |
FH | 27..16 | Frame height? |
FW | 11..0 | Frame width? |
VMAU apparently stands for Vector Matrix Arithmetic Unit. The JZ4780 manual indicates that it is used for pixel recovery.
Register | Offset | Description |
VMAU_MCBP | 0x80000 | |
VMAU_QTPARA | 0x80004 | |
VMAU_MAIN_ADDR | 0x80008 | |
VMAU_NCCHN_ADDR | 0x8000C | |
VMAU_CHN_LEN | 0x80010 | |
VMAU_ACBP | 0x80014 | |
VMAU_CPREDM_TLV | 0x80018 | |
VMAU_YPREDM0 | 0x8001C | |
VMAU_YPREDM1 | 0x80020 | |
VMAU_GBL_RUN | 0x80040 | |
VMAU_GBL_CTR | 0x80044 | |
VMAU_STATUS | 0x80048 | |
VMAU_CCHN_ADDR | 0x8004C | |
VMAU_VIDEO_TYPE | 0x80050 | |
VMAU_Y_GS | 0x80054 | |
VMAU_DEC_DONE | 0x80058 | |
VMAU_ENC_DONE | 0x8005C | |
VMAU_POS | 0x80060 | |
VMAU_MCF_STA | 0x80064 | |
VMAU_DEC_YADDR | 0x80068 | |
VMAU_DEC_UADDR | 0x8006C | |
VMAU_DEC_VADDR | 0x80070 | |
VMAU_DEC_STR | 0x80074 | |
VMAU_MEML | 0x84000 | |
VMAU_QT | 0x88000 | Quantisation table (256 bytes) |
Register | Offset | Description |
DBLK_DHA | 0x70000 | |
DBLK_TRIG | 0x70060 | |
DBLK_CTRL | 0x70064 | |
DBLK_VTR | 0x70068 | |
DBLK_FSTA | 0x7006C | |
DBLK_GSTA | 0x70070 | |
DBLK_GSIZE | 0x70074 | |
DBLK_GENDA | 0x70078 | |
DBLK_GPOS | 0x7007C | |
DBLK_GPIC_STR | 0x70080 | |
DBLK_GPIC_YA | 0x70084 | |
DBLK_GPIC_CA | 0x70088 | |
DBLK_GP_ENDA | 0x7008C | |
DBLK_SLICE_ENDA | 0x70090 | |
DBLK_BLK_CTRL | 0x70094 | |
DBLK_BLK_FIFO | 0x70098 |
SDE might conceivably stand for "stream decoder".
Also informed by libmpeg2/slice.c.
Register | Offset | Description |
SDE_STAT | 0x90000 | |
SDE_SL_CTRL | 0x90004 | |
SDE_SL_GEOM | 0x90008 | |
SDE_GL_CTRL | 0x9000C | |
SDE_CODEC_ID | 0x90010 | SDE identifier |
SDE_CFG0 | 0x90014 | SDE configuration |
SDE_CFG1 | 0x90018 | |
SDE_CFG2 | 0x9001C | Bitstream buffer address (bsaddr) |
SDE_CFG3 | 0x90020 | |
SDE_CFG4 | 0x90024 | |
SDE_CFG5 | 0x90028 | |
SDE_CFG6 | 0x9002C | |
SDE_CFG7 | 0x90030 | |
SDE_CFG8 | 0x90034 | |
SDE_CFG9 | 0x90038 | |
SDE_CFG10 | 0x9003C | |
SDE_CFG11 | 0x90040 | |
SDE_CFG12 | 0x90044 | |
SDE_CFG13 | 0x90048 | |
SDE_CFG14 | 0x9004C | |
SDE_CFG15 | 0x90050 | |
SDE_CTX_TBL | 0x92000 | |
SDE_CQP_TBL | 0x93800 |
Register | Offset | Description |
AUX_CTRL | 0xA0000 | Control the AUX core |
AUX_SPINLK | 0xA0004 | Spinlock |
AUX_SPIN1 | 0xA0008 | Spinlock access |
AUX_SPIN2 | 0xA000C | Spinlock access |
AUX_MIRQP | 0xA0010 | Message IRQ pending for main core |
AUX_MSG | 0xA0014 | Message word initiating IRQ |
CORE_MIRQP | 0xA0018 | Message IRQ pending for AUX core |
CORE_MSG | 0xA001C | Message word initiating IRQ |
Message IRQs are cleared by clearing the appropriate MIRQP register.
Field | Bits | Description |
SLEEP | 31 | Sleep status |
... | 30..9 | ... |
BTB_INV | 8 | Invalidate BTB |
... | 7..4 | ... |
MIRQ_EN | 3 | Enable message IRQ |
NMI_DIS | 2 | Only wake AUX with NMI |
SW_NMI | 1 | Issue NMI to AUX |
SW_RST | 0 | Hold AUX in reset state |
If NMI_DIS is clear, a NMI condition will reset AUX and start execution at 0xF4000000. Otherwise, if NMI_DIS is set, AUX will continue from the next instruction after a WAIT instruction.
Field | Bits | Description |
... | 31..2 | ... |
LOCK | 1..0 | Lock status |
The LOCK field is written via a mechanism connected to the AUX_SPIN1 and AUX_SPIN2 registers. Those registers retain a value that is committed to LOCK when they are read and if LOCK is zero.
Despite the presence of these spinlock registers, it seems as if some coprocessor #0 registers are used instead. See, for example, libmpeg4/jzsoc/jz4760_dcsc.h. The pertinent registers are as follows:
Register | CP0 Register | Description |
SPINLOCK | 12 select 5 | Spinlock |
SPINATOMIC | 12 select 6 | Spinlock access |
Here, the SPINLOCK register's LOCK field (3..0) can only be cleared so that the lock can be taken. Writing a value to the corresponding VAL field (3..0) in SPINATOMIC causes LOCK to be updated with the written value if LOCK was already zero.
Despite provision of the above registers, the following registers appear to be used by libmpeg4/jzsoc/jz4760_dcsc.h for spinlock purposes:
Register | CP0 Register | Description |
DCSC_SPINLK | 20 select 2 | Spinlock |
DCSC_SPIN0 | 20 select 3 | Spinlock access |
DCSC_SPIN1 | 20 select 4 | Spinlock access |
Some documented registers are also available for similar purposes:
Register | CP0 Register | Description |
Cores_Status | 12 select 3 | Sleep and IRQ status |
CORE_MBOX0 | 20 select 0 | Initiate IRQ to core 0 |
CORE_MBOX1 | 20 select 1 | Initiate IRQ to core 1 |
Field | Bits | Description |
... | 31..2 | ... |
SPIN1 | 1..0 | Lock status |
Field | Bits | Description |
... | 31..2 | ... |
SPIN2 | 1..0 | Lock status |
Field | Bits | Description |
... | 31..1 | ... |
MIRQP | 0 | Message IRQ pending |
Field | Bits | Description |
MESG | 31..0 | Message word |
Field | Bits | Description |
... | 31..1 | ... |
MIRQP | 0 | Message IRQ pending |
Field | Bits | Description |
MESG | 31..0 | Message word |
Register | Offset | Description |
JPGC_TRIG | 0xE0000 | |
JPGC_GLBI | 0xE0004 | |
JPGC_STAT | 0xE0008 | |
JPGC_BSA | 0xE000C | |
JPGC_P0A | 0xE0010 | |
JPGC_P1A | 0xE0014 | |
JPGC_P2A | 0xE0018 | |
JPGC_P3A | 0xE001C | |
JPGC_NMCU | 0xE0028 | |
JPGC_NRSM | 0xE002C | |
JPGC_P0C | 0xE0030 | |
JPGC_P1C | 0xE0034 | |
JPGC_P2C | 0xE0038 | |
JPGC_P3C | 0xE003C | |
JPGC_MCUS | 0xE0064 | |
JPGC_ZIGM0 | 0xE1000 | |
JPGC_ZIGM1 | 0xE1100 | |
JPGC_HUFB | 0xE1200 | |
JPGC_HUFM | 0xE1300 | |
JPGC_QMEM | 0xE1400 | |
JPGC_HUFE | 0xE1800 | |
JPGC_HUFS | 0xE1800 |
According to jz47_vae_map.c, reset is performed as follows:
According to libjzcommon/t_vputlb.h, the following operations are defined.
This only seems to be used in libvp8/vp8.c:
SET_VPU_GLBC(1, 0, 0, 4, 4);
This sets GLBC_TLBE as 1, GLBC_ENGM as 0, GLBC_EPRI as 0, GLBC_DPRI as 4, GLBC_CPRI as 4.
SET_VPU_TLB(entry, valid, psize, vtag, ptag)
This appears to set the SCH_SLDE0..SCH_SLDE3 registers with validity, page size, virtual tag, physical tag. The tags appear to be the uppermost ten bits of addresses, but in the MPlayer code an identity mapping appears to be used, employing physical addresses as virtual addresses.
SET_VPU_SCHC(sch4_act, sch4_pe1, sch4_pch1, sch4_pe0, sch4_pch0, sch3_act, sch3_pe0, sch3_pch0, sch2_act, sch2_pe0, sch2_pch0, sch1_act, sch1_pe0, sch1_pch0)
This is used in libh264/jzsoc/h264_p1.c and libh264/jzsoc/h264_cavlc_p1.c, with initialisation setting the ACT fields to 0, the PE fields to 1, and the PCH fields to 3, 2, 1, 1 and 0 respectively. All fields are set to 0 at the end or if an error occurs.
The "A Hybrid Scheme..." paper indicates that the JZ4770 has its main "J1" core running at 1000 MHz with the AUX core running at 500 MHz.
The different JZ-series manuals provide the following details about memory layout for the VPU facilities:
Region | Main | AUX | Accelerator |
TCSM0 | 0xF4000000 | 0x132B0000 | 0x132B0000 |
TCSM1 | 0x132C0000 | 0xF4000000 | 0x132C0000 |
SRAM | 0x132F0000 | 0x132F0000 | 0x132F0000 |
SRAM is Scratch RAM. The SRAM appears to differ between the JZ4760 and JZ4780, having 32K in the former at 0x132D0000, but only 16K in the latter. The MPlayer code (for example, libmpeg4/jzsoc/jz4760_tcsm_init.c) suggests a SRAM size of only 7K. However, elsewhere (for example, librv9/jzsoc/rv9_sram.h) the size is at least 16K, and in jz47_vae_map.c it is given as being 0x7000 bytes (28K) which is consistent with what is written in the "A Hybrid Scheme..." paper:
"There are three types of on-chip memories including two tightly coupled shared memories TCSM0 (16 KB) and TCSM1 (48 KB) along with SRAM (scratch RAM 28 KB)."
This is reflected in the jz47_vae_map.c definitions and is presumably definitive. Other files also seem to employ compatible definitions. For example:
Each region has its own DMA channel:
The AUX core, at least according to the more helpful JZ4760 programming manual, "can only access physical address space". However, it seems that it employs the following fixed memory mappings within TCSM1:
Virtual Region | Physical Region |
0xF4000000..0xF4001FFF | 0x132C0000..0x132C1FFF |
0xF4002000..0xF4003FFF | 0x132C2000..0x132C3FFF |
0xF4004000..0xF4005FFF | 0x132C4000..0x132C5FFF |
0xF4006000..0xF4007FFF | 0x132C6000..0x132C7FFF |
0xF4008000..0xF4009FFF | 0x132C8000..0x132C9FFF |
0xF400A000..0xF400BFFF | 0x132CA000..0x132CBFFF |
Thus, the main core and the AUX core each address "their own" TCSM via the 0xF4000000 region, with the DMA channels being employed to move data around.
Physical addresses outside the above mappings are accessible via the unmapped kernel mode memory regions:
Physical Region | Cached Region | Uncached Region |
0x132B0000 | 0x932B0000 | 0xB32B0000 |
0x132C0000 | 0x932C0000 | 0xB32C0000 |
0x132F0000 | 0x932F0000 | 0xB32F0000 |
This is featured in libmpeg4/jzsoc/mpeg4_tcsm0.h.
Some files define code for the p1_main section, such as libh264/jzsoc/h264_p1.c. A common linker script appears to be used to define the positioning of such code, found in the following locations:
The p1_main section appears at 0xF4000000, which for the AUX core appears to address TCSM1. So, code in this section may be intended for the AUX core and define a dedicated process running on that core.
According to libmpeg4/jzsoc/jz4760_dcsc.h, AUX is reset by writing to bit 0 (SW_RST) of AUX_CTRL.
According to libmpeg4/jzsoc/jz4760_dcsc.h, AUX is started as follows:
Since the NMI_DIS field will be clear in both of the above operations, the NMI initiation will cause the AUX core to start executing code at 0xF4000000.
Files: libh264/jzsoc/h264_p1.c, libh264/jzsoc/h264_cavlc_p1.c
P1_START_BASE is defined in libh264/jzsoc/mem_def_falcon.h and included files as...
GP0_DHA_BASE + GP0_DHA_SIZE (TCSM0_1VALUE + 4) + (4 * 4) ((C_TCSM0_BASE + 16) + 4) + (4 * 4)
...which corresponds to 0x132B0000 + 36.
It is set to zero at the start of processing and to AUX_END_VALUE (0x5A5A) at the end of processing, having posted a message word of 1 to AUX_MSG.
File: libmpeg2/decode.c
Resets the VPU using the CPM_VPU_SWRST register. Calls mpeg2_slice.
File: libmpeg2/slice.c
Calls (depending on the code parameter) M2D_SliceInit and M2D_SliceInit_ext.
Initialisation:
In the above, s->des_pa is the physical address of the allocated frame, defined in decode.c as follows:
mpeg2dec->decoder.vdma_base = (uint8_t *) jz4740_alloc_frame(128, 0x2000);
There is then a loop whose continuation condition is that SCH_STAT bit 0 is clear. In other words, it presumably terminates when SCH_STAT bit 0 is set. Presumably, SCH_STAT bit 0 has a similar role to VDMA_TASKST bit 0, which is VPU_BUSY.
File: libmpeg2/soc/jzm_mpeg2_dec.c
Sets up the DMA configuration using GEN_VDMA_ACFG.
File: libmpeg2/soc/jzm_mpeg2_dec.c
Sets up the DMA configuration using GEN_VDMA_ACFG.
The jz4760e_2ddma_hw.h file appears to describe DMA for transferring data between the TCSM and SRAM areas. However, its definitions appear to employ the VDMA region, together with other neighbouring regions, differently to that described in jzm_vpu.h. The JZ4770 manual does, however, provide a description of the descriptor formats.
Register | Offset | Description |
GPx_DHA | 0x*0000 | Descriptor head address |
GPx_DCS | 0x*0004 | DMA command/status |
Field | Bits | Description |
DHA | 31..2 | Descriptor head (physical) address |
... | 1..0 | ... |
Technically, the whole register represents the address at word (4-byte) resolution.
Field | Bits | Description |
BTN | 31..16 | Transfer number byte |
NDN | 15..8 | Transfer node number |
... | 7..3 | ... |
END | 2 | End of transmission |
RST | 1 | DMA software reset |
SUP | 0 | DMA start up |
The DCS register employs bit 2 as an apparent completion indicator.
The descriptors employed by the GP0, GP1 and GP2 channels also appear to be different to those employed in the VDMA initialisation:
Offset | Bits | Description |
0 | 31..0 | TSA - transfer source address |
4 | 31..0 | TDA - transfer destination address |
8 | 31..30 | TYP - transfer size type |
: | 29..16 | TST - transfer source stride |
: | 15..14 | FRM - DDR page size optimisation |
: | 13..0 | TDT - transfer destination stride |
12 | 31 | TAG - "current node link" |
: | 30..16 | TRN - row width in bytes |
: | 15..0 | NUM - transfer total in bytes |
The transfer size types (TST) can be...
The page size optimisation (FRM) can be...
The "current node link" (TAG) appears to be 1 at the final descriptor, 0 otherwise.
The file libffmpeg2/jzsoc/ffmpeg2_p1.c provides an example of a transfer from TCSM0 to TCSM1:
set_gp0_dha(TCSM1_PADDR(DDMA_GP0_SET)); *((volatile int *)(DDMA_GP0_SET + TSA)) = TCSM0_PADDR(*tcsm1_fifo_rp); *((volatile int *)(DDMA_GP0_SET + TDA)) = TCSM1_PADDR(dMB); *((volatile int *)(DDMA_GP0_SET + STRD)) = GP_STRD(TASK_BUF_LEN, 0, TASK_BUF_LEN); *((volatile int *)(DDMA_GP0_SET + UNIT)) = GP_UNIT(1, TASK_BUF_LEN, sizeof(struct MPEG2_MB_DecARGs)); set_gp0_dcs(); poll_gp0_end();
This employs physical addresses for the descriptor, source address and destination address. GP_STRD defines the source and destination strides with no DDR optimisation. GP_UNIT defines the final descriptor with the row width as the same as the stride, with the total transfer being defined by the size of the referenced structure.
The macro GEN_VDMA_ACFG in jzm_vpu.h appears to be involved in writing descriptor fields in the following format:
Offset | Bits | Description |
0 | 31..0 | Value |
4 | 31 | ACFG_VLD |
: | 30 | ACFG_TERM (terminate) |
: | ... | |
: | 19..0 | ACFG_IDX - register offset (bits 1..0 ignored) |
Decoder initialisation such as in the following files employs this macro to set up DMA descriptors:
Some files in the MPlayer sources that provide information.
Mapping of memory regions. This appears to contain the definitive regions for the JZ4780.
IPU-related memory mapping and allocation.
VPU register definitions and codec data. libh264/jzm_vpu.h appears to be the most complete of the definitions, with an older version appearing in libmpeg2/soc/jzm_vpu.h and libvc1/soc/jzm_vpu.h.
Note that libjzcommon/t_vputlb.h provides differing definitions but ones that appear to be actually used, whereas those in jzm_vpu.h may not be. Another variant of t_vputlb.h exists in libmpeg4/jzsoc alongside files such as t_motion.h and t_intpid.h.