Video-Related Notes

Documentation about the VPU in the JZ4780 programming manual is limited, referring to related documentation which is not publicly available.

These are some notes from studying the Ingenic MPlayer code and other resources including the paper "A Hybrid Scheme Based on Pipelining and Multitasking in Mobile Application Processors for Advanced Video Coding".

  1. Memory Regions
  2. CPM Registers
    1. Reset Operation
  3. VPU/SCH Registers
    1. SCH_GLBC
    2. SCH_SLDx
    3. SCH_TLBC
    4. SCH_TLBV
    5. SCH_SCHC
    6. SCH_SCHEx
    7. SCH_BND/SCH_SCHCS
      1. SCH_SCHCS
  4. VDMA Registers
    1. VDMA_TASKRG
    2. VDMA_TASKST
  5. EFE Registers
  6. MCE Registers
    1. MCE_CTRL
    2. MCE_CHx_STRD
    3. MCE_GEOM
  7. VMAU Registers
  8. DBLK Registers
  9. SDE Registers
  10. AUX Registers
    1. AUX_CTRL
    2. AUX_SPINLK
    3. AUX_SPIN1
    4. AUX_SPIN2
    5. AUX_MIRQP
    6. AUX_MSG
    7. CORE_MIRQP
    8. CORE_MSG
  11. JPGC Registers
  12. VPU Operations
    1. Soft Reset
    2. Reset
    3. GLBC
    4. TLB
    5. Scheduling
  13. Clocks
  14. Memory Layout
  15. Code Positioning
  16. AUX Operations
    1. Reset
    2. Start
  17. Codecs
    1. H.264
    2. MPEG2
      1. mpeg2_parse
      2. mpeg2_slice
      3. M2D_SliceInit
      4. M2D_SliceInit_ext
  18. 2D DMA
    1. GPx_DHA
    2. GPx_DCS
    3. Descriptors
  19. VDMA
  20. Files
    1. jz47_vae_map.c
    2. jz47_soc_mem.c
    3. jzm_vpu.h

Memory Regions

From jz47_vae_map.c and the manual:

Region Offset Size Description
CPM 0x10000000 0x00001000 Clock/power
VPU/SCH 0x13200000 0x00001000 Scheduler
GP0/VDMA 0x13210000 0x00001000 VPU DMA
GP1 0x13220000 0x00001000 VPU DMA
GP2 0x13230000 0x00001000 VPU DMA
EFE 0x13240000 YUV encoder front-end
MC/MCE 0x13250000 0x00001000 Motion comp./est.
DBLK0 0x13270000 0x00001000 Deblock
VMAU 0x13280000 0x0000F000 Pixel recovery
SDE 0x13290000 0x00010000 Bitstream parser
AUX 0x132A0000 0x00004000 XBurst core
TCSM0 0x132B0000 0x00004000 Shared memory (16K)
TCSM1 0x132C0000 0x0000C000 Shared memory (48K)
DBLK1 0x132D0000 0x00001000 Deblock
JPGC 0x132E0000 JPEG codec
SRAM 0x132F0000 0x00007000 Scratch RAM (28K)

Offsets indicated for each VPU peripheral region in the following sections are specified relative to the VPU peripheral base (0x13200000).

CPM Registers

The CPM (clock and power management) registers provide more generally applicable functionality. Registers of pertinence to the VPU are described here.

CPM_CLKGR0 (clock gate register #0) is at offset 0x20.

Flag Bit Description
IPU 29 Stop IPU clock

CPM_VPU_SWRST (soft reset and bus control) is at offset 0xC4.

Flag Bit Description
SR 31 Soft reset
STP 30 Stop request
ACK 29 Stop acknowledgement

CPM_CPSPR (scratch pad) at offset 0x34.

CPM_LPCR (low power control) is at offset 0x04.

CPM_OPCR (oscillator and power control) is at offset 0x24.

Reset Operation

  1. Set CPM_VPU_STP.
  2. Wait for CPM_CPU_ACK to become set.
  3. Set CPM_VPU_SR and clear CPM_VPU_STP.
  4. Clear CPM_VPU_SR and CPM_VPU_STP.

VPU/SCH Registers

Register Offset Description
SCH_GLBC 0x00000
VPU_DCNT 0x00008
VPU_CCNT 0x0000C
VPU_DBGC 0x00010
VPU_DWD 0x00014
VPU_CWD 0x00018
VPU_DWA 0x0001C
VPU_CWA 0x00020
SCH_TLBA 0x00030 TLB
SCH_STAT 0x00034 VPU/scheduler status
SCH_SLDE0 0x00040
SCH_SLDE1 0x00044
SCH_SLDE2 0x00048
SCH_SLDE3 0x0004C
SCH_TLBC 0x00050 TLB virtual address match
SCH_TLBV 0x00054 TLB
SCH_SCHC 0x00060 Schedule control?
SCH_BND/_SCHCS 0x00064
SCH_SCHG0 0x00068
SCH_SCHG1/E0 0x0006C
SCH_SCHE1 0x00070 SCH1_DSA
SCH_SCHE2 0x00074 SCH2_DSA
SCH_SCHE3 0x00078 SCH3_DSA
SCH_SCHE4 0x0007C SCH4_DSA

Some registers in this region are defined by libjzcommon/t_vputlb.h. They are given with a VPU prefix above.

Some additional registers appear to exist, defined by macros in jzm_vpu.h:

Register Offset Description
MSCOPE_START 0x00024 Start involves "mbnum"
MSCOPE_STOP 0x00028 Stop involves writing zero

SCH_GLBC

Field Bits Description
GLBC_SLDE 31
GLBC_TLBE 30
GLBC_TLBINV 29
... ...
TLBE_JPGC 26 TLB
TLBE_DBLK 25 TLB
TLBE_SDE 24 TLB
TLBE_EFE 23 TLB
TLBE_VDMA 22 TLB
TLBE_MCE 21 TLB
INTE_ACFGERR 20 Interrupt
... ...
INTE_TLBERR 18 Interrupt
INTE_BSERR 17 Interrupt
INTE_ENDF 16 Interrupt
GLBC_HIMAP 15 Interrupt
... ...
GLBC_HIAXI 9
GLBC_EPRI 8..7 Priority?
... ...

The register is set to GLBC_HIAXI before the ACFG fields are set in VDMA_TASKRG.

The definitions in libjzcommon/t_vputlb.h do not agree with the above. Instead, the following fields are defined:

Field Bits Description
GLBC_TLBE 31 Enable?
GLBC_ENGM 11..10
GLBC_EPRI 9..8
GLBC_DPRI 6..4
GLBC_CPRI 2..0

The SET_VPU_GLBC macro employs these definitions and is used, whereas the only active definition from the other set is GLBC_HIAXI, which would correspond to GLBC_EPRI as 0b10.

SCH_SLDx

Field Bits Description
VTAG 31..20
MASK 19..8
... ...
VLD 0

The definitions in libjzcommon/t_vputlb.h do not agree completely with the above. Instead, the following fields are defined:

Field Bits Description
VTAG 31..22 Virtual
PTAG 21..12 Physical
PSIZE 2..1 Page size
VALID 0 Valid

Page sizes (PSIZE) are 4M (0), 8M (1), 16M (2), 32M (3).

SCH_TLBC

TLB virtual address match register, possibly comparable to EntryHi in the MIPS architecture TLB.

Field Bits Description
VPN 31..12 Virtual page number
RIDX 11..4 Index
... ...
INVLD 1 Invalid
RETRY 0 Retry

SCH_TLBV

Field Bits Description
... ...
CNM 27..16
... ...
GCN 11..0

SCH_SCHC

Scheduler channel control.

Field Bits Description
CH4_ACT 31 From t_vputlb.h
CH4_PE1 30 From t_vputlb.h
CH4_PCH1 29..28 From t_vputlb.h
CH4_GS 27
CH4_PE0 26 Enable?
CH4_PCH0 25..24
CH3_ACT 23 From t_vputlb.h
... ...
CH3_GS 19
CH3_PE0 18 Enable?
CH3_PCH0 17..16
CH2_ACT 15 From t_vputlb.h
... ...
CH2_GS 11
CH2_PE 10 Enable?
CH2_PCH 9..8
CH1_ACT 7 From t_vputlb.h
... ...
CH1_GS 3
CH1_PE 2 Enable?
CH1_PCH 1..0

SCH_SCHEx

From t_vputlb.h:

Field Bits Description
SN 4
ID 3..0

SCH_BND/SCH_SCHCS

Scheduler group control?

Field Bits Description
CH4_HID 31..28
CH3_HID 27..24
CH2_HID 23..20
CH1_HID 19..16
... ...
DEPTH 11..8
G1F4 7
G1F3 6
G1F2 5
G1F1 4
G0F4 3
G0F3 2
G0F2 1
G0F1 0

The HID fields employ values corresponding to hardware units. Files such as libh264/jzm_vpu.h provide definitions of the appropriate values as follows:

Symbol Value Alternative Symbol
HID_SCH 0x0 HID_CFGC
HID_VDMA 0x1 HID_GP0
0x2 HID_GP1
0x3 HID_GP2
HID_EFE 0x4
HID_MCE 0x5
HID_DBLK 0x7
HID_VMAU 0x8
HID_SDE 0x9
HID_AUX 0xA
HID_TCSM 0xB HID_TCSM0
0xC HID_TCSM1
0xD HID_DBLK2
HID_JPGC 0xE
HID_SRAM 0xF

Alternative symbols are defined in libjzcommon/t_vputlb.h.

These values are provided to macros in order to populate the appropriate channel field in SCH_BND. The GEN_VDMA_ACFG macro employed by files such as libmpeg2/soc/jzm_mpeg2_dec.c may feature such values as in the following example:

GEN_VDMA_ACFG(chn, REG_SCH_BND, 0, (SCH_CH3_HID(HID_DBLK) |
                                    SCH_CH2_HID(HID_VMAU) |
                                    SCH_CH1_HID(HID_MCE) |
                                    SCH_DEPTH(MPEG2_FIFO_DEP)));

SCH_SCHCS

The operation in t_vputlb.h, employing the name SCH_SCHCS, defines the register differently:

Field Bits Description
CS4 27..24
CS3 19..16
CS2 11..8
CS1 3..0

However, this operation does not appear to be used.

VDMA Registers

These are defined in jzm_vpu.h as follows:

Register Offset Description
VDMA_LOCK 0x10000 (Obsolete - see below)
VDMA_UNLK 0x10004 (Obsolete - see below)
VDMA_TASKRG 0x10008 VDMA DHA
VDMA_TASKST 0x1000C VDMA status

However, these definitions are not referenced elsewhere, and they might therefore not be valid or relevant. Nevertheless, they are provided here for reference.

Code such as libmpeg2/slice.c does appear to use VDMA_TASKRG and VDMA_TASKST. Meanwhile, the 2D DMA mechanism employs offsets 0x10000 and 0x10004 differently for GP0, along with the corresponding offsets for GP1 and GP2.

VDMA_TASKRG

Field Bits Description
ACFG_DHA 31..7 Descriptor physical address
DESC_DHA 31..16 (Alternative definition)
... ...
ACFG_CLR 3 Clear error?
ACFG_SAFE 2
DESC? 1 (Combined with ACFG_RUN)
ACFG_RUN 0 Run?

VDMA_TASKST

Scheduled task status? The bits appear to correspond to those in VDMA_TASKRG.

Field Bits Description
... ...
ACFG_ERR 3
ACFG_END 2
DESC_END 1
VPU_BUSY 0

EFE Registers

Register Offset Description
EFE_CTRL 0x40000
EFE_GEOM 0x40004
EFE_COEF_BA 0x4000C
EFE_RAWY_SBA 0x40010
EFE_RAWC_SBA 0x40014
EFE_RAWU_SBA 0x40014
EFE_TOPMV_BA 0x40018
EFE_TOPPA_BA 0x4001C
EFE_MECHN_BA 0x40020
EFE_MAUCHN_BA 0x40024
EFE_DBLKCHN_BA 0x40028
EFE_SDECHN_BA 0x4002C
EFE_RAW_DBA 0x40030
EFE_RAWV_SBA 0x40034
EFE_RAW_STRD 0x40038
EFE_DBG_INFO 0x4003C
EFE_MVRP 0x40100
EFE_STAT 0x40110

MCE Registers

MCE stands for Motion Compensation and Estimation.

Register Offset Description
MCE_CTRL 0x50000 Control
MCE_CH1_STAT 0x50004
MCE_CH2_STAT 0x50804
MCE_MVPA 0x5000C
MCE_IWTA 0x5000C
MCE_CH1_PINFO 0x50020
MCE_CH2_PINFO 0x50820
MCE_CH1_WINFO 0x50024
MCE_CH2_WINFO1 0x50824
MCE_CH2_WINFO2 0x50828
MCE_CH1_WTRND 0x5002C
MCE_CH2_WTRND 0x5082C
MCE_CH1_BINFO 0x50030
MCE_CH2_BINFO 0x50830
MCE_CH1_IINFO1 0x50034
MCE_CH1_IINFO2 0x50038
MCE_CH2_IINFO1 0x50834
MCE_CH2_IINFO2 0x50838
MCE_CH1_TAP1L 0x5003C
MCE_CH1_TAP2L 0x50040
MCE_CH1_TAP1M 0x50044
MCE_CH1_TAP2M 0x50048
MCE_CH1_STRD 0x5004C Stride
MCE_CH2_STRD 0x5084C Stride
MCE_GEOM 0x50050 Geometry
MCE_DDC 0x50054
MCE_DSA 0x50058
MCE_ESTIC 0x5005C
MCE_CH1_RLUT 0x50300
MCE_CH2_RLUT 0x50B00
MCE_CH1_CLUT 0x50400
MCE_CH1_ILUT 0x50500
MCE_CH2_ILUT 0x50D00

MCE_CTRL

Defined in...

The librv9 definitions differ from the libvp8 definitions.

Field Bits Description
EBMS 31..28 ESMS in the librv9 files
ESMS 27..24 ERMS in the librv9 files
EARM 23
EPMV 22 PMVE in the librv9 files
ESA 21..20
EBME 19 EMET in the librv9 files
... 18
CAE 17
CSF 16 Defined in librv9 files
PGC 15..12
CH2EN 11
PRI 10..9
CKGE 8
OFA 7
ROT 6 ROTE in the librv9 files
ROTADIR 5
WM 4
CCF 3
IRQE 2 IRQ enable?
RST 1 Reset?
EN 0 Enable?

In vp8.c, all fields are set to zero apart from...

PGC = 0xF, CH2EN = 1, PRI = 3, CKGE = 1, CCF = 1, EN = 1

When decoding a frame, the following additional fields appear to be set:

CAE = 1, OFA = 1

In rv9_p0_mc.c, all fields are set to zero apart from...

CAE = 1, PGC = 0xF, CH2EN = 1, PRI = 3, CKGE = 1, OFA = 1, CCF = 1, EN = 1

MCE_CHx_STRD

Defined in libvp8/t_motion_p0.h and other files.

Field Bits Description
REF 27..16
RAW 15..8
DST 7..0 Stride

MCE_GEOM

Defined in libvp8/t_motion_p0.h and other files.

Field Bits Description
FH 27..16 Frame height?
FW 11..0 Frame width?

VMAU Registers

VMAU apparently stands for Vector Matrix Arithmetic Unit. The JZ4780 manual indicates that it is used for pixel recovery.

Register Offset Description
VMAU_MCBP 0x80000
VMAU_QTPARA 0x80004
VMAU_MAIN_ADDR 0x80008
VMAU_NCCHN_ADDR 0x8000C
VMAU_CHN_LEN 0x80010
VMAU_ACBP 0x80014
VMAU_CPREDM_TLV 0x80018
VMAU_YPREDM0 0x8001C
VMAU_YPREDM1 0x80020
VMAU_GBL_RUN 0x80040
VMAU_GBL_CTR 0x80044
VMAU_STATUS 0x80048
VMAU_CCHN_ADDR 0x8004C
VMAU_VIDEO_TYPE 0x80050
VMAU_Y_GS 0x80054
VMAU_DEC_DONE 0x80058
VMAU_ENC_DONE 0x8005C
VMAU_POS 0x80060
VMAU_MCF_STA 0x80064
VMAU_DEC_YADDR 0x80068
VMAU_DEC_UADDR 0x8006C
VMAU_DEC_VADDR 0x80070
VMAU_DEC_STR 0x80074
VMAU_MEML 0x84000
VMAU_QT 0x88000 Quantisation table (256 bytes)

DBLK Registers

Register Offset Description
DBLK_DHA 0x70000
DBLK_TRIG 0x70060
DBLK_CTRL 0x70064
DBLK_VTR 0x70068
DBLK_FSTA 0x7006C
DBLK_GSTA 0x70070
DBLK_GSIZE 0x70074
DBLK_GENDA 0x70078
DBLK_GPOS 0x7007C
DBLK_GPIC_STR 0x70080
DBLK_GPIC_YA 0x70084
DBLK_GPIC_CA 0x70088
DBLK_GP_ENDA 0x7008C
DBLK_SLICE_ENDA 0x70090
DBLK_BLK_CTRL 0x70094
DBLK_BLK_FIFO 0x70098

SDE Registers

SDE might conceivably stand for "stream decoder".

Also informed by libmpeg2/slice.c.

Register Offset Description
SDE_STAT 0x90000
SDE_SL_CTRL 0x90004
SDE_SL_GEOM 0x90008
SDE_GL_CTRL 0x9000C
SDE_CODEC_ID 0x90010 SDE identifier
SDE_CFG0 0x90014 SDE configuration
SDE_CFG1 0x90018
SDE_CFG2 0x9001C Bitstream buffer address (bsaddr)
SDE_CFG3 0x90020
SDE_CFG4 0x90024
SDE_CFG5 0x90028
SDE_CFG6 0x9002C
SDE_CFG7 0x90030
SDE_CFG8 0x90034
SDE_CFG9 0x90038
SDE_CFG10 0x9003C
SDE_CFG11 0x90040
SDE_CFG12 0x90044
SDE_CFG13 0x90048
SDE_CFG14 0x9004C
SDE_CFG15 0x90050
SDE_CTX_TBL 0x92000
SDE_CQP_TBL 0x93800

AUX Registers

Register Offset Description
AUX_CTRL 0xA0000 Control the AUX core
AUX_SPINLK 0xA0004 Spinlock
AUX_SPIN1 0xA0008 Spinlock access
AUX_SPIN2 0xA000C Spinlock access
AUX_MIRQP 0xA0010 Message IRQ pending for main core
AUX_MSG 0xA0014 Message word initiating IRQ
CORE_MIRQP 0xA0018 Message IRQ pending for AUX core
CORE_MSG 0xA001C Message word initiating IRQ

Message IRQs are cleared by clearing the appropriate MIRQP register.

AUX_CTRL

Field Bits Description
SLEEP 31 Sleep status
... 30..9 ...
BTB_INV 8 Invalidate BTB
... 7..4 ...
MIRQ_EN 3 Enable message IRQ
NMI_DIS 2 Only wake AUX with NMI
SW_NMI 1 Issue NMI to AUX
SW_RST 0 Hold AUX in reset state

If NMI_DIS is clear, a NMI condition will reset AUX and start execution at 0xF4000000. Otherwise, if NMI_DIS is set, AUX will continue from the next instruction after a WAIT instruction.

AUX_SPINLK

Field Bits Description
... 31..2 ...
LOCK 1..0 Lock status

The LOCK field is written via a mechanism connected to the AUX_SPIN1 and AUX_SPIN2 registers. Those registers retain a value that is committed to LOCK when they are read and if LOCK is zero.

Despite the presence of these spinlock registers, it seems as if some coprocessor #0 registers are used instead. See, for example, libmpeg4/jzsoc/jz4760_dcsc.h. The pertinent registers are as follows:

Register CP0 Register Description
SPINLOCK 12 select 5 Spinlock
SPINATOMIC 12 select 6 Spinlock access

Here, the SPINLOCK register's LOCK field (3..0) can only be cleared so that the lock can be taken. Writing a value to the corresponding VAL field (3..0) in SPINATOMIC causes LOCK to be updated with the written value if LOCK was already zero.

Despite provision of the above registers, the following registers appear to be used by libmpeg4/jzsoc/jz4760_dcsc.h for spinlock purposes:

Register CP0 Register Description
DCSC_SPINLK 20 select 2 Spinlock
DCSC_SPIN0 20 select 3 Spinlock access
DCSC_SPIN1 20 select 4 Spinlock access

Some documented registers are also available for similar purposes:

Register CP0 Register Description
Cores_Status 12 select 3 Sleep and IRQ status
CORE_MBOX0 20 select 0 Initiate IRQ to core 0
CORE_MBOX1 20 select 1 Initiate IRQ to core 1

AUX_SPIN1

Field Bits Description
... 31..2 ...
SPIN1 1..0 Lock status

AUX_SPIN2

Field Bits Description
... 31..2 ...
SPIN2 1..0 Lock status

AUX_MIRQP

Field Bits Description
... 31..1 ...
MIRQP 0 Message IRQ pending

AUX_MSG

Field Bits Description
MESG 31..0 Message word

CORE_MIRQP

Field Bits Description
... 31..1 ...
MIRQP 0 Message IRQ pending

CORE_MSG

Field Bits Description
MESG 31..0 Message word

JPGC Registers

Register Offset Description
JPGC_TRIG 0xE0000
JPGC_GLBI 0xE0004
JPGC_STAT 0xE0008
JPGC_BSA 0xE000C
JPGC_P0A 0xE0010
JPGC_P1A 0xE0014
JPGC_P2A 0xE0018
JPGC_P3A 0xE001C
JPGC_NMCU 0xE0028
JPGC_NRSM 0xE002C
JPGC_P0C 0xE0030
JPGC_P1C 0xE0034
JPGC_P2C 0xE0038
JPGC_P3C 0xE003C
JPGC_MCUS 0xE0064
JPGC_ZIGM0 0xE1000
JPGC_ZIGM1 0xE1100
JPGC_HUFB 0xE1200
JPGC_HUFM 0xE1300
JPGC_QMEM 0xE1400
JPGC_HUFE 0xE1800
JPGC_HUFS 0xE1800

VPU Operations

According to jz47_vae_map.c, reset is performed as follows:

Soft Reset

  1. CPM_VPU_SWRST has bit 30 (STP) set.
  2. CPM_VPU_SWRST bit 29 (ACK) tested repeatedly until it is set.
  3. CPM_VPU_SWRST bit 31 (SR) is set and bit 30 (STP) cleared.
  4. CPM_VPU_SWRST bit 31 (SR) is cleared and bit 30 (STP) cleared.

According to libjzcommon/t_vputlb.h, the following operations are defined.

Reset

  1. Save the first word of each 2048-word page for the six pages of TCSM.
  2. CPM_CPSPR is set to zero.
  3. CPM_LPCR has bit 30 (PD_VPU, power down module VPU) cleared.
  4. CPM_OPCR has bit 29 (MASK_VPU, debugging bit) set. This may be optional.
  5. CPM_OPCR bit 28 is tested repeatedly until it becomes set.
  6. CPM_OPCR bit 30 (PD_VPU) is set and bit 29 (MASK_VPU) is cleared.
  7. CPM_OPCR bit 30 (PD_VPU) is cleared, twice!
  8. The first word of each 2048-word page is restored.

GLBC

This only seems to be used in libvp8/vp8.c:

SET_VPU_GLBC(1, 0, 0, 4, 4);

This sets GLBC_TLBE as 1, GLBC_ENGM as 0, GLBC_EPRI as 0, GLBC_DPRI as 4, GLBC_CPRI as 4.

TLB

SET_VPU_TLB(entry, valid, psize, vtag, ptag)

This appears to set the SCH_SLDE0..SCH_SLDE3 registers with validity, page size, virtual tag, physical tag. The tags appear to be the uppermost ten bits of addresses, but in the MPlayer code an identity mapping appears to be used, employing physical addresses as virtual addresses.

Scheduling

SET_VPU_SCHC(sch4_act, sch4_pe1, sch4_pch1,
                       sch4_pe0, sch4_pch0,
             sch3_act, sch3_pe0, sch3_pch0,
             sch2_act, sch2_pe0, sch2_pch0,
             sch1_act, sch1_pe0, sch1_pch0)

This is used in libh264/jzsoc/h264_p1.c and libh264/jzsoc/h264_cavlc_p1.c, with initialisation setting the ACT fields to 0, the PE fields to 1, and the PCH fields to 3, 2, 1, 1 and 0 respectively. All fields are set to 0 at the end or if an error occurs.

Clocks

The "A Hybrid Scheme..." paper indicates that the JZ4770 has its main "J1" core running at 1000 MHz with the AUX core running at 500 MHz.

Memory Layout

The different JZ-series manuals provide the following details about memory layout for the VPU facilities:

Region Main AUX Accelerator
TCSM0 0xF4000000 0x132B0000 0x132B0000
TCSM1 0x132C0000 0xF4000000 0x132C0000
SRAM 0x132F0000 0x132F0000 0x132F0000

SRAM is Scratch RAM. The SRAM appears to differ between the JZ4760 and JZ4780, having 32K in the former at 0x132D0000, but only 16K in the latter. The MPlayer code (for example, libmpeg4/jzsoc/jz4760_tcsm_init.c) suggests a SRAM size of only 7K. However, elsewhere (for example, librv9/jzsoc/rv9_sram.h) the size is at least 16K, and in jz47_vae_map.c it is given as being 0x7000 bytes (28K) which is consistent with what is written in the "A Hybrid Scheme..." paper:

"There are three types of on-chip memories including two tightly coupled shared memories TCSM0 (16 KB) and TCSM1 (48 KB) along with SRAM (scratch RAM 28 KB)."

This is reflected in the jz47_vae_map.c definitions and is presumably definitive. Other files also seem to employ compatible definitions. For example:

Each region has its own DMA channel:

The AUX core, at least according to the more helpful JZ4760 programming manual, "can only access physical address space". However, it seems that it employs the following fixed memory mappings within TCSM1:

Virtual Region Physical Region
0xF4000000..0xF4001FFF 0x132C0000..0x132C1FFF
0xF4002000..0xF4003FFF 0x132C2000..0x132C3FFF
0xF4004000..0xF4005FFF 0x132C4000..0x132C5FFF
0xF4006000..0xF4007FFF 0x132C6000..0x132C7FFF
0xF4008000..0xF4009FFF 0x132C8000..0x132C9FFF
0xF400A000..0xF400BFFF 0x132CA000..0x132CBFFF

Thus, the main core and the AUX core each address "their own" TCSM via the 0xF4000000 region, with the DMA channels being employed to move data around.

Physical addresses outside the above mappings are accessible via the unmapped kernel mode memory regions:

Physical Region Cached Region Uncached Region
0x132B0000 0x932B0000 0xB32B0000
0x132C0000 0x932C0000 0xB32C0000
0x132F0000 0x932F0000 0xB32F0000

This is featured in libmpeg4/jzsoc/mpeg4_tcsm0.h.

Code Positioning

Some files define code for the p1_main section, such as libh264/jzsoc/h264_p1.c. A common linker script appears to be used to define the positioning of such code, found in the following locations:

The p1_main section appears at 0xF4000000, which for the AUX core appears to address TCSM1. So, code in this section may be intended for the AUX core and define a dedicated process running on that core.

AUX Operations

Reset

According to libmpeg4/jzsoc/jz4760_dcsc.h, AUX is reset by writing to bit 0 (SW_RST) of AUX_CTRL.

Start

According to libmpeg4/jzsoc/jz4760_dcsc.h, AUX is started as follows:

  1. AUX_CTRL is set to 1, initiating a software reset (SW_RST)
  2. AUX_CTRL is set to 2, initiating a non-maskable interrupt (SW_NMI)

Since the NMI_DIS field will be clear in both of the above operations, the NMI initiation will cause the AUX core to start executing code at 0xF4000000.

Codecs

H.264

Files: libh264/jzsoc/h264_p1.c, libh264/jzsoc/h264_cavlc_p1.c

P1_START_BASE is defined in libh264/jzsoc/mem_def_falcon.h and included files as...

GP0_DHA_BASE + GP0_DHA_SIZE (TCSM0_1VALUE + 4) + (4 * 4) ((C_TCSM0_BASE + 16) + 4) + (4 * 4)

...which corresponds to 0x132B0000 + 36.

It is set to zero at the start of processing and to AUX_END_VALUE (0x5A5A) at the end of processing, having posted a message word of 1 to AUX_MSG.

MPEG2

mpeg2_parse

File: libmpeg2/decode.c

Resets the VPU using the CPM_VPU_SWRST register. Calls mpeg2_slice.

mpeg2_slice

File: libmpeg2/slice.c

Calls (depending on the code parameter) M2D_SliceInit and M2D_SliceInit_ext.

Initialisation:

  1. VDMA_TASKST is set to zero.
  2. SDE_STAT is set to zero.
  3. SCH_STAT is set to zero.
  4. SCH_GLBC is cleared apart from HIAXI which is set.
  5. VDMA_TASKRG is set to (VDMA_ACFG_DHA(s->des_pa) | VDMA_ACFG_RUN).

In the above, s->des_pa is the physical address of the allocated frame, defined in decode.c as follows:

mpeg2dec->decoder.vdma_base = (uint8_t *) jz4740_alloc_frame(128, 0x2000);

There is then a loop whose continuation condition is that SCH_STAT bit 0 is clear. In other words, it presumably terminates when SCH_STAT bit 0 is set. Presumably, SCH_STAT bit 0 has a similar role to VDMA_TASKST bit 0, which is VPU_BUSY.

M2D_SliceInit

File: libmpeg2/soc/jzm_mpeg2_dec.c

Sets up the DMA configuration using GEN_VDMA_ACFG.

M2D_SliceInit_ext

File: libmpeg2/soc/jzm_mpeg2_dec.c

Sets up the DMA configuration using GEN_VDMA_ACFG.

2D DMA

The jz4760e_2ddma_hw.h file appears to describe DMA for transferring data between the TCSM and SRAM areas. However, its definitions appear to employ the VDMA region, together with other neighbouring regions, differently to that described in jzm_vpu.h. The JZ4770 manual does, however, provide a description of the descriptor formats.

Register Offset Description
GPx_DHA 0x*0000 Descriptor head address
GPx_DCS 0x*0004 DMA command/status

GPx_DHA

Field Bits Description
DHA 31..2 Descriptor head (physical) address
... 1..0 ...

Technically, the whole register represents the address at word (4-byte) resolution.

GPx_DCS

Field Bits Description
BTN 31..16 Transfer number byte
NDN 15..8 Transfer node number
... 7..3 ...
END 2 End of transmission
RST 1 DMA software reset
SUP 0 DMA start up

The DCS register employs bit 2 as an apparent completion indicator.

Descriptors

The descriptors employed by the GP0, GP1 and GP2 channels also appear to be different to those employed in the VDMA initialisation:

Offset Bits Description
0 31..0 TSA - transfer source address
4 31..0 TDA - transfer destination address
8 31..30 TYP - transfer size type
: 29..16 TST - transfer source stride
: 15..14 FRM - DDR page size optimisation
: 13..0 TDT - transfer destination stride
12 31 TAG - "current node link"
: 30..16 TRN - row width in bytes
: 15..0 NUM - transfer total in bytes

The transfer size types (TST) can be...

The page size optimisation (FRM) can be...

The "current node link" (TAG) appears to be 1 at the final descriptor, 0 otherwise.

The file libffmpeg2/jzsoc/ffmpeg2_p1.c provides an example of a transfer from TCSM0 to TCSM1:

set_gp0_dha(TCSM1_PADDR(DDMA_GP0_SET));
*((volatile int *)(DDMA_GP0_SET + TSA))  = TCSM0_PADDR(*tcsm1_fifo_rp);
*((volatile int *)(DDMA_GP0_SET + TDA))  = TCSM1_PADDR(dMB);
*((volatile int *)(DDMA_GP0_SET + STRD)) = GP_STRD(TASK_BUF_LEN, 0,
                                                   TASK_BUF_LEN);
*((volatile int *)(DDMA_GP0_SET + UNIT)) = GP_UNIT(1, TASK_BUF_LEN,
                                                   sizeof(struct MPEG2_MB_DecARGs));
set_gp0_dcs();
poll_gp0_end();

This employs physical addresses for the descriptor, source address and destination address. GP_STRD defines the source and destination strides with no DDR optimisation. GP_UNIT defines the final descriptor with the row width as the same as the stride, with the total transfer being defined by the size of the referenced structure.

VDMA

The macro GEN_VDMA_ACFG in jzm_vpu.h appears to be involved in writing descriptor fields in the following format:

Offset Bits Description
0 31..0 Value
4 31 ACFG_VLD
: 30 ACFG_TERM (terminate)
: ...
: 19..0 ACFG_IDX - register offset (bits 1..0 ignored)

Decoder initialisation such as in the following files employs this macro to set up DMA descriptors:

Files

Some files in the MPlayer sources that provide information.

jz47_vae_map.c

Mapping of memory regions. This appears to contain the definitive regions for the JZ4780.

jz47_soc_mem.c

IPU-related memory mapping and allocation.

jzm_vpu.h

VPU register definitions and codec data. libh264/jzm_vpu.h appears to be the most complete of the definitions, with an older version appearing in libmpeg2/soc/jzm_vpu.h and libvc1/soc/jzm_vpu.h.

Note that libjzcommon/t_vputlb.h provides differing definitions but ones that appear to be actually used, whereas those in jzm_vpu.h may not be. Another variant of t_vputlb.h exists in libmpeg4/jzsoc alongside files such as t_motion.h and t_intpid.h.