LAUGHTON ELECTRONICS

The KimKlone: Bride of Son of Cheap Video

X-Indirect-Y Addressing Using the W Register

Addr
Bus
Data
Bus
Comment
1 PC CBh/A1h Substitution results in
LDA (ind,X) OpCode
2 +1 0 inst. operand
3     (65C02 dead cycle)
4 X ptr low ptr low --> W low
5 +1 ptr high ptr high --> W high
6 ptr=W
=TOS
? result low
Timing for LDAW (0,X). In the code example at right, this special instruction fetches the first byte — using X-Indirect mode. It also does the copy operation which acts as the setup to subsequently get the other byte by using Indirect-Y mode (below).


Addr
Bus
Data
Bus
Comment
1 PC B1h LDA (ind),Y OpCode
2 +1 Wreg inst. operand
3 Wreg W low  
4 +1 W high   
5 W+Y=
TOS+1
? result high
To get the other byte(s) simply use Indirect-Y. This is a perfectly ordinary 65xx mode. Wreg is a text equate for the address in Z-Pg where W can be read.

Another use for W, besides its role in the NEXT instruction, is to give the 65C02 an extra address mode (more or less). Of course 65xx chips already have Indexed Indirect address mode and Indirect Indexed address mode, but the W register lets us emulate Indexed Indirect Indexed mode — Ie; addressing which is indexed both before and after the fetch of the indirect pointer.

This oh-so-esoteric capability is actually startlingly useful — especially in the Forth context, where it's very common for an address on stack to point to a multi-byte structure. This implies two steps. We want to index via X into the Forth data stack to find the pointer to the structure. Then, since the pointer only indicates the base of the structure, we index from there to access the other byte(s).

Here is a simple example. The Forth word @ (pronounced "fetch") treats the top-of-stack value TOS as an address, returning the value "at" that address. The value at the address is simply a 16-bit number — a basic instance of a multi-byte structure.

Classic version of @
(36 cycles typical)

LDA (0,X)      ;get byte at base of structure
PHA            ;stash byte
INC 0,X        ;overwrite the pointer itself!
BNE IncDone
INC 1,X
LDA (0,X)      ;get byte at base+1
STA 1,X        ;return result hi-byte
PLA            ;un-stash
STA 0,X        ;return result lo-byte

KimKlone @
  (21 cycles; 19 cycles if Y is already = 1)

LDAW (0,X)       ;get byte at base. Copy ptr to W.
STA 0,X          ;return result low byte
LDY #1           ;omit this if Y=1 by convention
LDA (Wreg),Y     ;Get byte at base+1
STA 1,X          ;return result high byte

Here's what happened. The instruction LDAW (0,X) is the same as LDA (0,X) but with the added feature that, on cycles 4 and 5 when the Zero Page pointer is accessed, the two pointer bytes are copied to the W register. This copy operation is equivalent to...

LDA 0,X
STA Wreg
LDA 1,X
STA Wreg+1

... except no extra cycles are consumed, and A is actually not involved. (See the upper-left chart if the underlying hardware behavior interests you.)

Thereafter the top-of-stack byte pair is accessible without any need for X indexing, because KK address decoding provides access to W at Wreg, a fixed pair of addresses in Zero Page. Of course a fixed pair of addresses in Zero Page can be used for indirect-Y addressing, and that's how the example concludes. (Incidentally, KK Forth sets Y=1 on startup and, by convention, restores Y=1 if it has ever gotten set otherwise.)

To match LDAW, the KK also has a STAW instruction, which for example is handy for accelerating Forth's ! ("store") operation. But LDA and STA aren't the only op's that can use the new address mode. There's always the option of using a "dummy" LDAW or STAW for the sole purpose of rapidly copying a pointer to W. Then subsequent code can use ADC or EOR or whatever Indirect-Y instructions it needs to get the job done.

X-Indirect-Y addressing is a significant asset for applications (including Forth) which need to index both before and after the fetch of an indirect pointer. In our example the performance boost is 89%. The main limitation is that W can only address one item at a time. But the impact of that is slight, given that W loads (and can be reloaded) at such high speed — 6 cycles. In many cases the one-item-at-a-time limitation has no impact at all.


<Previous Page   KK Index     Next Page >
visit
LAUGHTON
ELECTRONICS

Projects
Servicing the unserviceable
Main/extra index

copyright notice (Jeff Laughton)