|
What Can It Do?
The main
feature is fast access to a flat, 24-bit
address space. Although there are many ways to retrofit a 'C02 with
expanded memory, AFAIK none can touch this implementation for speed. Given
any random address in the 16 megabyte space, KK can fetch that byte in
6-8 cyles.
It's
fast because:
- the extended
addressing is part of
the instruction
set. There's no MMU to spoon-feed.
- memory is
organized as blocks of 64K (not a fraction
thereof, such as 16K)
Units of 64K make it easy to treat the entire memory as a linear, 16
megabyte space. That's because the results of address arithmetic
— a
24-bit addition, for example — are directly acceptable by the
machine. The most-significant byte is
the block address. This is far more efficient than a scheme using, say,
16K
blocks because a 16K-block system needs
to mask and shift the addition result to isolate the block
address before it
can be passed to the hardware.
Efficient linear addressing
opens the door to a modest range of "big data" applications
—
tasks which would cause
most expanded-memory 8-bit machines to hit the wall. They suffer an
order-of-magnitude
speed disadvantage simulating a linear space.
How Does It Work?
Most
of the control
signals originate as microcode
fetched from an EPROM array.
The microcode runs on a state machine that's clocked in lock-step with
the CPU. (One state machine cycle = one CPU/bus cycle.)
The CPU directs instruction fetching. Simultaneous execution by it and
the state machine determines the result.
Block (aka bank)
addresses are
stored in Register File
A, whose read section (at the top of the diagram) feeds A23-A16. These
new
address lines — and A15-A0 directly from the 65C02
— are
what address the 16 megabyte space. (You'll see there's a
second
register file which shadows the first. This makes it possible
to read
back the stored bank addresses when necessary.) A single instruction is
all that's required to load a new bank address.
That's considerably simpler than
I/O operations on an MMU. And, in contrast to an MMU manipulation, there's no need to
save then later restore A, X, Y or P. These registers remain undisturbed.
The
coprocessor acts as an
exo-skeleton for
the 65C02.
Devices such as the register files connect to the data bus and receive
microcoded cues to update themselves from it.
Sometimes they drive
the bus when the CPU thinks it's
reading data, instruction opcodes or instruction operands from
memory. Some of
the 46 undefined (aka illegal) 65C02
opcodes get aliased by the 32 x 8 PROM before they reach
the CPU. Others are used "as is." In fact, some of the 65C02's
so-called NOPs actually generate
an address and use the bus, even though the data is discarded. This odd
behavior turns out to present
important opportunities. But all these details are invisible; the
programmer
simply has 44 new instructions available (for a total of 254).
What Are
The New Instructions and Registers?
- instructions
that load and save the bank
addresses
cued up in the register file (K0, K1, K2 & K3)
- instructions
that actually output
a bank
address onto
A23-A16 (usually on a transient basis)
- miscellaneous
K0 is presented
almost continuously on address lines A23-A16, as it's used
for all code fetches. K0 is also the default
for data accesses. In the absence of a specification
for K1, K2 or K3, it will be K0 that's used, which means the
data access will
occur in the same bank as the currently-executing code. (A exception
applies when stack and zero-page address modes are used. Such accesses
always use
bank $00 —
not to be confused with K0).
Three single-byte prefix
instructions are
associated with registers K1, K2 and K3. Use of a prefix is one way to
specify a "Far" data access — that is, one whose bank address
is
independent of where the currently-executing code resides. The
prefix is followed by and acts upon any typical 65C02
instruction such as INC
Absolute, CMP Indirect,Y etc.
Here's the sequence. At run-time the CPU fetches the prefix byte but
ignores it. Even the
coprocessor takes no immediate action. Next comes the target
instruction, and, as usual, off-chip logic co-executes every cycle.
This includes
the extra cycles for zero-page indirection and other variations.
Then K1,
K2
or K3 is read out "on cue" for the data transfer which is the final
cycle
of the
instruction (final three
cycles for
Read-Modify-Write). All
of the CPU's 64K possible addresses are re-mapped by the bank switch.
Then K0 is
re-selected for A23-A16, an opcode fetch occurs, and the program
proceeds without missing a beat. (The only added delay was one cycle
for the prefix.) Many combinations
of instructions and address modes can use the prefixes and
thus become
Far. Considering all the combinations, you
could say there are hundreds
of new instructions,
not
just 44.
Because they open a
"Far" dimension for so many 65c02 instructions, the
prefixes are very general in their applicability. But, as noted, a one
cycle
penalty applies. Even this can be avoided, albeit with a loss
of
generality. The
separate
prefix may be omitted for six specific cases
involving LDA and STA, since six specific, "all-in-one"
opcodes
are
provided. Specific
opcodes are also provided for JMP_K3,
JSR_K3
and RTS_K3.
These
instructions include an operation that exchanges
K3 and K0
—
and, because K0 is updated, a new 64K bank becomes the default.
(Happily, this does not imply alternative zero-pages and stacks!
Recall that
accesses using stack and zero-page address modes
always use bank zero, regardless of what K0 may contain.)
Bank-address
registers K1, K2 and K3 load
themselves in the same
manner that X,
Y and A load themselves — that is, with
specific opcodes provided for the purpose. The available
address modes are Immediate, Absolute, Zero-pg and Zero-pg, X. These
three
registers can also be pushed and pulled from stack, and K0
can be
pushed.
Altogether there are 34 instructions for performing Far jumps, Far data
accesses, and for loading and saving bank
addresses.
Miscellaneous
instructions and registers
A few highlights
from this disparate group are as
follows. The SCAN_K3
instruction forces the
CPU to rapidly read a long string of bytes from memory, as part of a
program that
outputs video.
W is a 16-bit
register readable in zero-page, one of
whose functions is double-indexed
addressing. This is a
two-step
process that starts with
an instruction coded to use (Z-pg,X) mode. The cpu does the index
addition then fetches a two-byte pointer from zero-page as
it completes the instruction. KK copies the two-byte
pointer on the fly, making it subsequently
available in Z-pg at W (with no need to repeat the index
addition). If a
subsequent instruction
is coded to use (W),Y then the result of the two-instruction
sequence is
(Z-pg,X),Y mode. It's
equivalent to using X to index to a
pointer,
fetching the pointer, then indexing again into a data array. This is
not so unusual, given that even a 16-bit word (two bytes) constitutes
an
array.
Double-indexed addressing accelerates Forth mainstays such as @ and !
(fetch and store).
IP is a
16-bit register whose most notable function
is as a pointer
for the JMP((IP++)) instruction. This double-indirect jump
with post-increment
is a hardware realization of Forth's ubiquitous NEXT
operation. Hardware
NEXT has quadruple
the speed of the code sequence it replaces.
Overall Forth program speed increases
by about 90%.
Conclusion
KimKlone's CPU and
the off-chip accessories function as a unified
whole, much the same as a monolithic device created in a
wafer fab. The design is notable for introducing major features
despite the limitations of a legacy Programming Model and even legacy
silicon.
The linear memory
organization allows efficient
manipulation of objects larger than 64K — a
capability which is absent
from commercial 6502 microcomputers and from microprocessors such as
the MOS 6509 and the Hudson Soft 6280
and which, in retrospect, is more suggestive
of the WDC 65816.
(The KK was created shortly after but without any influence from the
65816.)
|
|