What Can It Do?
feature is fast access to a flat, 24-bit
address space. Although there are many ways to retrofit a 'C02 with
expanded memory, AFAIK none can touch this implementation for speed. Given
any random address in the 16 megabyte space, KK can fetch that byte in
Units of 64K make it easy to treat the entire memory as a linear, 16
megabyte space. That's because the results of address arithmetic
24-bit addition, for example — are directly acceptable by the
machine. The most-significant byte is
the block address. This is far more efficient than a scheme using, say,
blocks because a 16K-block system needs
to mask and shift the addition result to isolate the block
address before it
can be passed to the hardware.
Efficient linear addressing
opens the door to a modest range of "big data" applications
tasks which would cause
most expanded-memory 8-bit machines to hit the wall. They suffer an
speed disadvantage simulating a linear space.
- the extended
addressing is part of
set. There's no MMU to spoon-feed.
- memory is
organized as blocks of 64K (not a fraction
thereof, such as 16K)
How Does It Work?
of the control
signals originate as microcode
fetched from an EPROM array.
The microcode runs on a state machine that's clocked in lock-step with
the CPU. (One state machine cycle = one CPU/bus cycle.)
The CPU directs instruction fetching. Simultaneous execution by it and
the state machine determines the result.
Block (aka bank)
stored in Register File
A, whose read section (at the top of the diagram) feeds A23-A16. These
address lines — and A15-A0 directly from the 65C02
what address the 16 megabyte space. (You'll see there's a
register file which shadows the first. This makes it possible
back the stored bank addresses when necessary.) A single instruction is
all that's required to load a new bank address.
That's considerably simpler than
I/O operations on an MMU. And, in contrast to an MMU manipulation, there's no need to
save then later restore A, X, Y or P. These registers remain undisturbed.
coprocessor acts as an
Devices such as the register files connect to the data bus and receive
microcoded cues to update themselves from it.
Sometimes they drive
the bus when the CPU thinks it's
reading data, instruction opcodes or instruction operands from
memory. Some of
the 46 undefined (aka illegal) 65C02
opcodes get aliased by the 32 x 8 PROM before they reach
the CPU. Others are used "as is." In fact, some of the 65C02's
so-called NOPs actually generate
an address and use the bus, even though the data is discarded. This odd
behavior turns out to present
important opportunities. But all these details are invisible; the
simply has 44 new instructions available (for a total of 254).
The New Instructions and Registers?
that load and save the bank
cued up in the register file (K0, K1, K2 & K3)
that actually output
A23-A16 (usually on a transient basis)
K0 is presented
almost continuously on address lines A23-A16, as it's used
for all code fetches. K0 is also the default
for data accesses. In the absence of a specification
for K1, K2 or K3, it will be K0 that's used, which means the
data access will
occur in the same bank as the currently-executing code. (A exception
applies when stack and zero-page address modes are used. Such accesses
bank $00 —
not to be confused with K0).
Three single-byte prefix
associated with registers K1, K2 and K3. Use of a prefix is one way to
specify a "Far" data access — that is, one whose bank address
independent of where the currently-executing code resides. The
prefix is followed by and acts upon any typical 65C02
instruction such as INC
Absolute, CMP Indirect,Y etc.
Here's the sequence. At run-time the CPU fetches the prefix byte but
ignores it. Even the
coprocessor takes no immediate action. Next comes the target
instruction, and, as usual, off-chip logic co-executes every cycle.
the extra cycles for zero-page indirection and other variations.
or K3 is read out "on cue" for the data transfer which is the final
instruction (final three
of the CPU's 64K possible addresses are re-mapped by the bank switch.
Then K0 is
re-selected for A23-A16, an opcode fetch occurs, and the program
proceeds without missing a beat. (The only added delay was one cycle
for the prefix.) Many combinations
of instructions and address modes can use the prefixes and
Far. Considering all the combinations, you
could say there are hundreds
of new instructions,
Because they open a
"Far" dimension for so many 65c02 instructions, the
prefixes are very general in their applicability. But, as noted, a one
penalty applies. Even this can be avoided, albeit with a loss
prefix may be omitted for six specific cases
involving LDA and STA, since six specific, "all-in-one"
opcodes are also provided for JMP_K3,
instructions include an operation that exchanges
K3 and K0
and, because K0 is updated, a new 64K bank becomes the default.
(Happily, this does not imply alternative zero-pages and stacks!
accesses using stack and zero-page address modes
always use bank zero, regardless of what K0 may contain.)
registers K1, K2 and K3 load
themselves in the same
manner that X,
Y and A load themselves — that is, with
specific opcodes provided for the purpose. The available
address modes are Immediate, Absolute, Zero-pg and Zero-pg, X. These
registers can also be pushed and pulled from stack, and K0
Altogether there are 34 instructions for performing Far jumps, Far data
accesses, and for loading and saving bank
instructions and registers
A few highlights
from this disparate group are as
follows. The SCAN_K3
instruction forces the
CPU to rapidly read a long string of bytes from memory, as part of a
W is a 16-bit
register readable in zero-page, one of
whose functions is double-indexed
addressing. This is a
process that starts with
an instruction coded to use (Z-pg,X) mode. The cpu does the index
addition then fetches a two-byte pointer from zero-page as
it completes the instruction. KK copies the two-byte
pointer on the fly, making it subsequently
available in Z-pg at W (with no need to repeat the index
addition). If a
is coded to use (W),Y then the result of the two-instruction
(Z-pg,X),Y mode. It's
equivalent to using X to index to a
fetching the pointer, then indexing again into a data array. This is
not so unusual, given that even a 16-bit word (two bytes) constitutes
Double-indexed addressing accelerates Forth mainstays such as @ and !
(fetch and store).
IP is a
16-bit register whose most notable function
is as a pointer
for the JMP((IP++)) instruction. This double-indirect jump
is a hardware realization of Forth's ubiquitous NEXT
NEXT has quadruple
the speed of the code sequence it replaces.
Overall Forth program speed increases
by about 90%.
KimKlone's CPU and
the off-chip accessories function as a unified
whole, much the same as a monolithic device created in a
wafer fab. The design is notable for introducing major features
despite the limitations of a legacy Programming Model and even legacy
The linear memory
organization allows efficient
manipulation of objects larger than 64K — a
capability which is absent
from commercial 6502 microcomputers and from microprocessors such as
the MOS 6509 and the Hudson Soft 6280
and which, in retrospect, is more suggestive
of the WDC 65816.
(The KK was created shortly after but without any influence from the