Laughton Electronics | the KimKlone: a radical 6502 redesign

the KimKlone:
a Radical 6502 Redesign

photo of the entire KimKlone circuit board

What Is It?

It's a microcomputer with a cmos 6502 and a coprocessor — ie; off-chip logic capable of interpreting the instruction stream.

What Does The Coprocessor Do?

It assimilated the 6502! The collective Programming Model has 6 new registers and 44 new instructions, including NEXT (as used by Forth). The new instructions run at full speed, and it's all transparent from an assembly programmer's point of view. In other words, no difference between native and newly minted instructions. One main feature is KimKlone's 24-bit address space. That's been done before, but this implementation preserves most of the speed of the processor. It's fast because:

the extended addressing is part of the instruction set. There's no MMU to spoon-feed.
memory is organized as banks of 64K (not a fraction thereof, such as 16K)

Units of 64K make it easy to treat the entire memory as a linear, 16 MB space, and to walk through it with simple pointer arithmetic. This opens the door to a modest range of "big data" applications — tasks which would cause most expanded-memory 8-bit machines to hit the wall. They suffer an order-of-magnitude speed disadvantage simulating a linear space.

How Does It Work?

Most of the control signals originate as microcode fetched from an EPROM array. The microcode runs on a state machine that's clocked in lock-step with the CPU. (One state machine cycle = one CPU cycle = one bus cycle.) The CPU directs instruction fetching. Simultaneous execution by it and the state machine determines the result.

Bank addresses are stored and retrieved from a group of 74HC670's which form a small, multi-port register file. One of the read ports connects to the address bus, driving A23-A16. These new address lines — and A15-A0 directly from the 65C02 — are what address the 16 MByte space. The register file also connects to the data bus, allowing bank addresses to be loaded and recalled. Specific instructions are provided for this, an arrangement that's more direct than I/O operations on an MMU. It also allows registers A, X and Y to remain undisturbed.

Although the coprocessor has its own registers, it doesn't actually compute anything. It's more like an exo-skeleton for the 65C02. Devices such as the register file connect to the data bus and can be instructed to update themselves from it. Sometimes they drive the bus when the CPU thinks it's reading data (or even an instruction operand) from memory! Opcodes get edited, too. Some of the 46 undefined (aka illegal) 65C02 opcodes get replaced with substitutes before they reach the CPU. Others are used "as is" — they cause the CPU to generate addresses, which makes them vitally useful. When an illegal opcode is replaced the substitute may be either another illegal opcode, a normal instruction or a NOP. But these details are invisible; the programmer simply has 44 new instructions available (for a total of 254).

What Are The New Instructions and Registers?

instructions that load and save the four bank addresses cued up in the register file
instructions that actually output a bank address onto A23-A16 (often on a transient basis)
miscellaneous

The Current Code Pointer, aka CCP, tends to dominate. It is the default for readout to the high address lines A23-A16, and it prevails for the majority of bus cycles, including those for code access and those for Near data accesses — ie, access within the Current 64K bank. The other three words in the '670 register file appear to the programmer as Data Pointer registers DP0, DP1 and DP2.

Three single-byte prefix instructions are associated with DP0, DP1 and DP2. A Far data access usually takes the form of a prefix followed by an ordinary 65C02 instruction (eg: INC Absolute, CMP Indirect,Y etc). At run-time the coprocessor makes note of the prefix and the CPU just steps over it. Next comes the target instruction, and, as usual, off-chip logic co-executes every cycle. This includes the extra cycles for zero-page indirection and other variations. Ultimately DP0, DP1 or DP2 is read out "on cue" for the data transfer which is the final cycle of the instruction (final three cycles for Read-Modify-Write). All of the CPU's 64K possible addresses are re-mapped by the bank switch. Then CCP is re-selected for A23-A16, an opcode fetch occurs, and the program proceeds without missing a beat. Almost all instructions and address modes can use the prefixes and thus become Far. Considering all the combinations, you could say there are hundreds of new instructions, not just 44.

The prefixes provide a flexible overall mechanism, but there are some specialized provisions as well. Six of the most frequently used forms of Far LDA and Far STA have their own unique opcodes, thus eliminating the one byte, one cycle overhead of a prefix. Specific opcodes are also provided for Far JMP, Far JSR and Far RTS. These instructions include an operation that exchanges CCP and DP2 — and, because CCP is updated, a new 64K bank becomes the default. Happily, this does not imply alternative zero-pages and stacks! The default doesn't apply to stack and zero-page cycles; microcode always directs these accesses to bank zero.

Data Pointer registers DP0, DP1 and DP2 load themselves, responding to specific opcodes that can select Immediate, Absolute, Z-pg or Z-pg, X address mode. These three registers can also be pushed and pulled from stack. CCP can only be pushed. Altogether there are 34 instructions for performing Far jumps, Far data accesses, and for loading and saving bank addresses.

Miscellaneous instructions and registers

A few highlights from this disparate group are as follows. The SCAN instruction forces the CPU to rapidly read a long string of bytes from memory, as part of a program that outputs video.

W is a 16-bit register readable in zero-page, one of whose functions is double-indexed addressing. This involves W capturing the Effective Address of certain instructions coded with the 6502's (Z-pg,X) mode and making the EA available in Z-pg. Therefore it's possible to follow the (Z-pg,X) instruction with one using (W),Y and the result of the two-instruction sequence is (Z-pg,X),Y mode. It's equivalent to using X to index to a pointer, fetching the pointer, then indexing again into a data array. This is not so unusual, given that even a 16-bit word (2 bytes) constitutes an array. Double-indexed addressing accelerates Forth mainstays such as @ and ! (fetch and store).

IP is a 16-bit register whose most notable function is as a pointer for the JMP((IP++)) instruction. This double-indirect jump with post-increment is a hardware realization of Forth's ubiquitous NEXT operation. Hardware NEXT has quadruple the speed of the code sequence it replaces. Overall Forth program speed increases by about 90%.

Conclusion

KimKlone's CPU and the off-chip accessories function as a unified whole, much the same as a monolithic device created in a wafer fab. The design is notable for introducing major features despite the limitations of a legacy Programming Model and even legacy silicon.

The linear memory organization allows efficient manipulation of objects larger than 64K — a capability which is absent from commercial 6502 microcomputers and from microprocessors such as the MOS 6509 and the Hudson Soft 6280 and which, in retrospect, is more suggestive of the WDC 65816. (The KK was created shortly after but without any influence from the 65816.)

follow @Registerhead on Twitter

VIEW THE MAIN ARTICLE (including the Kimklone FAQ & photo gallery)
view my other articles pertaining to arcane processor hardware
view the main index of my site