LAUGHTON ELECTRONICS

Cheap Video à la Lancaster and the back story re: my KIM-1

Lancaster book covers

At the dawn of the microcomputer era, back in the days of the 8080 and the Motorola 6800, a certain Pioneer of the Art published plans for a remarkable microprocessor interface. Don Lancaster published a series of books, including The Cheap Video Cookbook and Son of Cheap Video. These explain an unorthodox technique that allows rudimentary microcomputers like the Altair and the KIM-1 to generate video output.

The novelty of Lancaster's approach impressed me deeply, and I took to heart an important lesson: sometimes the most expedient way to solve a problem is to "lie" to the machine! (See below.)

Incidentally, the title of my KimKlone article is a grateful acknowledgment to Mr Lancaster and his humorous style:
  ( Lancaster's book: )    Cheap Video Cookbook
  ( Lancaster's book: )    Son of Cheap Video
  (my spin-off article:)  Bride of Son of Cheap Video

Cheap Video and Lying To the Machine

Cheap Video is a means of outputting video without the need for DMA hardware or a Video Controller chip with dedicated video memory. Instead, a portion of the existing system RAM serves as video memory, and, instead of DMA, what's used is programmed I/O. In other words, video is generated as the output of an actual program running on the computer. This would ordinarily be impossible due to the very high data rate required, but Cheap Video slips a joker in the deck — a simple hardware trick (described later) which fools the CPU.

At the heart of the video program is a loop, and each iteration of this inner loop outputs one row of pixels, corresponding to one horizontal sweep (or "scan") of the CRT/LCD monitor. Each loop iteration begins with the CPU making a Jump To Subroutine (JSR) to some address within a portion of memory you've chosen to use as the video buffer. That's right — it jumps to an address where data is stored!

Thanks to the unusual hardware I mentioned, what the CPU "sees" in the buffer is not pixel data. Instead, those addresses appear to contain a dubious subroutine composed of dozens of ORA # $00 instructions. (Alternatively something like AND # $FF or CMP # $C9 instructions could be used. The effect is the same — namly a two-byte, two-cycle NOP.). Naturally the CPU follows orders and executes these virtual NOPs. Then the do-nothing subroutine terminates with an RTS.

What's noteworthy is that the address bus rapidly and steadily increments, once per cycle, as the NOPs and the first two cycles of the RTS execute. In other words, the CPU's PC register spends a few dozen cycles behaving like an ordinary 16-bit counter... and, it is counting its way through a selected portion of the video buffer. This is the "action" part of the sequence, and the CPU is doing what we would want a DMA controller to do — quickly read a series of bytes from memory. Each byte is immediately sent to a shift register that serializes the bits in order to output the video bit stream.

After the RTS occurs the spell is broken. We stop fetching from the video buffer, and the RTS's return address takes us back to finish the rest of the loop.We output a horizontal sync pulse (typically via a parallel port bit), we compute a new address to be used by the next JSR, then usually the loop reiterates. There's no loop exit until there have been enough scans (horizontal lines) to refresh the entire screen from top to bottom — ie; one frame. To produce a continuous succession of frames, the inner loop is wrapped in an outer loop that ultimately outputs the Vertical Sync pulse and rolls the JSR address back to its top-of-the-screen value.

The scheme just described produces a bit-mapped display and no interlacing. (Interlacing can be had by altering the software.) Character-based displays are also readily possible. One option is to throw hardware at the problem and install a Character Generator ROM. But that approach may not be worthwhile, given that the same result can be obtained by software. You can simply use the bit-mapped display and have it updated by an assembly-language character-drawing routine.

The sneaky trick mentioned earlier is what causes the CPU to see the buffer area as containing quasi-NOPs rather than what's really there (the video data). Here's how it's done:

Usually when a CPU sends out an address, memory will faithfully reply with the byte stored at that address. But with Cheap Video a major connection — that between the data buses — gets temporarily severed. This lets Cheap Video "lie" about what's in memory. (See the diagrams above, Business as Usual and Cheap Video.) During the "action" part of each scan, the bytes fetched onto the memory data bus don't get relayed back to the CPU's data bus. Instead, the bytes (ie; the pixel data we needed to fetch) get shipped off to the video display. Meanwhile, some Cheap Video flimflam logic feeds the CPU bus a brazen fabrication, a persistent ORA # $00 (and eventual RTS) which appear to reside at the addresses actually containing data.

Obviously there needs to be a mechanism that cues hardware regarding when to suspend reality and produce dummy op-codes. Lancaster's version takes its cues from the values appearing on the address bus. A portion of the 64K map — perhaps 4K or 8Kbytes in size — is recognized by the decode hardware as the video buffer. When scanning is enabled, from the CPU point of view the entire buffer region is filled with repeated images from a 32-byte PROM containing mostly ORA # $00 instructions plus an RTS. From this, clever wiring and coding can yield 40- and 80-byte-wide displays, although of course power-of-two widths such as 64 are easier. With all cheap-video schemes, proper scan timing depends hugely on how the code is written — particularly that the execution time mustn't vary from one line to the next.

The KimKlone is not cued by addresses. Instead of JSR, the inner loop uses a KimKlone JSR variant (coded as opcode $33) to initiate the scan. Scans terminate according to a VIA timer cycling at the horizontal frequency. Under this system the video buffer — or an array of them — can reside anywhere in KK's 16 MByte space.

Lancaster realized that a microprocessor is capable of burst-reads of memory, sustained a rate of one byte every cycle, even though conventional processing uses only sporadic accesses to small chunks of data. But prolonged sequences of memory reads do occur as the chip fetches the bytes of its program. The CPU unwittingly mimics a 16-bit counter or a DMA controller, with its address bus outputting an ascending 16-bit count.

I am indebted to Mr Lancaster for the lesson I learned from Cheap Video, namely that a microprocessor can readily be manipulated by hardware tricks in order to produce unusual behaviors that are useful. The KimKlone, of course, relies very heavily on this principle.

my KIM (the original mashup) and its mutant spawn, the KimKlone

My very first computer was a KIM-1 — the classic, 1-MHz 6502 board from MOS Technology. I hadn't had it long before I added some extra RAM (2114's), a pair of 6522's, an ASCII keyboard & a paper tape reader and, of course, Cheap Video. But around 1980 I switched the focus from video to memory-space expansion. The reason? On the surplus market I'd acquired a DRAM board of 128K capacity! I was agog; I felt hypoxemic. This utterly outclassed my previous expansion of 8K! And of course it was twice as much as the processor could address.

I decided to down-rate the new board to 112K, which allowed the new memory, the pre-existing memory and the I/O space all to reside within 128K. Then I devised a circuit which recognized some of the undefined aka "illegal" 65c02 opcodes and used them as cues to direct access between "this" bank and "the other" bank. As with the KimKlone (which came later), the banks were a full 64K in size. This contrasts sharly with conventional expansion schemes, which are restricted to a comparatively small "window" (eg, 16K) into the expanded space. Bank switches were impemented as transient events lasting less than one instruction cycle; my new circuitry had to manage its task on a bus cycle by bus cycle basis.

If I recall correctly, the deal with my KIM was that each illegal op-code of the pattern xxxxx011 would cause the upper, don't-care bits (the xxxxx) to select one of thirty-two 8-bit patterns held in a TTL PROM, and the selected pattern was parallel- loaded into a shift register and regurgitated serially. The xxxxx011 op-code acted as a prefix instruction, and the shift register would trot out the corresponding pattern, one bit per cycle, while the following instruction — the target of the prefix — executed. The target would be a normal 65xx memory reference instruction such as INC Absolute, STA Indirect, CMP Indirect-Y or whatever. The shift register's serial output toggled a flip-flop feeding A16, the most-significant address line. Typical timing patterns caused A16 to flip from one 64K bank to the other for a single bus cycle only, exactly during the time the target instruction performed its fetch or store. (Read-Modify-Write instructions used patterns that produced a three-cycle bank switch.) There were other capabilities as well: for instance you could JMP to the alternate bank and stay there, or do a Far JSR and later a Far RTS. The exact details escape me. But the 65c02's 64K address limit was transcended by using undefined op-codes as prefixes to specify Far addressing for legacy instructions.

The arrangement I've described was perfectly functional, but a more elegant solution would be to infer timing information directly from the target op-code. The KIM circuit didn't even sample the target instruction; its behavior depended solely on the prefix. So, instead of just a few prefixes, a few sets of prefixes had to be made available, with members of each set identical except in regard to timing. That's how I was able to match the timing of the CPU as it executes different target instructions using different address modes. It seemed a shame to use all those undefined op-codes so inefficiently, but with the KIM it didn't really matter because there was nothing else that needed to be controlled. Later the KimKlone, a "clean sheet of paper" design, pushed the envelope a great deal further. See KimKlone Short Summary

visit
LAUGHTON
ELECTRONICS

Projects
Servicing the unserviceable
Main/extra index

copyright notice (Jeff Laughton)