######## ################## ###### ###### ##### ##### #### #### ## ##### #### #### #### #### #### ##### ##### ## ## #### ## ## ## ### ## #### ## ## ## ##### ######## ## ## ## ##### ## ## ## ## ## ##### ## ## ######## ## ## ## ### ## ## #### ## ## ##### #### #### #### #### ##### #### #### #### #### #### ###### ##### ## ###### ###### Issue #17 ################## November 15, 1998 ######## ............................................................................... "The words of the wise are as goads, and as nails fastened by the masters of assemblies." Ecclesiastes 12 "Before criticizing a man I try to walk a mile in his shoes. That way, if he gets mad he's a mile away and barefoot." John Ianetta ............................................................................... BSOUT For me, fall is a time for reflection. The trees descend into their golden time and then seem to die. And yet, under the surface they are quite alive, and teeming with activity at a smaller, less-visible scale, waiting to burst forwards again in full bloom. I think there's a great metaphor for C= in this. But I have no idea what it is. In fact things are totally hectic around here, and I haven't given more than a few moments thought towards the 64, so this will be a mighty short editorial. Between a PhD thesis and begging for jobs there hasn't been much 64-time, but with a little here and a little there this issue is finally squeaking out. Everybody worked hard over the summer, and my goal was to get it out in September. Well, you know, these days, if something in the 64 world is only two months late it's doing pretty good, so no big whoop. C=Hacking ought to appear reguarly after December, though. On the down side, some things, such as the challenge problem from last time, will have to wait until the next issue. I also stayed pretty low-key while putting this issue together, but future issues will be more public in soliciting articles (e.g. on comp.sys.cbm). In other news, a C64 C-compiler has finally appeared! This outstanding effort is the work of Ullrich von Bassewitz, uz@musoftware.de, the force behind Elite128 among other things. The cc65 webpage is at http://www.von-bassewitz.de/uz/cc65/ so have a look, and tell Ullrich what a stellar guy he is :). Also, as most of you know, the Chicago Expo took place on October 24, and was a real hoot! Check out http://driven.c64.org/ for some nice pictures from the Expo, taken by Mark Seelye. Meanwhile, sit back, relax, and enjoy these latest musings from these, our masters of assembly. ....... .... .. . C=H #17 ::::::::::::::::::::::::::::::::::: Contents :::::::::::::::::::::::::::::::::: BSOUT o Voluminous ruminations from your unfettered editor. Jiffies o Is it a bug or a feature? The C=Hallenge o To be continued... Side Hacking o "SuperCPU Software Repair", by S. Judd . An amateur's excursion into correcting errant wares. Main Articles o "An Optimizing Hybrid LZ77 RLE Data Compression Program, aka Improving Compression Ratio for Low-Resource Decompression", by Pasi 'Albert' Ojala Part two of a two-part article on data compression, giving a detailed description of the compression algorithms used in pucrunch, not to mention the decompression code. o "VIC-20 Kernel ROM Disassembly Project", by Richard Cini This is the first in a series of articles which aims to present a complete, commented disassembly of the VIC-20 ROMs. o Masters Class: "NTSC/PAL fixing, part I", by Russell Reed , Robin Harbron , and S. Judd. Sit up straight and pay attention. In the Masters Class, a Commodore luminary attempts to instruct a couple of ignorant plebians in his art. In this case, Robin and I set out to learn NTSC/PAL fixing from one of the greats, Decomp/Style. Our first fix, a demo from the obscure Finnish group Pu-239, is included, along with detailed descriptions of our experiences. o "The Herd Mentality", by Bil Herd This is a collection of entertaining musings on Commodore and the development of the C128, as provided by Bil Herd (and that's no bull). If you don't know who Bil Herd _is_, why not type SYS 32800,123,45,6 on a 128 sometime... .................................. Credits ................................... Editor, The Big Kahuna, The Car'a'carn..... Stephen L. Judd C=Hacking logo by.......................... Mark Lawrence Special thanks to Marko Makela, Olaf Seibert, and the rest of the cbm-hackers for their many otherwise unacknowledged contributions. Legal disclaimer: 1) If you screw it up it's your own damn fault! 2) If you use someone's stuff without permission you're a dork! For information on the mailing list, ftp and web sites, send some email to chacking-info@jbrain.com. About the authors: Pasi 'Albert' Ojala is a 29 year old software engineer, currently working at a VLSI design company on a RISC DSP core C compiler. Around 1984 a friend introduced him to the VIC-20, and a couple of years later he bought a 64+1541 to replace a broken Spectrum48K. He began writing his own BBS, using ML routines for speed, and later wrote a series of demos under the Pu-239 label. In addition to pucrunch and his many C=Hacking articles, Pasi was written an Amiga 1581 filesystem, a graphics conversion package, a C64 burstloader, and a number of demos. He is currently uses his 64 for hobbyist pursuits, and is contemplating multipart demos for the 64 and the VIC-20, in addition to future C=Hacking articles. Pasi is also a huge Babylon-5 fan, and has a B5 quote page at http://www.cs.tut.fi/~albert/Quotes/B5-quotes.html Richard Cini is a 31 year old senior loan underwriter who first became involved with Commodore 8-bits in 1981, when his parents bought him a VIC-20 as a birthday present. Mostly he used it for general BASIC programming, with some ML later on, for projects such as controlling the lawn sprinkler system, and for a text-to-speech synthesyzer. All his CBM stuff is packed up right now, along with his other "classic" computers, including a PDP11/34 and a KIM-1. In addition to collecting old computers Richard enjoys gardening, golf, and recently has gotten interested in robotics. As to the C= community, he feels that it is unique in being fiercely loyal without being evangelical, unlike some other communities, while being extremely creative in making the best use out of the 64. Robin Harbron is a 26 year old internet tech support at a local independent phone company. He first got involved with C= 8-bits in 1980, playing with school PETs, and in 1983 his parents convinced him to spend the extra money on a C64 instead of getting a VIC-20. Like most of us he played a lot of games, typed in games out of magazines, and tried to write his own games. Now he writes demos, dabbles with Internet stuff, writes C= magazine articles, and, yes, plays games. He is currently working on a few demos and a few games, as well as the "in-progress-but-sometimes-stalled-for-a-real-long-time- until-inspiration-hits-again Internet stuff". He is also working on raising a family, and enjoys music (particularly playing bass and guitar), church, ice hockey and cricket, and classic video games. ................................... Jiffies .................................. 0 REM SHIFT-L by the cbm.hackers Everybody knows the old REM shift-L trick in BASIC 2.0, which generates a syntax error upon listing. But why does it work? The answer turns out to be quite interesting. Normally, when the BASIC interpreter tokenizes a line it strips out characters with the high bit set. One exception is characters within quotes; the other is characters after REM. In those cases, characters are embedded literally into the program line. Now, BASIC tokens all have the high bit set. When LIST encounters a character with the high bit set, it checks whether it is in quote mode. If it is, the character is outputted as normal. If not, the character is treated as a token, which is expanded by QPLOP (located at $A717). The part of QPLOP which prints keywords looks like this: :LOOP1 DEX ;Traverse the keyword table BEQ :PLOOP :LOOP2 INY ;read a keyword LDA RESLST,Y BPL :LOOP2 BMI :LOOP1 :PLOOP INY ;Print out the keyword LDA RESLST,Y BMI LISTENT1 ;Exit if on last char JSR $AB47 ;Print char, AND #$FF BNE :PLOOP The keyword strings in RESLST all stored dextral character inverted (the last char has the high bit set), and the above code just moves forward through the table until it has counted out .X keywords. At that point, :PLOOP prints out the next keyword to the screen, and hops back into LIST. Shift-L is character code 204, or $CC. When LIST sees this inside of a REM, it sees the high bit set and de-tokenizes it. As it turns out, though, the last token is $CB, which is keyword GO (so that GO TO works). It also turns out that RESLST, the list of BASIC keywords, uses 255 characters. The 256th character is zero (value zero, not character zero). So, the above code goes through the list, and then prints out token $CC, the first character of which is a null. This zero is sent to $AB47. $AB47 sends it to JSR $FFD2 (which does nothing with the character), performs an AND #$FF, and exits. But that makes the BNE :PLOOP branch _not_ get taken, and the code erroneously moves forwards into... the code which executes the FOR statement! And the first thing FOR does is look for a valid variable. When you LIST a program, the character immediately following the LIST is a statement terminator (a colon : or else an end of line null). LET (which is called by FOR) reads this character, decides it's an invalid variable name, and generates a ?SYNTAX ERROR. Because QPLOP uses absolute addressing (LDA RESLST,Y), .Y can wrap around through 255 to traverse the list again. This is why shift-M shift-N etc. all generate valid keywords. Only shift-L will force an error, and it is all due to the zero in the keyword table. Similar things can happen in other BASICs. In BASIC 4.0, token $DB does the trick. In BASIC 1.0, $CB ought to do it. The problem seems to have been fixed in BASIC 7.0; at least the trick doesn't seem to work on a 128. Finally, like most things on the 64, embedding tokens into REM statements can naturally be put to some creative use. For example, RUN once ran a contest where readers submitted stories and riddles and such, which were read by LISTing the program. They were very clever and entertaining, and I close this summary with the one I've remembered since high school: 10 REM WHAT'S AN APPLECOSTA? 20 REM {C=-V}T A {INST CTRL-0}E ............................... The C=Hallenge ............................... Wait until next time! ................................ Side Hacking ................................ SuperCPU software repair ---------------------------> by S. L. Judd One of the feature articles in this issue deals with NTSC/PAL fixing. But have you ever thought about SCPU fixing? You know how it goes: you have that program that could really benefit from the speed boost, but doesn't work, and usually because of some silly little thing. Well, it really bugs me to have programs not like my nice hardware for dumb reasons, so I decided I would try my hand at fixing up some programs. The one that really did it for me was the game "Stunt Car Racer" -- I had never played it before, but after getting ahold of it it was clear that here was a game that would be just great with a SuperCPU. I had never done something like this before, but it seemed a doable problem and so I jumped in head first, and this article sums up my inexpert experience to date. By the way, stuntcar-scpu is totally cool :). To date I have fixed up just three games: Stunt Car Racer, Rescue on Fractalus, and Stellar 7. My goal was really to "CMD-fix" these programs, to make them run off of my FD-2000 as well as my SCPU. Although these are all games, the techniques should apply equally well to application programs with a bad attitude. Before discussing the fixes, it is probably worthwhile to discuss a few generalities. I also note that programmers who don't have a SuperCPU might find some of this information helpful in designing their programs to work with SCPUs. Finally, my fixes are available in the Fridge. Tools and Process ----------------- The tools I used were: o Action Replay o Wits o Paper for taking notes (backs of receipts/envelopes work) I think this is all that is necessary, although a good sector editor can come in handy for certain things. After trying a number of different approaches to the problem, the process I've settled on goes roughly like the following: - Have an idea of what will need fixing - Familiarize yourself with the program - Track down the things that need fixing - Figure out free areas of memory - Apply patches, and test Most programs work in more or less the same way: there are some initialization routines, there's a main loop, and there's an interrupt routine or series of routines. The interrupts are easy to find, via the vectors at either $FFFA or at $0314 and friends. The initialization routine can be tougher, but can be deduced from the loader or decompressor; also, some programs point the NMI vector to the reset code, so that RESTORE restarts the program. Finding the things that need fixing usually involves freezing the program at the appropriate time, and doing a little disassembly. Sometimes a hunt for things like LDA $DC01 is helpful, too. Figuring out free areas of memory is easy, by either getting a good feel for the program, or filling some target memory with a fill byte and then checking it later, to see if it was overwritten. Once the patch works on the 64, all that remains is to test it on the SCPU, and it's all done! Diagnosis --------- It seems to me that, at the fundamental level, the SCPU is different from a stock machine in three basic ways: it is a 65816, it runs at 20MHz, and it has hardware registers/different configurations. There are also some strange and mysterious problems that can arise. All possible opcodes are defined on the 65816, which means that "illegal" or quasi-opcodes will not work correctly. On the 65xx chips, the quasi-opcodes aren't real opcodes -- they are like little holes in the cpu, and things going through those holes fall through different parts of the normal opcode circuitry. Although used by very few programs, a number of copy protection schemes make use of them, so sometimes the program works fine with a SCPU but the copy protection makes it choke -- how very annoying (example: EA's Lords of Conquest). Naturally, disk-based protection methods mean it won't work on an FD-2000, either. Running at 20Mhz makes all sorts of problems. Any kind of software loop will run too fast -- delay loops, countdown loops, input busy-loops, etc. Also main program loops, so that the game runs unplayably fast (most 3D games never had to worry about being too fast). It can also lead to flickering screens, as we shall see later, and the "play" of some games is designed with 1Mhz in mind -- velocities, accelerations, etc. What looks smooth at the low frame rate might look poor at the high, as we shall also see later. Finally, fastloaders invariably fail at 20Mhz, like any other code using software-based timing. The SuperCPU also has a series of configuration registers located at $D07x and $D0Bx, which determine things like software speed and VIC optimization modes (which areas of memory are mirrored/copied to the C64's RAM). Note also that enabling hardware registers rearranges $E000 ROM routines. Although it is possible for programs to accidentally reconfigure the SCPU, it is awfully unlikely, since the enable register, which switches the hardware registers in, is sandwiched between disable registers: $D07D Hardware register disable $D07E Enable hardware registers $D07F Hardware register disable Strangely enough, though, different hardware configurations can sometimes cause problems. For example, newer (v2) SCPUs allow selective mirroring of the stack and zero page, and by default have that mirroring turned OFF. For some totally unknown reason, this caused major problems with an early attempt of mine to fix Stunt Car Racer -- I am told that the old version would slow down to just double-speed, flicker terribly, and more. Turning mirroring back on apparently fixes up the problem. (I have an older SCPU, and hence did not have this problem). So before going after a big fix, it is worthwhile to invest a few minutes in trying different configurations. Finally, there are other strange problems that can arise. For example, I have two 128s: one is a flat 128, one a 128D. With my 128D, if $D030 is set then the SCPU sometimes -- but not always -- freaks out and locks up. The flat 128 does not have this problem. One reason this is important is that many decompressors INC $D030 to enable 2MHz mode. A simple BIT ($2C) fixes this problem up, but the point is that the SCPU has to interact with the computer, so perhaps that interaction can lead to problems in obscure cases. Now, if the goal is to CMD-fix the program, there may be a few disk-related things that may need fixing. In addition to stripping out (or possibly fixing up) any fastloaders, most programs annoyingly assume drive #8 is the only drive in town. Also, if the program uses a track-based loader (instead of a file-based loader), then that will need to fixed up as well, and any disk-based copy protection will have to be removed. There's one other thing to consider, before you fix: is the program really busted? For example, if you've tried a chess program with the SCPU, chances are that you saw no speed improvement. Why not? It turns out that most chess programs use a timer-based search algorithm -- changing the playing strength changes the amount of time the program spends searching, and not the depth of the search. (The reason is to make the gameplay flow a little better -- otherwise you have very slow play at the beginning, when there are many more moves to consider). So although it might look like it isn't working right with the SCPU, it is actually working quite well. And that pretty much covers the basic ideas. The first program I fixed up was Stunt Car Racer. Stunt Car Racer --------------- Stunt Car Racer, in case you don't know, is a 3D driving game, and quite fun. It is also too fast, unplayably fast, at 20MHz. Therefore, it needs to be slowed down! My first plan... well, suffice to say that most of my original plans were doomed to failure, either from being a bad idea, or from poor implementation. It is clear enough that some sort of delay is needed, though, in the main loop, or perhaps by intercepting the joystick reading routine. The program has a main loop and an interrupt loop as well. The interrupt handles the display and other things, and all of the game calculations are done in the main loop, which flows like Do some calculations Draw buffer 1 Swap buffers Do some calculations Draw buffer 2 Swap buffers JMP loop One of my first thoughts was to intercept the joystick I/O, which is easy to find by hunting for LDA $DC01 (or DC00, whichever joystick is used). The patch failed, and possibly because I didn't check that the memory was safe, and possibly because it was in the interrupt routine (I simply don't remember). Before patching, it is very important to make sure that the patch will survive, and not interfere with the program, so it is very important to find an area of memory that is not used by the program. It took me a little while to figure this out! Finding unused memory was pretty easy -- I just filled the suspect areas with a fill byte, ran the program, and checked that memory. Mapping out the memory areas also aids in saving the file, as un-needed areas don't need to be saved, or can be cleared out to aid compression. The first free area of memory I found was at $C000. It turns out that this is a sprite, though, and so put some garbage on the screen. The second I tried was $8000, which worked great in practice mode but got overwritten in competition mode -- always test your patches thoroughly! (I had only tested in practice mode). Finally, I found a few little spots in low memory that survived, and placed the patch there. The program does a whole lot of memory moving, and uses nearly all memory. I also left some initialization code at $8000, since it only needed to be run once, at the beginning (to turn on mirroring in v2 SCPUs). Recall that the main loop has two parts -- one for buffer 1, and one for buffer 2. The trick is to find some code that is common to both sections, like a subroutine call: JSR SomeRoutine Draw buffer 1 JSR SomeRoutine Draw buffer 2 The patch routine I used was a simple delay loop, redirected from those two JSRs: LDX #$FF CPX $D012 BNE *-5 DEX CPX #$FC BNE *-10 JMP SomeRoutine Of course, this will also slow the program down at 1Mhz; later on I became smarter about my patches, but this one works pretty well. To save the game and patches, I simply froze it from AR. Just saving from the monitor generally failed; the initialization routine doesn't initialize all I/O settings. Part of the freezing process involves RLE compression, so if you freeze it is a good idea to fill all unused portions of memory -- temporary areas, bitmaps, etc. Another thing to do is to set a freeze point at the init routine, and then JMP there from the monitor. By clearing the screen, you won't have to look at all the usual freezer ugliness, and at this point freezing isn't any different than saving from the ML monitor and RLE-packing the file. Once saved, I tested a few times from the 64 side, to make sure things worked right. Whether freezing or saving from the monitor, if the file size is larger than 202 blocks or so, it can't be loaded on the SCPU without a special loader -- unless you compress it first. I naturally recommend using pu-crunch for that purpose, but if you want to do it on the 64 then I recommend using ABCrunch, which works well with the SCPU and gives about as good compression as you can get without an REU. The result was stuntcar-scpu, which is *awfully* fun when fixed. Rescue on Fractalus ------------------- Next on my list was Rescue on Fractalus, an older (and quite cool) Lucasfilm game that just didn't cut it in the 64 conversion, for a number of reasons (that perhaps could have been avoided). There are at least two versions of the game, one of which doesn't even work on a 128 (good 'ol $D030), but I have the older version, which does work. With a SuperCPU, though, there are a number of problems. The display flickers terribly. The gameplay is smooth and not at all too fast -- in fact, it is too slow. Specifically, the velocities and turning rates and such do not give a convincing illusion of speed or excitement. The game is copy- protected and uses a track-based fastloader, loaded from disk via B-E, which also saves the high scores to disk. Clearly, this one is a bigger job: the display is too fast, the game constants need adjusting, and the highscore code needs to be replaced by some kernal calls. The structure of this code is a little different. The main loop handles the (double-buffered) display -- it does all the calculations and draws to the two buffers. The multi-part interrupt loop does the rest -- it swaps buffers, changes the display in different parts of the screen, reads the joystick, and performs the game updates which change your position and direction. It also handles enemies such as saucers, but doesn't handle the bunkers which fire at you from the mountains (the main loop takes care of those). What does all this mean? First, that the game can be a good ten steps ahead of the screen, which makes things like targeting very difficult. Second, that the bunkers almost never fire at you at 1MHz (they go crazy at 20). Third, that things like velocity and turning rate are rather low, because advancing or turning too quickly would get the game way out of sync (unfortunately, they are still too fast for 1MHz, making targeting difficult and movement clunky). On the other hand, having the movement in the interrupt is the reason that the game does not become unplayably fast at 20MHz, and means that something besides a delay loop is needed. The interrupt swaps buffers, but the main loop draws them, and because it draws so quickly it can start clearing and drawing to the visible buffer. To make sure this was what I was seeing, I reversed the buffer swap code in the interrupt, so that the drawing buffer was always on-screen. Sure enough, that's what the 20Mhz version looked like. It turned out to be pretty easy to force the main loop to wait on the interrupt. Although I messed around (unsuccessfully) with intercepting the interrupt loop, the buffer swap code actually modifies a zero-page variable upon swapping. So all the main loop has to do is wait on that variable before charging ahead. I may have made it wait for two frames, because it made the game play a little better. Now, how to find the velocity and turn code? Well it takes a keypress to change the velocity, so by hunting for LDA $DC01, and tracing back, the routine can be found; at the very least the affected variables may be found, and hunted for. For example, if the result is stored in $D0, then you can search for LDA $D0. The point is to locate the keypress processing code. From there, a little trial and error (setting freeze points and pressing the velocity key) locates the piece of code which deals with changing the velocity, and in particular which variable corresponds to velocity. Finally, from there it just takes another hunt for LDA velocity, ADC velocity, etc. to figure out where the code for updating position and direction is. In this case, I was pretty sure I had found it, as it went something like LDA velocity LSR LSR ADC #$20 and this was added to the position. To check that this was the code, I just changed the ADC, or removed an LSR, to see that the speed changed. The code for turning left and right and moving up and down was similar, and again after a little trial and error it was clear what code did what. Again, it wasn't necessary to totally understand how these routines worked exactly -- just the general idea of them, in this case to see that a multiple of the velocity was used to change the position and orientation of the player. So, to fix it up, I just changed that multiple -- probably I NOPed out an LSR above, to basically double the speed, and changed the turning rates similarly. This took a little experimentation, as it not only needed to be playably fast, but also couldn't overflow at high speeds, etc. But once that was working, all that remained was the highscore table. Finding the table location was pretty easy -- I just got a high score, and while entering my name froze the program, and figured out what got stored where. From there it was pretty easy to figure out what was saved to disk. From the original loader, I also knew where the highscores needed to be loaded to initially (the highscore table gets copied around a lot -- it doesn't just stay at that one location). Figuring out the exact number of bytes to save took a little bit of effort (saving either too many or too few bytes screws it up), but from there it was clear what memory needed to be saved. So all that remained was to add the usual SETLFS etc. kernal calls, right? Wrong. The program uses all the usual kernal variables (from $90-$C0) for its own purposes. Also recall that I wanted the program to work with device 9, etc. To get around this, I did two things. First, when the program first starts, I save some of the relevant variables to an unused part of memory -- in particular, I save the current drive number. Second, before saving the highscore file, I actually copy all zero page variables from $90-$C2 or so to a temporary location, and then copy them back after saving. That way there are no worries about altering important location