########
             ##################
         ######            ######
      #####
    #####  ####  ####      ##      #####   ####  ####  ####  ####  ####   #####
  #####    ##    ##      ####    ##   ##   ##  ###     ##    ####  ##   ##   ##
 #####    ########     ##  ##   ##        #####       ##    ## ## ##   ##
#####    ##    ##    ########  ##   ##   ##  ###     ##    ##  ####   ##   ##
#####  ####  ####  ####  ####  #####   ####  ####  ####  ####  ####   ######
#####                                                                    ##
 ######            ######           Issue #17
   ##################            November 15, 1998
       ########


...............................................................................

	"The words of the wise are as goads, and as nails fastened by
	 the masters of assemblies."  Ecclesiastes 12

	"Before criticizing a man I try to walk a mile in his shoes.  That
	 way, if he gets mad he's a mile away and barefoot."  John Ianetta

...............................................................................

BSOUT

	For me, fall is a time for reflection.  The trees descend into their 
golden time and then seem to die.  And yet, under the surface they are quite 
alive, and teeming with activity at a smaller, less-visible scale, waiting to 
burst forwards again in full bloom.  I think there's a great metaphor for C= 
in this.  But I have no idea what it is.
	In fact things are totally hectic around here, and I haven't given
more than a few moments thought towards the 64, so this will be a mighty
short editorial.  Between a PhD thesis and begging for jobs there hasn't been
much 64-time, but with a little here and a little there this issue is finally
squeaking out.  Everybody worked hard over the summer, and my goal was
to get it out in September.  Well, you know, these days, if something in
the 64 world is only two months late it's doing pretty good, so no big whoop.
C=Hacking ought to appear reguarly after December, though.
	On the down side, some things, such as the challenge problem from
last time, will have to wait until the next issue.  I also stayed pretty
low-key while putting this issue together, but future issues will be more 
public in soliciting articles (e.g. on comp.sys.cbm).
	In other news, a C64 C-compiler has finally appeared!  This outstanding
effort is the work of Ullrich von Bassewitz, uz@musoftware.de, the force 
behind Elite128 among other things.  The cc65 webpage is at

        http://www.von-bassewitz.de/uz/cc65/

so have a look, and tell Ullrich what a stellar guy he is :).  Also, as
most of you know, the Chicago Expo took place on October 24, and was a real
hoot!  Check out 

	http://driven.c64.org/

for some nice pictures from the Expo, taken by Mark Seelye.

	Meanwhile, sit back, relax, and enjoy these latest musings from 
these, our masters of assembly.

.......
....
..
.                                    C=H #17

::::::::::::::::::::::::::::::::::: Contents ::::::::::::::::::::::::::::::::::

BSOUT
	o Voluminous ruminations from your unfettered editor.


Jiffies
	o Is it a bug or a feature?


The C=Hallenge

	o To be continued...


Side Hacking

	o "SuperCPU Software Repair", by S. Judd <sjudd@nwu.edu>.
	   An amateur's excursion into correcting errant wares.


Main Articles

	o "An Optimizing Hybrid LZ77 RLE Data Compression Program, aka
	  Improving Compression Ratio for Low-Resource Decompression",
	  by Pasi 'Albert' Ojala <albert@cs.tut.fi>

	  Part two of a two-part article on data compression, giving a
	  detailed description of the compression algorithms used in 
	  pucrunch, not to mention the decompression code.

	o "VIC-20 Kernel ROM Disassembly Project", by Richard Cini
	  <rcini@email.msn.com>

	  This is the first in a series of articles which aims to
	  present a complete, commented disassembly of the VIC-20
	  ROMs.

	o Masters Class: "NTSC/PAL fixing, part I", by Russell Reed 
	  <rreed@egypt.org>, Robin Harbron <macbeth@tbaytel.net>, and S. Judd.

	  Sit up straight and pay attention.  In the Masters Class, a
	  Commodore luminary attempts to instruct a couple of ignorant
	  plebians in his art.  In this case, Robin and I set out to
	  learn NTSC/PAL fixing from one of the greats, Decomp/Style.
	  Our first fix, a demo from the obscure Finnish group Pu-239,
	  is included, along with detailed descriptions of our experiences.

	o "The Herd Mentality", by Bil Herd <bherd@zeus.jersey.net>

	  This is a collection of entertaining musings on Commodore and the 
	  development of the C128, as provided by Bil Herd (and that's no
	  bull).  If you don't know who Bil Herd _is_, why not type
	  SYS 32800,123,45,6 on a 128 sometime...

	  
.................................. Credits ...................................

Editor, The Big Kahuna, The Car'a'carn..... Stephen L. Judd
C=Hacking logo by.......................... Mark Lawrence

Special thanks to Marko Makela, Olaf Seibert, and the rest of the cbm-hackers
for their many otherwise unacknowledged contributions.

Legal disclaimer:
	1) If you screw it up it's your own damn fault!  
	2) If you use someone's stuff without permission you're a dork!

For information on the mailing list, ftp and web sites, send some email
to chacking-info@jbrain.com.

About the authors:

Pasi 'Albert' Ojala is a 29 year old software engineer, currently
working at a VLSI design company on a RISC DSP core C compiler.
Around 1984 a friend introduced him to the VIC-20, and a couple
of years later he bought a 64+1541 to replace a broken Spectrum48K.
He began writing his own BBS, using ML routines for speed, and
later wrote a series of demos under the Pu-239 label.  In addition
to pucrunch and his many C=Hacking articles, Pasi was written an Amiga
1581 filesystem, a graphics conversion package, a C64 burstloader, and a 
number of demos.  He is currently uses his 64 for hobbyist pursuits, and
is contemplating multipart demos for the 64 and the VIC-20, in addition
to future C=Hacking articles.  Pasi is also a huge Babylon-5 fan, and
has a B5 quote page at http://www.cs.tut.fi/~albert/Quotes/B5-quotes.html

Richard Cini is a 31 year old senior loan underwriter who first became
involved with Commodore 8-bits in 1981, when his parents bought him a
VIC-20 as a birthday present.  Mostly he used it for general BASIC
programming, with some ML later on, for projects such as controlling
the lawn sprinkler system, and for a text-to-speech synthesyzer.  All 
his CBM stuff is packed up right now, along with his other "classic" 
computers, including a PDP11/34 and a KIM-1.  In addition to collecting
old computers Richard enjoys gardening, golf, and recently has gotten
interested in robotics.  As to the C= community, he feels that it
is unique in being fiercely loyal without being evangelical, unlike
some other communities, while being extremely creative in making the 
best use out of the 64.

Robin Harbron is a 26 year old internet tech support at a local
independent phone company.  He first got involved with C= 8-bits
in 1980, playing with school PETs, and in 1983 his parents convinced
him to spend the extra money on a C64 instead of getting a VIC-20.
Like most of us he played a lot of games, typed in games out of
magazines, and tried to write his own games.  Now he writes demos,
dabbles with Internet stuff, writes C= magazine articles, and, yes,
plays games.  He is currently working on a few demos and a few games,
as well as the "in-progress-but-sometimes-stalled-for-a-real-long-time-
until-inspiration-hits-again Internet stuff".  He is also working on
raising a family, and enjoys music (particularly playing bass and guitar), 
church, ice hockey and cricket, and classic video games.


................................... Jiffies ..................................


0 REM SHIFT-L
by the cbm.hackers

	Everybody knows the old REM shift-L trick in BASIC 2.0, which
generates a syntax error upon listing.  But why does it work?  The
answer turns out to be quite interesting.
	Normally, when the BASIC interpreter tokenizes a line it strips 
out characters with the high bit set.  One exception is characters within
quotes; the other is characters after REM.  In those cases, characters
are embedded literally into the program line.
	Now, BASIC tokens all have the high bit set.  When LIST
encounters a character with the high bit set, it checks whether
it is in quote mode.  If it is, the character is outputted as normal.
If not, the character is treated as a token, which is expanded by
QPLOP (located at $A717).  The part of QPLOP which prints keywords looks 
like this:

:LOOP1   DEX              ;Traverse the keyword table
         BEQ :PLOOP
:LOOP2   INY              ;read a keyword
         LDA RESLST,Y
         BPL :LOOP2
         BMI :LOOP1
:PLOOP   INY              ;Print out the keyword
         LDA RESLST,Y
         BMI LISTENT1     ;Exit if on last char
         JSR $AB47        ;Print char, AND #$FF
         BNE :PLOOP

The keyword strings in RESLST all stored dextral character inverted
(the last char has the high bit set), and the above code just moves 
forward through the table until it has counted out .X keywords.  At
that point, :PLOOP prints out the next keyword to the screen, and
hops back into LIST.

	Shift-L is character code 204, or $CC.  When LIST sees this
inside of a REM, it sees the high bit set and de-tokenizes it.
As it turns out, though, the last token is $CB, which is keyword GO
(so that GO TO works).  It also turns out that RESLST, the list of
BASIC keywords, uses 255 characters.  The 256th character is zero
(value zero, not character zero).
	So, the above code goes through the list, and then prints
out token $CC, the first character of which is a null.  This zero
is sent to $AB47.  $AB47 sends it to JSR $FFD2 (which does nothing
with the character), performs an AND #$FF, and exits.  But that makes 
the BNE :PLOOP branch _not_ get taken, and the code erroneously moves 
forwards into...  the code which executes the FOR statement!
	And the first thing FOR does is look for a valid variable.
When you LIST a program, the character immediately following the
LIST is a statement terminator (a colon : or else an end of line null).
LET (which is called by FOR) reads this character, decides it's an
invalid variable name, and generates a ?SYNTAX  ERROR.
	Because QPLOP uses absolute addressing (LDA RESLST,Y), .Y
can wrap around through 255 to traverse the list again.  This is
why shift-M shift-N etc. all generate valid keywords.  Only shift-L
will force an error, and it is all due to the zero in the keyword
table.
	Similar things can happen in other BASICs.  In BASIC 4.0,
token $DB does the trick.  In BASIC 1.0, $CB ought to do it.
The problem seems to have been fixed in BASIC 7.0; at least the
trick doesn't seem to work on a 128.

	Finally, like most things on the 64, embedding tokens into
REM statements can naturally be put to some creative use.  For example,
RUN once ran a contest where readers submitted stories and riddles
and such, which were read by LISTing the program.  They were very clever
and entertaining, and I close this summary with the one I've remembered 
since high school:

10 REM WHAT'S AN APPLECOSTA?
20 REM {C=-V}T A {INST CTRL-0}E


............................... The C=Hallenge ...............................

Wait until next time!



................................ Side Hacking ................................

		SuperCPU software repair
		---------------------------> by S. L. Judd

	One of the feature articles in this issue deals with NTSC/PAL
fixing.  But have you ever thought about SCPU fixing?  You know how
it goes: you have that program that could really benefit from
the speed boost, but doesn't work, and usually because of some silly
little thing.
	Well, it really bugs me to have programs not like my nice
hardware for dumb reasons, so I decided I would try my hand at fixing 
up some programs.  The one that really did it for me was the game 
"Stunt Car Racer" -- I had never played it before, but after getting 
ahold of it it was clear that here was a game that would be just great 
with a SuperCPU.  I had never done something like this before, but it 
seemed a doable problem and so I jumped in head first, and this article
sums up my inexpert experience to date.

	By the way, stuntcar-scpu is totally cool :).

	To date I have fixed up just three games: Stunt Car Racer, Rescue
on Fractalus, and Stellar 7.  My goal was really to "CMD-fix" these programs,
to make them run off of my FD-2000 as well as my SCPU.  Although these are 
all games, the techniques should apply equally well to application programs
with a bad attitude.  Before discussing the fixes, it is probably worthwhile
to discuss a few generalities.

	I also note that programmers who don't have a SuperCPU might find
some of this information helpful in designing their programs to work with
SCPUs.

	Finally, my fixes are available in the Fridge.

Tools and Process
-----------------

The tools I used were:

  o Action Replay
  o Wits
  o Paper for taking notes (backs of receipts/envelopes work)

I think this is all that is necessary, although a good sector editor
can come in handy for certain things.

After trying a number of different approaches to the problem, the process
I've settled on goes roughly like the following:

	- Have an idea of what will need fixing
	- Familiarize yourself with the program
	- Track down the things that need fixing
	- Figure out free areas of memory
	- Apply patches, and test

	Most programs work in more or less the same way: there are
some initialization routines, there's a main loop, and there's an 
interrupt routine or series of routines.  The interrupts are easy to 
find, via the vectors at either $FFFA or at $0314 and friends.  The 
initialization routine can be tougher, but can be deduced from 
the loader or decompressor; also, some programs point the NMI vector to 
the reset code, so that RESTORE restarts the program.  Finding the
things that need fixing usually involves freezing the program at the
appropriate time, and doing a little disassembly.  Sometimes a hunt for
things like LDA $DC01 is helpful, too.  Figuring out free areas of
memory is easy, by either getting a good feel for the program, or
filling some target memory with a fill byte and then checking it
later, to see if it was overwritten.  Once the patch works on the 64,
all that remains is to test it on the SCPU, and it's all done!

Diagnosis
---------

	It seems to me that, at the fundamental level, the SCPU is different 
from a stock machine in three basic ways: it is a 65816, it runs at 20MHz, 
and it has hardware registers/different configurations.  There are also 
some strange and mysterious problems that can arise.

	All possible opcodes are defined on the 65816, which means that
"illegal" or quasi-opcodes will not work correctly.  On the 65xx chips, the 
quasi-opcodes aren't real opcodes -- they are like little holes in the cpu, 
and things going through those holes fall through different parts of the 
normal opcode circuitry.  Although used by very few programs, a number of 
copy protection schemes make use of them, so sometimes the program works fine
with a SCPU but the copy protection makes it choke -- how very annoying 
(example: EA's Lords of Conquest).  Naturally, disk-based protection methods 
mean it won't work on an FD-2000, either.

	Running at 20Mhz makes all sorts of problems.  Any kind of software
loop will run too fast -- delay loops, countdown loops, input busy-loops,
etc.  Also main program loops, so that the game runs unplayably fast
(most 3D games never had to worry about being too fast).  It can also
lead to flickering screens, as we shall see later, and the "play" of some 
games is designed with 1Mhz in mind -- velocities, accelerations, etc.  
What looks smooth at the low frame rate might look poor at the high, as we 
shall also see later.  Finally, fastloaders invariably fail at 20Mhz,
like any other code using software-based timing.

	The SuperCPU also has a series of configuration registers located
at $D07x and $D0Bx, which determine things like software speed and VIC
optimization modes (which areas of memory are mirrored/copied to the C64's 
RAM).  Note also that enabling hardware registers rearranges $E000 ROM 
routines.  Although it is possible for programs to accidentally reconfigure
the SCPU, it is awfully unlikely, since the enable register, which switches
the hardware registers in, is sandwiched between disable registers:

	$D07D	Hardware register disable
	$D07E	Enable hardware registers
	$D07F	Hardware register disable

Strangely enough, though, different hardware configurations can sometimes 
cause problems.  For example, newer (v2) SCPUs allow selective mirroring of 
the stack and zero page, and by default have that mirroring turned OFF.  
For some totally unknown reason, this caused major problems with an early 
attempt of mine to fix Stunt Car Racer -- I am told that the old version 
would slow down to just double-speed, flicker terribly, and more.  Turning 
mirroring back on apparently fixes up the problem.  (I have an older SCPU, 
and hence did not have this problem).  So before going after a big fix, it 
is worthwhile to invest a few minutes in trying different configurations.

	Finally, there are other strange problems that can arise.  For
example, I have two 128s: one is a flat 128, one a 128D.  With my 128D, 
if $D030 is set then the SCPU sometimes -- but not always -- freaks out 
and locks up.  The flat 128 does not have this problem.  One reason this 
is important is that many decompressors INC $D030 to enable 2MHz mode.
A simple BIT ($2C) fixes this problem up, but the point is that the SCPU has 
to interact with the computer, so perhaps that interaction can lead to 
problems in obscure cases.

	Now, if the goal is to CMD-fix the program, there may be a few
disk-related things that may need fixing.  In addition to stripping out
(or possibly fixing up) any fastloaders, most programs annoyingly assume
drive #8 is the only drive in town.  Also, if the program uses a track-based 
loader (instead of a file-based loader), then that will need to fixed up
as well, and any disk-based copy protection will have to be removed.

	There's one other thing to consider, before you fix: is the
program really busted?  For example, if you've tried a chess program
with the SCPU, chances are that you saw no speed improvement.  Why
not?  It turns out that most chess programs use a timer-based search
algorithm -- changing the playing strength changes the amount of
time the program spends searching, and not the depth of the search.
(The reason is to make the gameplay flow a little better -- otherwise
you have very slow play at the beginning, when there are many more
moves to consider).  So although it might look like it isn't working
right with the SCPU, it is actually working quite well.

	And that pretty much covers the basic ideas.  The first program
I fixed up was Stunt Car Racer.

Stunt Car Racer
---------------

	Stunt Car Racer, in case you don't know, is a 3D driving game,
and quite fun.  It is also too fast, unplayably fast, at 20MHz.  Therefore, 
it needs to be slowed down!  
	My first plan... well, suffice to say that most of my original
plans were doomed to failure, either from being a bad idea, or from
poor implementation.  It is clear enough that some sort of delay is
needed, though, in the main loop, or perhaps by intercepting the joystick
reading routine.
	The program has a main loop and an interrupt loop as well.
The interrupt handles the display and other things, and all of the
game calculations are done in the main loop, which flows like

	Do some calculations
	Draw buffer 1
	Swap buffers
	Do some calculations
	Draw buffer 2
	Swap buffers
	JMP loop

One of my first thoughts was to intercept the joystick I/O, which is
easy to find by hunting for LDA $DC01 (or DC00, whichever joystick
is used).  The patch failed, and possibly because I didn't check that 
the memory was safe, and possibly because it was in the interrupt routine
(I simply don't remember).
	Before patching, it is very important to make sure that the
patch will survive, and not interfere with the program, so it is
very important to find an area of memory that is not used by the
program.  It took me a little while to figure this out!  Finding
unused memory was pretty easy -- I just filled the suspect areas with
a fill byte, ran the program, and checked that memory.  Mapping out the 
memory areas also aids in saving the file, as un-needed areas don't
need to be saved, or can be cleared out to aid compression.
	The first free area of memory I found was at $C000.  It turns
out that this is a sprite, though, and so put some garbage on the
screen.  The second I tried was $8000, which worked great in practice
mode but got overwritten in competition mode -- always test your
patches thoroughly!  (I had only tested in practice mode).  Finally,
I found a few little spots in low memory that survived, and placed the 
patch there.  The program does a whole lot of memory moving, and uses 
nearly all memory.  I also left some initialization code at $8000, since
it only needed to be run once, at the beginning (to turn on mirroring 
in v2 SCPUs).
	Recall that the main loop has two parts -- one for buffer 1, and 
one for buffer 2.  The trick is to find some code that is common to both 
sections, like a subroutine call:

	JSR SomeRoutine
	Draw buffer 1
	JSR SomeRoutine
	Draw buffer 2

The patch routine I used was a simple delay loop, redirected from those
two JSRs:

	LDX #$FF
	CPX $D012
	BNE *-5
	DEX
	CPX #$FC
	BNE *-10
	JMP SomeRoutine

Of course, this will also slow the program down at 1Mhz; later on I became
smarter about my patches, but this one works pretty well.
	To save the game and patches, I simply froze it from AR.  Just
saving from the monitor generally failed; the initialization routine
doesn't initialize all I/O settings.  Part of the freezing process
involves RLE compression, so if you freeze it is a good idea to
fill all unused portions of memory -- temporary areas, bitmaps, etc.
Another thing to do is to set a freeze point at the init routine,
and then JMP there from the monitor.  By clearing the screen, you
won't have to look at all the usual freezer ugliness, and at this
point freezing isn't any different than saving from the ML monitor
and RLE-packing the file.  Once saved, I tested a few times from the
64 side, to make sure things worked right.

	Whether freezing or saving from the monitor, if the file size
is larger than 202 blocks or so, it can't be loaded on the SCPU without
a special loader -- unless you compress it first.  I naturally recommend
using pu-crunch for that purpose, but if you want to do it on the 64
then I recommend using ABCrunch, which works well with the SCPU and
gives about as good compression as you can get without an REU.

	The result was stuntcar-scpu, which is *awfully* fun when fixed.


Rescue on Fractalus
-------------------

	Next on my list was Rescue on Fractalus, an older (and quite cool) 
Lucasfilm game that just didn't cut it in the 64 conversion, for a number
of reasons (that perhaps could have been avoided).  There are at least two
versions of the game, one of which doesn't even work on a 128 (good 'ol $D030),
but I have the older version, which does work.

	With a SuperCPU, though, there are a number of problems.  The display
flickers terribly.  The gameplay is smooth and not at all too fast -- in fact,
it is too slow.  Specifically, the velocities and turning rates and such do
not give a convincing illusion of speed or excitement.  The game is copy-
protected and uses a track-based fastloader, loaded from disk via B-E, which
also saves the high scores to disk.  Clearly, this one is a bigger job: the 
display is too fast, the game constants need adjusting, and the highscore code
needs to be replaced by some kernal calls.

	The structure of this code is a little different.  The main loop
handles the (double-buffered) display -- it does all the calculations and 
draws to the two buffers.  The multi-part interrupt loop does the rest -- 
it swaps buffers, changes the display in different parts of the screen,
reads the joystick, and performs the game updates which change your
position and direction.  It also handles enemies such as saucers, but
doesn't handle the bunkers which fire at you from the mountains (the main
loop takes care of those).
	What does all this mean?  First, that the game can be a good ten
steps ahead of the screen, which makes things like targeting very
difficult.  Second, that the bunkers almost never fire at you at 1MHz
(they go crazy at 20).  Third, that things like velocity and turning
rate are rather low, because advancing or turning too quickly would
get the game way out of sync (unfortunately, they are still too fast
for 1MHz, making targeting difficult and movement clunky).  On the
other hand, having the movement in the interrupt is the reason that
the game does not become unplayably fast at 20MHz, and means that
something besides a delay loop is needed.
	The interrupt swaps buffers, but the main loop draws them,
and because it draws so quickly it can start clearing and drawing to 
the visible buffer.  To make sure this was what I was seeing, I reversed
the buffer swap code in the interrupt, so that the drawing buffer was
always on-screen.  Sure enough, that's what the 20Mhz version looked
like.
	It turned out to be pretty easy to force the main loop to wait
on the interrupt.  Although I messed around (unsuccessfully) with
intercepting the interrupt loop, the buffer swap code actually
modifies a zero-page variable upon swapping.  So all the main loop
has to do is wait on that variable before charging ahead.  I may have
made it wait for two frames, because it made the game play a little
better.

	Now, how to find the velocity and turn code?  Well it takes
a keypress to change the velocity, so by hunting for LDA $DC01, and
tracing back, the routine can be found; at the very least the
affected variables may be found, and hunted for.  For example, if
the result is stored in $D0, then you can search for LDA $D0.  The
point is to locate the keypress processing code.  From there, a little
trial and error (setting freeze points and pressing the velocity key)
locates the piece of code which deals with changing the velocity, and 
in particular which variable corresponds to velocity.  Finally, from 
there it just takes another hunt for LDA velocity, ADC velocity, etc.
to figure out where the code for updating position and direction is.
	In this case, I was pretty sure I had found it, as it went
something like

	LDA velocity
	LSR
	LSR
	ADC #$20

and this was added to the position.  To check that this was the code,
I just changed the ADC, or removed an LSR, to see that the speed changed.
The code for turning left and right and moving up and down was similar,
and again after a little trial and error it was clear what code did
what.  Again, it wasn't necessary to totally understand how these
routines worked exactly -- just the general idea of them, in this case
to see that a multiple of the velocity was used to change the position
and orientation of the player.
	So, to fix it up, I just changed that multiple -- probably I
NOPed out an LSR above, to basically double the speed, and changed the 
turning rates similarly.  This took a little experimentation, as it
not only needed to be playably fast, but also couldn't overflow at
high speeds, etc.

	But once that was working, all that remained was the highscore
table.  Finding the table location was pretty easy -- I just got a high
score, and while entering my name froze the program, and figured out
what got stored where.  From there it was pretty easy to figure out
what was saved to disk.  From the original loader, I also knew where
the highscores needed to be loaded to initially (the highscore table
gets copied around a lot -- it doesn't just stay at that one location).
Figuring out the exact number of bytes to save took a little bit of
effort (saving either too many or too few bytes screws it up), but
from there it was clear what memory needed to be saved.
	So all that remained was to add the usual SETLFS etc. kernal
calls, right?  Wrong.  The program uses all the usual kernal variables
(from $90-$C0) for its own purposes.  Also recall that I wanted the
program to work with device 9, etc.  To get around this, I did two
things.  First, when the program first starts, I save some of the
relevant variables to an unused part of memory -- in particular, I
save the current drive number.  Second, before saving the highscore
file, I actually copy all zero page variables from $90-$C2 or so
to a temporary location, and then copy them back after saving.
That way there are no worries about altering important location