*sigh* ….

What baffles me most about the reactions to our Farbrausch source code release is how many people are evidently stuck in the past. Most discussions in the forums that the link went through revolve around Second Reality or how people did stuff on 8bit machines or how much more skillful you needed to be in the DOS days compared to nowadays where “gigabytes of OS and drivers do everything for you anyway”.

Yeah, right. Frankly: The scene didn’t cease to exist after 1993, it’s alive and kicking. It’s only YOU who got old and boring. Deal with it.

Killer Loop O.S.T.

Continuing with the “digging out old stuff that I’ve made” theme (bear with me, my hidden treasure chamber isn’t too big :) )… did you know the PS3 plays most old PlayStation1 games flawlessly? We didn’t. Until we somehow wound up playing the first game that, among others, most of the other Farbrausch guys (Chaos, Fiver2 et al) and I ever created. And lo and behold, it still was fun to play :)

Anyway. A few emails to a few companies later I got permission to publish the soundtrack that I made for people’s listening pleasure. So, without further ado, here it is:

Killer Loop O.S.T.

All of the songs have been treated with some remastering; my skills plus 1999 homestudio tech just weren’t too good back then. Luckily some things DO get better with time. :)


At some point in 2007 somebody told me how the Amiga’s sound chip, Paula, was able to modify a sound’s volume digitally without using multiplication – dedicated circuits for that would have been prohibitively expensive for a home computer in 1984: It simply has a 6-bit counter per voice that’s incremented every cycle and if its value is above the set volume, the voice is silenced for that cycle. So effectively it’s PWM with a pulse frequency of about 50Khz.

“But wait, shouldn’t that color the sound, ring modulation artifacts and such?” I thought. The answer is of course a resounding no (also all artifacts introduced by the PWM are outside the audible range) but that didn’t stop me from trying to emulate a Paula voice at the full 3.5MHz and then filtering it down to find out how it sounds.

Yeah well, and while we’re at it, let’s see if we can hack up a simple .MOD player too without using anything a 68000 didn’t have to offer (multiplications and such). Because coding for a couple of hours and the only being able to play a single waveform is boring.

Another few hours later there was one additional never-to-be-published toy project on my HD that way able to play a few MOD files that I liked, and that was about to be abandoned… if it hadn’t been for a thread on pouet.net where somebody was asking for a module player source. And I just came home from a party and was ever so slightly inebriated, so I just pasted the source code there. A discussion spawned, I cleaned up the code a bit and fixed some replay errors, and so here it is, released into the public domain for everyone to enjoy or laugh at:


Just add sound output. Have fun :)


Test post

Yes, this is certainly a test.


The Workings of FR-08′s Sound System, Part IV

Part 4: Let’s talk about synthesizers.

Ok, I know that this is, uh, LATE. Most of you will have been waiting for the continuation of this article series for more than a year, and most of those most of you may already have given up or coded an own synthesizer without my help or with my help in one of the thousands of e-mail messages I got and tried to reply.

For those who are still waiting, let’s begin…. no, NOT coding a synth. These are not the NeHe tutorials, these are tutorials for people knowing how to code and people knowing what a synthesizer might be. Sorry. Let’s instead begin with one of the most feared and hated things when coding: THINKING about what we’re going to do next.

Oh, and if you were really waiting for the last year, just reread the previous articles, there are some references to them below :)

4.1 – The synth interface

So, first, we should clarify the interface between the audio/MIDI system and the synth itself. I decided to have the following base function set:


  • void synthInit(void *patchdata)
    initialises the synthesizer, generates necessary tables, etc (if your synthesizer implementation is a class, you will make this the constructor)
  • void synthReset()
    resets the synthesizer to an initial state, with eg. all voices turned off.
  • void synthProcessMIDI(const unsigned char *buf, const int length)
    sends a buffer of MIDI data to the synthesizer for processing
  • void synthRender(float *destbuf, const int length)
    lets the synthesizer render a specified amount of samples into your destination buffer.

As I didn’t do any dynamic memory allocations, a deinit/close function or a destructor weren’t necessary.)

All controlling of the synthesizer functions is done via MIDI commands. This can go as far as sending the patch bank over via a SysEx stream, but I preferred giving a pointer to the patch bank memory in the synthInit() function. Remind that if you do this, the synth should react to changes in the patch memory instantly, so your sound editing software can simply link a slider value to a memory address and the synth will follow if you twist your knobs while editing sounds. Or do it the „hard way“ and send all patch changes via MIDI controllers or SysEx commands – it’s up to you :)

The synchronisation of MIDI commands and sample output is done by my sound system’s render() loop in the following way:

While still samples to render

  • Check how many samples left until the next MIDI event occurs
  • Render as many samples as possible (to next event or end of buffer)
  • If MIDI events are due, process them

The synth editing app, though, just had a loop that rendered 256 samples and then sent whatever arrived at my MIDI in port in the mean time to the synth. This led to 6ms latency jittering and is generally considered „inprecise as shit“, but as said, I only had two days for the whole GUI app, and it was enough for me. As soon as I find out a better way for real-time processing, I’ll let you know.

Update (…”I’ll let you know”): VST2 instrument plugins rock ;)

4.2 – The synth’s structure

So, what happens now if synthRender() is called? Well, let’s have a look at the following ‘1337 45C11 4R7 picture:

/-------\ | | | | /-------\ /----\ /--------\ /----\ | |add |Channel| |Chan| add |Main mix| |Post| /---\ | VOICE |===>|Buffer |=>| FX |==========>| buffer |==>|FX |==>|OUT| | POOL | \-------/ \----/ || \--------/ \----/ \---/ | | \_________________/ || /\ | | 16 Channels \/ ||add | | /-------\ /-------\ \-------/ | AUX | | AUX | 32 vces |buffers|=>|effects| \-------/ \-------/ \_____________________/ 2x

First, we have the voice pool (a detailed description will come later). For each played note, one physical voice is allocated from the pool and assigned to the MIDI channel the note is played on. Now, we accumulate all voices for each channel in a buffer which we call „channel buffer“, and apply the per-channel effects (such as distortion or chorus) to it. This channel buffer is then added to the main mix buffer and an arbitrary number of AUX buffers (just like the AUX sends/returns of a mixing console), which then are processed with other effects (such as delay and reverb) and mixed to the main mix buffer again. And after we’ve processed this main mix buffer with optional post processing effects (such as compressor or EQs), we have our final signal which we can give back to the sound system.

I’ve just mentioned the word mixing. This may be the wrong place, but I will most likely repeat the following thoroughly capitalized sentence anyway:


This means, if you’ve got three voices, never ever try silly things like dividing the resulting signal by three, or advanced versions like „I heard that the volume increases logarithmically, so I’ll do something like dividing by ln(voices).“ or similar sh*t. “But”, you’ll say, “if you mix two voices at full volume, your resulting signal will be too loud” – this is of course true, but avoiding signal clipping should be completely in the hand of the person doing the mix (read: the musician), not of a program changing volumes around in a completely unreasonable way. So, DON’T make all voices full volume, but a reasonable volume.

Imagine a string quartet. The violin player starts to play his theme, and after a few bars, the cello player also starts. Does the violin get more silent because there’s a cello sound now? Or does the violin get louder as soon as the other instrumentalists leave the stage? Think.

Imagine a mixing desk with only one channel turned on. If you turn a second channel up, does the first one get softer or does it stay at its current volume?

If you’re afraid of clipping, and you know you have several voices, keep the master volume down to a reasonable fixed maximum which won’t produce clipping too early. And then, leave the rest to the musician. But DON’T decrease the gain of your channels as soon as they get more. No. No no no no no no no. Seriously. No.

Ok. Back to some more constructive things: a pseudo code version of the synthRender() function

  • clear main mix and AUX buffers
  • for all 16 channels do
    • clear channel buffer
    • for all voices assigned to the channel do
      • render voice and add it to the channel buffer
    • apply channel effects to channel buffer
    • add channel buffer to main mix and AUX buffers with the according volumes
  • process AUX buffers with effects and add them to the mix buffer
  • apply post processing effects to the mix buffer
  • copy/clip the mix buffer to the output buffer (or don’t, if your buffer architecture allows it)

(please remind that though in the ASCII sketch above there’s a “16 channels” below the channel buffer, there’s actually only one channel buffer which gets reused for all 16 channels)

The voices themselves are quite simple: There is one global “voice buffer” which first gets cleared and then filled with the sound of each voice and processed. It is then added to the channel buffer with additional volume adjustments (that means how loud the voice is)

4.3 – Modulation, MIDI processing, fragmentation et al.

A central question which arises when designing a real-time synthesizer is how to process the modulation sources (such as Envelope Generators or LFOs), or better: how often to process them. The most common solutions are:

  • per-sample:
    All modulation sources are processed with the full time resolution. This results in great accuracy and thus sound (and is able to deliver the most „punchy“ and analog sound you can get), but imposes heavy processing costs.
  • Per-frame:
    The modulation sources are processed at a rate much below the sample rate, eg. 200Hz. This is enough for most purposes, but tends to sound dirty, as the values will „jump“ from one value to the other, which will produce clicks (ever wondered about the crackling noise when you use envelopes or volume slides in most older trackers?). And really fast modulations will become impossible. On the good side though, this really speeds things up.
  • Interpolated:
    You process the modulation data per-frame, but do per-sample linear interpolation between the values. This won’t take too much CPU time and in most cases eliminate any clicks, but beware that linear interpolation is not too suitable for many parameters (such as filter coefficients or frequencies), and that fast modulations are still impossible.

As I really had to go for little CPU usage, I chose the per-frame approach with a frame size of 256 samples (yes, beat me for not following my own advice, next time I’ll do it better :) ) and did linear interpolation for all volume related parameters to reduce most of the clicks.

So, all modulation stuff (read: everything that doesn’t involve generating and processing audio data) is processed at a low rate, about 182Hz in my case. And soon the next question pops up: if everything runs at this low rate, wouldn’t it make sense to align the MIDI data processing to this rate?

Answer: Yes and no. I chose not to do it, and in a certain way, I regret it. Aligning MIDI processing to the frame rate will dramatically increase timing jittering, and a granularity of 256 samples is too much in my opinion, but remember that regardless whatever you send to the synth, the result will most probably be heard no sooner than the beginning of the next frame anyway. My solution was to make note-ons and note-offs affect the envelope generators and the oscillator frequency instantly, so the voices COULD start the new notes in time, but sadly, the volume and all parameters only get updated with the next frame, so the sound will come later, but at least with some more correct timing that with aligning. On the other hand, as soon the synth runs out of voices (and playing voices have to be re-allocated), you may hear some artefacts resulting from half-set-up voices between the note-on and the next frame. Play around with it, my next try will most probably the frame length to 128 or better 120 samples (256 are too long for good drum sounds) and align the MIDI processing (3ms timing jitters aren’t too bad, and hey, being slightly inaccurate gives a synth this nice bitchy „analogue“ feel).

Update (a few months ater i’ve written this *g*): I’ve decreased the frame length to 120 samples and aligned the MIDI processing to the frame size – and it sounds nearly perfect. Only downside is that the voice on/off clicking became somewhat more audible, but you can compensate that with standard mixer click removal techniques (the OpenCP source code is a good example :) .

4.4 – Voice allocation

On to the voice allocation. As said, whenever a note-on occurs, one voice from the voice pool will be assigned to a channel. And as soon as a voice has finished playing, it will be freed again. So far, so good. The problems arise as soon as all voices are busy playing and there’s a note-on event happening. Let’s assume that playing new notes is better than sustaining old ones. So – which voice do we throw away to make place for the new one?

Let’s first refine the term „no voice free“ – in the fr08 synth, I can specify the maximum polyphony for a channel, that is how many notes can be played simultaneously on one channel. This is great to keep the maximum polyphony down and reduce unnecessary CPU usage (be warned that voices will most likely continue to be allocated even if they’ve already faded out below anything audible), but complicates the case somewhat more. Now we have two different cases: Either the whole synth runs out of voices, or the maximum polyphony of the channel is reached. In the first case, the following algorithm applies to all voices, in the second, only to voices belonging to the channel we’re trying to play a new note on.

The scheme used by me was simply a check of the following rules in top-to-bottom order:

  • find a voice which is keyed off and playing the same note (on the same channel) as we want to play, and reallocate it
  • else: find all voices which are keyed off and reallocate the one with the lowest volume
  • else: find a voice which is playing the same note (on the same channel) as we want to play, and reallocate it
  • else: reallocate the voice with the lowest volume

If you’re clever, you can generate a magic number from the volume, the channel and the “same note” check, and simply take the voice with the lowest/highest number. This also allows easy refining of the rules if you find out that your scheme is likely to interrupt the “wrong” voices.

Freeing the voices again is simple, at least with my chosen architecture: One of the envelope generators is hard-wired to the oscillator volume (more on this later), and every frame the voice allocation algorithm checks if this EG is in “finished” state (which came after the release state, when the output level was below -150dB). In this case the voice is simply freed and added to the pool again.

4.5 – Parameters, Controllers and the Modulation Matrix

Whatever you do, whatever capabilities your synthesizer will have, it will have a lot of sound parameters to play with. And you will also want to modify those parameters while playing, because twisting knobs and screwing around with the filter cutoff isn’t only mandatory in most newer electronic music styles, but it’s first of all helluva fun. And further than that, being able to tweak sound parameters makes the music sound much more alive and much less artificial (See also my short rant on “why trackers suck” in chapter one :) .

Another thing we know from chapter one (and I assume that most of you have forgotten about that at this point) is that we wanted to save memory. My goal was to fit the synth and the music into 16K, so I had to think carefully about how to store all those parameters.

If you have worked with real (that is: hardware) synths, you might have noticed that most of them only use a 7bit resolution for most parameters. And for most purposes, it’s perfectly sufficient. So, let‘s first state: We use a range from 0 to 127 (or less) for any parameter and store all parameters as bytes. Should be small enough.

If you worry about the resolution: 7bit for the parameters aren’t too bad. We won’t process them in this form, and even small controller changes (which would require more resolution) won’t sound too bad. And if you’re of the opinion that they do, simply do something like eg. the Virus series (made by the German company Access) does (and I can only recommend getting this kickass synth if you’ve got the money): Smooth between the values. I can’t tell you any algorithm, as i haven’t done this so far, but any nice envelope follower or adaptive low pass filter scheme will most probably do. Just if you’ve found out, please tell me. :)

Keeping all parameters in the range between 0 and 127 has another big advantage: You can already bind them to MIDI controllers or update them via SysEx commands. So if you want to twist your cutoff knob now, feel free to do so.

Still, that’s quite clumsy. In most cases, you simply don’t want to modify your cutoff frequency over the whole range. And you want to have envelope generators and LFOs and all that neato stuff, and want them to modify your sound parameters, too. This is where the modulation matrix comes into play.

To keep it short, the modulation matrix cares about which modulation source (such as the velocity, or a MIDI controller, or an LFO) affects what parameter, and how much it does so. The two most often used schemes are:

  • Hardwired modulations
    This means that you think of a set of meaningful source/target pairs (such as “velocity -> voice volume” or “Envelope 2 -> cutoff frequency”) and present all of them to the user who can adjust the amount of those modulations. This has the advantage that it’s very easy and intuitive to use, but then, it’s not what i’d call “flexible”, and in most cases you don’t need many of the possibilities, which is (for my purposes) a waste of space.
  • A matrix
    Think about why the modulation matrix is called that way. You have a set of modulation sources and a set of targets, and every source can affect every target with a certain amount. Mathematically spoken, this is a normal two-dimensional MxN matrix, where M is the number of sources and N the number of destinations. And nothing prevents you from treating it that way. This is by far the most flexible way, as any source/target combination is allowed, and if you’ve got only a few sources and targets it’s even easy to use. But… well… the newest version of my synth has about 90 parameters which would be suitable for modulation, and if I’d used every MIDI controller plus all other sources I would have got about 140 modulation sources. And now imagine a GUI or better: a desk with 12600 knobs on it. Not too easy to use, eh? And more than that, 11K of modulation data (ok, most of them would be zeroes and get compressed away, I admit) would be quite, uh, even more waste of space.

So I had to come up with something different. And as I wanted to do a software synth (the following solution is in my opinion very unpractical for hardware devices), i came up with this:

My modulation matrix is in fact a list of “source, target, amount” records. This means that I use only three bytes for each modulation (plus one byte telling the number of used modulations), and still have the full flexibility. The first byte is an index into a source table which looks like this:

0 - Note velocity
1 - Channel pressure
2 - Pitch bend
3 - Poly pressure
4 - Envelope Generator 1
5 - Envelope Generator 2
6 - LFO 1
7 - LFO 2
8..15 - reserved for future use
16..127 - MIDI controllers 0-111

Note that all modulation sources have been normalized to a float range of 0.0 .. 1.0 before, and that the EGs/LFOs run at full float resolution instead of 7bit. The one big exception is the Pitch Bend wheel, which covers a range from -1.0 to 1.0 , and has 14bit resolution instead of 7bit, according to the MIDI standard. The Poly Pressure can safely be neglected, as you’d need a poly aftertouch capable keyboard to make use of this, and I haven’t seen such a thing with a price tag below “much too high” in my whole life.

You could also say that the note number is a modulation source. I didn’t (yet :) , but if you want to do it, simply add a preset modulation of e.g. “note number * 1.0 to voice transpose”, and you’ve not only eliminated a few lines of code, but can also play around with ¼th tone scales and “keyboard tracking” for other parameters.

The second byte of a modulation definition is simply an index into my sound parameters in the order in which they’re defined. To make sure that no parameter is affected which can’t be modulated was the task of my GUI which simply doesn’t allow to set non-modulatable parameters as destination. Easy as that.

The third byte is the amount, in the usual 0..127 range. Note that this is a signed value and that you’ve got to do a “realamount=2*(amount-64)” to get the real amount (-128.0 .. 126.0).

The modulation matrix now simply has to compile an array of all modulation sources, make another array with float representations of all parameters (to get rid of the 7bit resolution) and process the modulation list. For each modulation, value_of_source*amount is calculated and added to the target parameter (which is then clamped to the [0,128] range). And then, the parameters are sent to the synth modules to update their internal states or whatever. Once per frame.

4.6 – Make the parameters feel good

This may sound strange, but is really important for the quality of the music coming out of your synthesizer. What i’m talking about is the effective range and scale of your parameters. The easiest way would be linear scaling/translating of the 0..127 range to whatever range your signal processing algorithms need, and using this value.

But: This won’t work.

Ok, in theory it will, and of course the code won’t crash, and what the heck, maybe there’s even music coming from your speakers, but don’t think you’ll get ANYTHING really usable out of your synth if you handle your parameters that way. Let me clear up a few things: Our complete feeling of pitch, volume and sound is highly nonlinear. In fact, almost everything we perceive, we perceive in an exponential scale. Have you e.g. ever noticed that most volume knobs of whatever seem to have a drastic effect in their lower areas, while eg. from half turned up to fully turned up, there’s not too much change? Well, that’s because we don’t hear “signal amplitude doubled”, but rather “volume turned up a certain bit” (which is about 6dB in this case). Try it – if you’ve got a volume knob on a cheap stereo, going from ¼ to ½ is about the same step as going from ½ to full volume. or from 1/8 to ¼ or whatever (assuming a linear potentiometer, of course).

Same with our perception of pitch. An average human is able to hear frequencies from 16Hz to something in the lines of 12 to 20 KHz, depending on his/her age. Let’s make this 16Hz .. 16384Hz. This is factor 1024, or let’s call it factor 2^10, or better: 10 octaves (which is 120 semitones; that fits our typical 0..127 range quite well, don’t you think?). And again, doubling the pitch doesn’t make us think “the pitch has doubled”, but rather “the sound is one octave higher than before”.

Even for time values, like envelope attack/release times etc, an exponential scale is often better. But this heavily depends.

In a nutshell, use an exponential function for EVERYTHING dealing with pitch or frequencies, be it the oscillator pitch, be it a filter cutoff or even the operating rate of an LFO. You will most definitely regret anything else. For volumes, it depends on what you want to achieve, and for times, use an exponential scale for everything that has to “feel” right, like envelope release times, and a linear scale for everything that has to be precise, like chorus/flanger delays or compressor lookahead times or whatever.

And if neither a linear nor an exponential scale fits your need, feel free to experiment with things like square or square root, or x^(constant), or cos²(x), or atan(x). The better it “feels”, the more you’re able to adjust the parameter to produce something that sounds good. And this is exactly what we want.

This also affects the way that you’re calculating things. If you e.g. do everything with exponential values, you’ll get a hell of a lot of pow() operations. This takes time, especially when you have short frame times or even calculate the modulations sample-wise. In this case, you might want to fake the pow() function by approximating it with linear interpolation or even strange tricks like handing values from the integer to the floating point domain without conversion, and so on. Just don’t do that with frequencies, these have to be as precise as possible. But for everything else (volumes/times/etc), a deviation of <5% will remain almost unnoticed and you’ll have saved lots of precious CPU time.

Just remind that “if it sounds good, it is good”, and the easier it is for the musician to produce good sounds, the better the music will get. Just don’t let the coder’s laziness get in the way of the musician’s creativity.

4.7 – Various things to consider

If you want to calculate the pitch from the note number, use this formula:


An octave consists of 12 semitones. An octave is the doubling of the frequency, so if you multiply the frequency by 2^(1/12), you get one semitone higher. The “chamber note” A-4 is located at exactly 440Hz (change this value to implement something like a “master tune” function), and is MIDI note #45, considering a sound playing in the 8′ octave, which is quite common. So, the above formula should be clear. “note_number” is of course a float value, and you can get “between” the semitones with it. This also means that this formula is the LAST thing to calculate. Every modification of the pitch, be it through modulations, pitch bending, transposing or whatever, must be applied to the note_number variable (by simply adding it) before. Thus, the pitch will always be correct. Hint: Do a range check of this value before using it. It can’t go below zero because of the pow() function, but it shouldn’t be allowed to go above the nyqvist frequency, which is samplerate/2 (so, 22050 in most cases).

Then, many amplitude values are given in dB (Decibels). Decibels are a relative unit, so you define a certain level as 0dB, and the actual dB value is a factor with an exponential scale.

Warning: the formula presented below isn’t really exact, but about 0.1% off. But as most audio devices and software I ran across treat +6dB as “double”, this fits IMO more than the “official” definition.


so, +6dB is twice the level, +12dB is four times the level, and -24dB is 1/8 of the level. Simple as that. And yes, you can’t reach zero with it, but there isn’t a zero anyway, just use “too little to be useful” for that.

Then, let me tell you something about panning.

Falsely quoting a commercial ad on German TV: “The history of proper panning is maybe one of the least understood chapters of mankind”. I could start ranting now how panning envelopes have killed music, and how unreal panning slides are, and that I don’t know any good “real world” song which plays around with panning more than setting it (apart from sound effects, of course)… anyway, different story.

But did you ever recognize that in common tracker music most sounds seem to get louder if the panning is more at the sides and softer if it’s in the middle? Well, surprise, they ARE. And that’s because from the very beginning, tracker coders forgot about one simple thing: What we percieve as loudness is the energy coming from the speakers to our ears, not the voltage.

Energy is W=P*t, so it’s power multiplied by time. Let’s forget about time and say, what we hear is the power coming from the speakers.

The power P coming from a stereo speaker system is P=Uleft*Ileft+Uright*Iright, the voltage U multiplied with the current , and both speakers added together. Speakers normally have a constant impedance or resistance (not completely, but let’s assume that), and the current I is I=U/R (voltage divided by the resistance). So, if we insert I=U/R into our term for P and assume that both speakers have the same impedance, we get P=(Uleft²+Uright²)/R. Let’s also forget about R (because it’s constant), so we can say: The percieved volume is dependent on the sum of the squares of the signal levels.

Ok, most people (and sadly all tracker coders) simply do something like left=sample*(1.0-panning); right= sample*panning; … which seems ok, because left+right is our original sample again. BUT: Let’s try our power formula for 100% left panning and then for middle panning.

For 100% left panning we get:
left = sample
right= 0
power= left²+right² = sample²

And for middle panning we get:
left = sample*0.5
right= sample*0.5
power= left²+right² = 0.25*sample²+0.25*sample² = 0.5*sample²

… Which is only half of the power we got for 100% left panning, or to apply something we learnt above, if we revert the square again, exactly 3dB softer.

The solution is simple: do so-called EQP panning, or “EQual Power” panning. Just take into consideration that the power is about the square of the level, and do something like this:

left = sample*sqrt(1.0-panning);
right= sample*sqrt(panning);

So, for middle panning we get the correct total power and thevolume won’t be dependant of the pan position anymore. The problem about this now is that it won’t work with mono signals. As said above, with the “simple” way, the signal “left+right” (which is mono) isn’t dependant of the panning, but it is with EQP panning. So, EQP sucks for stereo signals which are likely to be converted to mono. This either means that you should take a “mono mode” into consideration when designing the synth architecture (completely letting out any panning), or that you try to find a good middle way, eg. replacing the sqrt(x) by pow(x,0.75) (remind that the original solution is nothing more than pow(x,1.0) and EQP is pow(x, 0.5)).

EQP is also a good solution in other cases when you mix or split two signals with varying weights, first of all for cross-fades. If you ever tried the cross fade function of FastTracker 2′s sample editor, you may have noticed that in most cases the volume drops to about half the original volume in the middle of the fade. With EQP fades instead of linear ones, this would have worked. And most professional software supports EQP fading for cross-fading audio files into each other among other things.

So, conforming to the last paragraph, let’s rephrase all we’ve learnt in one three-words sentence:

Linear is bad.

The Workings of FR-08′s Sound System, Part III

Part 3: The basic system.

3.1: Introduction

Ok. Now we have chosen our weapons and our preferred file format and can start. And soon the next problem arrives: How do we get sound out of this grey box at all?

(You can skip this section if you’re an experienced “sound output system” coder. I’d recommend reading through it, tho, as you might get some ideas that you didn’t know before)

The first question that arises is which sound API to use. Assuming that we want to use Windows for sound output, there are two possibilities: waveOut (the “normal” way the PC does sound) and DirectSound.

My choice was DirectSound, for some simple reasons:

  • the API needs less calls and first of all only ONE import (good for file size)
  • Secondary buffers may suck performance-wise, but are the safest means of getting sound out of your computer, and they may even be hardware accelerated and fast in some rare cases.
  • DirectSound gives you an easy and reliable way of synchronizing your internal clock to the audio stream.

3.2: The DirectSound Player

The DirectSound init procedure is quite simple (look into the DirectSound SDK for further explanation): Get an IDirectSound interface with DirectSoundCreate, set your cooperative level to the “priority” setting (“exclusive” would be even better for demos, the only problem is that it’s unsupported and in fact the same as “priority”, at least with DirectX8), retrieve the primary and secondary buffer, set both to your preferred output format (I’d suggest 16bit 44.1KHz signed stereo PCM), lock the entire secondary buffer, clear it, unlock it again…

… and then play it.

Well. Play WHAT? We will need to fill the buffer with data somehow. Again, there are two ways:

  • use a second thread (similar to a timer IRQ back in DOS days):
    We create a thread responsible for rendering the sound. This thread will run independently of our main program, it won’t need any Windows messaging or whatever, so it’s really convenient. And if we set the thread priority to some higher value than normal, it will most probably not be disturbed by anything else going on in the PC. The only problem is that our sound thread may steal some CPU time in just about that very moment that would be needed to complete rendering the frame before the next video frame is due, and thus may affect our frame rate in a very ugly fashion.
  • put it into the main loop:
    We simply call the sound rendering routine in our main loop. That way, it won’t interfere with the rest of our routines, this only has another big problem: if one run of our main loop is longer than the sound buffer, the sound will simply be f***ed up. If you ever tried Quake1 on a 486 machine, you know what I mean.

The solution I finally used was kind of a hybrid between those two ways. First of all, I decided that I wanted to use a sound thread for output. To make things easy, this thread would be a simple loop which does the following things:

  • get the current buffer playing position
  • render and fill up the buffer from the last known position to the current playing position
  • wait a little
  • loop if the “exit sound thread request” variable isn’t set.

I know, the DirectSound SDK and many other sources will make it seem that things like double-buffering or DirectSound’s notorious Position Notifications are a necessity, but in fact they aren’t. The only thing that’s necessary is that you refill the buffer in time, and the way of determining what’s “in time” is completely your decision. Actually, my sleep command waited for about one quarter of the buffer size, so that there’s always plenty of headroom in the buffer.

Now for the CPU time interference problem. I wanted the synth renderer to be in sync with the video rendering engine without sacrificing any of the advantages of perfect background playing. I achieved this by defining a synchronisation event (look into the Win32 SDK for nearer specifications) which can “trigger” the sound thread loop, as i replaced the Sleep() command with WaitForSingleObject() which exits if either the specified time has run out or if the event was set.

This way, I was able trigger the event in the main loop via SetEvent(). Due to the inner workings of the Windows scheduler and the fact that my sound thread runs at a higher priority level, the main thread is suspended and the sound thread does one run of the loop. As soon as it comes to WaitForSingleObject() again, the main thread continues. So this is kinda like a direct call into the sound rendering routine – and as soon as your main loop would take too much time for the sound to run stable, the sound thread’s timeout value comes into play and “renders the sound in the background” again.

If you want to avoid that the sound thread gets called too often, simply put a “minimum time check” into the loop which skips rendering if not enough samples have been played since the last call.

3.3: Latency and synchronisation.

Let’s just recall a key property for what we’re just doing:

“The purpose of this sound system is playing back music.”

This may be trivial, but this sentence is the key to all latency problems, simply because there is no latency problem anymore. When you simply play back a musical piece, there’s nothing that will occur unexpectedly. You play a consistent stream of data which could come directly from a .WAV file and will never ever change throughout playing. That way, you can make the latency as high or low as you want, it doesn’t matter – it’s clear what will be played anyway, and noone cares if the sound comes out of the speakers a bit later.

Noone cares? Well, I wanted to synchronize video to the sound, so I better SHOULD care when the sound will actually be played. And most people would try to make the latency as low as possible now, just to get the video as close to the audio as they can.

And they forget one thing: The actual latency is known. It’s exactly one buffer of sound in length (plus maybe the 20ms additional DirectSound mixing latency, but in most cases you can safely ignore that). So what stops us from just “turning back” our clock the length of one sound buffer? Nothing. And we’ll happily recognize that we’re in perfect sync then.

So, the demo’s main timing source looks like this:

  • We have a variable to count the already rendered samples, which gets initialized at minus the sound buffer size.
  • The sound thread will update this variable after having rendered a chunk of samples
  • Our GetTimer() routine will get the current playback position, subtract the last known playback position from it and add that value to the number of already rendered samples.

And voila, we have a timing source which is in perfect sync with the audio output and will never stop to do so. Just remember that it will start at minus buffersize upon playing, so better make your timer values signed and wait some time before you start the visuals :)

As this would be faaaar too easy, there are of course some things you’ve got to consider: DirectSound’s GetPosition function may be unaccurate sometimes. You MUST specify DSBCAPS_GETCURRENTPOSITION2 for your secondary buffer, you MUST encapsulate all routines (the sound thread’s loop except the Sleep()/WaitForSingleObject() call and the whole GetTimer() routine) into critical sections or mutexes (look into the Win32 SDK again), as you will run into synchronisation problems otherwise…

… and even then, the timer value may skip a bit every few seconds, especially with badly written sound card drivers (can you spell creative?). The only workaround I found for this was checking if the timer delta from the last to the current call made sense. If it was bigger than eg. half the sound buffer size, the current position was ignored and my routine returned the last known position instead. This is far from perfect, but as said – it happened only for one frame every 20 or 30 seconds, and nobody will recognize a small timing jitter now and then.

If you want to synchronize your demo events to certain notes/events in the song, don’t waste your time with trying to synchronize the song position counter to the clock (it’s possible with a small FIFO queue which receives position/rendered-number-of-samples correlations as the player comes across the position and will be read out by the GetSongPosition function up to the “real” timer value, but why bother) – just enhance your music player by routines which calculate the timer value from the song position and vice versa, use these in your authoring tool and store only timer values for the events in the actual demo. Ths makes things a whole lot easier (and the player code shorter again, without losing the possibility of ultra-tight syncing).

3.4: The rendering loop

Now to the rendering. It makes sense to use a certain granularity, as the synth will most probably have a “frame rate” and aligning the rendering blocks to that rate is in most cases a good idea. Just remember one thing:

A bad idea, however, is to make your buffer sizes a power of two.

The times when ASM coders used AND operations to mask out the buffer offsets are over. Those one or two cycles for a compare operation don’t hurt. So, there’s no reason for using power-of-two-buffer sizes except that you may be used to it. And in fact, it’s even better if you don’t. I won’t go into too much detail here, but if yo know how a cache tag RAM works, you might realize that the CPU can manage the cache better if the buffers start at “weird” addresses, especially if you use multiple buffers at a time (eg. int the same loop). Just make the buffer addresses a multiple of 32, don’t make their sizes a power of two (or leave some space between the buffers, even one dword is enough) and you’re set.

Then, use at least a 32bit integer buffer or better a 32bit float buffer for your “final” output signal as it leaves the rendering stage. This also applies for every intermediate mixing buffer, as 16bit precision is much too low (processing them will produce a great amount audible noise if done more than a few times) and you wouldn’t have ANY headroom if the signal was likely to clip. For integer buffers, treat them as 1:7.24 fixed point values, for float buffers, normalizing the signal at 1.0 is quite a good idea.

So, the “render” part of the sound thread loop looks more like this:

  • subtract last position from current position (modulo the buffer size) (this will give you the amount of samples to render in this run)
  • optional: align it to your buffer granularity (newsize = size – (size MOD granularity) , if newsize becomes 0 then, well, it’s ok, just render nothing rather than 4 gigs of data :)
  • call the render function to render the specified amount of samples into an intermediate buffer
  • lock the DirectSound buffer from the last playing position
  • convert and clip (!!) the output buffer to 16bits signed and copy it into the DirectSound buffer
  • unlock the DirectSound buffer again
  • add the number of rendered samples to the last position (MOD buffersize !)

And thus, your render function will just be called with a destination buffer and an amount of samples and you can write your synth or player or whatever completely OS/platform independent. If you want to port the system, just rewrite the sound output code. Same if you want to use waveOut or .wav writers or your favourite MP3 player output plugin or want to make your whole thing a VST2 plugin (use normalized and pre-clipped float buffers then :) or whatever.

At last, we have sound running in the background, not getting in the way of other CPU time critical routines, with perfect sync and in a nice modular fashion. And it’s even easy to code. Do we need more?

“Yes, indeed, we want to have a synthesizer now”

Well, sorry, but more to this later. You’ve got enough work to do if you followed me to this point, and from now on, things get tough. And second, I haven’t finished these parts, so you’ve got to wait.

Anyway, I hope that this helped you in any way, if you’ve got any questions, comments or suggestions, simply send a mail to kb@kebby.org or catch me on IRC :)

until then…

The Workings of FR-08′s Sound System, Part II

Part 2: Why are SMF files smaller?

2.1: MIDI files smaller? wtf…

If you know a bit about MIDI communication and .mid files (which are, as said, just a log of all events with timestamps), you might ask how I can think that those files are (after compression) actually smaller than modules.

Let’s first assume that we don’t need to compress the music data at all. The executable will be compressed anyway, most probably with an RLE or LZ variant (and perhaps with some entropy coding). So, there’s nothing against one megabyte of zeroes, they will become something like four bytes after compression. Things that repeat or look very similar to each other are also something good.

A module consists of patterns, which is in itself a real size advantage, as you can repeat them. Problem: under the above assumption, this doesn’t help at all. If you simply write all pattern data below each other, a good LZ algorithm will find out that the structures are repetetive and simply put a reference to the last occurrence of the pattern. So, the whole order table and all the code processing it are a plain waste of space.

The second problem is that all current module players (the only exception was Digitrakker, IMHO) store the data in a “per-row”, not “per-channel” fashion, which means that data that is likely to repeat (your bass drum on channels 1,2,3 eg :) is interleaved or “scattered” with data that will change (melody, chords, etc), so a LZ packer will find something, but will only be able to compress small chunks of data, which can be considered quite sub-optimal.

Ok, a standard MIDI file is much worse. SMF is a really “compact” format which focuses on not wasting any byte. That’s cool for the uncompressed file size, but is likely to tilt any standard compression algorithm. The fact that the content of such a file is a single stream of data which is not even separated in channels (which is true for format-0 files, format-1 files can have an arbitrary number of tracks, but that will become useless soon anyway) doesn’t help anything either.

2.2: How to please an LZ compressor

But we can do something about that. The trick is to re-sort the MIDI data and group similar events together. In fact, I splitted the MIDI stream with the following criteria:

  • MIDI channel
  • Type of event (Note, Control Change, Program Change, Pitch Bend, Channel Pressure, Rest)
  • for Controller events: Number of Controller

So, after the split, I had a few hundred single streams (some of them of course empty) which carried data like “All changes of controller 2 on channel 10″. You see that in our bass drum example, a good LZ compressor will process the “notes on channel 1″ stream, think something like “oh, the same event every quarter note throughout the song” and replace all those notes with one word: “tekkno”.

But, of course, that’s not enough. If your song contains more than a bass drum, it will most probably contain sequences which get transposed to other pitches. Normally, an LZ packer won’t recognize such transposed patterns, and it won’t recognize a continuous controller slide from 0 to 127, either. Therefore, apply delta coding to everything… the timestamps, the note numbers, velocity information, controller value, simply everything. A controller slide from 0 to 127 will become something like {always_the_same_time_delta, 1} per event, and all sequences of notes will become “decoupled” from their base note, as only the pitch distances between the notes are encoded, and an LZ compressor will recognize all those slight repetitions as such.

Then, almost all executable compressors compress their data byte-wise. Make sure that the compressor you chose likes your stream structures. I won’t comment that further (keeping a small advantage for oneself rocks), but you’ll find out what I mean if you think about it, promised :)

You might (or hopefully will) also find out that writing a .mid player that has to keep about 200 delta encoded streams and their timestamps in “mind” and schedules every event at the right time isn’t a too trivial task. In fact, it makes the player code somewhat bigger… a simple .mid player should be possible in 500 bytes, the player for my converted format is about 1.5K uncompressed in not-at-all-optimized C++. Still, this pays off. The 11 minutes main tune for FR-08 is about 120K in size in .mid format (PKZIP makes something among the lines of 20K of it). After conversion and adding a few hundred bytes of sound bank data, the ready-for-playing file is about 180K in size…. and after applying the executable compressor, only 4K of it are left. That’s a compression ratio of 1:30, which is, well, quite cool IMHO (and definitely better than the 1:10 ratio i would’ve got with standard .mid files)

So, we have a concept, we have a file format for the music… let’s get some sound out of the PC :)

The Workings of FR-08′s Sound System, Part I

Part 1 : The concept

Back in the summer of 2000, I was asked by Chaos / Ex-Sanity (who looks somewhat similar to the fabolous Powa of Elitegroup, but that’s a completely different story) if I wanted to contribute the musical part to, as he called it, “the best 64k intro of all times”. After I asked him to see that intro, he said that there wasn’t anything finished until then, but that Fiver2 and him would deliver something in November which would definitely convince me.

And so they did. Damned. Every year I just want to take a rest from all those scene activities and every year the same story: “Let’s win The Party”. I’ll NEVER escape this.

But so what, I should stop pretending that I’d have something called “life” and, well, it’s fun and I had some ideas slumbering in my brain and waiting to be tried out anyway. So let’s start.

1.1: A solution that doesn’t suck?

But start with WHAT? I mean, I never was a good chip tune composer, I always took the memory I wanted or needed, and in the younger past, I only composed using “real” music equipment and all that stuff. And finally, I was really addicted to vocals :)

So, it HAS to be possible somehow to put a sound track into the 16K compressed executable size that Chaos granted me.. that leaves me as many of those things i’d learned loving as possible.

But how? Let’s have a look at the solutions which existed before:

  • Normal tracked music using MXMPlay/W or MiniFMOD or whatever:
    Well, to put it short: NO. Samples take up memory, and even those chippy, noisy crap samples I’d have to use would take too much. Apart from that, that’s just how all of those 64k intros sound. There are enough musicians capable of doing nice tunes with such low resources, but most 64k intro soundtracks wouldn’t survive long in “the real world”.
  • Using modules with pre-calced samples from a softsynth (“the Stash way”):
    Nice. But still not what I wanted… You have to keep a software synthesizer AND a module player in the executable and there’s no thing like “usability”… I respect Probe and the others for the great results the got, but I as a musician would rather quit the scene than compose a demo score typing hex bytes into the softsynth, re-compiling it, generating the samples, swithing to the tracker, loading the samples again, finding out that they don’t REALLY fit… and so on. This solution may work, but really kills the whole work flow and the creativity.
    Additionally, I tried that before one year ago, and apart from the sad fact that the intro that we’ve been planning then was never even half finished, I did the music for the Mekka&Symposium 2000 invitation intro using that method and found out that it sucked.
  • And well, there are always FM synths and General Midi wavetable chips on today’s sound cards. But as all they can do is “sound cheesy and cheap”, this solution is really none. And first of all, you can’t be sure that the song will sound the same on all computers.

If you really think about it, tracked music sucks anyway. Face it: The whole format of modules, the way the patterns are displayed, organized in memory and played, was designed just for one single reason: Paula. MODs were made to have an easy way (as in “easy for the coder and the CPU”, not “easy for the musician”) to make the Amiga’s sound chip (called Paula) play some samples, and that was it. And sadly, all other programs followed that paradigm, even though the MOD format itself was rendered obsolete by the first occurence of better Amiga CPUs or even software mixers. And to make it worse, all newer and well-known tracker programs carry TONS of overhead to make them at least a bit compatible to the original SoundTracker. And IMHO, for no convincing reason.
Additionally, we’re living in the year 2001. CPUs are somewhat faster now than back then, music programs have evolved, there’s a plethora of cool programs like Buzz or plugins for whatever format… and I just can’t imagine why the heck anyone would consider a handful of mono samples without any effects or modulations enough for making good music. In terms of sound quality, every M***x Music Maker kiddie can just laugh about trackers’ capabilities… and he would be right.

Ok, let’s conclude: Every Existing Solution Sucks.

But what to do now? I want a flexible, good sounding sound system, an intuitive approach to composing, state-of-the-art sound quality (and vocals :) … AND it has to fit into like 16K together will all music data. That sounds quite impossible, doesn’t it?

1.2: Innovation through established ideas

Not at all.

Let’s dig out one of the oldest standards for computer aided music there is: MIDI. When I’m talking about MIDI, I’m talking about the original “Music Instruments Digital Interface” standard which includes the definitions for a serial interface (standard RS232 asynchronous protocol, 32500 bits/sec, one start- and one stop-bit, no parity, no handshaking, using TTL voltages and this notorious 5-pin DIN plug with those even more notorious decoupling circuits) and the definitions for a simple protocol cabable of transferrin musical events such as “key pressed”, “key released” or “modulation wheel turned a bit up”. I’m also talking about the SMF file format (standard midi files, those .mid things) which contains nothing more than a time-stamped “log” of all events belonging to a song.

What I’m NOT talking about is the “General MIDI” standard (and all its derivates) which defines various standard sound banks and is what most people think of when they hear the word MIDI. That’s a nice toy for distributing bad cover versions of charts songs over the net or for old DOS games or whatever. As mentioned above, this would sound, well, cheap.

“So… today’s CPUs are quite fast. Also, today’s the age of TnL 3D cards, so the CPU really doesn’t have to do much in a demo. I’ve got all the time in the world.”

(to which Chaos replied something in the lines of “if my intro drops to the second frame because of your synth, you’ll die”, but who cares)

Well, then… how about coding a complete realtime synthesis system which simply processes MIDI events, hooking it to a .mid file player and there you go? Ok, it will (and in fact DOES) suck up a lot of CPU time, but there are many advantages:

  • A .mid file player is MUCH smaller than an XM player… just read the file, wait until the next timestamp and send some commands to the synth… no effect processing, nothing.
  • Sound quality wise, realtime synthesis is much better than precalculated samples… Lines will sound more “natural” (as with analog equipment, no two sounds sound the same) and we can do really long filter sweeps and modulations without having to sacrifice several Megabytes of memory and precious seconds of precalc time for those sounds.
  • If the synth processes MIDI commands anyway, well, why not simply hook a small GUI for editing sounds to it and connect it to the MIDI sequencing program of your choice? That way, you can compose the whole song in Cubase (or Logic or Cakewalk or zTracker or Modix or whatever you prefer) and any sound change is just two mouse clicks (or one twist of a knob on the MIDI controller board next to you :) away and happens immediately, instead of “change, render, load, try, doesn’t fit, change, render …”.
  • One nice side-effect is that the musical data in .mid files is stored in a quite more senseful way than tracker patterns (which were stored for easy access with a M68000 CPU, and that never really changed), and is generally smaller (in fact, that was not enough for me, but more about that later)

Maybe some of you will now complain that this way, you can’t play samples anymore. But adding sample playback support to a softsynth is like 20 lines of ASM code or 5 lines of C++ if your concept is good. So maybe it’s more coding work, but not really something I’d call a problem.

Ok, for the next chapter, you’ll need a bit of knowledge about the MIDI protocol and maybe the .mid file format, so I’ll leave this as a homework before you continue reading.

Better HAVE a life after christmas

a The Party 8 report by The Artist Formerly Known As Doc Roole / ELITEGROUP

Sure, maybe I made the same mistake again I made the last two years. As always, shortly after The Party you think “no, I WON’T go there again, it will surely suck even more than this year” – and what happens? You show up there again 363 days later and get disappointed quite the same way as you got the year before.

Well, so I felt after my first few impressions of The Party 8 (which was my 4th The Party in a row and my about 20th to 30th party in total, who cares). We arrived there about two hours before the party actually opened – and the first funny thing was that we met a huge bunch of sceners already before the entrance, as they were reasonable enough not to pay the raised entrance fee when you arrived too early, just unlike those few hundred quake lamers which already filled up the halls and were busy sitting in front of their computers and doing whatever they did, at least nothing of any importance.

About twenty minutes before 8AM, the organizers then showed their immense generousity and scene-friendlyness (sarcasm) and let us in, so we could finally move our equipment and ourselves into the hall. Basically, the hall had the same layout as the year before, only most of the tables in hall 1 faced 90 degrees to the big screen, as there were rumours that some of the quakers felt annoyed by those nasty compos last year and complained. Whatever, we took our seat at our reserved tables in hall 3 and the few tables around, where most of known sceners were anyway.

It was luck for me that I have one of those 100Mbps network cards in my PC, as instead of the promised 10/100M autosensing network hubs, there were only pure 100M ones in hall 3, so everyone owning only a 10Mbit card wasn’t able to connect to the network if there haven’t been some people (basically the Smash Designs guys) who brought their own network hubs with ‘em and supplied 10Mbit to half of the hall.

Speaking of organizers and especially lacking competence, this syndrom was truly widespread among the guys wearing those “The Party Organizer” shirts. Whatever problem you had, whatever you asked them, whether it was a problem with your TCP/IP-configuration, whether you wanted to deliver a contribution on disk instead of the online contribution system or whether you just had an idea concerning how to do something better concerning whatever, the only “answers” you could get from them were things like “uh?”, “no, we can’t do that”, “err, I don’t know who exactly is responsible for this, come back later” or “No, alcohol isn’t allow… what was your question again?”

So, every of our expectations of a scenish party or organizers helping the attenders was brutally destroyed after a few hours and, like last year, we had to face that all fun and scene spirit this party would depend on ourselves once again. And so we tried:

experiment 0×01 – ELITEGROUP tries to find sceners to be in war with

mission: we scanned all halls for people doing anything scene related, such as coding, pixeling, modelling or tracking.
results: there were about 4000 people at the party place. Of these 4000 people, there were

  • 2 people coding
  • about 5 people using FastTracker or Impulse Tracker (half of them only listening to tunes)
  • and 3 people using LightWave or 3DSMax for 3D scenes

conclusion: As The Party is (in their own words) “The biggest scene event of the year”, the days of sceners using editors, paint programs and trackers for making their demos are definitely over. Nowadays’ demo creators’ tools are definitely

  • Windows 9x EXPLORER.EXE
  • Netscape
  • Quake
  • and of course Winamp


Slowly, there was that certain feeling that the only interesting people to meet at the party were the guys we already knew and are in good contact with, anyway – and we started to ask ourselves what the heck we wanted there expect winning the Classic WiLD compo with our Sony PSX demo (which we did) and if there weren’t better things to spend our time with, like preparing ourselves for better parties (like the Mekka/Symposium or Summer Encounter), caring about our girlfriends we left at home (as there aren’t any more boring places for non-scene-girls than scene parties) or just sitting around at home and watch TV…

… but hey.. aren’t there things like competitions and such at a scene party? Why didn’t we see ONE of them so far? What’s up?

Looking at the timetable, we had then to find out that some of the compos have already been held without the slightest urge of the organizers to actually NOTIFY someone not watching the big screen all the time. What kind of so-called “scene event” is it when the people behind it don’t even care that everyone is able to see the most important thing there is at a scene party, namely the competitions? I would say, none at all.

Well, in a certain way, the organizers even announced the compos, at least for the people in hall 1: They had some totally useless lighting equipment at the stage and did a sort of “light show” right before the compos… did i say “right before”? No, it was sadly NOT right before the compos, but kind of 15 minutes before the compo and five minutes long – so the compos didn’t start right after the light show but ten to thirty minutes to gove everyone time to go back to their seats, as this time, the quake kids seem to have complained about the crowd of people standing around and making noise everytime a compo starts. I mean, imagine YOU are a quake player and have payed 400DK to kill a before unseen amount of “friends”… wouldn’t you feel very offended if there were a bunch of people around you having fun that conventional, peaceful way right before your eyes? I definitely would… I think.

Speaking of compos, let’s talk a bit about them. For example, this year there were no 4K intro compos, which was kind of a shock for the scene, but, viewed from my current point, nothing but pure wisdom of the omniscient organization team, as they surely already knew that almost no coder capable of this rare discipline would show up there (and they were quite right, don’t you agree?). The compo PC was changed from a Pentium MMX/233 to first a P2/233 and then a full P2/300 in the last few hours, which of course has improved the quality of the compos, but was a kick into the back of all groups who designed their effects to run smooth on the compo machine instead of following nowadays’ “buy a faster computer” way of coding things. Anyway, the overall compo quality was VERY mediocre for such a BIG “scene” event. Some few good demos (Bomb, Blasphemy+Purple), some few good intros (Halcyon, Fudge), the usual good graphics and, to put it short, that was it. The music compos consisted to 99% of pure techno (neglecting the fact that the multichannel compo took place at the last morning, so almost noone was listening to it) without any real known name in the attenders list – and the wild compos showed the usual videos and bad renderings, I can say without arrogance that our Playstation demo (SCHLEUDERTRAUMA) was the only entry worth watching there. So, even the compos were far too disappointing to make The Party anything like “The scene event of the year” or a good scene event at all.

So, when do the organizers finally realize that the legend is over and The Party has become nothing more than a big danish quake meeting with some sceners who haven’t realized this sad fact yet? Never, I guess. The horrendous entrance fees make the event pay off too well for them to just stop – and sceners will also still be attracted, be it the fame from past days, be it the compo prizes which make the party still worth attending if you have a winner production in your backpack. But to all others, I can only recommend not even to think about spending your life after christmas in Aars when you want to go to a scene party, as The Party ISN’T. Not anymore.

ELITEGROUP – .we piss on you.

F***ing Learn To Code Again

Published in the scene disk magazine Hugi, Issue 14

While reading Hugi 13, I ran across many articles rumouring about whether the scene dies, whether people become inactive, why they do and how much innovation is lacking in demos. But, despite all this, even the TECHNICAL quality of demos is becoming steadily worse. Lately, I downloaded tons of productions from demo.cat.hu, looked through all of them, and almost started to cry and think of whether it wouldn’t be better if I quitted this mess calling itself “scene” nowadays.

What I had to see were first of all many demos which didn’t even start, but crash or fail for this and that reason (and I don’t think a 72meg machine with GUS and SB and a Trio64V+ in it (yes, UniVBE is loaded) doesn’t cover enough standards to have at least one of them supported) – and the rest was mainly reincarnation of Mode 13h with effects I saw many times before and BETTER. So what’s up? Have people entirely forgot that apart from all that design-hype nowadays coding should be still kind of an art? Or does just nobody care anymore? I don’t know.

The most astonishing thing about it is, that in former times, people were actually ABLE to produce elegant code and good-looking fast demos – and all this with PCs we would just laugh at today. And look at all those Amiga and C64 sceners still producing stuff which some of “us” PC guys ‘n grrrls just envy. And do you know what? It’s friggin’ easy. Just CARE about what you code and don’t lean back once your desired effect shows the first correctly looking still picture on your screen.

On the other hand, most of that ‘leet underground knowledge how to code demos is lost. There is no tutorial at all how to code demo effects in a way that they look good, work on more than your machine and maybe run the same overall speed (not frame rate of course) independent of the watcher’s cpu. And sadly, nobody seems to be able to find out how all this worked and still works (ok, maybe, as said, people don’t care).

So, i’ll just state a set of rules or proposals now how to make better demos (or demos at all)…


… and i’ll start right with the thing noone of you seems to want to hear:


["Optimize? Are you NUTS? We have Pentium IIs, we don't need to optimize!"]

Yes, I admit it, if you have a P2 machine with at least 300MHz, completely losing the overview of what you do is a permanently imminent danger. What ever mess you type into your favourite editor or IDE, the result will be most probably what is called “fucking fast”. One good example for this is that thread about rotozoomers on news.scene.org’s coders newsgroup where a guy posted a rotozoomer which was about 10 times slower than a rotozoomer (which is a 1992 effect, just by the way) should normally be. And you know what? He didn’t even recognize. No, in fact, he considered the rotozoomer I send him back (1 hour of coding, some not-so-well-optimized written-down asm inner loops, rest C++, ran 70fps on a P60 (instead of HIS 10fps)) buggy – and another guy even dared to answer me things like “your ASM code will lead you nowhere”. Nowhere.

In fact, “nowhere” is exactly where I want to be if THIS attitude towards DEMO CODING (let’s repeat this : DEMO CODING) is the common one. We have so powerful machines nowadays, capable of displaying ten thousands of particles, thousands of triangles and an unmeasurable number of shadebobs per video frame (read: 70fps) and trying to actually DO something with this power instead of repeating the same old effects again and again (just worse each time) leads me or any other person doing this NOWHERE? Get a life.

Now, how does one write optimized code? Well, it’s easy if you think of a few things.

  • CHOOSE THE RIGHT LANGUAGE:Demo effects are supposed to run fast (at least I hope so). So, the FIRST choice at least for your inner loops is of course Assembler. I don’t say that 100% ASM is necessary at all, in fact it causes more problems than it solves, but a good trick is to count how often a part of code is called per calculated frame – and if this number exceeds 1000, better use ASM code for this. And mostly, those portions are small inner loops and don’t take more than one or two screen pages of code, so it isn’t that much work actually.
    • Know How To Code In Assembler (or: what is the difference between hand-written and compiled code): Don’t try to code your ASM routine by simply converting your C++ or whatever prototype line by line. No, try to use all the registers the cpu has, avoid variables, avoid jumps and restructure your loop in a way that it fits perfectly to the cpu’s structure (this includes rearranging all your data types etc, too). Because simple Pentium optimization doesn’t help anything in most cases if the algorithm itself is not optimized (the P2 does all that pairing etc by itself, so the speed gain as at about 0%) – apart from one thing: USING MEMORY IS BAD. Don’t waste around with tables, 32bit image data and whatever, since we have that thing called “cache”, simply calculating things is often faster.
    • and Use The Right Language For The Rest: Do NOT bother with Borland Pascal, Visual Basic and all those other toys. Don’t say “C++ looks too complicated for me”, imperative programming languages are basically all the same, once you get used to the new syntax, it’s just as easy as before. So: use C or C++ (for DOS, preferably Watcom, for Win, VC++) – and don’t be too afraid of OOP, if you use it wisely and don’t throw around with abstract classes or virtual functions, it isn’t noticeably slower at all (just remind one thing: In demo coding, there is no thing like “code reusability” or that crap ;)
  • DON’T CALCULATE THINGS YOU SHOULD ALREADY KNOWThis one is a bit tougher, let me just explain, look eg at this rectangle fill routine (a bad example, but I just want to show the principles):

    for (int y=y1; y<y2; y++)
    for (int x=x1; x<x2; x++)

    In it’s innerloop, the cpu has to calculate vidmem+320*y each time, though this value NEVER changes (and in fact, it’s at least 20 cpu cycles you waste per pixel). Why not using THIS version:

    whatever *vptr=vidmem+320*y1;
    for (int y=y1; y<y2; y++) {
    for (int x=x1; x<x2; x++)

    Isnt this a BIT better? Well, in my opinion (and in the opinion of ever other ‘leet demo coder of course) not really. Still, the cpu has to compare the x value with the right border every pixel. So (if we have a well optimizing compiler), the following version is again a bit faster:

    whatever *vptr=vidmem+320*y1+x1;
    int xwidth = x2-x1;
    for (int y=y2-y1; y; y--) {
    for (int x=xwidth; x; x--)

    and if you remember the things I said above (considering an 8bit mode):

    char *vptr=vidmem+320*y1+x1;
    _asm {
    mov edi, [vptr]
    mov al, [color]
    mov ebx, [x2]
    sub ebx, [x1]
    mov edx, [y2]
    sub edx, [y1]
    mov ecx,ebx
    rep stosb
    sub edi, ebx
    add edi, 320
    dec edx
    jnz yloop

    (now, is this too long or too complicated or do you need too much effort for this? Or does this lead you nowhere?)

    Notice that I didn’t write EVERYTHING in asm (it isn’t necessary, as said) and that I got completely rid of the variables x,y and xwidth, as well as the whole inner loop, which I cut down to ONE asm instruction. So, this version should be about ten times faster than the first C++ one (and all you asm haters out there, tell me ONE compiler which would be able to optimize the c++ code THIS way). Ok, “rep stosb” isn’t the fastest command, there are means of making such routines still MUCH faster, but I don’t want to go into that deep level of detail now.

  • KNOW WHAT U WANT:Don’t try to code your routines as universally usable as possible, don’t think of huge data structures, hundreds of abstraction layers and all this other crap CS students become flooded with while their studies – when it comes to reality (the thing we exist in), no demo effect code will ever be reused for other purposes. If you want to code a rotozoomer, don’t code a texture mapper and think “hey, I can use the code in my engine, too” – you WON’T. So just code a rotozoomer which does what you want it to do (in this case, zoom and rotate ;) and optimize it. Your engine (which you won’t use in your next demo and you DEFINITELY won’t use in your upcoming game) will need a completely different approach for its texture mappers anyway (apart from the fact we have those nifty 3d cards in our computers. Hello scene.).

Many of you will now probably realize that this problem would not exist without optimization, as optimised routines aren’t really open to changes if you want to reuse them and recognize you have to change them here and there. But this is (at least in my completely unhumble opinion) the fate of every “true” demo coder and this is the point where you HAVE to care and HAVE to put MUCH work into your productions. Or you can just stay what people like me call lame. It’s up to you.

INCOMING MESSAGE: Hello Java coders. Thanks again for showing me that the definition of platform independence is that the code won’t run at all, independent of the platform. EOT.


["It doesn't work? That's strange, HERE it does... maybe you should buy a better PC!"]

A common practice is that demos nowadays run on exactly two computers:

1. The computer of the coder
2. The compo machine (needs to reboot after the end)

Sometimes the demo will even run on one or two other group members’ pcs, but this is rather rare. Some good examples of this were that Windows 3DFX demo at Evoke (which didn’t even run on ANY machine in it’s “final” version) and of course “Perfect Drug” by Elitegroup which was a brilliant example of code and design, but is known to run on almost no pc people have, for which reason ever (I was in the lucky situation to watch it at Dominator/Elitegroup’s PC once, but it never worked for me either).

Anyway, what causes this dilemma? There is only one answer for this – and it’s an answer we know: LAZINESS. You know Second Reality, don’t you? Of course you do. And I can surely say you watched it, if not several times. How come that demo works on every pc from 386SX16 to the latest Xeon machines and nowadays’ demos DON’T? Isn’t there anything wrong with it?

Definitely there is. People code something which SEEMS to work on their own pc, call it “demo” and send it out into the world without having tested it on ONE other computer of even realizing that their hardware dependant DOS code is what it is called: hardware dependant.

So what are the reasons why a demo only works on specific machines? And what can be done about it?

  • BUGZ ‘N MEMORY LEAKZ:Ever thought about that your code does not really work and have it working at your pc is pure LUCK? No? Then think about the following: Let’s say your effect works correctly. At least it seems so. Most probably, you have allocated plenty of memory for your tables, textures, virtual screens and all that stuff. Now, what happens if some code writes beyond the boundaries of those memory chunks? If you’re lucky, nothing – at least nothing noticeable. But in fact, with most compilers and operating system, those things destroy vital information the OS needs for heap management. And guess what MAY happen if you then try to free that memory again or allocate more of it… right. Our (Smash Designs) Demo wasn’t released at TP8 just because of THOSE bugs.

    So, watch your steps. Or better, know what you do and what your code does. If you have the time for it, a VERY good idea is to redefine malloc(), free(), new, delete and whatever you use and make it monitor what happens and how much you alloc and free. And if you don’t see the beatiful number “0″ at the end of whatever you did, you can be sure there is something wrong. And if this doesn’t suffice to track your code’s quirks, set up a so-called “memwall” – modify your malloc/new routines so that they allocate some more mem, put the desired memory chunk in the middle of the bigger area and fill the rest with a special sequence of bytes – thus, another routine can simply test if this sequence is still intact and you instantly know WHICH memory boundaries get overwritten.

    And if you then tracked down all those bugs and finally, everything works, you may realize that there is no such thing like occasional crashes ["Your Windows drivers seem to suck"].

  • THE CHOICE OF THE OS:Ok, I know, this question is widely regarded as a religious one. If someone would sell weapons to the demo scene, the Gulf War would look like a bad joke compared to the carnage which would arise just because of this rather unimportant point. So, i’ll just try to explain my thoughts from the viewpoint of a coder who desires that his demo will be seen by as many people as possible. Today, there are three possibilities: DOS, Linux and Windows. Let’s just discuss all of them.

    DOS is the, still, most often used “Operating System” for demos. But sadly, it is also most often the cause for all the problems we have, as there are no drivers for your hardware and no standard API to access it. Therefore, each demo has to support different standards to work on more than the coder’s pc. On the audio side, there are luckily libraries like MIDAS (uhm no, forget that), USMP and IMS (which I’d highly recommend, despite of it’s bitch-ness), so there are only problems with newer PCI cards which most often don’t work at all with DOS programs. On the video side, though, there is the VESA standard and almost noone manages to support it correctly (or rather noone wants to ["HERE it works..."]). Every card manufacturer brings his own quirks and bugs into the VESA standard – and the widespread UniVBE isn’t any better concerning this point. Speaking of UniVBE: What the heck is the point in requiring VBE2.0 anyway? If you use virtual screens in the computer’s main mem (which is a necessary thing anyway, i’ll come to this point later), writing banked blit routines is no problem at all. It only takes some time. If you don’t have this time, go play Quake again, you’re not worth being here. But also if you support banked modes, the VESA standard offers MANY traps for you, like modes with other x-resolution than actual words per line – and if you want to support more than ONE mode (which is quite better, do you REALLY know every graphics card supports the mode you’ve chosen?), the REAL problems just have begun.

    To come to a conclusion, as cool DOS is regarding things like stable timers and enough CPU time, the hardware and API chaos is just a mess. It takes MONTHS to get an universal VBE code working on, let’s say, 90% of all PCs. And in the actual era of PCI sound cards, you can be sure that enough people won’t be able to enjoy the cool music of your demo at all (or in rare cases in more than kewl 22khz 8bit super-duper hifi sound).

    Ok, Linux could be the answer. A free (and therefore extremely scene-friendly) OS, stable, great multitasking and well optimizing compilers… if there would be a decent standard to get your demo onto the screen. Admit it, nobody wants demos running in a small X11 window (preferably only in exactly that color depth you never use) – and svgalib only works with, uhrm, nothing, and needs to be SUID 0. Hooray. There are things like libggi etc, but as long as there is no decent standard means of accessing the graphics card’s frame buffer, Linux isn’t more than inacceptable as a demo OS (not to speak of ["you need libvgagl 100.14, kernel 2.7.444pre3, X11R6.27, TCL8.4++ and to be precise a complete sunsite image taken at December 12th, 2014AD, 5:37pm to run this demo"]).

    So, as sad as it is, and as much as it hurts to admit it, today the only REAL choice for demos is Windows. At least, it is the only OS which provides a rather nice, fast, standardized and WORKING way of bringing your code to the screen and the speakers – and this is DirectX. If you just set up a DSound primary buffer and a DirectDraw primary surface, you have exactly what you had under DOS with VESA and your sound drivers. And all that setup code, fiddling with COM interfaces, messaging etc is about two days of work – and then you have your wrapper, your FlipScreen() and everything goes on as usual, with a BIG difference: Your demo will most probably work on MORE than your hardware. Ok, there still are some quirks and inconsistencies in the DirectX API, but the problems you encounter when you want your code to run on another pc are very small compared to those you get when trying the same under DOS.

    So, Windows is unstable, bad and EVIL – but for demo coding, it’s at the moment the only choice. And once the hurt stops, think about all those nice image and sound codecs and other libraries you have in a standard Win95/98/NT which happily do all those things for you which were such a torture to code under DOS. With the right wrappers (max one week of work), you can finally concentrate on what you really want – coding effects.

  • TEST DA THANG OUT:Yes, you are one of those persons who finish a demo three minutes before the compo deadline and don’t have the time to test it on any other pc… Come on. Don’t say you coded your whole VBE code five minutes before the deadline and did all the effects at home without actually seeing them… Ok, maybe this explains how ugly some demos are, but every coder should have and in fact HAS enough time to test his production on other computers. And “other” computers are not only your Dad’s one, but also those of your real-life and scene friends, your girlfriend’s brother and (yes) eg. random people on IRC. Also, small scene meetings (do you still have them?) are an IDEAL place to test out your code and talk to others if it doesn’t work and you don’t know why.

(hm, maybe I should consider wearing one of those neato “Hello – are YOU the PC scene?” shirts at parties, too)

To come to the end, I’ll finally ask you one question: Is all this really too hard or too much work? What’s the problem with optimizing code or making your demo run on other computers than yours? Face it – it’s only your own laziness. Nothing of all this is impossible, many groups showed us that it works. So why doesn’t it work on you? Don’t you feel ashamed? You better do.

If i ever write a next issue of this article, its content will most probably be “how to make your demo look good”, covering topics from how to query the escape key ["escape key? Isn't the reset key enough?"] to DOS related ones like how to synch your effects to the retrace ["you mean, those stripes aren't necessary?"] and maybe some Win95 coding issues. We’ll see.