WorldForge Audio Whitepaper

Here's my conclusion. The prefered music format is one of the MOD derivatives, i.e. XM/S3M/IT. The prefered decoder library is ModPlug, with all but one of its global effects turned off.

Here's my reasoning, with my detailed conclusion at the end.

Here is some hard data to evaluate the merits of the various music formats. These benchmarks were recorded on a Dual Pentium II 300, with 128 megs of RAM, running kernel 2.2.16 compiled natively. The software used for benchmarking is the tcsh implementation of the "time" command. The software used for playback was the Mandrake RPM of xmms v1.2.2, the Mandrake RPM of the xmms mikmod plugin v1.2.2, an i386 RPM of the xmms modplug plugin v1.3a, and the i386 binary of the xmms Vorbis plugin Beta 2 release.

The song used to test is Purple Motion's release of his parts of the Second Reality soundtrack, which is an 8 channel ScreamTracker 3 file, nominally 6:42 minutes long. It uses 41 of 45 instrument slots. 73 out of 84 patterns are unique. 4 patterns are unused. 12 effects are used. It was chosen as being a well written, middle of the road piece, in terms of instruments and effects. The MP3 version was encoded at a datarate of 128 kilobits per second using BladeEnc 0.91. The Vorbis version was encoded in mode 2, which is an approximate datarate of 128 kilobits per second.

The mikmod plugin was configured to output 16 bit stereo 44 kHz sound, with surround mixing and interpolation turned on.

The modplug plugin was configured two different ways. Both configurations were set to output 16 bit stereo 44 kHz sound. One configuration enabled all features, which are oversampling, noise reduction, volume ramping, fast playlist info, reverb, bass boost, and surround. The last three of those were enabled with default settings. The other configuration enabled only oversampling, disabling all others.

The MPEG Audio Layer III decoder used was the MPG123 decoder v1.2.2 which ships with xmms, configured to output 16 bit stereo 44 kHz sound.

The Ogg Vorbis decoder used was the Beta 2 decoder available from the Vorbis developers.

The WAV decoder used was the Wave Player v1.2.2 which ships with xmms, playing back a 16 bit stereo 44 kHz file. It has no settings of its own.

Here is the raw output of the time command:

ModPlug All Features 48.400u 8.240s 6:50.43 13.8% 0+0k 0+0io 1682pf+0w
ModPlug One Feature 32.900u 8.170s 6:50.42 10.0% 0+0k 0+0io 1684pf+0w
MikMod 23.750u 7.140s 6:51.94 7.4% 0+0k 0+0io 1532pf+0w
MPG123 40.130u 8.900s 6:56.51 11.7% 0+0k 0+0io 3538pf+0w
Ogg Vorbis 117.200u 12.800s 6:53.99 31.4% 0+0k 0+0io 2135pf+0w
WAV 18.560u 9.770s 6:54.19 6.8% 0+0k 0+0io 18990pf+0w

This is the default output format which is: %Uu %Ss %E %P %X+%Dk %I+%Oio %Fpf+%Ww

The number we're most interested in is the first one, which is the time the process spent in user mode in cpu seconds. The raw number is not the precise amount of real CPU time required to do the decoding. It includes in it a nontrivial amount of xmms overhead. Actual CPU time required for the bare decoder libraries would be less. That extra time is irrelevant for purposes of comparison, since all of the decoders suffered equally.

The following matrix shows the percentage difference in CPU time between the various decoders. The abbreviations correspond to the list above. This table is read: (vertical label) uses (table value) (more/less) CPU than (horizontal label). If the sign is negative, it's less, otherwise it's more.

MAFMOFMIKMP3OGGWAV
MAF 0.00% 32.02% 50.93% 17.11% -142.15% 61.65%
MOF -47.11% 0.00% 27.81% -21.95% -256.23% 43.59%
MIK -103.79% -38.53% 0.00% -68.93% -393.47% 21.85%
MP3 -20.64% 18.00% 40.80% 0.00% -192.12% 53.74%
OGG 58.70% 71.93% 79.74% 65.77% 0.00% 84.16%
WAV -160.78% -77.26% -27.96% -116.16% -531.47% 0.00%

Reading from this chart, we see that libmikmod requires considerably less CPU time than anything except WAV playback. We also see that the Vorbis format requires far and away the most CPU time. The most salient point we see is the fact that modplug requires more CPU time than MP3 decoding if all of its effects are turned on, but requires 21% less if only the one useful effect is enabled.

One more set of statistics. Here are the filesizes of each of the files used in testing.

-rw------- 1 dragonm dragonm 71427740 Sep 22 20:05 Unreal ][ \ PM.wav
-rw------- 1 dragonm dragonm 6478786 Sep 22 20:16 Unreal ][ \ PM.mp3
-rw------- 1 dragonm dragonm 6097719 Sep 23 16:14 Unreal ][ \ PM.ogg
-rw------- 1 dragonm dragonm 626656 Aug 20 16:16 2ND_PM.S3M

MP3 and OGG files are essentially the same size, for our purposes. A WAV is of course huge. And an S3M is an order of magnitude smaller than the MP3 of itself.

Some additional discussion. ModPlug has some "features" enabled by default which are either uncessary or actually detrimental to the playback of the majority of MOD-derivative files. Noise reduction, volume ramping, and fast playlist info are all unnecessary. Reverb, bass boost, and pseudo-surround are detrimental. Reverb is apparently a misguided attempt to simulate a concert hall. It's irritating, at the least, and typically not what the composer intended. Bass boost is for use in your lowrider. It has no use in your living room, and it's also not what the composer intended. Pseudo-surround can be helpful for oldstyle files in the original MOD format, which was not stereo. In modern formats, it again distorts what the composer intended. Nearly all of the discussed mis-features distort what is being played back, sometimes radically altering the composer's intention. The Praxis Hyperspace Woosh sample in Skaven's part of the Second Reality soundtrack contains Dolby Surround data, which is utterly destroyed by ModPlug's helpful mis-features. Given these problems, and the fact that enabling these mis-features heavily impacts CPU time required for decoding, I say disable them, and do not provide the ability to enable them.

One format which hasn't been mentioned is MIDI. Unfortunately I didn't have available a MIDI version of the song used in testing, so that format was left out of the statistics. However, it is well known that MIDI requires extremely low CPU time, and is an extremely compact format. The one major disadvantage I see with the format is the fact that the composer can't know how his efforts are going to sound, since that depends entirely on the hardware being used to synthesize the song. A modern soundcard with a heavily tweaked patch set can sound like a fine orchestra. My Soundblaster 1.5 card in my 486 can sound rather less fine.

One final note. From a programming perspective, MOD-derivative formats have a distinct advantage. The technique of bridging from the middle of a MOD into the middle of another one I described in an e-mail on the media mailing list works very well and very easily. The technique was used somewhat primitively in the PC game "Pinball Fantasies." We can do at least as good, and with some effort on the part of composers, quite a bit better.

Here's the executive summary, complete with bullet points:

In summary, I choose ModPlug with only Oversampling enabled because S3M/XM/IT have the advantage in compact filesize, low CPU usage, ease of achieving game programming goals, and faithful reproduction of the composer's intent (with some care in decoder settings).

Appendix:

Package names, as installed:
modplug-xmms-1.3a-1
xmms-1.2.2-4mdk
xmms-mikmod-1.2.2-4mdk