Anti-alias filters: the invisible distortion mechanism in digital audio?

Richard Black

Independent consultant

London SE15 5UW, UK

Email: richard@musaeus.co.uk

ABSTRACT

The majority of digital recording and replay equipment includes anti-alias filters which allow a degree of aliasing in the near-ultrasonic. This is normally accepted as unimportant. However, consideration of this in conjunction with the intermodulation distortion generated in loudspeakers suggests that it may have audible effects. Simple measurements on digital equipment and on speakers are presented to support this conjecture.

0 INTRODUCTION

The advent of higher sampling rates (96kHz and above) has given new impetus to claims that the intrinsic bandwidth limitations of CD-format digital audio make the system inadequate to meet the aural criteria. However, it has also facilitated investigation of the mechanisms responsible for the perceived imperfections of current digital audio. One particular report [1] detailing an experiment whereby 96kHz-sampled material was low-pass filtered to 20kHz bandwidth with no apparent quality loss was a major motivation behind the present paper.

Most of the proposed explanations for the perceived superiority of high-sampling rate systems have centred either simply on the requirement for higher frequencies to be reproduced [2,3], or, more subtly (though coming mathematically to much the same thing in practice), on the energy dispersion effects of sharp-cutoff filters necessary to implement sampling systems having a passband extending very nearly to the Nyquist limit [4].

A paper by Julian Dunn [5] discusses three specific features of sharp-cutoff filters as implemented in the majority of digital audio recording and reproducing apparatus: pre- and post-echo due to filter passband ripple; clipping in filters; and inadequate transition/stopband rejection of such filters. The present paper deals with the third of these and its potential for generating audible intermodulation distortion in conjunction with practical (non-linear) loudspeakers.

1. ADC/DAC FILTERS

1.1 The Filtering Requirement

It is well known that in a discrete-time (sampled) system the bandwidth of the audio signal must be limited, a priori, in order to prevent the introduction of severe distortion known as ‘aliasing’ or ‘imaging’. These terms are sometimes used interchangeably: for the purpose of this paper ‘aliasing’ will be used for the production, on analogue-to-digital conversion, of spurious lower-frequency components by signals above half the sampling frequency, Fs, while ‘images’ are higher-frequency signals above Fs/2 created on digital-to-analogue conversion or sampling rate conversion. In both cases the spurious signal will generally be at a frequency of

F(spurious) = F(sampling) - F(input).

Higher-frequency aliases/images can and do occur but are much easier to deal with.

The sampling rate of CD, 44.1kHz, is only just adequate for an audio bandwidth of 20kHz. In order to realise fully the performance potential of the system, anti-alias and anti-image (or reconstruction) filters must be perfectly flat to 20kHz but attenuate signals strongly above 22.05kHz, requiring a rolloff on the order of 600dB/octave. Early generations of recording and replay equipment made a more or less good job of this with sophisticated analogue filters, but even at the outset of CD Philips-based players made use of ‘oversampling’ to upsample the 44.1kHz data stream to 176.4kHz, performing anti-image filtering digitally and relaxing the requirement for the analogue filters so that the stopband need only start at 154.35kHz.

1.2 Filtering Performance in the Real World

The vast majority of modern recording and replay equipment uses oversampling techniques, often in conjunction with low-bit modulators and decimators, thus shifting most of the work of anti-alias/image filtering to the digital domain. However, even with fairly powerful DSP readily available, implementing a suitable filter is not trivial and in practice many - perhaps the majority - of chipset and equipment manufacturers elect to relax the filter requirements and allow a degree of aliasing/imaging to occur. The common criterion for this is to allow spurious signals above the nominal resolution of the converter (e.g. 96dB for a 16-bit part) as long as the frequency of such spuriae is above 20kHz and therefore inaudible.

Dunn [op. cit.] gives sample figures for spurious rejection of commercially available ADC and DAC filters, and taking these together with other data sheets and measurements on equipment to hand, I have come to the conclusion that a ‘typical’ response is certainly flat to 20kHz and is attenuated by only 6 - 10dB at Fs/2, with the true stopband beginning at about 24kHz (assuming, as in general throughout this paper, 44.1kHz sampling). Frequently, signal attenuation at 23kHz is only of the order of 20dB. [Fig.1] shows an illustrative ‘composite’ graph of typical performance, an amalgam of measured or documented performance of several current-production ADCs, DACs and oversampling filters. Manufacturers do not always provide detailed specifications of transition-band performance but data sheets from leading manufacturers such as Crystal Semiconductor, AKM Semiconductor, NPC and Burr-Brown all quote figures or show graphs.

Whether or not it is audible, energy above 20kHz most certainly exists in real music, and even if the response of practical microphones and amplifiers is not perfectly flat above 20kHz it will not in general be greatly attenuated within a few kHz of that frequency. Boyk [6] has conducted studies of the ultrasonic spectra of musical instruments, revealing in some cases clear harmonics to well in excess of 50kHz and signals significantly above the noise floor to 100kHz. It is therefore inevitable that in an analogue-to-digital converter with a non-ideal filter, aliasing will occur. By the same token, signal components only just below Fs/2, recorded almost unattenuated by most ADCs, will give rise to images above Fs/2. Boyk’s measurements, and my own informal sampling of CDs in my collection (including a variety of musical styles and instruments), suggest that in many cases the per-unit-frequency energy around 20kHz may be as little as 30dB below that in the fundamental frequency range of 0 - 2kHz, with individual harmonic components in the region of -40dB. It can therefore be expected that aliases offset from the input frequency by a few hundred Hz will occur with amplitudes in the region of -50 to -55dB relative to peak signal level.

Similarly, on replay, images will occur with like amplitude (an image of an alias, of course, is the original signal frequency but this result, though exploited by more than one hi-fi company to market anomalous slow-rolloff anti-image filters in CD players, is of little practical value). Such signals are still ultrasonic and likely to be inaudible.

 

2. LOUDSPEAKERS

It is well known that real loudspeakers are not perfectly linear and therefore have the potential to generate both harmonic and intermodulation distortion. (Amplifiers also have this potential, although for the purposes of this paper we will assume that ‘blameless’ amplifiers are being used since their performance can and should significantly outstrip that of most loudspeakers.)

Dunn [op. cit.] suggests that simple second-order intermodulation between a signal and its image (both of which will be output by a DAC if the former is close to Fs/2) may be audible. In fact in extreme cases this will occur, but my initial tests suggested that with amplifiers and speakers of even quite modest attainment and cost this is not a major issue.

Problems arise, however, when one considers the nature of musical material likely to contain ultrasonic components. It will of necessity also contain substantial energy in the whole treble band - no acoustic instrument of any kind produces stronger harmonics above 10kHz than below, nor does any synthesiser of more than novelty value! This means that a loudspeaker exposed to spurious components above 20kHz must be reproducing relatively high-level signals, probably right back to 1kHz or below, at the same time and is therefore the more likely to produce intermodulation distortion.

A test was set up to investigate this. Two oscillators were added in a simple passive mixer and fed, via a commercial amplifier of known good distortion performance, to a selection of loudspeakers, both bare drivers and assembled chassis. Initially, one oscillator was set to a nominal 22kHz, while the other was swept over the region of 5 - 20kHz. Levels were set so that the lower frequency was presented to the loudspeakers at 2V p-p (0.7VRMS, equivalent to approx. 60mW or -12dBW), while the level of the 21kHz signal was varied in steps of a few dB, between equal level to the lower frequency and about 30dB below.

In a later, similar test two composite signals were recorded on CD so that a larger selection of loudspeakers could be tested at other sites with only a portable oscilloscope for level checking. Each signal consisted of a swept sinusoid, varied randomly between about 7 and 10kHz at constant level, plus a fixed 21kHz sinusoid at a lower level — 12dB lower in one signal and 18dB lower in the other.

It was found that for most of the conventional electrodynamic drive units tested an intermodulation tone was clearly audible when the frequency of the lower tone was just under, or just over, half that of the higher. That is, the intermodulation was third-order, of the form

F(intermod.) = 2F1 - F2

With some drive units it was also possible to hear higher order intermodulation products, for instance

3F1 - F2,

but this was not investigated further. Most obvious distortion occurred, therefore, while the lower-frequency tone was in the region of 7 - 9kHz

The levels required to produce audible intermodulation were surprisingly small and are summarised in Tables 1 and 2. Loudspeakers tested covered the range up to approx. UK£2000 per pair, while the bare drivers tested comprised a high quality tweeter, a moderate quality small bass/mid unit and a budget full-range driver of the type used in good quality car stereo systems. The tweeter in particular was tested in rather more detail and was found to give audible intermodulation when fed with 9kHz (approx.) at -12dBW and 21kHz at -47dBW or 9kHz at -18dBW and 21kHz at -44dBW. Allowing that this tweeter would commonly be found in a complete loudspeaker with about 10dB of attenuation to match its sensitivity to that of the bass driver, these levels translate to approximately -19dB and -54dB, or -25dB and -51dB respectively, referred to the maximum output of a modest 50W amplifier.

In other words, audible distortion occurred at levels of the order of those expected from aliasing and imaging in current ADCs and DACs.

 

Loudspeaker/driver

UK Price (£)

Approx. sensit-

ivity (dB/1W/1m)

Level of 22kHz signal for

just-audible IMD (VRMS)

3cm soft-dome tweeter

-

-

55mV

8cm full-range driver

-

-

<30mV

10cm bass/mid driver

-

-

35mV

‘Ribbon’ (planar) driver

-

-

>1V

2-way bookshelf speaker, metal-dome tweeter

200

88

110mV

2-way small monitor speaker, soft-dome tweeter

1,750

83

170mV

Table1: level of 22kHz signal required for just-audible IMD in presence of constant level signal of 8.5kHz ± 1.5kHz at 0.7VRMS.

 

Loudspeaker

UK Price (£)

Approx.

sensitivity (dB/1W/1m)

Level of comp-

osite signal 1 for just-audible IMD (VRMS)

Level of comp-

osite signal 2 for just-audible IMD (VRMS)

Small 2-way monitor, metal-dome tweeter

480

87

-

1.1V

Medium-size 2-way, plastic dome tweeter

1,300

94

-

0.6V

Miniature 2-way, unidentified tweeter

140

92

0.14V

0.24V

Small 2-way, soft-dome tweeter

600

85

0.7V

1.1V

Miniature 2-way, plastic dome tweeter

140

88

0.3V

0.5V

Table 2: level of two composite signals required for just-audible IMD.

Signal 1: varying frequency of 8.5kHz ± 1.5kHz plus fixed 21kHz at -12dB relative level.

Signal 2: as signal 1 but 21kHz at -18dB relative level.

 

3. DISCUSSION

Because of the frequency-differencing nature of alias/image distortion, spurious components arising will not in general be harmonically related to the input signal. Normally, music consists of harmonically related components and it therefore follows that intermodulation between these will give only other harmonics (granted that many instruments produce somewhat inexact harmonics). Intermodulation products which arise between harmonics and non-harmonic aliases will themselves be non-harmonically related and are therefore likely to be highly audible, especially if they occur in the sensitive part of the audio band (roughly, 500Hz to 5kHz).

As a simplistic example, consider a tone of fundamental 2.25kHz with strong harmonics, recorded via an aliasing ADC. On the third-order model presented above, intermodulation could occur on replay between the fourth harmonic (9kHz) and the alias of the 10th (21.6kHz) at 3.6kHz, which bears no simple harmonic relation to 2.25kHz (it is in fact about 8.1semitones above 2.25kHz).

As with any distortion, this type (which I term ‘alias-intermodulation distortion’ or AID) is subject to considerations of psychoacoustical masking, which is a major reason for not wishing to be dogmatic at this stage about its audibility or otherwise. Nevertheless, the test described above, based as it was entirely on listening for low-level distortions in the presence of relatively high-level, high-frequency sinusoids (attempts to measure the intermodulation distortion levels with a microphone carry their own baggage of microphone distortion etc.: as it was, a simple check with 9kHz and 21kHz tones replayed via separate speakers sufficed to assure that distortion was indeed a function of speakers rather than the human ear) might be expected to produce some masking of its own, making the test less sensitive than real music listening conditions may prove.

Consideration has been given above to sampling at 44.1kHz, but clearly the situation when sampling at 48kHz is not substantially different and it is indeed perfectly possible for exactly the same effect to occur, still with AID artefacts in the audio band, when sampling at 96kHz (or even 192kHz) with inadequate filter transition-band performance. Clearly, however, distortion amplitudes will be considerably lower in this case due to the rapid fall-off in musical energy above 20kHz. Sony’s DSD system will presumably suffer no such effect.

4. REMEDIES

Because filtering performance seems to be non-ideal in the majority of recording and replay equipment, it is not a sufficient criterion that ADC filter attenuation should reach some arbitrarily high figure by Fs/2: images could still escape via the DAC reconstruction filters. The ideal solution, of having all CD players also reach high attenuation by Fs/2 (and all loudspeakers produce negligible IMD!), is not realisable given the installed base of players which fail to meet that specification. In fact, in my survey of consumer players the only ones which approximated that specification were those using the Pacific Microsonics PMD100 filter: whether by coincidence or otherwise, almost all such players have been remarkably well reviewed.

It therefore appears necessary, if one wishes to avoid the possibility of AID occurring, to filter even more stringently in the ADC so that attenuation is already high by, say, (Fs/2-1kHz). Older CD players (and a few ‘maverick’ modern designs) have even more relaxed reconstruction filters with aliasing visible on an oscilloscope even when replaying a 20kHz tone, but most modern designs give little output above about 23.5kHz. In addition, it is not likely to be necessary to lower AID by many tens of dB in order to eliminate its audibility, and it is to be hoped that further research will allow figures to be put to this.

There is a third option available which can not only greatly reduce DAC imaging but also ADC aliasing in existing recordings, after the fact. This is to employ very high performance filters at the mastering stage - this of course has the great advantage that such filtering need not be done in real time and therefore can be of arbitrary complexity. A hypothetical filter with negligible attenuation at 20kHz and infinite attenuation at 20.001kHz would remove practically all aliases generated in recording (on the assumption that these meet the usual ‘no aliases below 20kHz’ criterion) and also remove any spectral components that might cause imaging on replay. It is important to note that this should be done before any noise-shaping is carried out. Remastering in this way on highly noise-shaped recordings, or those using systems like Apogee Digital’s UV22, could have unpredictable results!

It should also be pointed out that, if the conjectures of Story [4] et al. are correct, such very fast rolloff filters would almost certainly worsen energy dispersion effects even while avoiding AID. The only alternative, without changing sampling rate, would then be to use anti-alias filters which encroach somewhat on the audio band, as for instance in the CS5397 ADC (Crystal Semiconductor), where the filter attenuates by 117dB above Fs/2 but the passband (in a 44.1kHz system) extends only to 18.1kHz. [7]

5. CONCLUSION

From the above experiments and theoretical considerations, it appears perfectly possible that audible AID is in fact occurring when existing recordings are played via existing CD players and loudspeakers. Significant further research would appear to be necessary before firm estimates of the importance of AID can be made, research which may, I believe, have consequences as important for loudspeaker manufacturers as for those of digital audio equipment. However, elimination of this effect, combined with advances in better-documented areas such as linearity, jitter and noise shaping, holds out promise for a better approximation to CD’s original claims for ‘Perfect Sound, Forever’.

 

6. REFERENCES

 

[1] Bob Katz: report posted to ‘Pro-Audio’ mailing list (Internet discussion group), summer 1998

[2] T. Oohashi, E. Nishina, N. Kawai, Y. Fuwamoto and H Imai: ‘High-Frequency Sound Above the Audible Range Affects Brain Electric Activity and Sound Perception’. Preprint 3207 of the 91st AES Convention, New York, 1991.

[3] E.-J. Volker and W. Teuber: ‘The Importance of Early Sound for Recording and Reproduction — is the Quality of Digital Sound Transmission Sufficient?’ Preprint 4579 of the 103rd AES Convention, New York, 1997.

[4] M. Story: ‘A Suggested Explanation for (Some of) the Audible Differences Between High Sample Rate and Conventional Sample Rate Audio Material’. ‘White Paper’ published by dCS Ltd, UK, September 1997.

[5] J. Dunn: ‘Anti-Alias and Anti-Image Filtering: the Benefits of 96kHz Sampling Rate Formats for Those Who Cannot Hear Above 20kHz’. Preprint 4734 of the 104th AES Convention, Amsterdam, 1998.

[6] J. Boyk: ‘There’s Life Above 20kHz!’ Published at http://www.cco.caltech.edu/~boyk/spectra/spectra.htm, 1998.

[7] Crystal Semiconductor data sheet: ‘CS5396/CS5397 120dB 96kHz Audio A/D Converter’. 1997.