Guyz this 2nd post on MPEG-4 technology the first was Introduction to it . Now this one is about the High Audio Quality of MPEG.
3.1 Introduction
MPEG-4 Audio facilitates a wide variety of applications which could range from intelligible speech to high quality multichannel audio, and from natural sounds to synthesized sounds. In particular, it supports the highly efficient representation of audio objects consisting of:
3.1.1 General Audio Signals
Support for coding general audio ranging from very low bitrates up to high quality is provided by transform coding techniques. With this functionality, a wide range of bitrates and bandwidths is covered. It starts at a bitrate of 6 kbit/s and a bandwidth below 4 kHz and extends to broadcast quality audio from mono up to multichannel. High quality can be achieved with low delays. Parametric Audio Coding allows sound manipulation at low speeds. Fine Granularity Scalability (or FGS, scalability resolution down to 1 kbit/s per channel)
3.1.2 Speech signals
Speech coding can be done using bitrates from 2 kbit/s up to 24 kbit/s using the speech coding tools. Lower bitrates, such as an average of 1.2 kbit/s, are also possible when variable rate coding is allowed. Low delay is possible for communications applications. When using the HVXC tools, speed and pitch can be modified under user control during playback. If the CELP tools are used, a change of the playback speed can be achieved by using and additional tool for effects processing.
3.1.3 Synthetic Audio
MPEG-4 Structured Audio is a language to describe 'instruments' (little programs that generate sound) and 'scores' (input that drives those objects). These objects are not necessarily musical instruments, they are in essence mathematical formulae, that could generate the sound of a piano, that of falling water – or something 'unheard' in nature.
3.2 Overview of the MPEG-4 Speech Coding Tools
MPEG-4 Natural Speech Coding Tool Set provides a generic coding framework for a wide range of applications with speech signals. Its bitrate coverage spans from as low as 2 kbit/s to 23.4 kbit/s. MPEG-4 Natural Speech Coding Tool Set contains two algorithms: HVXC (Harmonic Vector eXcitation Coding) and CELP (Code Excited Linear Predictive coding). HVXC is used at a low bitrate of 2 or 4 kbit/s. Higher bitrates than 4 kbit/s in addition to 3.85 kbit/s are covered by CELP. The algorithmic delay by either of these algorithms is comparable to that of other standards for two-way communications, therefore, MPEG-4 Natural Speech Coding Tool Set is also applicable to such applications. Storage of speech data and broadcast are also promising applications of MPEG-4 Natural Speech Coding Tool Set. MPEG-4 is based on tools each of which can be combined according to the user needs. HVXC consists of LSP (line spectral pair) VQ (vector quantization) tool and harmonic VQ tool. RPE (regular pulse excitation) tool, MPE (multipulse excitation) tool, and LSP VQ tool form CELP.
MPEG-4 Natural Speech Coding Tools are illustrated in Fig. 1.
3.2.1 Functionalities of MPEG-4 Natural Speech Coding
MPEG-4 Natural Speech Coding Tools are different from other existing speech coding standards such as ITU-T G.723.1 and G.729 in the following three new functionalities: multibitrate coding (An arbitrary bitrate may be selected with a 200 bit/s step by simply changing the parameter values), bitrate scalable coding, and bandwidth scalable coding. Actually, these new functionalities characterize MPEG-4 Natural Speech Coding Tools. It should be noted that the bandwidth scalability is available only for CELP.
3.2.2 Multibitrate Coding
Multibitrate coding provides flexible bitrate selection with the same coding algorithm. It has not been available and different codecs were needed for different bitrates. In multibitrate coding, a bitrate is selected among multiple available bitrates upon establishment of a connection between the communicating parties.
3.2.3 Scalable Coding
Bitrate and bandwidth scalabilities are useful for multicast transmission. The bitrate and the bandwidth can be independently selected for each receiver by simply stripping off a part of the bitstream. Scalabilities necessitate only a single encoder to transmit the same data to multiple points connected at different rates. The encoder generates a single common bitstream by scalable coding for all the recipients instead of independent bitstreams at different bitrates.
Figure 2: Scalabilities in MPEG-4/CELP
Scalabilities include bitrate scalability and bandwidth scalability. These scalabilities reduce signal distortion or achieve better speech quality with high frequency components by adding enhancement bitstreams to the core bitstream. These enhancement bitstreams contain detailed characteristics of the input signal or components in higher frequency bands. For example, the output of Decoder A in Fig. 2 is the minimum-quality signal decoded from the 6 kbit/s core bitstream. The Decoder B output is a high-quality signal decoded from an 8 kbit/s bitstream. Decoder C provides a higher-quality signal decoded from a 12 kbit/s bitstream. On the other hand, theDecoder D output has a wider bandwidth.
3.2.4 Bitrate Scalable (BRS) Tool
A blockdiagram of the BRS tool is shown in Fig. 3. The actual signal to be encoded in the BRS tool is the residual, which is defined as the difference between the input signal and the output of the LP synthesis filter (local decode signal), supplied from the core encoder. This combination of the core encoder and the BRS tool can be considered as multistage encoding of the MPE. However, there is no feedback path for the residual in the BRS tool connected to the MPE in the core encoder. The excitation signal in the BRS tool has no in uence on the adaptive codebook in the core encoder. This guarantees that the adaptive codebook in the core decoder at any site is identical to that in the encoder (in terms of the codewords), which leads to the minimum quality degradation for the frame-by-frame bitrate change. The BRS tool adaptively controls the pulse positions so that none of them coincides with a position used in the core encoder. This adaptive pulse position control contributes to more efficient multistage encoding.
A block diagram of the BRS tool is shown in Fig. 3
3.2.5 Multipulse Excitation
The excitation signal in the bandwidth extension tool is represented by an adaptive codebook, two MPE signals. The pitch delay of the adaptive codebook is searched for from the vicinity of its estimation obtained from the narrowband pitch-delay. One of the two MPE signals (MP1) is an upsampled version of the narrowband MPE signal and the other (MP2) is an exclusive MPE signal in the bandwidth extension tool. The adaptive codebook and the gains for MP2 are vector-quantized and the gains for MP1 are scalar-quantized. These quantizations are performed to minimize the perceptually weighted error.
3.3 Scalability
(Bitstream) scalability is the ability of an audio codec to support an ordered set of bit streams which can produce a reconstructed sequence. Moreover, the codec can output useful audio when certain subsets of the bit stream are decoded. The minimum subset that can be decoded is called the base layer. The remaining bit streams in the set are called enhancement or extension layers. Depending on the size of the extension layers we talk about large step or small step (granularity) scalability. Small step scalability denotes enhancement layers of around 1 kbit/s (or smaller). Typical data rates for the extension layers in a large step scalable system are 16 kbit/s or more. Scalability in MPEG-4 natural audio largely relies on difference encoding, either in time domain or, as in the case of AAC layers, of the spectral lines (frequency domain).
3.3.1 Types of scalability in MPEG-4 natural audio
MPEG-4 natural audio allows for a large number of codec combinations for scalability. The combinations for the speech coders are described in the paragraphs explaining MPEG-4 CELP and HVXC. The following list contains the main combinations for MPEG-4 General Audio (GA):
AAC layers only
Narrow-band CELP base layer plus AAC
Twin VQ base layer plus AAC
Depending on the application, either of these possibilities can provide optimum performance. In all cases where good speech quality at low bitrates is a requirement for the case of reception of the core layer only (like for example in a digital broadcasting system using hierarchical channel coding), the speech codec base layer is preferred. If, on the other hand, music should be of reasonable qualityfor a very low bitrate core layer (for example for Internet streaming of music using scalability), the TwinVQ base layer provides the best quality. If the base layer is allowed to work at somewhat higher bitrates (like 16 bit/s or more), a system built from AAC layers only can deliver the best overall performance.
Next: MPEG-4 NATURAL VIDEO CODING
3.1 Introduction
MPEG-4 Audio facilitates a wide variety of applications which could range from intelligible speech to high quality multichannel audio, and from natural sounds to synthesized sounds. In particular, it supports the highly efficient representation of audio objects consisting of:
3.1.1 General Audio Signals
Support for coding general audio ranging from very low bitrates up to high quality is provided by transform coding techniques. With this functionality, a wide range of bitrates and bandwidths is covered. It starts at a bitrate of 6 kbit/s and a bandwidth below 4 kHz and extends to broadcast quality audio from mono up to multichannel. High quality can be achieved with low delays. Parametric Audio Coding allows sound manipulation at low speeds. Fine Granularity Scalability (or FGS, scalability resolution down to 1 kbit/s per channel)
3.1.2 Speech signals
Speech coding can be done using bitrates from 2 kbit/s up to 24 kbit/s using the speech coding tools. Lower bitrates, such as an average of 1.2 kbit/s, are also possible when variable rate coding is allowed. Low delay is possible for communications applications. When using the HVXC tools, speed and pitch can be modified under user control during playback. If the CELP tools are used, a change of the playback speed can be achieved by using and additional tool for effects processing.
3.1.3 Synthetic Audio
MPEG-4 Structured Audio is a language to describe 'instruments' (little programs that generate sound) and 'scores' (input that drives those objects). These objects are not necessarily musical instruments, they are in essence mathematical formulae, that could generate the sound of a piano, that of falling water – or something 'unheard' in nature.
3.2 Overview of the MPEG-4 Speech Coding Tools
MPEG-4 Natural Speech Coding Tool Set provides a generic coding framework for a wide range of applications with speech signals. Its bitrate coverage spans from as low as 2 kbit/s to 23.4 kbit/s. MPEG-4 Natural Speech Coding Tool Set contains two algorithms: HVXC (Harmonic Vector eXcitation Coding) and CELP (Code Excited Linear Predictive coding). HVXC is used at a low bitrate of 2 or 4 kbit/s. Higher bitrates than 4 kbit/s in addition to 3.85 kbit/s are covered by CELP. The algorithmic delay by either of these algorithms is comparable to that of other standards for two-way communications, therefore, MPEG-4 Natural Speech Coding Tool Set is also applicable to such applications. Storage of speech data and broadcast are also promising applications of MPEG-4 Natural Speech Coding Tool Set. MPEG-4 is based on tools each of which can be combined according to the user needs. HVXC consists of LSP (line spectral pair) VQ (vector quantization) tool and harmonic VQ tool. RPE (regular pulse excitation) tool, MPE (multipulse excitation) tool, and LSP VQ tool form CELP.
MPEG-4 Natural Speech Coding Tools are illustrated in Fig. 1.
3.2.1 Functionalities of MPEG-4 Natural Speech Coding
MPEG-4 Natural Speech Coding Tools are different from other existing speech coding standards such as ITU-T G.723.1 and G.729 in the following three new functionalities: multibitrate coding (An arbitrary bitrate may be selected with a 200 bit/s step by simply changing the parameter values), bitrate scalable coding, and bandwidth scalable coding. Actually, these new functionalities characterize MPEG-4 Natural Speech Coding Tools. It should be noted that the bandwidth scalability is available only for CELP.
3.2.2 Multibitrate Coding
Multibitrate coding provides flexible bitrate selection with the same coding algorithm. It has not been available and different codecs were needed for different bitrates. In multibitrate coding, a bitrate is selected among multiple available bitrates upon establishment of a connection between the communicating parties.
3.2.3 Scalable Coding
Bitrate and bandwidth scalabilities are useful for multicast transmission. The bitrate and the bandwidth can be independently selected for each receiver by simply stripping off a part of the bitstream. Scalabilities necessitate only a single encoder to transmit the same data to multiple points connected at different rates. The encoder generates a single common bitstream by scalable coding for all the recipients instead of independent bitstreams at different bitrates.
Figure 2: Scalabilities in MPEG-4/CELP
Scalabilities include bitrate scalability and bandwidth scalability. These scalabilities reduce signal distortion or achieve better speech quality with high frequency components by adding enhancement bitstreams to the core bitstream. These enhancement bitstreams contain detailed characteristics of the input signal or components in higher frequency bands. For example, the output of Decoder A in Fig. 2 is the minimum-quality signal decoded from the 6 kbit/s core bitstream. The Decoder B output is a high-quality signal decoded from an 8 kbit/s bitstream. Decoder C provides a higher-quality signal decoded from a 12 kbit/s bitstream. On the other hand, theDecoder D output has a wider bandwidth.
3.2.4 Bitrate Scalable (BRS) Tool
A blockdiagram of the BRS tool is shown in Fig. 3. The actual signal to be encoded in the BRS tool is the residual, which is defined as the difference between the input signal and the output of the LP synthesis filter (local decode signal), supplied from the core encoder. This combination of the core encoder and the BRS tool can be considered as multistage encoding of the MPE. However, there is no feedback path for the residual in the BRS tool connected to the MPE in the core encoder. The excitation signal in the BRS tool has no in uence on the adaptive codebook in the core encoder. This guarantees that the adaptive codebook in the core decoder at any site is identical to that in the encoder (in terms of the codewords), which leads to the minimum quality degradation for the frame-by-frame bitrate change. The BRS tool adaptively controls the pulse positions so that none of them coincides with a position used in the core encoder. This adaptive pulse position control contributes to more efficient multistage encoding.
A block diagram of the BRS tool is shown in Fig. 3
3.2.5 Multipulse Excitation
The excitation signal in the bandwidth extension tool is represented by an adaptive codebook, two MPE signals. The pitch delay of the adaptive codebook is searched for from the vicinity of its estimation obtained from the narrowband pitch-delay. One of the two MPE signals (MP1) is an upsampled version of the narrowband MPE signal and the other (MP2) is an exclusive MPE signal in the bandwidth extension tool. The adaptive codebook and the gains for MP2 are vector-quantized and the gains for MP1 are scalar-quantized. These quantizations are performed to minimize the perceptually weighted error.
3.3 Scalability
(Bitstream) scalability is the ability of an audio codec to support an ordered set of bit streams which can produce a reconstructed sequence. Moreover, the codec can output useful audio when certain subsets of the bit stream are decoded. The minimum subset that can be decoded is called the base layer. The remaining bit streams in the set are called enhancement or extension layers. Depending on the size of the extension layers we talk about large step or small step (granularity) scalability. Small step scalability denotes enhancement layers of around 1 kbit/s (or smaller). Typical data rates for the extension layers in a large step scalable system are 16 kbit/s or more. Scalability in MPEG-4 natural audio largely relies on difference encoding, either in time domain or, as in the case of AAC layers, of the spectral lines (frequency domain).
3.3.1 Types of scalability in MPEG-4 natural audio
MPEG-4 natural audio allows for a large number of codec combinations for scalability. The combinations for the speech coders are described in the paragraphs explaining MPEG-4 CELP and HVXC. The following list contains the main combinations for MPEG-4 General Audio (GA):
AAC layers only
Narrow-band CELP base layer plus AAC
Twin VQ base layer plus AAC
Depending on the application, either of these possibilities can provide optimum performance. In all cases where good speech quality at low bitrates is a requirement for the case of reception of the core layer only (like for example in a digital broadcasting system using hierarchical channel coding), the speech codec base layer is preferred. If, on the other hand, music should be of reasonable qualityfor a very low bitrate core layer (for example for Internet streaming of music using scalability), the TwinVQ base layer provides the best quality. If the base layer is allowed to work at somewhat higher bitrates (like 16 bit/s or more), a system built from AAC layers only can deliver the best overall performance.
Next: MPEG-4 NATURAL VIDEO CODING
0 comments:
Post a Comment
Thanks for your Valuable comment