FFmpeg 9.1 AAC Encoder Rewrite
FFmpeg 9.1 AAC Encoder Rewrite
FFmpeg 9.1 introduces a completely rewritten native AAC encoder designed to replace the previous suboptimal native implementation. The new encoder achieves superior audio quality across various bitrates, outperforming fdk-aac and Apple's qaac in metrics such as Google's Zimtohrli and ViSQOL.
Technical Architecture and Optimizations
The new encoder is a ground-up rewrite of rate control, Rate-Distortion Optimization (RDO), and all core coding tools. Key technical improvements include:
- Full RDO Integration: All coding tools—including Perceptual Noise Substitution (PNS), Temporal Noise Shaping (TNS), Intensity Stereo (I/S), and Mid/Side (M/S) coding—are integrated into the RDO loop. The encoder avoids arbitrary bitrate cutoffs or heuristics, utilizing a tool if it provides the best quality for the given bit budget.
- Perceptual Optimization: Unlike some competitors that rely on simple band energy curves, the new encoder uses masked band energy for RDO to better allocate bits to audible frequencies.
- Strict CBR Implementation: The encoder is optimized for Constant Bit Rate (CBR), as a fixed bit budget target significantly improves coding efficiency. The developer recommends against using
-q:a(real VBR mode). - Strategic Band Zeroing: The encoder intentionally leaves "holes" in spectrograms by zeroing or PNS-ing masked bands. This design choice prioritizes the high-quality coding of audible bands over the attempt to code all bands poorly.
Performance Benchmarks
According to the developer, the new encoder (specifically the nmr coder) outperforms fdk-aac and Apple's encoder across most bitrates when measured by Zimtohrli and ViSQOL (where lower Zim and higher ViS are better).
| Bitrate (kbps) | 8.1 fast | 8.1 twoloop | nmr | fdk-aac | apple | libopus | | :--- | :--- | :--- | :--- | :--- | | 64 | 0.01315 / 2.65 | 0.00696 / 3.24 | 0.00309 / 3.83 | 0.00322 / 3.69 | 0.00612 / 3.29 | 0.00100 / 4.59 | | 96 | 0.00338 / 3.77 | 0.00268 / 3.99 | 0.00134 / 4.04 | 0.00153 / 3.98 | 0.00175 / 3.87 | 0.00039 / 4.62 | | 128 | 0.00229 / 4.10 | 0.00170 / 4.28 | 0.00072 / 4.47 | 0.00143 / 4.27 | 0.00081 / 4.44 | 0.00020 / 4.68 | | 160 | 0.00129 / 4.30 | 0.00108 / 4.44 | 0.00051 / 4.56 | 0.00065 / 4.31 | 0.00117 / 4.51 | 0.00084 / 4.68 | | 256 | 0.00105 / 4.41 | 0.00121 / 4.55 | 0.00031 / 4.61 | 0.00103 / 4.45 | 0.00067 / 4.63 | 0.00002 / 4.73 |
Note: Zim/ViS values are listed. Opus remains the top performer overall.
Usage Guidelines and Constraints
To achieve the best results with the new encoder, users should follow these specific configurations:
- Sample Rate: The encoder is primarily optimized for 48kHz audio. While 44.1kHz and 96kHz are supported, 48kHz is recommended for maximum quality.
- Downmixing: If the output is expected to be downmixed, users should use
-aac_is 0 -aac_pns 0to preserve the original signal phase. - Bandwidth Cutoff: The developer has suggested reducing bandwidth to 16kHz for 128kbps and 18kHz for 160kbps+ using the
-cutoffflag.
Community Feedback and Known Issues
Early user testing on HydrogenAudio has highlighted several points of interest:
- General Quality: Users report a "sublime listen" at 134kbps and 200kbps, noting a significant improvement over the old native encoder's "chirping" artifacts.
- TNS Ticking: One user reported a "ticking sound" at high bitrates (192kbps) that disappeared when TNS was disabled (
-aac_tns 0). The developer suggested this may be due to overly aggressive TNS settings inlibavcodec/aacenc_tns.c. - Stereo Image at Low Bitrates: At 64kbps stereo, some users found the new encoder to be more "smeary" or "metallic" than the old encoder, though the developer noted that 64kbps is generally too low for high-quality stereo AAC.
Command Line Examples
For users transitioning to the new encoder, the following commands are recommended:
# Standard high-quality encode
ffmpeg -i input.flac -map 0:0 -c:a aac -b:a 128000 output.m4a
# Encode while disabling Intensity Stereo to maintain phase
ffmpeg -i input.flac -map 0:0 -c:a aac -aac_is 0 -b:a 128000 output.m4a
# Encode while disabling PNS to maintain phase
ffmpeg -i input.flac -map 0:0 -c:a aac -aac_pns 0 -b:a 128000 output.m4a