Architecture
IR Research Notes
Research references for improving Greybound impulse-response rendering beyond static speaker convolution.
Greybound currently treats the speaker IR as an optional post-amp convolution stage. The runtime implementation is optimized for live playing rather than offline production: a mono cabinet IR WAV, direct FIR for the first partition, and partitioned FFT convolution for the tail.
This page tracks research that may guide future IR work. These papers are not implementation requirements yet; they are references for evaluating better interpolation, reconstruction, and spatial rendering approaches.
Current Boundary
core/src/ir.rsowns the zero-latency hybrid convolver and speaker IR WAV loading.- The first 256 IR taps run as direct FIR so the direct cabinet response starts at sample 0.
- The remaining IR tail runs through partitioned FFT convolution, where the partition delay aligns naturally with the tail offset.
- IR bypass returns the dry path immediately; it no longer adds a dry compensation delay.
- Amp models should not own room, cabinet, or listener-position state.
- Dynamic or learned IR systems should produce renderable IRs or convolution-ready partitions for the speaker/room stage.
- Any neural or reconstruction model must be optional and benchmarked against the real-time budget before becoming part of the runtime path.
Design Directions
Static cabinet IR:
- Keep the current hybrid direct-head/FFT-tail convolver as the live baseline.
- Add import and normalization tooling before adding more complex rendering.
- Measure CPU, peak/RMS stability, and perceived tonal change against the current reference IR.
Sparse measured IR interpolation:
- Useful if we want cabinet, mic, or room positions to move continuously.
- Early reflections and direct sound should be handled more carefully than late diffuse tails.
- Candidate output format: a time-varying IR, a small set of aligned early components plus a late tail, or a bank of crossfaded partitions.
Physics-informed reconstruction:
- Useful when a small number of measurements should reconstruct a larger sound field.
- For Greybound, this is more likely an offline authoring or dataset-building tool than a live audio thread feature.
- The early part of the IR is the critical target because misaligned direct sound and early reflections produce obvious combing and spatial instability.
Neural fields:
- Neural IR fields are promising for source/listener-position-continuous rendering.
- They should be treated as offline or precomputed systems until inference cost, determinism, and artifact behavior are proven.
- If adopted, the runtime should consume cached IRs or compact component parameters rather than invoking a large model in the audio callback.
References
- Zitong Lan, Chenhao Zheng, Zhiwei Zheng, and Mingmin Zhao, "Acoustic Volume Rendering for Neural Impulse Response Fields", NeurIPS 2024. Introduces Acoustic Volume Rendering, adapting volume rendering to impulse-response fields with frequency-domain rendering and spherical integration. Greybound relevance: a possible long-term direction for position-continuous IR fields, but likely offline or cached for now.
- Xenofon Karakonstantis, Diego Caviedes-Nozal, Antoine Richard, and Efren Fernandez-Grande, "Room impulse response reconstruction with physics-informed deep learning", Journal of the Acoustical Society of America 155(2), 1048-1059, 2024, doi: 10.1121/10.0024750. Uses physics-informed neural networks with the wave equation and sparse measured RIRs. Greybound relevance: good candidate for offline reconstruction/validation of early room response fields.
- Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, and Chenliang Xu, "Neural Acoustic Context Field: Rendering Realistic Room Impulse Response With Neural Fields", arXiv:2309.15977, 2023. Uses geometry, material, and spatial context, plus temporal correlation and energy-decay criteria. Greybound relevance: useful design language for any future IR model that conditions on cabinet, room, mic, and listener context.
- Amal Emthyas, Annika Neidhardt, Sebastià V. Amengual Garí, and Enzo De Sena, "Spatial interpolation and extrapolation of binaural room impulse responses via system inversion", AES International Conference on Audio for Virtual and Augmented Reality, 2024. Frames BRIR interpolation/extrapolation as an inverse problem using time-domain equivalent sources. Greybound relevance: a practical path for sparse BRIR datasets if headphone/spatial monitoring becomes a target.
- Jiahong Zhao, Xiguang Zheng, Christian Ritz, and Daeyoung Jang, "Interpolating the Directional Room Impulse Response for Dynamic Spatial Audio Reproduction", Applied Sciences 12(4), 2061, 2022, doi: 10.3390/app12042061. Decomposes first-order Ambisonic directional RIRs into direct, specular, and diffuse components for interpolation. Greybound relevance: supports a component-oriented IR representation where early reflections are parameterized separately from the late tail.
Open Questions
- Should Greybound model only cabinet/mic IRs, or also room/listener spatial IRs?
- Should the IR asset format store raw samples only, or should it also store decomposed early reflections and a late tail?
- What validation fixtures should prove that interpolation does not create combing, unstable peaks, or level jumps?
- Can offline learned/reconstructed IRs be compiled into the same partitioned-convolution format used by
SpeakerStage?