Greybound
Architecture

IR Research Notes

Research references for improving Greybound impulse-response rendering beyond static speaker convolution.

Greybound currently treats the speaker IR as an optional post-amp convolution stage. The runtime implementation is optimized for live playing rather than offline production: a mono cabinet IR WAV, direct FIR for the first partition, and partitioned FFT convolution for the tail.

This page tracks research that may guide future IR work. These papers are not implementation requirements yet; they are references for evaluating better interpolation, reconstruction, and spatial rendering approaches.

Current Boundary

  • core/src/ir.rs owns the zero-latency hybrid convolver and speaker IR WAV loading.
  • The first 256 IR taps run as direct FIR so the direct cabinet response starts at sample 0.
  • The remaining IR tail runs through partitioned FFT convolution, where the partition delay aligns naturally with the tail offset.
  • IR bypass returns the dry path immediately; it no longer adds a dry compensation delay.
  • Amp models should not own room, cabinet, or listener-position state.
  • Dynamic or learned IR systems should produce renderable IRs or convolution-ready partitions for the speaker/room stage.
  • Any neural or reconstruction model must be optional and benchmarked against the real-time budget before becoming part of the runtime path.

Design Directions

Static cabinet IR:

  • Keep the current hybrid direct-head/FFT-tail convolver as the live baseline.
  • Add import and normalization tooling before adding more complex rendering.
  • Measure CPU, peak/RMS stability, and perceived tonal change against the current reference IR.

Sparse measured IR interpolation:

  • Useful if we want cabinet, mic, or room positions to move continuously.
  • Early reflections and direct sound should be handled more carefully than late diffuse tails.
  • Candidate output format: a time-varying IR, a small set of aligned early components plus a late tail, or a bank of crossfaded partitions.

Physics-informed reconstruction:

  • Useful when a small number of measurements should reconstruct a larger sound field.
  • For Greybound, this is more likely an offline authoring or dataset-building tool than a live audio thread feature.
  • The early part of the IR is the critical target because misaligned direct sound and early reflections produce obvious combing and spatial instability.

Neural fields:

  • Neural IR fields are promising for source/listener-position-continuous rendering.
  • They should be treated as offline or precomputed systems until inference cost, determinism, and artifact behavior are proven.
  • If adopted, the runtime should consume cached IRs or compact component parameters rather than invoking a large model in the audio callback.

References

Open Questions

  • Should Greybound model only cabinet/mic IRs, or also room/listener spatial IRs?
  • Should the IR asset format store raw samples only, or should it also store decomposed early reflections and a late tail?
  • What validation fixtures should prove that interpolation does not create combing, unstable peaks, or level jumps?
  • Can offline learned/reconstructed IRs be compiled into the same partitioned-convolution format used by SpeakerStage?

On this page