Skip to content

CUDA kernel

gammatone_kernel

Custom CUDA kernel for batched IIR (SOS) filtering.

Applies a different SOS filter to the same input signal on every channel in a single kernel launch, replacing a Python-level loop of N independent cupyx.scipy.signal.sosfilt calls.

Falls back gracefully: is_available() returns False whenever CuPy or nvrtc cannot be used, and callers should keep the original loop as a fallback path.

is_available

is_available() -> bool

Return True iff CuPy + nvrtc can compile this kernel on this machine.

Triggers a one-time JIT compile of a tiny stub kernel; subsequent calls hit the kernel cache and are essentially free.

RETURNS DESCRIPTION
bool

True if the kernel can be used. False on CPU-only systems or when nvrtc is missing.

batched_sosfilt

batched_sosfilt(sos: 'cp.ndarray', x: 'cp.ndarray', gain: float = 1.0, out: Optional['cp.ndarray'] = None, precision: str = 'float64') -> 'cp.ndarray'

Apply a per-channel SOS cascade to the same input in one kernel launch.

Equivalent to a loop of cupyx.scipy.signal.sosfilt calls, each with its own SOS coefficients but a shared input signal, fused into a single CUDA kernel.

PARAMETER DESCRIPTION
sos

SOS coefficients of shape (n_channels, n_sections, 6). Dtype must match precision (float64 for "float64", float32 for "float32").

TYPE: ndarray

x

Shared 1D input signal of shape (n_samples,). Must be float32.

TYPE: ndarray

gain

Per-channel scalar applied to the output before write-back.

TYPE: float DEFAULT: 1.0

out

Pre-allocated output buffer of shape (n_channels, n_samples) and dtype float32. Allocated by the caller when reused across many invocations to avoid per-call allocation.

TYPE: ndarray DEFAULT: None

precision

Internal compute precision. "float32" is ~8x faster on consumer Ampere (3090ti, Jetson Orin) where FP64 throughput is throttled to 1/64 of FP32. The float32 path matches scipy to ~1e-3 worst-case relative error; the float64 path to ~1e-9.

TYPE: ('float64', 'float32') DEFAULT: "float64"

RETURNS DESCRIPTION
ndarray

Filtered output of shape (n_channels, n_samples), dtype float32. If out was provided, returns that same buffer.

RAISES DESCRIPTION
RuntimeError

If CuPy is not installed.

ValueError

If precision is not one of the supported values.

TypeError

If sos or x does not match the expected dtype.