Skip to content

Data structures

structs

Data classes for spectrogram and RSF representations.

Spectrogram dataclass

Spectrogram(data: ndarray, times: ndarray, freqs: ndarray, sr: int)

Auditory spectrogram with frequency and time axes.

PARAMETER DESCRIPTION
data

Spectrogram array of shape (n_freq, n_time). May be a numpy or cupy array depending on the backend that produced it.

TYPE: ndarray

times

Time axis in seconds, length n_time.

TYPE: ndarray

freqs

Center frequencies in Hz, length n_freq.

TYPE: ndarray

sr

Sample rate (Hz) of the original audio.

TYPE: int

shape property

shape: tuple

Shape of data as (n_freq, n_time).

n_freqs property

n_freqs: int

Number of frequency channels.

n_times property

n_times: int

Number of time frames.

duration property

duration: float

Total duration in seconds (last time minus first; 0.0 if a single frame).

to_numpy

to_numpy() -> ndarray

Return data as a numpy array, copying from the GPU if needed.

RETURNS DESCRIPTION
ndarray

Host-side copy of the spectrogram data.

RSF dataclass

RSF(data: ndarray, times: ndarray, rates: ndarray, scales: ndarray, freqs: ndarray)

Rate-Scale-Frequency representation produced by the Gabor stage.

PARAMETER DESCRIPTION
data

RSF array of shape (n_frames, n_rates, n_scales, n_freq). May be numpy or cupy.

TYPE: ndarray

times

Frame center times in seconds, length n_frames.

TYPE: ndarray

rates

Temporal modulation rates in Hz, length n_rates. The first half is negative (upward sweeps), the second half is positive (downward sweeps).

TYPE: ndarray

scales

Spectral modulation scales in cycles/octave, length n_scales.

TYPE: ndarray

freqs

Center frequencies in Hz, length n_freq.

TYPE: ndarray

shape property

shape: tuple

Shape of data as (n_frames, n_rates, n_scales, n_freq).

n_frames property

n_frames: int

Number of RSF frames along the time axis.

n_rates property

n_rates: int

Number of temporal modulation rates.

n_scales property

n_scales: int

Number of spectral modulation scales.

n_freqs property

n_freqs: int

Number of frequency channels.

to_numpy

to_numpy() -> ndarray

Return data as a numpy array, copying from the GPU if needed.

RETURNS DESCRIPTION
ndarray

Host-side copy of the RSF data.

mean_over_time

mean_over_time()

Average across the time/frame axis.

RETURNS DESCRIPTION
ndarray

Array of shape (n_rates, n_scales, n_freq).

mean_over_freq

mean_over_freq()

Average across the frequency axis.

RETURNS DESCRIPTION
ndarray

Array of shape (n_frames, n_rates, n_scales).

upward_rates

upward_rates() -> ndarray

Negative-rate half of the rates axis (upward-sweeping ripples).

RETURNS DESCRIPTION
ndarray

First half of rates, length n_rates // 2.

downward_rates

downward_rates() -> ndarray

Positive-rate half of the rates axis (downward-sweeping ripples).

RETURNS DESCRIPTION
ndarray

Second half of rates, length n_rates // 2.

rate_scale_matrix

rate_scale_matrix(fold: bool = False)

Reduce the RSF to a 2D scale-by-rate matrix.

PARAMETER DESCRIPTION
fold

If True, average the upward and downward halves into a single symmetric matrix, then mirror it back to full width so the output shape is preserved.

TYPE: bool DEFAULT: False

RETURNS DESCRIPTION
ndarray

Matrix of shape (n_scales, n_rates).

rate_scale_matrix_split

rate_scale_matrix_split()

Reduce the RSF to two scale-by-rate matrices, by sweep direction.

RETURNS DESCRIPTION
tuple of np.ndarray

(upward, downward), each of shape (n_scales, n_rates // 2).