LongRecordingOrganizer#

class neurodent.core.LongRecordingOrganizer(base_folder_path, mode: Literal['bin', 'si', 'mne', None] = 'bin', truncate: bool | int = False, cache_policy: Literal['auto', 'always', 'force_regenerate'] = 'auto', multiprocess_mode: Literal['dask', 'serial'] = 'serial', extract_func: Callable[[...], BaseRecording] | Callable[[...], Raw] | None = None, input_type: Literal['folder', 'file', 'files'] = 'folder', file_pattern: str | None = None, manual_datetimes: datetime | list[datetime] | None = None, datetimes_are_start: bool = True, n_jobs: int = 1, **kwargs)[source]#

Bases: object

Parameters:

mode (Literal['bin', 'si', 'mne', None])
truncate (bool | int)
cache_policy (Literal['auto', 'always', 'force_regenerate'])
multiprocess_mode (Literal['dask', 'serial'])
extract_func (Callable[[...], si.BaseRecording] | Callable[[...], Raw])
input_type (Literal['folder', 'file', 'files'])
file_pattern (str)
manual_datetimes (datetime | list[datetime])
datetimes_are_start (bool)
n_jobs (int)

__init__(base_folder_path, mode: Literal['bin', 'si', 'mne', None] = 'bin', truncate: bool | int = False, cache_policy: Literal['auto', 'always', 'force_regenerate'] = 'auto', multiprocess_mode: Literal['dask', 'serial'] = 'serial', extract_func: Callable[[...], BaseRecording] | Callable[[...], Raw] | None = None, input_type: Literal['folder', 'file', 'files'] = 'folder', file_pattern: str | None = None, manual_datetimes: datetime | list[datetime] | None = None, datetimes_are_start: bool = True, n_jobs: int = 1, **kwargs)[source]#

Construct a long recording from binary files or EDF files.

Parameters:

base_folder_path (str) – Path to the base folder containing the data files.
mode (Literal['bin', 'si', 'mne', None]) – Mode to load data in. Defaults to ‘bin’.
truncate (Union[bool, int], optional) – If True, truncate data to first 10 files. If an integer, truncate data to the first n files. Defaults to False.
overwrite_rowbins (bool, optional) – If True, overwrite existing row-major binary files. Defaults to False.
multiprocess_mode (Literal['dask', 'serial'], optional) – Processing mode for parallel operations. Defaults to ‘serial’.
extract_func (Callable, optional) – Function to extract data when using ‘si’ or ‘mne’ mode. Required for those modes.
input_type (Literal['folder', 'file', 'files'], optional) – Type of input to load. Defaults to ‘folder’.
file_pattern (str, optional) – Pattern to match files when using ‘file’ or ‘files’ input type. Defaults to ‘*’.
manual_datetimes (datetime | list[datetime], optional) – Manual timestamps for the recording. For ‘bin’ mode: if datetime, used as global start/end time; if list, one timestamp per file. For ‘si’/’mne’ modes: if datetime, used as start/end of entire recording; if list, one per input file.
datetimes_are_start (bool, optional) – If True, manual_datetimes are treated as start times. If False, treated as end times. Defaults to True.
n_jobs (int, optional) – Number of jobs for MNE resampling operations. Defaults to 1 for safety. Set to -1 for automatic parallel detection, or >1 for specific job count.
**kwargs – Additional arguments passed to the data loading functions.
cache_policy (Literal['auto', 'always', 'force_regenerate'])

Raises:

ValueError – If no data files are found, if the folder contains mixed file types, or if manual time parameters are invalid.

detect_and_load_data(mode: Literal['bin', 'si', 'mne', None] = 'bin', cache_policy: Literal['auto', 'always', 'force_regenerate'] = 'auto', multiprocess_mode: Literal['dask', 'serial'] = 'serial', extract_func: Callable[[...], BaseRecording] | Callable[[...], Raw] | None = None, input_type: Literal['folder', 'file', 'files'] = 'folder', file_pattern: str | None = None, **kwargs)[source]#

Load in recording based on mode.

Parameters:

mode (Literal['bin', 'si', 'mne', None])
cache_policy (Literal['auto', 'always', 'force_regenerate'])
multiprocess_mode (Literal['dask', 'serial'])
extract_func (Callable[[...], BaseRecording] | Callable[[...], Raw] | None)
input_type (Literal['folder', 'file', 'files'])
file_pattern (str | None)

prepare_colbins_rowbins_metas()[source]#

convert_colbins_rowbins_to_rec(overwrite_rowbins: bool = False, multiprocess_mode: Literal['dask', 'serial'] = 'serial', cache_policy: Literal['auto', 'always', 'force_regenerate'] = 'auto')[source]#

Parameters:

overwrite_rowbins (bool)
multiprocess_mode (Literal['dask', 'serial'])
cache_policy (Literal['auto', 'always', 'force_regenerate'])

convert_colbins_to_rowbins(overwrite=False, multiprocess_mode: Literal['dask', 'serial'] = 'serial')[source]#

Convert column-major binary files to row-major binary files, and save them in the rowbin_folder_path.

Parameters:

overwrite (bool, optional) – If True, overwrite existing row-major binary files. Defaults to True.
multiprocess_mode (Literal['dask', 'serial'], optional) – If ‘dask’, use dask to convert the files in parallel. If ‘serial’, convert the files in serial. Defaults to ‘serial’.

convert_rowbins_to_rec(multiprocess_mode: Literal['dask', 'serial'] = 'serial', cache_policy: Literal['auto', 'always', 'force_regenerate'] = 'auto')[source]#

Convert row-major binary files to SpikeInterface Recording structure.

Parameters:

multiprocess_mode (Literal['dask', 'serial'], optional) – If ‘dask’, use dask to convert the files in parallel. If ‘serial’, convert the files in serial. Defaults to ‘serial’.
cache_policy (Literal) – Caching policy for intermediate files (default: “auto”) - “auto”: Use cached files if exist and newer than sources, regenerate with logging if missing/invalid - “always”: Use cached files if exist, raise error if missing/invalid - “force_regenerate”: Always regenerate files, overwrite existing cache

convert_file_with_si_to_recording(extract_func: Callable[[...], BaseRecording], input_type: Literal['folder', 'file', 'files'] = 'folder', file_pattern: str = '*', cache_policy: Literal['auto', 'always', 'force_regenerate'] = 'auto', **kwargs)[source]#

Create a SpikeInterface Recording from a folder, a single file, or multiple files.

This is a thin wrapper around extract_func that discovers inputs under self.base_folder_path and builds a si.BaseRecording accordingly.

Modes:

folder: Passes self.base_folder_path directly to extract_func.
file: Uses glob with file_pattern relative to self.base_folder_path. If multiple matches are found, the first match is used and a warning is issued.
files: Uses Path.glob with file_pattern under self.base_folder_path, optionally truncates via self._truncate_file_list(...), sorts the files, applies extract_func to each file, and concatenates the resulting recordings via si.concatenate_recordings.

Parameters:

extract_func (Callable[..., "si.BaseRecording"]) – Function that consumes a path (folder or file path) and returns a si.BaseRecording.
input_type (Literal['folder', 'file', 'files'], optional) – How to discover inputs. Defaults to 'folder'.
file_pattern (str, optional) – Glob pattern used when input_type is 'file' or 'files'. Defaults to '*'.
**kwargs – Additional keyword arguments forwarded to extract_func.
cache_policy (Literal['auto', 'always', 'force_regenerate'])

Side Effects:: Sets self.LongRecording to the resulting recording and initializes self.meta based on that recording’s properties.

Raises:

ValueError – If no files are found for the given file_pattern or input_type is invalid.

Parameters:

extract_func (Callable[[...], BaseRecording])
input_type (Literal['folder', 'file', 'files'])
file_pattern (str)
cache_policy (Literal['auto', 'always', 'force_regenerate'])

convert_file_with_mne_to_recording(extract_func: Callable[[...], Raw], input_type: Literal['folder', 'file', 'files'] = 'folder', file_pattern: str = '*', intermediate: Literal['edf', 'bin'] = 'edf', intermediate_name=None, cache_policy: Literal['auto', 'always', 'force_regenerate'] = 'auto', multiprocess_mode: Literal['dask', 'serial'] = 'serial', n_jobs: int | None = None, **kwargs)[source]#

Convert MNE-compatible files to SpikeInterface recording format with metadata caching.

Parameters:

extract_func (Callable) – Function that takes a file path and returns mne.io.Raw object
input_type (Literal) – Type of input - “folder”, “file”, or “files”
file_pattern (str) – Glob pattern for file matching (default: “*”)
intermediate (Literal) – Intermediate format - “edf” or “bin” (default: “edf”)
intermediate_name (str, optional) – Custom name for intermediate file
cache_policy (Literal) – Caching policy for intermediate and metadata files (default: “auto”) - “auto”: Use cached files if both data and metadata exist and cache is newer than sources, regenerate with logging if missing/invalid - “always”: Use cached files if both data and metadata exist, raise error if missing/invalid - “force_regenerate”: Always regenerate files, overwrite existing cache
multiprocess_mode (Literal) – Processing mode - “dask” or “serial” (default: “serial”)
n_jobs (int, optional) – Number of jobs for MNE resampling. If None (default), uses the instance n_jobs value. Set to -1 for automatic parallel detection, or >1 for specific job count.
**kwargs – Additional arguments passed to extract_func

Note

Creates two cache files: data file (e.g., file.edf) and metadata sidecar (e.g., file.edf.meta.json). Both files must exist for cache to be used. Metadata preserves channel names, original sampling rates, and other DDFBinaryMetadata fields across cache hits.

cleanup_rec()[source]#

get_num_fragments(fragment_len_s)[source]#

get_fragment(fragment_len_s, fragment_idx)[source]#

get_dur_fragment(fragment_len_s, fragment_idx)[source]#

get_datetime_fragment(fragment_len_s, fragment_idx)[source]#

Get the datetime for a specific fragment using the timestamp mapper.

Parameters:

fragment_len_s (float) – Length of each fragment in seconds
fragment_idx (int) – Index of the fragment to get datetime for

Returns:

The datetime corresponding to the start of the fragment

Return type:

datetime

Raises:

ValueError – If timestamp mapper is not initialized (only available in ‘bin’ mode)

convert_to_mne() → RawArray[source]#

Convert this LongRecording object to an MNE RawArray.

Returns:: The converted MNE RawArray
Return type:: mne.io.RawArray

compute_bad_channels(lof_threshold: float | None = None, limit_memory: bool = True, force_recompute: bool = False)[source]#

Compute bad channels using LOF analysis with unified score storage.

Parameters:

lof_threshold (float, optional) – Threshold for determining bad channels from LOF scores. If None, only computes/loads scores without setting bad_channel_names.
limit_memory (bool) – Whether to reduce memory usage by decimation and float16.
force_recompute (bool) – Whether to recompute LOF scores even if they exist.

apply_lof_threshold(lof_threshold: float)[source]#

Apply threshold to existing LOF scores to determine bad channels.

Parameters:: lof_threshold (float) – Threshold for determining bad channels.

get_lof_scores() → dict[source]#

Get LOF scores with channel names.

Returns:: Dictionary mapping channel names to LOF scores.
Return type:: dict

finalize_file_timestamps()[source]#: Finalize file timestamps using manual times if provided, otherwise validate CSV times.

__str__()[source]#: Return a string representation of critical long recording features.

merge(other_lro)[source]#

Merge another LRO into this one using si.concatenate_recordings.

This creates a new concatenated recording from this LRO and the other LRO. The other LRO should represent a later time period to maintain temporal order.

Parameters:

other_lro (LongRecordingOrganizer) – The LRO to merge into this one

Raises:

ValueError – If LROs are incompatible (different channels, sampling rates, etc.)
ImportError – If SpikeInterface is not available

__repr__()[source]#: Return a detailed string representation for debugging.

LongRecordingOrganizer#

This Page