Data Loading Tutorial#

This tutorial covers how to load EEG data from various formats into NeuRodent’s two main organizer classes:

  • LongRecordingOrganizer (LRO) — loads and manages a single recording (one session, one animal).

  • AnimalOrganizer (AO) — discovers and groups multiple recordings for one animal using file-path patterns, then creates LROs internally.

Most users will interact with AnimalOrganizer directly. Understanding the LRO helps when you need fine-grained control over how individual recordings are loaded.

Setup#

import csv
from pathlib import Path
import logging
from datetime import datetime

import numpy as np

from neurodent import core
from neurodent.core.discovery import DiscoveredFile, FileDiscoverer

import spikeinterface.core as si

logging.basicConfig(
    format="%(asctime)s - %(levelname)s - %(message)s",
    level=logging.INFO,
)
logger = logging.getLogger()
/home/runner/work/neurodent/neurodent/.venv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Part 1 — LongRecordingOrganizer#

What LRO Accepts as item#

The first argument (item) of LongRecordingOrganizer accepts several types, depending on your data layout:

item type

Use case

str / Path (single file)

One recording in a standard format (EDF, Intan, NWB, …)

list[str] (multiple files)

Several files to concatenate into one long recording

DiscoveredFile (single-file)

One file returned by FileDiscoverer

DiscoveredFile (multi-file)

Paired files that together form one recording (e.g. .bin + .csv)

None

When passing a pre-loaded si.BaseRecording via the recording= parameter

The mode parameter selects the backend: "si" (SpikeInterface), "mne" (MNE-Python), or None (pre-created recording).

The optional extract_func tells LRO how to read the file(s). It can be a SpikeInterface extractor name (e.g. "read_edf"), a callable, or a file-path string (e.g. "readers.py:read_custom").

1. Loading a Standard Format (EDF)#

The simplest case: point LRO to a single file and specify a built-in SpikeInterface extractor.

# Load an EDF file by passing a single path string
lro_edf = core.LongRecordingOrganizer(
    item="../../.tests/integration/data/A10/A10_recording.edf",
    mode="si",
    extract_func="read_edf",
    manual_datetimes=datetime(2023, 12, 13),
)

print(f"Sampling frequency: {lro_edf.meta.f_s} Hz")
print(f"Number of channels: {lro_edf.meta.n_channels}")
print(f"Duration: {lro_edf.LongRecording.get_total_duration():.1f} s")
Sampling frequency: 1000.0 Hz
Number of channels: 10
Duration: 5.0 s
2026-04-02 05:38:08,307 - INFO - Applying scale_to_uV to convert raw ADC data to microvolts
2026-04-02 05:38:08,308 - INFO - Recording already at target sampling rate (1000 Hz) or unable to determine, no resampling needed
2026-04-02 05:38:08,308 - INFO - Finalizing file timestamps
2026-04-02 05:38:08,309 - INFO - Using manual timestamps: 1 file end times specified
# Access the underlying SpikeInterface recording
recording = lro_edf.LongRecording

print(f"Recording type: {type(recording).__name__}")
print(f"Duration: {recording.get_total_duration():.1f} seconds")
Recording type: ScaleRecording
Duration: 5.0 seconds

2. Loading Multi-File Formats with DiscoveredFile#

Some formats pair a data file with a metadata sidecar (e.g. a .bin with a .csv). Wrap the paths in a DiscoveredFile so LRO treats them as a single recording.

A custom extract_func receives the DiscoveredFile and returns a si.BaseRecording. First, define the reader function inline:

def read_bin_csv_pair(discovered_file, **kwargs):
    """Read paired ColMajor .bin + Meta .csv files into a recording."""
    bin_path = [p for p in discovered_file.paths if p.endswith(".bin")][0]
    csv_path = [p for p in discovered_file.paths if p.endswith(".csv")][0]

    with open(csv_path) as f:
        rows = list(csv.DictReader(f))

    n_channels = len(rows)
    sampling_rate = float(rows[0]["SampleRate"])
    channel_names = [row["Label"] for row in rows]
    data = np.fromfile(bin_path, dtype=np.float32).reshape(-1, n_channels)

    return si.NumpyRecording(
        traces_list=[data],
        sampling_frequency=sampling_rate,
        channel_ids=channel_names,
    )
# Two files that together form one recording
discovered = DiscoveredFile(
    paths=(
        "../../.tests/integration/data/A10/Cage 2 A10-0_ColMajor.bin",
        "../../.tests/integration/data/A10/Cage 2 A10-0_Meta.csv",
    ),
)

# Pass the inline function as extract_func
lro_bin = core.LongRecordingOrganizer(
    item=discovered,
    mode="si",
    extract_func=read_bin_csv_pair,
    manual_datetimes=datetime(2023, 12, 13),
)

print(f"Sampling frequency: {lro_bin.meta.f_s} Hz")
print(f"Number of channels: {lro_bin.meta.n_channels}")
print(f"Channel names: {lro_bin.meta.channel_names}")
Sampling frequency: 1000.0 Hz
Number of channels: 10
Channel names: ['C-009', 'C-010', 'C-012', 'C-014', 'C-015', 'C-016', 'C-017', 'C-019', 'C-021', 'C-022']
2026-04-02 05:38:08,339 - INFO - Recording already at target sampling rate (1000 Hz) or unable to determine, no resampling needed
2026-04-02 05:38:08,339 - INFO - Finalizing file timestamps
2026-04-02 05:38:08,340 - INFO - Using manual timestamps: 1 file end times specified

File-path string alternative#

Instead of defining the reader inline, you can point to a function in a Python file using the "path/to/file.py:function_name" syntax. The repository includes a pre-packaged reader at tests/integration/readers.py:read_bin_csv_pair:

# Same result, but the reader is loaded from a file
lro_bin_from_file = core.LongRecordingOrganizer(
    item=discovered,
    mode="si",
    extract_func="../../tests/integration/readers.py:read_bin_csv_pair",
    manual_datetimes=datetime(2023, 12, 13),
)

print(f"Sampling frequency: {lro_bin_from_file.meta.f_s} Hz")
print(f"Number of channels: {lro_bin_from_file.meta.n_channels}")
Sampling frequency: 1000.0 Hz
Number of channels: 10
2026-04-02 05:38:08,350 - INFO - Recording already at target sampling rate (1000 Hz) or unable to determine, no resampling needed
2026-04-02 05:38:08,351 - INFO - Finalizing file timestamps
2026-04-02 05:38:08,351 - INFO - Using manual timestamps: 1 file end times specified

3. Other Standard Formats#

Any format supported by SpikeInterface can be loaded via mode="si" by passing the appropriate extractor name:

# Intan .rhd
lro = core.LongRecordingOrganizer(
    item="/path/to/recording.rhd",
    mode="si",
    extract_func="read_intan",
)

# NWB
lro = core.LongRecordingOrganizer(
    item="/path/to/file.nwb",
    mode="si",
    extract_func="read_nwb",
)

MNE-Python formats are available with mode="mne":

import mne

lro = core.LongRecordingOrganizer(
    item="/path/to/recording.fif",
    mode="mne",
    extract_func=mne.io.read_raw_fif,
    manual_datetimes=datetime(2023, 12, 13),
)

4. Concatenating Multiple Files#

Pass a list of paths to have LRO concatenate them in order:

lro_multi = core.LongRecordingOrganizer(
    item=["/path/to/session1.edf", "/path/to/session2.edf"],
    mode="si",
    extract_func="read_edf",
)

5. Pre-Loaded Recording Objects#

If you already have a SpikeInterface BaseRecording in memory (from any source — NumpyRecording, a loaded .nwb, custom processing, etc.), pass it directly to LRO with mode=None.

First, create the recording:

# Create a SpikeInterface recording from raw numpy data
num_channels = 8
sampling_frequency = 1000  # Hz
duration = 10  # seconds
num_samples = int(sampling_frequency * duration)

data = np.random.randn(num_samples, num_channels).astype(np.float32)

recording_custom = si.NumpyRecording(
    traces_list=[data],
    sampling_frequency=sampling_frequency,
)

channel_ids = [f"CH{i:02d}" for i in range(num_channels)]
recording_custom = recording_custom.rename_channels(new_channel_ids=channel_ids)

print(f"Recording type: {type(recording_custom).__name__}")
print(f"Duration: {recording_custom.get_total_duration():.1f} s")
Recording type: ChannelSliceRecording
Duration: 10.0 s
# Pass any si.BaseRecording directly to LRO
lro_custom = core.LongRecordingOrganizer(
    item=None,
    mode=None,
    recording=recording_custom,
)

print(f"Sampling frequency: {lro_custom.meta.f_s}")
print(f"Number of channels: {lro_custom.meta.n_channels}")
print(f"Channel names: {lro_custom.meta.channel_names}")
Sampling frequency: 1000.0
Number of channels: 8
Channel names: ['CH00', 'CH01', 'CH02', 'CH03', 'CH04', 'CH05', 'CH06', 'CH07']
2026-04-02 05:38:08,365 - INFO - Recording already at target sampling rate (1000 Hz) or unable to determine, no resampling needed

6. Inspecting Loaded Data#

Every LRO exposes a meta attribute (RecordingMetadata) with key properties:

metadata = lro_edf.meta

print(f"Recording metadata: {metadata}")
print(f"Sampling frequency: {metadata.f_s} Hz")
print(f"Number of channels: {metadata.n_channels}")
print(f"Channel names: {metadata.channel_names}")
print(f"Units: {metadata.V_units}")
print(f"Duration: {lro_edf.file_durations} seconds")
Recording metadata: <neurodent.core.core.RecordingMetadata object at 0x7fd783041d50>
Sampling frequency: 1000.0 Hz
Number of channels: 10
Channel names: ['C-009', 'C-010', 'C-012', 'C-014', 'C-015', 'C-016', 'C-017', 'C-019', 'C-021', 'C-022']
Units: µV
Duration: [5.0] seconds

Part 2 — AnimalOrganizer#

In practice you rarely create LROs yourself. Instead, AnimalOrganizer discovers recordings for an animal automatically using patterns — format strings with placeholders that match parts of the file path.

Pattern Placeholders#

Placeholder

Meaning

{animal}

Animal identifier (e.g. A10, F22)

{session}

Session or day folder (e.g. day1, 2023-12-13)

{index}

File index within a session (when multiple files per session)

*

Standard glob wildcard — matches any characters

Example: "/data/{animal}/{session}/*.edf" matches files like /data/A10/day1/recording.edf and extracts animal="A10", session="day1".

Key Constructor Parameters#

AO internally uses FileDiscoverer to match files, groups them by session, and creates one LRO per session:

Parameter

Description

pattern

A single pattern string, or a list of patterns for multi-file formats

animal_id

Filter discoveries to one animal

skip_sessions

Glob patterns for sessions to exclude

truncate

Limit the number of sessions loaded

assume_from_number

Parse channel aliases from channel-name numbers

lro_kwargs

Dict of arguments forwarded to each LongRecordingOrganizer

7. FileDiscoverer — Finding Recordings#

FileDiscoverer scans the filesystem using placeholder patterns and returns DiscoveredFile objects. For multi-file formats, pass a list of patterns; files that share the same placeholder values are grouped automatically.

# Discover all bin/csv pairs under .tests/integration/data/
discoverer = FileDiscoverer([
    "../../.tests/integration/data/{animal}/*_ColMajor.bin",
    "../../.tests/integration/data/{animal}/*_Meta.csv",
])
discovered_files = discoverer.discover()

for f in discovered_files:
    print(f"Animal {f.metadata['animal']}: {[Path(p).name for p in f.paths]}")
Animal A10: ['Cage 2 A10-0_ColMajor.bin', 'Cage 2 A10-0_Meta.csv']
Animal F22: ['Cage 3 F22-0_ColMajor.bin', 'Cage 3 F22-0_Meta.csv']

8. Passing Patterns to AnimalOrganizer#

The same pattern syntax goes straight into AnimalOrganizer. The AO runs FileDiscoverer internally, groups the results by session, and builds LROs.

For example, the patterns below discover the Cage 2 A10-0_ColMajor.bin and Cage 2 A10-0_Meta.csv files under .tests/integration/data/A10/ and group them into a single-session LRO:

from neurodent import visualization

# Multi-file pattern for paired bin/csv data
ao = visualization.AnimalOrganizer(
    pattern=[
        "../../.tests/integration/data/{animal}/*_ColMajor.bin",
        "../../.tests/integration/data/{animal}/*_Meta.csv",
    ],
    animal_id="A10",
    assume_from_number=True,
    lro_kwargs={
        "mode": "si",
        "extract_func": "../../tests/integration/readers.py:read_bin_csv_pair",
        "manual_datetimes": datetime(2023, 12, 13),
    },
)

print(f"Animal Organizer created for {ao.animal_id}")
Animal Organizer created for A10
2026-04-02 05:38:08,385 - INFO - Processing manual_datetimes configuration
2026-04-02 05:38:08,385 - INFO - Processing global manual datetimes starting at 2023-12-13 00:00:00
2026-04-02 05:38:08,386 - INFO - Computing continuous timeline for 1 animaldays (1 total items) starting at 2023-12-13 00:00:00
2026-04-02 05:38:08,386 - INFO - Ordered items for timeline: ['Cage 2 A10-0_ColMajor.bin...']
2026-04-02 05:38:08,391 - INFO - Recording already at target sampling rate (1000 Hz) or unable to determine, no resampling needed
2026-04-02 05:38:08,391 - INFO - Finalizing file timestamps
2026-04-02 05:38:08,392 - INFO - Using manual timestamps: 1 file end times specified
2026-04-02 05:38:08,392 - INFO - Item Cage 2 A10-0_ColMajor.bin...: duration = 120.4s (loaded with manual timestamp)
2026-04-02 05:38:08,393 - INFO - Timeline computed: 1 items, total duration 120.4s
2026-04-02 05:38:08,395 - INFO - Recording already at target sampling rate (1000 Hz) or unable to determine, no resampling needed
2026-04-02 05:38:08,396 - INFO - Finalizing file timestamps
2026-04-02 05:38:08,396 - INFO - Using manual timestamps: 1 file end times specified
2026-04-02 05:38:08,397 - INFO - AnimalOrganizer Timeline Summary:
LRO 0: 2023-12-13 00:00:00 -> 2023-12-13 00:02:00.360000 (duration: 120.4s, items: 1, item: Cage 2 A10-0_ColMajor.bin...)

For single-file formats the pattern is just a string:

ao = visualization.AnimalOrganizer(
    pattern="/data/{animal}/{session}/*.edf",
    animal_id="A10",
    lro_kwargs={"mode": "si", "extract_func": "read_edf"},
)

Summary#

In this tutorial, you learned:

  1. The different item types accepted by LongRecordingOrganizer (single path, list, DiscoveredFile, in-memory recording)

  2. How to load standard formats (EDF, Intan, NWB, MNE) and custom multi-file formats

  3. How to inspect loaded data via the meta attribute

  4. How FileDiscoverer finds and pairs recordings using placeholder patterns

  5. How AnimalOrganizer wraps discovery + LRO creation into a single step

Next Steps#