Data Loading Tutorial#
This tutorial covers how to load EEG data from various formats into NeuRodent’s two main organizer classes:
LongRecordingOrganizer(LRO) — loads and manages a single recording (one session, one animal).AnimalOrganizer(AO) — discovers and groups multiple recordings for one animal using file-path patterns, then creates LROs internally.
Most users will interact with AnimalOrganizer directly. Understanding
the LRO helps when you need fine-grained control over how individual
recordings are loaded.
Setup#
import csv
from pathlib import Path
import logging
from datetime import datetime
import numpy as np
from neurodent import core
from neurodent.core.discovery import DiscoveredFile, FileDiscoverer
import spikeinterface.core as si
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(message)s",
level=logging.INFO,
)
logger = logging.getLogger()
/home/runner/work/neurodent/neurodent/.venv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Part 1 — LongRecordingOrganizer#
What LRO Accepts as item#
The first argument (item) of LongRecordingOrganizer accepts several
types, depending on your data layout:
|
Use case |
|---|---|
|
One recording in a standard format (EDF, Intan, NWB, …) |
|
Several files to concatenate into one long recording |
|
One file returned by |
|
Paired files that together form one recording (e.g. |
|
When passing a pre-loaded |
The mode parameter selects the backend: "si" (SpikeInterface),
"mne" (MNE-Python), or None (pre-created recording).
The optional extract_func tells LRO how to read the file(s). It can
be a SpikeInterface extractor name (e.g. "read_edf"), a callable, or
a file-path string (e.g. "readers.py:read_custom").
1. Loading a Standard Format (EDF)#
The simplest case: point LRO to a single file and specify a built-in SpikeInterface extractor.
# Load an EDF file by passing a single path string
lro_edf = core.LongRecordingOrganizer(
item="../../.tests/integration/data/A10/A10_recording.edf",
mode="si",
extract_func="read_edf",
manual_datetimes=datetime(2023, 12, 13),
)
print(f"Sampling frequency: {lro_edf.meta.f_s} Hz")
print(f"Number of channels: {lro_edf.meta.n_channels}")
print(f"Duration: {lro_edf.LongRecording.get_total_duration():.1f} s")
Sampling frequency: 1000.0 Hz
Number of channels: 10
Duration: 5.0 s
2026-04-02 05:38:08,307 - INFO - Applying scale_to_uV to convert raw ADC data to microvolts
2026-04-02 05:38:08,308 - INFO - Recording already at target sampling rate (1000 Hz) or unable to determine, no resampling needed
2026-04-02 05:38:08,308 - INFO - Finalizing file timestamps
2026-04-02 05:38:08,309 - INFO - Using manual timestamps: 1 file end times specified
# Access the underlying SpikeInterface recording
recording = lro_edf.LongRecording
print(f"Recording type: {type(recording).__name__}")
print(f"Duration: {recording.get_total_duration():.1f} seconds")
Recording type: ScaleRecording
Duration: 5.0 seconds
2. Loading Multi-File Formats with DiscoveredFile#
Some formats pair a data file with a metadata sidecar (e.g. a .bin
with a .csv). Wrap the paths in a DiscoveredFile so LRO treats
them as a single recording.
A custom extract_func receives the DiscoveredFile and returns a
si.BaseRecording. First, define the reader function inline:
def read_bin_csv_pair(discovered_file, **kwargs):
"""Read paired ColMajor .bin + Meta .csv files into a recording."""
bin_path = [p for p in discovered_file.paths if p.endswith(".bin")][0]
csv_path = [p for p in discovered_file.paths if p.endswith(".csv")][0]
with open(csv_path) as f:
rows = list(csv.DictReader(f))
n_channels = len(rows)
sampling_rate = float(rows[0]["SampleRate"])
channel_names = [row["Label"] for row in rows]
data = np.fromfile(bin_path, dtype=np.float32).reshape(-1, n_channels)
return si.NumpyRecording(
traces_list=[data],
sampling_frequency=sampling_rate,
channel_ids=channel_names,
)
# Two files that together form one recording
discovered = DiscoveredFile(
paths=(
"../../.tests/integration/data/A10/Cage 2 A10-0_ColMajor.bin",
"../../.tests/integration/data/A10/Cage 2 A10-0_Meta.csv",
),
)
# Pass the inline function as extract_func
lro_bin = core.LongRecordingOrganizer(
item=discovered,
mode="si",
extract_func=read_bin_csv_pair,
manual_datetimes=datetime(2023, 12, 13),
)
print(f"Sampling frequency: {lro_bin.meta.f_s} Hz")
print(f"Number of channels: {lro_bin.meta.n_channels}")
print(f"Channel names: {lro_bin.meta.channel_names}")
Sampling frequency: 1000.0 Hz
Number of channels: 10
Channel names: ['C-009', 'C-010', 'C-012', 'C-014', 'C-015', 'C-016', 'C-017', 'C-019', 'C-021', 'C-022']
2026-04-02 05:38:08,339 - INFO - Recording already at target sampling rate (1000 Hz) or unable to determine, no resampling needed
2026-04-02 05:38:08,339 - INFO - Finalizing file timestamps
2026-04-02 05:38:08,340 - INFO - Using manual timestamps: 1 file end times specified
File-path string alternative#
Instead of defining the reader inline, you can point to a function in a
Python file using the "path/to/file.py:function_name" syntax. The
repository includes a pre-packaged reader at
tests/integration/readers.py:read_bin_csv_pair:
# Same result, but the reader is loaded from a file
lro_bin_from_file = core.LongRecordingOrganizer(
item=discovered,
mode="si",
extract_func="../../tests/integration/readers.py:read_bin_csv_pair",
manual_datetimes=datetime(2023, 12, 13),
)
print(f"Sampling frequency: {lro_bin_from_file.meta.f_s} Hz")
print(f"Number of channels: {lro_bin_from_file.meta.n_channels}")
Sampling frequency: 1000.0 Hz
Number of channels: 10
2026-04-02 05:38:08,350 - INFO - Recording already at target sampling rate (1000 Hz) or unable to determine, no resampling needed
2026-04-02 05:38:08,351 - INFO - Finalizing file timestamps
2026-04-02 05:38:08,351 - INFO - Using manual timestamps: 1 file end times specified
3. Other Standard Formats#
Any format supported by SpikeInterface can be loaded via mode="si" by
passing the appropriate extractor name:
# Intan .rhd
lro = core.LongRecordingOrganizer(
item="/path/to/recording.rhd",
mode="si",
extract_func="read_intan",
)
# NWB
lro = core.LongRecordingOrganizer(
item="/path/to/file.nwb",
mode="si",
extract_func="read_nwb",
)
MNE-Python formats are available with mode="mne":
import mne
lro = core.LongRecordingOrganizer(
item="/path/to/recording.fif",
mode="mne",
extract_func=mne.io.read_raw_fif,
manual_datetimes=datetime(2023, 12, 13),
)
4. Concatenating Multiple Files#
Pass a list of paths to have LRO concatenate them in order:
lro_multi = core.LongRecordingOrganizer(
item=["/path/to/session1.edf", "/path/to/session2.edf"],
mode="si",
extract_func="read_edf",
)
5. Pre-Loaded Recording Objects#
If you already have a SpikeInterface BaseRecording in memory (from
any source — NumpyRecording, a loaded .nwb, custom processing, etc.),
pass it directly to LRO with mode=None.
First, create the recording:
# Create a SpikeInterface recording from raw numpy data
num_channels = 8
sampling_frequency = 1000 # Hz
duration = 10 # seconds
num_samples = int(sampling_frequency * duration)
data = np.random.randn(num_samples, num_channels).astype(np.float32)
recording_custom = si.NumpyRecording(
traces_list=[data],
sampling_frequency=sampling_frequency,
)
channel_ids = [f"CH{i:02d}" for i in range(num_channels)]
recording_custom = recording_custom.rename_channels(new_channel_ids=channel_ids)
print(f"Recording type: {type(recording_custom).__name__}")
print(f"Duration: {recording_custom.get_total_duration():.1f} s")
Recording type: ChannelSliceRecording
Duration: 10.0 s
# Pass any si.BaseRecording directly to LRO
lro_custom = core.LongRecordingOrganizer(
item=None,
mode=None,
recording=recording_custom,
)
print(f"Sampling frequency: {lro_custom.meta.f_s}")
print(f"Number of channels: {lro_custom.meta.n_channels}")
print(f"Channel names: {lro_custom.meta.channel_names}")
Sampling frequency: 1000.0
Number of channels: 8
Channel names: ['CH00', 'CH01', 'CH02', 'CH03', 'CH04', 'CH05', 'CH06', 'CH07']
2026-04-02 05:38:08,365 - INFO - Recording already at target sampling rate (1000 Hz) or unable to determine, no resampling needed
6. Inspecting Loaded Data#
Every LRO exposes a meta attribute (RecordingMetadata) with key
properties:
metadata = lro_edf.meta
print(f"Recording metadata: {metadata}")
print(f"Sampling frequency: {metadata.f_s} Hz")
print(f"Number of channels: {metadata.n_channels}")
print(f"Channel names: {metadata.channel_names}")
print(f"Units: {metadata.V_units}")
print(f"Duration: {lro_edf.file_durations} seconds")
Recording metadata: <neurodent.core.core.RecordingMetadata object at 0x7fd783041d50>
Sampling frequency: 1000.0 Hz
Number of channels: 10
Channel names: ['C-009', 'C-010', 'C-012', 'C-014', 'C-015', 'C-016', 'C-017', 'C-019', 'C-021', 'C-022']
Units: µV
Duration: [5.0] seconds
Part 2 — AnimalOrganizer#
In practice you rarely create LROs yourself. Instead, AnimalOrganizer
discovers recordings for an animal automatically using patterns —
format strings with placeholders that match parts of the file path.
Pattern Placeholders#
Placeholder |
Meaning |
|---|---|
|
Animal identifier (e.g. |
|
Session or day folder (e.g. |
|
File index within a session (when multiple files per session) |
|
Standard glob wildcard — matches any characters |
Example: "/data/{animal}/{session}/*.edf" matches files like
/data/A10/day1/recording.edf and extracts animal="A10", session="day1".
Key Constructor Parameters#
AO internally uses FileDiscoverer to match files, groups them by
session, and creates one LRO per session:
Parameter |
Description |
|---|---|
|
A single pattern string, or a list of patterns for multi-file formats |
|
Filter discoveries to one animal |
|
Glob patterns for sessions to exclude |
|
Limit the number of sessions loaded |
|
Parse channel aliases from channel-name numbers |
|
Dict of arguments forwarded to each |
7. FileDiscoverer — Finding Recordings#
FileDiscoverer scans the filesystem using placeholder patterns and
returns DiscoveredFile objects. For multi-file formats, pass a
list of patterns; files that share the same placeholder values are
grouped automatically.
# Discover all bin/csv pairs under .tests/integration/data/
discoverer = FileDiscoverer([
"../../.tests/integration/data/{animal}/*_ColMajor.bin",
"../../.tests/integration/data/{animal}/*_Meta.csv",
])
discovered_files = discoverer.discover()
for f in discovered_files:
print(f"Animal {f.metadata['animal']}: {[Path(p).name for p in f.paths]}")
Animal A10: ['Cage 2 A10-0_ColMajor.bin', 'Cage 2 A10-0_Meta.csv']
Animal F22: ['Cage 3 F22-0_ColMajor.bin', 'Cage 3 F22-0_Meta.csv']
8. Passing Patterns to AnimalOrganizer#
The same pattern syntax goes straight into AnimalOrganizer. The AO
runs FileDiscoverer internally, groups the results by session, and
builds LROs.
For example, the patterns below discover the Cage 2 A10-0_ColMajor.bin
and Cage 2 A10-0_Meta.csv files under .tests/integration/data/A10/ and group
them into a single-session LRO:
from neurodent import visualization
# Multi-file pattern for paired bin/csv data
ao = visualization.AnimalOrganizer(
pattern=[
"../../.tests/integration/data/{animal}/*_ColMajor.bin",
"../../.tests/integration/data/{animal}/*_Meta.csv",
],
animal_id="A10",
assume_from_number=True,
lro_kwargs={
"mode": "si",
"extract_func": "../../tests/integration/readers.py:read_bin_csv_pair",
"manual_datetimes": datetime(2023, 12, 13),
},
)
print(f"Animal Organizer created for {ao.animal_id}")
Animal Organizer created for A10
2026-04-02 05:38:08,385 - INFO - Processing manual_datetimes configuration
2026-04-02 05:38:08,385 - INFO - Processing global manual datetimes starting at 2023-12-13 00:00:00
2026-04-02 05:38:08,386 - INFO - Computing continuous timeline for 1 animaldays (1 total items) starting at 2023-12-13 00:00:00
2026-04-02 05:38:08,386 - INFO - Ordered items for timeline: ['Cage 2 A10-0_ColMajor.bin...']
2026-04-02 05:38:08,391 - INFO - Recording already at target sampling rate (1000 Hz) or unable to determine, no resampling needed
2026-04-02 05:38:08,391 - INFO - Finalizing file timestamps
2026-04-02 05:38:08,392 - INFO - Using manual timestamps: 1 file end times specified
2026-04-02 05:38:08,392 - INFO - Item Cage 2 A10-0_ColMajor.bin...: duration = 120.4s (loaded with manual timestamp)
2026-04-02 05:38:08,393 - INFO - Timeline computed: 1 items, total duration 120.4s
2026-04-02 05:38:08,395 - INFO - Recording already at target sampling rate (1000 Hz) or unable to determine, no resampling needed
2026-04-02 05:38:08,396 - INFO - Finalizing file timestamps
2026-04-02 05:38:08,396 - INFO - Using manual timestamps: 1 file end times specified
2026-04-02 05:38:08,397 - INFO - AnimalOrganizer Timeline Summary:
LRO 0: 2023-12-13 00:00:00 -> 2023-12-13 00:02:00.360000 (duration: 120.4s, items: 1, item: Cage 2 A10-0_ColMajor.bin...)
For single-file formats the pattern is just a string:
ao = visualization.AnimalOrganizer(
pattern="/data/{animal}/{session}/*.edf",
animal_id="A10",
lro_kwargs={"mode": "si", "extract_func": "read_edf"},
)
Summary#
In this tutorial, you learned:
The different
itemtypes accepted byLongRecordingOrganizer(single path, list,DiscoveredFile, in-memory recording)How to load standard formats (EDF, Intan, NWB, MNE) and custom multi-file formats
How to inspect loaded data via the
metaattributeHow
FileDiscovererfinds and pairs recordings using placeholder patternsHow
AnimalOrganizerwraps discovery + LRO creation into a single step
Next Steps#
Basic Usage Tutorial: Complete workflow from loading to visualization
Windowed Analysis Tutorial: Extract features from loaded data
Spike Analysis Tutorial: Work with spike-sorted data