Windowed Analysis Tutorial#

This tutorial provides a deep dive into Neurodent’s windowed analysis capabilities for extracting features from continuous EEG data.

Overview#

Windowed Analysis Results (WAR) is the core feature extraction system in Neurodent. It:

Divides continuous EEG data into time windows
Computes features for each window
Aggregates results across time and channels
Provides filtering and quality control methods

This approach is efficient for long recordings and enables parallel processing.

import sys
from pathlib import Path
import logging

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from neurodent import core, visualization, constants

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger()

1. Feature Categories#

Neurodent extracts four main categories of features:

Linear Features (per channel)#

Single-value metrics for each channel in each time window:

# Available linear features
print("Linear features:")
for feature in constants.LINEAR_FEATURES:
    print(f"  - {feature}")

# Examples:
# - rms: Root mean square amplitude
# - logrms: Log of RMS amplitude
# - ampvar: Amplitude variance
# - psdtot: Total power spectral density
# - psdslope: Slope of PSD on log-log scale

Band Features (per frequency band)#

Features computed for each frequency band (delta, theta, alpha, beta, gamma):

# Available band features
print("\nBand features:")
for feature in constants.BAND_FEATURES:
    print(f"  - {feature}")

# Frequency bands
print("\nFrequency bands:")
print(f"  Delta: 0.1-4 Hz")
print(f"  Theta: 4-8 Hz")
print(f"  Alpha: 8-13 Hz")
print(f"  Beta: 13-25 Hz")
print(f"  Gamma: 25-40 Hz")

Matrix Features (connectivity)#

Features measuring relationships between channels:

# Available matrix features
print("\nMatrix features:")
for feature in constants.MATRIX_FEATURES:
    print(f"  - {feature}")

# Examples:
# - cohere: Spectral coherence between channel pairs
# - pcorr: Pearson correlation between channels

2. Computing Windowed Analysis#

Basic Usage#

# Load data (see Data Loading tutorial)
data_path = Path("/path/to/data")
animal_id = "animal_001"

lro = core.LongRecordingOrganizer(
    base_folder=data_path,
    animal_id=animal_id,
    mode="bin"
)

ao = visualization.AnimalOrganizer(lro)

# Compute all features
war_all = ao.compute_windowed_analysis(
    features=['all'],
    exclude=['nspike', 'lognspike'],  # Exclude spike features if no spikes
    multiprocess_mode='serial'
)

Selective Feature Computation#

For faster processing, compute only needed features:

# Compute specific features
war_selective = ao.compute_windowed_analysis(
    features=['rms', 'logrms', 'psdband', 'cohere'],
    multiprocess_mode='serial'
)

print(f"Computed features: {war_selective.features}")

Parallel Processing#

For large datasets, use parallel processing:

# Option 1: Multiprocessing (uses all CPU cores)
war_mp = ao.compute_windowed_analysis(
    features=['rms', 'psdband'],
    multiprocess_mode='multiprocess'
)

# Option 2: Dask (for distributed computing)
# Requires Dask cluster setup
# war_dask = ao.compute_windowed_analysis(
#     features=['rms', 'psdband'],
#     multiprocess_mode='dask'
# )

3. Data Quality and Filtering#

Method Chaining (Recommended)#

Apply multiple filters in sequence:

war_filtered = (
    war_all
    .filter_logrms_range(z_range=3)           # Remove outliers (±3 SD)
    .filter_high_rms(max_rms=500)             # Remove high amplitude artifacts
    .filter_low_rms(min_rms=50)               # Remove low amplitude periods
    .filter_high_beta(max_beta_prop=0.4)      # Remove high beta activity
    .filter_reject_channels_by_session()      # Reject bad channels
)

print("Filtering completed!")

Configuration-Driven Filtering#

Alternative approach using configuration dictionary:

filter_config = {
    'logrms_range': {'z_range': 3},
    'high_rms': {'max_rms': 500},
    'low_rms': {'min_rms': 50},
    'high_beta': {'max_beta_prop': 0.4},
    'reject_channels_by_session': {},
    'morphological_smoothing': {'smoothing_seconds': 8.0}
}

war_filtered_config = war_all.apply_filters(
    filter_config,
    min_valid_channels=3
)

Available Filters#

filter_logrms_range(z_range): Remove outliers based on log RMS
filter_high_rms(max_rms): Remove high amplitude artifacts
filter_low_rms(min_rms): Remove low amplitude periods
filter_high_beta(max_beta_prop): Remove high beta activity (muscle artifacts)
filter_reject_channels_by_session(): Identify and reject bad channels
morphological_smoothing(smoothing_seconds): Smooth data morphologically

4. Data Aggregation#

Aggregate data across time windows:

# Aggregate time windows
war_filtered.aggregate_time_windows()

# This combines data from multiple windows for statistical analysis

5. Channel Management#

Reorder and Pad Channels#

Ensure consistent channel ordering across animals:

# Define standard channel order
standard_channels = [
    "LMot", "RMot",  # Motor cortex
    "LBar", "RBar",  # Barrel cortex
    "LAud", "RAud",  # Auditory cortex
    "LVis", "RVis",  # Visual cortex
    "LHip", "RHip"   # Hippocampus
]

war_filtered.reorder_and_pad_channels(
    standard_channels,
    use_abbrevs=True  # Use abbreviated channel names
)

print(f"Channels: {war_filtered.channels}")

6. Accessing Computed Features#

WAR objects store features as xarray DataArrays:

# Access RMS data
rms_data = war_filtered.rms
print(f"RMS shape: {rms_data.shape}")
print(f"RMS dims: {rms_data.dims}")
print(f"RMS coords: {list(rms_data.coords)}")

# Access band power data
psdband_data = war_filtered.psdband
print(f"\nPSD Band shape: {psdband_data.shape}")
print(f"PSD Band dims: {psdband_data.dims}")
print(f"Bands: {list(psdband_data.coords['band'].values)}")

# Access coherence data (matrix feature)
cohere_data = war_filtered.cohere
print(f"\nCoherence shape: {cohere_data.shape}")
print(f"Coherence dims: {cohere_data.dims}")

7. Metadata and Grouping Variables#

WAR objects contain metadata for grouping and analysis:

# Access metadata
print(f"Animal ID: {war_filtered.animal_id}")
print(f"Genotype: {war_filtered.genotype}")
print(f"Recording day: {war_filtered.animal_day}")

# Add unique identifier
war_filtered.add_unique_hash()
print(f"Unique hash: {war_filtered.unique_hash}")

8. Saving and Loading#

Save WAR objects for later analysis:

# Save WAR
output_path = Path("./output") / animal_id
output_path.mkdir(parents=True, exist_ok=True)

war_filtered.to_pickle_and_json(output_path)
print(f"Saved to {output_path}")

# Load WAR
war_loaded = visualization.WindowAnalysisResult.load_pickle_and_json(output_path)
print(f"Loaded from {output_path}")

9. Best Practices#

Feature Selection#

Start with basic features (rms, psdband) before computing expensive ones (cohere, psd)
Exclude spike features if you don’t have spike data
Use selective feature computation for faster iteration

Filtering#

Always inspect data before and after filtering
Use conservative thresholds initially, then adjust
Consider biological significance (e.g., high beta may indicate muscle artifacts)

Processing#

Use serial mode for debugging
Use multiprocess for local analysis of large datasets
Use Dask for cluster computing

Quality Control#

Check channel consistency across animals
Verify metadata (genotype, day, etc.)
Save intermediate results frequently

Summary#

This tutorial covered:

Feature categories and types
Computing windowed analysis with different options
Data quality control and filtering
Channel management and standardization
Accessing computed features
Metadata and grouping variables
Saving and loading results
Best practices

Next Steps#

Visualization Tutorial: Plot and analyze WAR results
Spike Analysis Tutorial: Integrate spike-sorted data