Windowed Analysis Tutorial#

This tutorial provides a deep dive into Neurodent’s windowed analysis capabilities for extracting features from continuous EEG data.

Overview#

Windowed Analysis Results (WAR) is the core feature extraction system in Neurodent. It:

  1. Divides continuous EEG data into time windows

  2. Computes features for each window

  3. Aggregates results across time and channels

  4. Provides filtering and quality control methods

This approach is efficient for long recordings and enables parallel processing.

import sys
from pathlib import Path
import logging

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from neurodent import core, visualization, constants

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger()

1. Feature Categories#

Neurodent extracts four main categories of features:

Linear Features (per channel)#

Single-value metrics for each channel in each time window:

# Available linear features
print("Linear features:")
for feature in constants.LINEAR_FEATURES:
    print(f"  - {feature}")

# Examples:
# - rms: Root mean square amplitude
# - logrms: Log of RMS amplitude
# - ampvar: Amplitude variance
# - psdtot: Total power spectral density
# - psdslope: Slope of PSD on log-log scale

Band Features (per frequency band)#

Features computed for each frequency band (delta, theta, alpha, beta, gamma):

# Available band features
print("\nBand features:")
for feature in constants.BAND_FEATURES:
    print(f"  - {feature}")

# Frequency bands
print("\nFrequency bands:")
print(f"  Delta: 0.1-4 Hz")
print(f"  Theta: 4-8 Hz")
print(f"  Alpha: 8-13 Hz")
print(f"  Beta: 13-25 Hz")
print(f"  Gamma: 25-40 Hz")

Matrix Features (connectivity)#

Features measuring relationships between channels:

# Available matrix features
print("\nMatrix features:")
for feature in constants.MATRIX_FEATURES:
    print(f"  - {feature}")

# Examples:
# - cohere: Spectral coherence between channel pairs
# - pcorr: Pearson correlation between channels

2. Computing Windowed Analysis#

Basic Usage#

# Load data (see Data Loading tutorial)
data_path = Path("/path/to/data")
animal_id = "animal_001"

lro = core.LongRecordingOrganizer(
    base_folder=data_path,
    animal_id=animal_id,
    mode="bin"
)

ao = visualization.AnimalOrganizer(lro)

# Compute all features
war_all = ao.compute_windowed_analysis(
    features=['all'],
    exclude=['nspike', 'lognspike'],  # Exclude spike features if no spikes
    multiprocess_mode='serial'
)

Selective Feature Computation#

For faster processing, compute only needed features:

# Compute specific features
war_selective = ao.compute_windowed_analysis(
    features=['rms', 'logrms', 'psdband', 'cohere'],
    multiprocess_mode='serial'
)

print(f"Computed features: {war_selective.features}")

Parallel Processing#

For large datasets, use parallel processing:

# Option 1: Multiprocessing (uses all CPU cores)
war_mp = ao.compute_windowed_analysis(
    features=['rms', 'psdband'],
    multiprocess_mode='multiprocess'
)

# Option 2: Dask (for distributed computing)
# Requires Dask cluster setup
# war_dask = ao.compute_windowed_analysis(
#     features=['rms', 'psdband'],
#     multiprocess_mode='dask'
# )

3. Data Quality and Filtering#

Configuration-Driven Filtering#

Alternative approach using configuration dictionary:

filter_config = {
    'logrms_range': {'z_range': 3},
    'high_rms': {'max_rms': 500},
    'low_rms': {'min_rms': 50},
    'high_beta': {'max_beta_prop': 0.4},
    'reject_channels_by_session': {},
    'morphological_smoothing': {'smoothing_seconds': 8.0}
}

war_filtered_config = war_all.apply_filters(
    filter_config,
    min_valid_channels=3
)

Available Filters#

  • filter_logrms_range(z_range): Remove outliers based on log RMS

  • filter_high_rms(max_rms): Remove high amplitude artifacts

  • filter_low_rms(min_rms): Remove low amplitude periods

  • filter_high_beta(max_beta_prop): Remove high beta activity (muscle artifacts)

  • filter_reject_channels_by_session(): Identify and reject bad channels

  • morphological_smoothing(smoothing_seconds): Smooth data morphologically

4. Data Aggregation#

Aggregate data across time windows:

# Aggregate time windows
war_filtered.aggregate_time_windows()

# This combines data from multiple windows for statistical analysis

5. Channel Management#

Reorder and Pad Channels#

Ensure consistent channel ordering across animals:

# Define standard channel order
standard_channels = [
    "LMot", "RMot",  # Motor cortex
    "LBar", "RBar",  # Barrel cortex
    "LAud", "RAud",  # Auditory cortex
    "LVis", "RVis",  # Visual cortex
    "LHip", "RHip"   # Hippocampus
]

war_filtered.reorder_and_pad_channels(
    standard_channels,
    use_abbrevs=True  # Use abbreviated channel names
)

print(f"Channels: {war_filtered.channels}")

6. Accessing Computed Features#

WAR objects store features as xarray DataArrays:

# Access RMS data
rms_data = war_filtered.rms
print(f"RMS shape: {rms_data.shape}")
print(f"RMS dims: {rms_data.dims}")
print(f"RMS coords: {list(rms_data.coords)}")

# Access band power data
psdband_data = war_filtered.psdband
print(f"\nPSD Band shape: {psdband_data.shape}")
print(f"PSD Band dims: {psdband_data.dims}")
print(f"Bands: {list(psdband_data.coords['band'].values)}")

# Access coherence data (matrix feature)
cohere_data = war_filtered.cohere
print(f"\nCoherence shape: {cohere_data.shape}")
print(f"Coherence dims: {cohere_data.dims}")

7. Metadata and Grouping Variables#

WAR objects contain metadata for grouping and analysis:

# Access metadata
print(f"Animal ID: {war_filtered.animal_id}")
print(f"Genotype: {war_filtered.genotype}")
print(f"Recording day: {war_filtered.animal_day}")

# Add unique identifier
war_filtered.add_unique_hash()
print(f"Unique hash: {war_filtered.unique_hash}")

8. Saving and Loading#

Save WAR objects for later analysis:

# Save WAR
output_path = Path("./output") / animal_id
output_path.mkdir(parents=True, exist_ok=True)

war_filtered.to_pickle_and_json(output_path)
print(f"Saved to {output_path}")

# Load WAR
war_loaded = visualization.WindowAnalysisResult.load_pickle_and_json(output_path)
print(f"Loaded from {output_path}")

9. Best Practices#

Feature Selection#

  • Start with basic features (rms, psdband) before computing expensive ones (cohere, psd)

  • Exclude spike features if you don’t have spike data

  • Use selective feature computation for faster iteration

Filtering#

  • Always inspect data before and after filtering

  • Use conservative thresholds initially, then adjust

  • Consider biological significance (e.g., high beta may indicate muscle artifacts)

Processing#

  • Use serial mode for debugging

  • Use multiprocess for local analysis of large datasets

  • Use Dask for cluster computing

Quality Control#

  • Check channel consistency across animals

  • Verify metadata (genotype, day, etc.)

  • Save intermediate results frequently

Summary#

This tutorial covered:

  1. Feature categories and types

  2. Computing windowed analysis with different options

  3. Data quality control and filtering

  4. Channel management and standardization

  5. Accessing computed features

  6. Metadata and grouping variables

  7. Saving and loading results

  8. Best practices

Next Steps#