Utilities#

Utility functions for the core module.

Functions#

neurodent.core.utils.convert_units_to_multiplier(current_units: str, target_units: str = 'µV') float[source]#

Convert between different voltage units and return the multiplication factor.

This function calculates the conversion factor needed to transform values from one voltage unit to another (e.g., from mV to µV).

Parameters:
  • current_units (str) – The current unit of the values. Must be one of: ‘µV’, ‘mV’, ‘V’, ‘nV’.

  • target_units (str, optional) – The target unit to convert to. Defaults to ‘µV’. Must be one of: ‘µV’, ‘mV’, ‘V’, ‘nV’.

Returns:

The multiplication factor to convert from current_units to target_units.

To convert values, multiply your data by this factor.

Return type:

float

Raises:

AssertionError – If current_units or target_units are not supported.

Examples

>>> convert_units_to_multiplier("mV", "µV")
1000.0
>>> convert_units_to_multiplier("V", "mV")
1000.0
>>> convert_units_to_multiplier("µV", "V")
1e-06
neurodent.core.utils.extract_mne_unit_info(raw_info: dict) tuple[str | None, float | None][source]#

Extract unit information from MNE Raw info object.

Parameters:

raw_info (dict) – MNE Raw.info object containing channel information

Returns:

(unit_name, mult_to_uV) where unit_name

is the consistent unit across all channels and mult_to_uV is the conversion factor to µV

Return type:

tuple[str | None, float | None]

Raises:

ValueError – If channel units are inconsistent across channels

neurodent.core.utils.is_day(dt: datetime, sunrise=6, sunset=18)[source]#

Check if a datetime object is during the day.

Parameters:
  • dt (datetime) – Datetime object to check

  • sunrise (int, optional) – Sunrise hour (0-23). Defaults to 6.

  • sunset (int, optional) – Sunset hour (0-23). Defaults to 18.

Returns:

True if the datetime is during the day, False otherwise

Return type:

bool

Raises:

TypeError – If dt is not a datetime object

neurodent.core.utils.convert_colpath_to_rowpath(rowdir_path: str | Path, col_path: str | Path, gzip: bool = True, aspath: bool = True) str | Path[source]#

Convert a ColMajor file path to its corresponding RowMajor file path.

This function transforms file paths from column-major format to row-major format, which is used when converting between different data storage layouts in NeuRodent.

Parameters:
  • rowdir_path (str | Path) – Directory path where the RowMajor file should be located.

  • col_path (str | Path) – Path to the ColMajor file to be converted. Must contain ‘ColMajor’ in the path.

  • gzip (bool, optional) – If True, append ‘.npy.gz’ extension. If False, append ‘.bin’. Defaults to True.

  • aspath (bool, optional) – If True, return as Path object. If False, return as string. Defaults to True.

Returns:

The converted RowMajor file path, either as string or Path object based on aspath parameter.

Return type:

str | Path

Raises:

ValueError – If ‘ColMajor’ is not found in col_path.

Examples

>>> convert_colpath_to_rowpath("/data/row/", "/data/col/file_ColMajor_001.bin")
PosixPath('/data/row/file_RowMajor_001.npy.gz')
>>> convert_colpath_to_rowpath("/data/row/", "/data/col/file_ColMajor_001.bin", gzip=False)
PosixPath('/data/row/file_RowMajor_001.bin')
>>> convert_colpath_to_rowpath("/data/row/", "/data/col/file_ColMajor_001.bin", aspath=False)
'/data/row/file_RowMajor_001.npy.gz'
neurodent.core.utils.filepath_to_index(filepath) int[source]#

Extract the index number from a filepath.

This function extracts the last number found in a filepath after removing common suffixes and file extensions. For example, from “/path/to/data_ColMajor_001.bin” it returns 1.

Parameters:

filepath (str | Path) – Path to the file to extract index from.

Returns:

The extracted index number.

Return type:

int

Examples

>>> filepath_to_index("/path/to/data_ColMajor_001.bin")
1
>>> filepath_to_index("/path/to/data_2023_015_ColMajor.bin")
15
>>> filepath_to_index("/path/to/data_Meta_010.json")
10
neurodent.core.utils.parse_truncate(truncate: int | bool) int[source]#

Parse the truncate parameter to determine how many characters to truncate.

If truncate is a boolean, returns 10 if True and 0 if False. If truncate is an integer, returns that integer value directly.

Parameters:

truncate (int | bool) – If bool, True=10 chars and False=0 chars. If int, specifies exact number of chars.

Returns:

Number of characters to truncate (0 means no truncation)

Return type:

int

Raises:

ValueError – If truncate is not a boolean or integer

neurodent.core.utils.nanaverage(A: ndarray, weights: ndarray, axis: int = -1) ndarray[source]#

Compute weighted average of an array, ignoring NaN values.

This function computes a weighted average along the specified axis while properly handling NaN values by masking them out of the calculation.

Parameters:
  • A (np.ndarray) – Input array containing the values to average.

  • weights (np.ndarray) – Array of weights corresponding to the values in A. Must be broadcastable with A along the specified axis.

  • axis (int, optional) – Axis along which to compute the average. Defaults to -1 (last axis).

Returns:

Weighted average with NaN values properly handled. If all values

along an axis are NaN, the result will be NaN for that position.

Return type:

np.ndarray

Examples

>>> import numpy as np
>>> A = np.array([[1.0, 2.0, np.nan], [4.0, np.nan, 6.0]])
>>> weights = np.array([1, 2, 1])
>>> nanaverage(A, weights, axis=1)
array([1.66666667, 5.        ])

Note

Be careful with zero or negative weights as they may produce unexpected results. The function uses numpy’s masked array functionality for robust NaN handling.

neurodent.core.utils.parse_path_to_animalday(filepath: str | Path, animal_param: tuple[int, str] | str | list[str] = (0, None), day_sep: str | None = None, mode: Literal['nest', 'concat', 'base', 'noday'] = 'concat', **day_parse_kwargs)[source]#

Parses the filename of a binfolder to get the animalday identifier (animal id, genotype, and day).

Parameters:
  • filepath (str | Path) – Filepath of the binfolder.

  • animal_param (tuple[int, str] | str | list[str], optional) – Parameter specifying how to parse the animal ID: tuple[int, str]: (index, separator) for simple split and index str: regex pattern to extract ID list[str]: list of possible animal IDs to match against

  • day_sep (str, optional) – Separator for day in filename. Defaults to None.

  • mode (Literal['nest', 'concat', 'base', 'noday'], optional) –

    Mode to parse the filename. Defaults to ‘concat’.

    • ’nest’: Extracts genotype/animal from parent directory name and date from filename. Example: “/WT_A10/recording_2023-04-01.*”

    • ’concat’: Extracts all info from filename, expects genotype_animal_date format. Example: “/WT_A10_2023-04-01.*”

    • ’base’: Same as concat

    • ’noday’: Extracts only genotype and animal ID, uses default date. Example: “/WT_A10_recording.*”

  • **day_parse_kwargs – Additional keyword arguments to pass to parse_str_to_day function. Common options include parse_params dict for dateutil.parser.parse.

Returns:

Dictionary with keys “animal”, “genotype”, “day”, and “animalday” (concatenated).

Example: {“animal”: “A10”, “genotype”: “WT”, “day”: “Apr-01-2023”, “animalday”: “A10 WT Apr-01-2023”}

Return type:

dict[str, str]

Raises:
  • ValueError – If mode is invalid or required components cannot be extracted

  • TypeError – If filepath is not str or Path

neurodent.core.utils.parse_str_to_genotype(string: str, strict_matching: bool = False) str[source]#

Parses the filename of a binfolder to get the genotype.

Parameters:
  • string (str) – String to parse.

  • strict_matching (bool, optional) – If True, ensures the input matches exactly one genotype. If False, allows overlapping matches and uses longest. Defaults to False for backward compatibility.

Returns:

Genotype.

Return type:

str

Raises:

ValueError – When string cannot be parsed or contains ambiguous matches in strict mode.

Examples

>>> parse_str_to_genotype("WT_A10_data")
'WT'
>>> parse_str_to_genotype("WT_KO_comparison", strict_matching=True)  # Would raise error
ValueError: Ambiguous match...
>>> parse_str_to_genotype("WT_KO_comparison", strict_matching=False)  # Uses longest match
'WT'  # or 'KO' depending on which alias is longer
neurodent.core.utils.parse_str_to_animal(string: str, animal_param: tuple[int, str] | str | list[str] = (0, None)) str[source]#

Parses the filename of a binfolder to get the animal id.

Parameters:
  • string (str) – String to parse.

  • animal_param (tuple[int, str] | str | list[str] (default: (0, None))) – Parameter specifying how to parse the animal ID: tuple[int, str]: (index, separator) for simple split and index. Not recommended for inconsistent naming conventions. str: regex pattern to extract ID. Most general use case. If multiple matches are found, returns the first match. list[str]: list of possible animal IDs to match against. Returns first match in list order, case-sensitive, ignoring empty strings.

Returns:

Animal id.

Return type:

str

Examples

# Tuple format: (index, separator) >>> parse_str_to_animal(“WT_A10_2023-01-01_data.bin”, (1, “_”)) ‘A10’ >>> parse_str_to_animal(“A10_WT_recording.bin”, (0, “_”)) ‘A10’

# Regex pattern format >>> parse_str_to_animal(“WT_A10_2023-01-01_data.bin”, r”Ad+”) ‘A10’ >>> parse_str_to_animal(“subject_123_data.bin”, r”d+”) ‘123’

# List format: possible IDs to match >>> parse_str_to_animal(“WT_A10_2023-01-01_data.bin”, [“A10”, “A11”, “A12”]) ‘A10’ >>> parse_str_to_animal(“WT_A10_data.bin”, [“B15”, “C20”]) # No match ValueError: No matching ID found in WT_A10_data.bin from possible IDs: [‘B15’, ‘C20’]

neurodent.core.utils.parse_str_to_day(string: str, sep: str | None = None, parse_params: dict | None = None, parse_mode: Literal['full', 'split', 'window', 'all'] = 'split', date_patterns: list[tuple[str, str]] | None = None) datetime[source]#

Parses the filename of a binfolder to get the day.

Parameters:
  • string (str) – String to parse.

  • sep (str, optional) – Separator to split string by. If None, split by whitespace. Defaults to None.

  • parse_params (dict, optional) – Parameters to pass to dateutil.parser.parse. Defaults to {‘fuzzy’:True}.

  • parse_mode (Literal["full", "split", "window", "all"], optional) – Mode for parsing the string. Defaults to “split”. “full”: Try parsing the entire cleaned string only “split”: Try parsing individual tokens only “window”: Try parsing sliding windows of tokens (2-4 tokens) only “all”: Use all three approaches in the order “full”, “split”, “window

  • date_patterns (list[tuple[str, str]], optional) – List of (regex_pattern, strptime_format) tuples to try before falling back to token-based parsing. This allows users to specify exact formats to handle ambiguous cases like MM/DD/YYYY vs DD/MM/YYYY. Only used in “split” and “all” modes. Defaults to None (no regex patterns).

Returns:

Datetime object corresponding to the day of the binfolder.

Return type:

datetime

Raises:
  • ValueError – If no valid date token is found in the string.

  • TypeError – If date_patterns is not a list of tuples.

Examples

>>> # Handle ambiguous date formats with explicit patterns
>>> patterns = [(r'(19\d{2}|20\d{2})-(\d{1,2})-(\d{1,2})', '%Y-%m-%d')]
>>> parse_str_to_day('2001_2023-07-04_data', date_patterns=patterns)
datetime.datetime(2023, 7, 4, 0, 0)
>>> # European format pattern
>>> patterns = [(r'(\d{1,2})/(\d{1,2})/(19\d{2}|20\d{2})', '%d/%m/%Y')]
>>> parse_str_to_day('04/07/2023_data', date_patterns=patterns)
datetime.datetime(2023, 7, 4, 0, 0)  # July 4th, not April 7th

Note

When date_patterns is provided, users have full control over date interpretation. Without date_patterns, the function falls back to token-based parsing which may be ambiguous for formats like MM/DD/YYYY vs DD/MM/YYYY.

neurodent.core.utils.parse_chname_to_abbrev(channel_name: str, assume_from_number=False, strict_matching=True) str[source]#

Parses the channel name to get the abbreviation.

Parameters:
  • channel_name (str) – Name of the channel.

  • assume_from_number (bool, optional) – If True, assume the abbreviation based on the last number in the channel name when normal parsing fails. Defaults to False.

  • strict_matching (bool, optional) – If True, ensures the input matches exactly one L/R alias and one channel alias. If False, allows multiple matches and uses longest. Defaults to True.

Returns:

Abbreviation of the channel name.

Return type:

str

Raises:
  • ValueError – When channel_name cannot be parsed or contains ambiguous matches in strict mode.

  • KeyError – When assume_from_number=True but the detected number is not a valid channel ID.

Examples

>>> parse_chname_to_abbrev("left Aud")
'LAud'
>>> parse_chname_to_abbrev("Right VIS")
'RVis'
>>> parse_chname_to_abbrev("channel_9", assume_from_number=True)
'LAud'
>>> parse_chname_to_abbrev("LRAud", strict_matching=False)  # Would work in non-strict mode
'LAud'  # Uses longest L/R match
neurodent.core.utils.set_temp_directory(path: str | Path) None[source]#

Set the temporary directory for NeuRodent operations.

This function configures the temporary directory used by NeuRodent for intermediate files and operations. The directory will be created if it doesn’t exist.

Parameters:

path (str | Path) – Path to the temporary directory. Will be created if it doesn’t exist.

Return type:

None

Examples

>>> set_temp_directory("/tmp/neurodent_temp")
>>> set_temp_directory(Path.home() / "neurodent_workspace" / "temp")

Note

This function modifies the TMPDIR environment variable, which affects the behavior of other temporary file operations in the process.

neurodent.core.utils.get_temp_directory() Path[source]#

Get the current temporary directory used by NeuRodent.

Returns:

Path object representing the current temporary directory.

Return type:

Path

Examples

>>> temp_dir = get_temp_directory()
>>> print(f"Current temp directory: {temp_dir}")
Current temp directory: /tmp/neurodent_temp
Raises:

KeyError – If TMPDIR environment variable is not set.

Return type:

Path

neurodent.core.utils.cache_fragments_to_zarr(np_fragments: ndarray, n_fragments: int, tmpdir: str | None = None) tuple[str, zarr.Array][source]#

Cache numpy fragments array to zarr format for efficient memory management.

This function converts a numpy array of recording fragments to a zarr array stored in a temporary location. This allows better memory management and garbage collection by avoiding keeping large numpy arrays in memory for extended periods.

Parameters:
  • np_fragments (np.ndarray) – Numpy array of shape (n_fragments, n_samples, n_channels) containing the recording fragments to cache.

  • n_fragments (int) – Number of fragments to cache (allows for subset caching).

  • tmpdir (str, optional) – Directory path for temporary zarr storage. If None, uses get_temp_directory(). Defaults to None.

Returns:

A tuple containing:
  • str: Path to the temporary zarr file

  • zarr.Array: The zarr array object for accessing cached data

Return type:

tuple[str, zarr.Array]

Raises:

ImportError – If zarr is not available

neurodent.core.utils.get_file_stem(filepath: str | Path) str[source]#

Get the true stem for files, handling double extensions like .npy.gz.

Return type:

str

Parameters:

filepath (str | Path)

neurodent.core.utils.nanmean_series_of_np(x: Series, axis: int = 0) ndarray[source]#

Efficiently compute NaN-aware mean of a pandas Series containing numpy arrays.

This function is optimized for computing the mean across a Series where each element is a numpy array. It uses different strategies based on the size of the Series for optimal performance.

Parameters:
  • x (pd.Series) – Series containing numpy arrays as elements.

  • axis (int, optional) – Axis along which to compute the mean. Defaults to 0. - axis=0: Mean across the Series elements (most common) - axis=1: Mean within each array element

Returns:

Array containing the computed means with NaN values properly handled.

Return type:

np.ndarray

Examples

>>> import pandas as pd
>>> import numpy as np
>>> # Create a Series of numpy arrays
>>> arrays = [np.array([1.0, 2.0, np.nan]),
...           np.array([4.0, np.nan, 6.0]),
...           np.array([7.0, 8.0, 9.0])]
>>> series = pd.Series(arrays)
>>> nanmean_series_of_np(series)
array([4. , 5. , 7.5])
Performance Notes:
  • For Series with more than 1000 elements containing numpy arrays, uses np.stack() for better performance

  • Falls back to list conversion for smaller Series or mixed types

  • Handles shape mismatches gracefully by falling back to the slower method

neurodent.core.utils.log_transform(rec: ndarray, **kwargs) ndarray[source]#

Log transform the signal

Parameters:

rec (np.ndarray) – The signal to log transform.

Returns:

ln(rec + 1)

Return type:

np.ndarray

neurodent.core.utils.sort_dataframe_by_plot_order(df: DataFrame, df_sort_order: dict | None = None) DataFrame[source]#

Sort DataFrame columns according to predefined orders.

Parameters:
  • df (pd.DataFrame) – DataFrame to sort

  • df_sort_order (dict) – Dictionary mapping column names to the order of the values in the column.

Returns:

Sorted DataFrame

Return type:

pd.DataFrame

Raises:

ValueError – If df_sort_order is not a valid dictionary or contains invalid categories

class neurodent.core.utils.Natural_Neighbor[source]#

Bases: object

Natural Neighbor algorithm implementation for finding natural neighbors in a dataset.

This class implements the Natural Neighbor algorithm which finds mutual neighbors in a dataset by iteratively expanding the neighborhood radius until convergence.

__init__()[source]#

Initialize the Natural Neighbor algorithm.

nan_edges#

Graph of mutual neighbors

Type:

dict

nan_num#

Number of natural neighbors for each instance

Type:

dict

repeat#

Data structure that counts repetitions of the count method

Type:

dict

target#

Set of classes

Type:

list

data#

Set of instances

Type:

list

knn#

Structure that stores neighbors of each instance

Type:

dict

load(filename)[source]#

Load dataset from a CSV file, separating attributes and classes.

Parameters:

filename (str) – Path to the CSV file containing the dataset

read(data: ndarray)[source]#

Load data directly from a numpy array.

Parameters:

data (np.ndarray) – Input data array

asserts()[source]#

Initialize data structures for the algorithm.

Sets up the necessary data structures including: - nan_edges as an empty set - knn, nan_num, and repeat dictionaries for each instance

count()[source]#

Count the number of instances that have no natural neighbors.

Returns:

Number of instances with zero natural neighbors

Return type:

int

findKNN(inst, r, tree)[source]#

Find the indices of the k nearest neighbors.

Parameters:
  • inst – Instance to find neighbors for

  • r (int) – Radius/parameter for neighbor search

  • tree – KDTree object for efficient neighbor search

Returns:

Array of neighbor indices (excluding the instance itself)

Return type:

np.ndarray

algorithm()[source]#

Execute the Natural Neighbor algorithm.

The algorithm iteratively expands the neighborhood radius until convergence, finding mutual neighbors between instances.

Returns:

The final radius value when convergence is reached

Return type:

int

class neurodent.core.utils.TimestampMapper(file_end_datetimes: list[datetime], file_durations: list[float])[source]#

Bases: object

Map each fragment to its source file’s timestamp.

This class provides functionality to map data fragments back to their original file timestamps when data has been concatenated from multiple files with different recording times.

Parameters:
  • file_end_datetimes (list[datetime])

  • file_durations (list[float])

file_end_datetimes#

The end datetimes of each source file.

Type:

list[datetime]

file_durations#

The durations of each source file in seconds.

Type:

list[float]

file_start_datetimes#

Computed start datetimes of each file.

Type:

list[datetime]

cumulative_durations#

Cumulative sum of file durations.

Type:

np.ndarray

Examples

>>> from datetime import datetime, timedelta
>>> # Set up files with known end times and durations
>>> end_times = [datetime(2023, 1, 1, 12, 0), datetime(2023, 1, 1, 13, 0)]
>>> durations = [3600.0, 1800.0]  # 1 hour, 30 minutes
>>> mapper = TimestampMapper(end_times, durations)
>>>
>>> # Get timestamp for fragment at index 2 with 60s fragments
>>> timestamp = mapper.get_fragment_timestamp(2, 60.0)
>>> print(timestamp)
2023-01-01 11:02:00
__init__(file_end_datetimes: list[datetime], file_durations: list[float])[source]#

Initialize the TimestampMapper.

Parameters:
  • file_end_datetimes (list[datetime]) – The end datetimes of each file.

  • file_durations (list[float]) – The durations of each file in seconds.

Raises:

ValueError – If the lengths of file_end_datetimes and file_durations don’t match.

get_fragment_timestamp(fragment_idx: int, fragment_len_s: float) datetime[source]#

Get the timestamp for a specific fragment based on its index and length.

Parameters:
  • fragment_idx (int) – The index of the fragment (0-based).

  • fragment_len_s (float) – The length of each fragment in seconds.

Returns:

The timestamp corresponding to the start of the specified fragment.

Return type:

datetime

Examples

>>> # Get timestamp for the 5th fragment (index 4) with 30-second fragments
>>> timestamp = mapper.get_fragment_timestamp(4, 30.0)
>>> # This returns the timestamp 2 minutes into the first file
neurodent.core.utils.validate_timestamps(timestamps: list[datetime], gap_threshold_seconds: float = 60) list[datetime][source]#

Validate that timestamps are in chronological order and check for large gaps.

Parameters:
  • timestamps (list[datetime]) – List of timestamps to validate

  • gap_threshold_seconds (float, optional) – Threshold in seconds for warning about large gaps. Defaults to 60.

Returns:

The validated timestamps in chronological order

Return type:

list[datetime]

Raises:

ValueError – If no valid timestamps are provided

neurodent.core.utils.should_use_cached_file(cache_path: str | Path, source_paths: list[str | Path], use_cached: Literal['auto', 'always', 'never', 'error'] = 'auto') bool[source]#

Determine whether to use a cached intermediate file based on caching policy and file timestamps.

Parameters:
  • cache_path (str | Path) – Path to the cached intermediate file

  • source_paths (list[str | Path]) – List of source file paths that the cache depends on

  • use_cached (Literal['auto', 'always', 'never', 'error'] (default: 'auto')) – Caching policy - “auto”: Use cached if exists and newer than all sources (default) - “always”: Always use cached if it exists - “never”: Never use cached (always regenerate) - “error”: Raise error if cached doesn’t exist

Returns:

True if cached file should be used, False if it should be regenerated

Return type:

bool

Raises:
  • FileNotFoundError – When use_cached=”error” and cache doesn’t exist

  • ValueError – For invalid use_cached values

neurodent.core.utils.get_cache_status_message(cache_path: str | Path, use_cached: bool) str[source]#

Generate a descriptive message about cache usage for logging.

Return type:

str

Parameters:
  • cache_path (str | Path)

  • use_cached (bool)

neurodent.core.utils.should_use_cache_unified(cache_path: str | Path, source_paths: list[str | Path], cache_policy: Literal['auto', 'always', 'force_regenerate']) bool[source]#

Unified cache decision logic for all intermediate files.

Parameters:
  • cache_path (str | Path) – Path to the cache file

  • source_paths (list[str | Path]) – List of source file paths to check timestamps against

  • cache_policy (Literal['auto', 'always', 'force_regenerate']) – Caching policy: - “auto”: Use cache if exists and newer than sources, regenerate with logging if missing/invalid - “always”: Use cache if exists, raise error if missing/invalid - “force_regenerate”: Always regenerate and overwrite existing cache

Returns:

True if cache should be used, False if should regenerate

Return type:

bool

Raises:

ValueError – If cache_policy is invalid