Utilities#
Utility functions for the core module.
Functions#
- neurodent.core.utils.convert_units_to_multiplier(current_units: str, target_units: str = 'µV') float[source]#
Convert between different voltage units and return the multiplication factor.
This function calculates the conversion factor needed to transform values from one voltage unit to another (e.g., from mV to µV).
- Parameters:
current_units (
str) – The current unit of the values. Must be one of: ‘µV’, ‘mV’, ‘V’, ‘nV’.target_units (
str, optional) – The target unit to convert to. Defaults to ‘µV’. Must be one of: ‘µV’, ‘mV’, ‘V’, ‘nV’.
- Returns:
- The multiplication factor to convert from current_units to target_units.
To convert values, multiply your data by this factor.
- Return type:
float
- Raises:
AssertionError – If current_units or target_units are not supported.
Examples
>>> convert_units_to_multiplier("mV", "µV") 1000.0 >>> convert_units_to_multiplier("V", "mV") 1000.0 >>> convert_units_to_multiplier("µV", "V") 1e-06
- neurodent.core.utils.extract_mne_unit_info(raw_info: dict) tuple[str | None, float | None][source]#
Extract unit information from MNE Raw info object.
- Parameters:
raw_info (
dict) – MNE Raw.info object containing channel information- Returns:
- (unit_name, mult_to_uV) where unit_name
is the consistent unit across all channels and mult_to_uV is the conversion factor to µV
- Return type:
tuple[str | None, float | None]
- Raises:
ValueError – If channel units are inconsistent across channels
- neurodent.core.utils.is_day(dt: datetime, sunrise=6, sunset=18)[source]#
Check if a datetime object is during the day.
- Parameters:
dt (
datetime) – Datetime object to checksunrise (
int, optional) – Sunrise hour (0-23). Defaults to 6.sunset (
int, optional) – Sunset hour (0-23). Defaults to 18.
- Returns:
True if the datetime is during the day, False otherwise
- Return type:
bool
- Raises:
TypeError – If dt is not a datetime object
- neurodent.core.utils.convert_colpath_to_rowpath(rowdir_path: str | Path, col_path: str | Path, gzip: bool = True, aspath: bool = True) str | Path[source]#
Convert a ColMajor file path to its corresponding RowMajor file path.
This function transforms file paths from column-major format to row-major format, which is used when converting between different data storage layouts in NeuRodent.
- Parameters:
rowdir_path (
str | Path) – Directory path where the RowMajor file should be located.col_path (
str | Path) – Path to the ColMajor file to be converted. Must contain ‘ColMajor’ in the path.gzip (
bool, optional) – If True, append ‘.npy.gz’ extension. If False, append ‘.bin’. Defaults to True.aspath (
bool, optional) – If True, return as Path object. If False, return as string. Defaults to True.
- Returns:
The converted RowMajor file path, either as string or Path object based on aspath parameter.
- Return type:
str | Path
- Raises:
ValueError – If ‘ColMajor’ is not found in col_path.
Examples
>>> convert_colpath_to_rowpath("/data/row/", "/data/col/file_ColMajor_001.bin") PosixPath('/data/row/file_RowMajor_001.npy.gz') >>> convert_colpath_to_rowpath("/data/row/", "/data/col/file_ColMajor_001.bin", gzip=False) PosixPath('/data/row/file_RowMajor_001.bin') >>> convert_colpath_to_rowpath("/data/row/", "/data/col/file_ColMajor_001.bin", aspath=False) '/data/row/file_RowMajor_001.npy.gz'
- neurodent.core.utils.filepath_to_index(filepath) int[source]#
Extract the index number from a filepath.
This function extracts the last number found in a filepath after removing common suffixes and file extensions. For example, from “/path/to/data_ColMajor_001.bin” it returns 1.
- Parameters:
filepath (
str | Path) – Path to the file to extract index from.- Returns:
The extracted index number.
- Return type:
int
Examples
>>> filepath_to_index("/path/to/data_ColMajor_001.bin") 1 >>> filepath_to_index("/path/to/data_2023_015_ColMajor.bin") 15 >>> filepath_to_index("/path/to/data_Meta_010.json") 10
- neurodent.core.utils.parse_truncate(truncate: int | bool) int[source]#
Parse the truncate parameter to determine how many characters to truncate.
If truncate is a boolean, returns 10 if True and 0 if False. If truncate is an integer, returns that integer value directly.
- Parameters:
truncate (
int | bool) – If bool, True=10 chars and False=0 chars. If int, specifies exact number of chars.- Returns:
Number of characters to truncate (0 means no truncation)
- Return type:
int
- Raises:
ValueError – If truncate is not a boolean or integer
Compute weighted average of an array, ignoring NaN values.
This function computes a weighted average along the specified axis while properly handling NaN values by masking them out of the calculation.
- Parameters:
A (
np.ndarray) – Input array containing the values to average.weights (
np.ndarray) – Array of weights corresponding to the values in A. Must be broadcastable with A along the specified axis.axis (
int, optional) – Axis along which to compute the average. Defaults to -1 (last axis).
- Returns:
- Weighted average with NaN values properly handled. If all values
along an axis are NaN, the result will be NaN for that position.
- Return type:
np.ndarray
Examples
>>> import numpy as np >>> A = np.array([[1.0, 2.0, np.nan], [4.0, np.nan, 6.0]]) >>> weights = np.array([1, 2, 1]) >>> nanaverage(A, weights, axis=1) array([1.66666667, 5. ])
Note
Be careful with zero or negative weights as they may produce unexpected results. The function uses numpy’s masked array functionality for robust NaN handling.
- neurodent.core.utils.parse_path_to_animalday(filepath: str | Path, animal_param: tuple[int, str] | str | list[str] = (0, None), day_sep: str | None = None, mode: Literal['nest', 'concat', 'base', 'noday'] = 'concat', **day_parse_kwargs)[source]#
Parses the filename of a binfolder to get the animalday identifier (animal id, genotype, and day).
- Parameters:
filepath (
str | Path) – Filepath of the binfolder.animal_param (
tuple[int, str] | str | list[str], optional) – Parameter specifying how to parse the animal ID: tuple[int, str]: (index, separator) for simple split and index str: regex pattern to extract ID list[str]: list of possible animal IDs to match againstday_sep (
str, optional) – Separator for day in filename. Defaults to None.mode (
Literal['nest', 'concat', 'base', 'noday'], optional) –Mode to parse the filename. Defaults to ‘concat’.
’nest’: Extracts genotype/animal from parent directory name and date from filename. Example: “/WT_A10/recording_2023-04-01.*”
’concat’: Extracts all info from filename, expects genotype_animal_date format. Example: “/WT_A10_2023-04-01.*”
’base’: Same as concat
’noday’: Extracts only genotype and animal ID, uses default date. Example: “/WT_A10_recording.*”
**day_parse_kwargs – Additional keyword arguments to pass to parse_str_to_day function. Common options include parse_params dict for dateutil.parser.parse.
- Returns:
- Dictionary with keys “animal”, “genotype”, “day”, and “animalday” (concatenated).
Example: {“animal”: “A10”, “genotype”: “WT”, “day”: “Apr-01-2023”, “animalday”: “A10 WT Apr-01-2023”}
- Return type:
dict[str, str]
- Raises:
ValueError – If mode is invalid or required components cannot be extracted
TypeError – If filepath is not str or Path
- neurodent.core.utils.parse_str_to_genotype(string: str, strict_matching: bool = False) str[source]#
Parses the filename of a binfolder to get the genotype.
- Parameters:
string (
str) – String to parse.strict_matching (
bool, optional) – If True, ensures the input matches exactly one genotype. If False, allows overlapping matches and uses longest. Defaults to False for backward compatibility.
- Returns:
Genotype.
- Return type:
str
- Raises:
ValueError – When string cannot be parsed or contains ambiguous matches in strict mode.
Examples
>>> parse_str_to_genotype("WT_A10_data") 'WT' >>> parse_str_to_genotype("WT_KO_comparison", strict_matching=True) # Would raise error ValueError: Ambiguous match... >>> parse_str_to_genotype("WT_KO_comparison", strict_matching=False) # Uses longest match 'WT' # or 'KO' depending on which alias is longer
- neurodent.core.utils.parse_str_to_animal(string: str, animal_param: tuple[int, str] | str | list[str] = (0, None)) str[source]#
Parses the filename of a binfolder to get the animal id.
- Parameters:
string (
str) – String to parse.animal_param (
tuple[int,str] |str|list[str] (default:(0, None))) – Parameter specifying how to parse the animal ID: tuple[int, str]: (index, separator) for simple split and index. Not recommended for inconsistent naming conventions. str: regex pattern to extract ID. Most general use case. If multiple matches are found, returns the first match. list[str]: list of possible animal IDs to match against. Returns first match in list order, case-sensitive, ignoring empty strings.
- Returns:
Animal id.
- Return type:
str
Examples
# Tuple format: (index, separator) >>> parse_str_to_animal(“WT_A10_2023-01-01_data.bin”, (1, “_”)) ‘A10’ >>> parse_str_to_animal(“A10_WT_recording.bin”, (0, “_”)) ‘A10’
# Regex pattern format >>> parse_str_to_animal(“WT_A10_2023-01-01_data.bin”, r”Ad+”) ‘A10’ >>> parse_str_to_animal(“subject_123_data.bin”, r”d+”) ‘123’
# List format: possible IDs to match >>> parse_str_to_animal(“WT_A10_2023-01-01_data.bin”, [“A10”, “A11”, “A12”]) ‘A10’ >>> parse_str_to_animal(“WT_A10_data.bin”, [“B15”, “C20”]) # No match ValueError: No matching ID found in WT_A10_data.bin from possible IDs: [‘B15’, ‘C20’]
- neurodent.core.utils.parse_str_to_day(string: str, sep: str | None = None, parse_params: dict | None = None, parse_mode: Literal['full', 'split', 'window', 'all'] = 'split', date_patterns: list[tuple[str, str]] | None = None) datetime[source]#
Parses the filename of a binfolder to get the day.
- Parameters:
string (
str) – String to parse.sep (
str, optional) – Separator to split string by. If None, split by whitespace. Defaults to None.parse_params (
dict, optional) – Parameters to pass to dateutil.parser.parse. Defaults to {‘fuzzy’:True}.parse_mode (
Literal["full", "split", "window", "all"], optional) – Mode for parsing the string. Defaults to “split”. “full”: Try parsing the entire cleaned string only “split”: Try parsing individual tokens only “window”: Try parsing sliding windows of tokens (2-4 tokens) only “all”: Use all three approaches in the order “full”, “split”, “windowdate_patterns (
list[tuple[str, str]], optional) – List of (regex_pattern, strptime_format) tuples to try before falling back to token-based parsing. This allows users to specify exact formats to handle ambiguous cases like MM/DD/YYYY vs DD/MM/YYYY. Only used in “split” and “all” modes. Defaults to None (no regex patterns).
- Returns:
Datetime object corresponding to the day of the binfolder.
- Return type:
datetime
- Raises:
ValueError – If no valid date token is found in the string.
TypeError – If date_patterns is not a list of tuples.
Examples
>>> # Handle ambiguous date formats with explicit patterns >>> patterns = [(r'(19\d{2}|20\d{2})-(\d{1,2})-(\d{1,2})', '%Y-%m-%d')] >>> parse_str_to_day('2001_2023-07-04_data', date_patterns=patterns) datetime.datetime(2023, 7, 4, 0, 0)
>>> # European format pattern >>> patterns = [(r'(\d{1,2})/(\d{1,2})/(19\d{2}|20\d{2})', '%d/%m/%Y')] >>> parse_str_to_day('04/07/2023_data', date_patterns=patterns) datetime.datetime(2023, 7, 4, 0, 0) # July 4th, not April 7th
Note
When date_patterns is provided, users have full control over date interpretation. Without date_patterns, the function falls back to token-based parsing which may be ambiguous for formats like MM/DD/YYYY vs DD/MM/YYYY.
- neurodent.core.utils.parse_chname_to_abbrev(channel_name: str, assume_from_number=False, strict_matching=True) str[source]#
Parses the channel name to get the abbreviation.
- Parameters:
channel_name (
str) – Name of the channel.assume_from_number (
bool, optional) – If True, assume the abbreviation based on the last number in the channel name when normal parsing fails. Defaults to False.strict_matching (
bool, optional) – If True, ensures the input matches exactly one L/R alias and one channel alias. If False, allows multiple matches and uses longest. Defaults to True.
- Returns:
Abbreviation of the channel name.
- Return type:
str
- Raises:
ValueError – When channel_name cannot be parsed or contains ambiguous matches in strict mode.
KeyError – When assume_from_number=True but the detected number is not a valid channel ID.
Examples
>>> parse_chname_to_abbrev("left Aud") 'LAud' >>> parse_chname_to_abbrev("Right VIS") 'RVis' >>> parse_chname_to_abbrev("channel_9", assume_from_number=True) 'LAud' >>> parse_chname_to_abbrev("LRAud", strict_matching=False) # Would work in non-strict mode 'LAud' # Uses longest L/R match
- neurodent.core.utils.set_temp_directory(path: str | Path) None[source]#
Set the temporary directory for NeuRodent operations.
This function configures the temporary directory used by NeuRodent for intermediate files and operations. The directory will be created if it doesn’t exist.
- Parameters:
path (
str | Path) – Path to the temporary directory. Will be created if it doesn’t exist.- Return type:
None
Examples
>>> set_temp_directory("/tmp/neurodent_temp") >>> set_temp_directory(Path.home() / "neurodent_workspace" / "temp")
Note
This function modifies the TMPDIR environment variable, which affects the behavior of other temporary file operations in the process.
- neurodent.core.utils.get_temp_directory() Path[source]#
Get the current temporary directory used by NeuRodent.
- Returns:
Path object representing the current temporary directory.
- Return type:
Path
Examples
>>> temp_dir = get_temp_directory() >>> print(f"Current temp directory: {temp_dir}") Current temp directory: /tmp/neurodent_temp
- Raises:
KeyError – If TMPDIR environment variable is not set.
- Return type:
Path
- neurodent.core.utils.cache_fragments_to_zarr(np_fragments: ndarray, n_fragments: int, tmpdir: str | None = None) tuple[str, zarr.Array][source]#
Cache numpy fragments array to zarr format for efficient memory management.
This function converts a numpy array of recording fragments to a zarr array stored in a temporary location. This allows better memory management and garbage collection by avoiding keeping large numpy arrays in memory for extended periods.
- Parameters:
np_fragments (
np.ndarray) – Numpy array of shape (n_fragments, n_samples, n_channels) containing the recording fragments to cache.n_fragments (
int) – Number of fragments to cache (allows for subset caching).tmpdir (
str, optional) – Directory path for temporary zarr storage. If None, uses get_temp_directory(). Defaults to None.
- Returns:
- A tuple containing:
str: Path to the temporary zarr file
zarr.Array: The zarr array object for accessing cached data
- Return type:
tuple[str, zarr.Array]
- Raises:
ImportError – If zarr is not available
- neurodent.core.utils.get_file_stem(filepath: str | Path) str[source]#
Get the true stem for files, handling double extensions like .npy.gz.
- Return type:
str- Parameters:
filepath (str | Path)
- neurodent.core.utils.nanmean_series_of_np(x: Series, axis: int = 0) ndarray[source]#
Efficiently compute NaN-aware mean of a pandas Series containing numpy arrays.
This function is optimized for computing the mean across a Series where each element is a numpy array. It uses different strategies based on the size of the Series for optimal performance.
- Parameters:
x (
pd.Series) – Series containing numpy arrays as elements.axis (
int, optional) – Axis along which to compute the mean. Defaults to 0. - axis=0: Mean across the Series elements (most common) - axis=1: Mean within each array element
- Returns:
Array containing the computed means with NaN values properly handled.
- Return type:
np.ndarray
Examples
>>> import pandas as pd >>> import numpy as np >>> # Create a Series of numpy arrays >>> arrays = [np.array([1.0, 2.0, np.nan]), ... np.array([4.0, np.nan, 6.0]), ... np.array([7.0, 8.0, 9.0])] >>> series = pd.Series(arrays) >>> nanmean_series_of_np(series) array([4. , 5. , 7.5])
- Performance Notes:
For Series with more than 1000 elements containing numpy arrays, uses
np.stack()for better performanceFalls back to list conversion for smaller Series or mixed types
Handles shape mismatches gracefully by falling back to the slower method
- neurodent.core.utils.log_transform(rec: ndarray, **kwargs) ndarray[source]#
Log transform the signal
- Parameters:
rec (
np.ndarray) – The signal to log transform.- Returns:
ln(rec + 1)
- Return type:
np.ndarray
- neurodent.core.utils.sort_dataframe_by_plot_order(df: DataFrame, df_sort_order: dict | None = None) DataFrame[source]#
Sort DataFrame columns according to predefined orders.
- Parameters:
df (
pd.DataFrame) – DataFrame to sortdf_sort_order (
dict) – Dictionary mapping column names to the order of the values in the column.
- Returns:
Sorted DataFrame
- Return type:
pd.DataFrame- Raises:
ValueError – If df_sort_order is not a valid dictionary or contains invalid categories
- class neurodent.core.utils.Natural_Neighbor[source]#
Bases:
objectNatural Neighbor algorithm implementation for finding natural neighbors in a dataset.
This class implements the Natural Neighbor algorithm which finds mutual neighbors in a dataset by iteratively expanding the neighborhood radius until convergence.
- __init__()[source]#
Initialize the Natural Neighbor algorithm.
- nan_edges#
Graph of mutual neighbors
- Type:
dict
- nan_num#
Number of natural neighbors for each instance
- Type:
dict
- repeat#
Data structure that counts repetitions of the count method
- Type:
dict
- target#
Set of classes
- Type:
list
- data#
Set of instances
- Type:
list
- knn#
Structure that stores neighbors of each instance
- Type:
dict
- load(filename)[source]#
Load dataset from a CSV file, separating attributes and classes.
- Parameters:
filename (
str) – Path to the CSV file containing the dataset
- read(data: ndarray)[source]#
Load data directly from a numpy array.
- Parameters:
data (
np.ndarray) – Input data array
- asserts()[source]#
Initialize data structures for the algorithm.
Sets up the necessary data structures including: - nan_edges as an empty set - knn, nan_num, and repeat dictionaries for each instance
- count()[source]#
Count the number of instances that have no natural neighbors.
- Returns:
Number of instances with zero natural neighbors
- Return type:
int
- findKNN(inst, r, tree)[source]#
Find the indices of the k nearest neighbors.
- Parameters:
inst – Instance to find neighbors for
r (
int) – Radius/parameter for neighbor searchtree – KDTree object for efficient neighbor search
- Returns:
Array of neighbor indices (excluding the instance itself)
- Return type:
np.ndarray
- class neurodent.core.utils.TimestampMapper(file_end_datetimes: list[datetime], file_durations: list[float])[source]#
Bases:
objectMap each fragment to its source file’s timestamp.
This class provides functionality to map data fragments back to their original file timestamps when data has been concatenated from multiple files with different recording times.
- Parameters:
file_end_datetimes (list[datetime])
file_durations (list[float])
- file_end_datetimes#
The end datetimes of each source file.
- Type:
list[datetime]
- file_durations#
The durations of each source file in seconds.
- Type:
list[float]
- file_start_datetimes#
Computed start datetimes of each file.
- Type:
list[datetime]
- cumulative_durations#
Cumulative sum of file durations.
- Type:
np.ndarray
Examples
>>> from datetime import datetime, timedelta >>> # Set up files with known end times and durations >>> end_times = [datetime(2023, 1, 1, 12, 0), datetime(2023, 1, 1, 13, 0)] >>> durations = [3600.0, 1800.0] # 1 hour, 30 minutes >>> mapper = TimestampMapper(end_times, durations) >>> >>> # Get timestamp for fragment at index 2 with 60s fragments >>> timestamp = mapper.get_fragment_timestamp(2, 60.0) >>> print(timestamp) 2023-01-01 11:02:00
- __init__(file_end_datetimes: list[datetime], file_durations: list[float])[source]#
Initialize the TimestampMapper.
- Parameters:
file_end_datetimes (
list[datetime]) – The end datetimes of each file.file_durations (
list[float]) – The durations of each file in seconds.
- Raises:
ValueError – If the lengths of file_end_datetimes and file_durations don’t match.
- get_fragment_timestamp(fragment_idx: int, fragment_len_s: float) datetime[source]#
Get the timestamp for a specific fragment based on its index and length.
- Parameters:
fragment_idx (
int) – The index of the fragment (0-based).fragment_len_s (
float) – The length of each fragment in seconds.
- Returns:
The timestamp corresponding to the start of the specified fragment.
- Return type:
datetime
Examples
>>> # Get timestamp for the 5th fragment (index 4) with 30-second fragments >>> timestamp = mapper.get_fragment_timestamp(4, 30.0) >>> # This returns the timestamp 2 minutes into the first file
- neurodent.core.utils.validate_timestamps(timestamps: list[datetime], gap_threshold_seconds: float = 60) list[datetime][source]#
Validate that timestamps are in chronological order and check for large gaps.
- Parameters:
timestamps (
list[datetime]) – List of timestamps to validategap_threshold_seconds (
float, optional) – Threshold in seconds for warning about large gaps. Defaults to 60.
- Returns:
The validated timestamps in chronological order
- Return type:
list[datetime]
- Raises:
ValueError – If no valid timestamps are provided
- neurodent.core.utils.should_use_cached_file(cache_path: str | Path, source_paths: list[str | Path], use_cached: Literal['auto', 'always', 'never', 'error'] = 'auto') bool[source]#
Determine whether to use a cached intermediate file based on caching policy and file timestamps.
- Parameters:
cache_path (
str|Path) – Path to the cached intermediate filesource_paths (
list[str|Path]) – List of source file paths that the cache depends onuse_cached (
Literal['auto','always','never','error'] (default:'auto')) – Caching policy - “auto”: Use cached if exists and newer than all sources (default) - “always”: Always use cached if it exists - “never”: Never use cached (always regenerate) - “error”: Raise error if cached doesn’t exist
- Returns:
True if cached file should be used, False if it should be regenerated
- Return type:
bool
- Raises:
FileNotFoundError – When use_cached=”error” and cache doesn’t exist
ValueError – For invalid use_cached values
- neurodent.core.utils.get_cache_status_message(cache_path: str | Path, use_cached: bool) str[source]#
Generate a descriptive message about cache usage for logging.
- Return type:
str- Parameters:
cache_path (str | Path)
use_cached (bool)
- neurodent.core.utils.should_use_cache_unified(cache_path: str | Path, source_paths: list[str | Path], cache_policy: Literal['auto', 'always', 'force_regenerate']) bool[source]#
Unified cache decision logic for all intermediate files.
- Parameters:
cache_path (
str|Path) – Path to the cache filesource_paths (
list[str|Path]) – List of source file paths to check timestamps againstcache_policy (
Literal['auto','always','force_regenerate']) – Caching policy: - “auto”: Use cache if exists and newer than sources, regenerate with logging if missing/invalid - “always”: Use cache if exists, raise error if missing/invalid - “force_regenerate”: Always regenerate and overwrite existing cache
- Returns:
True if cache should be used, False if should regenerate
- Return type:
bool
- Raises:
ValueError – If cache_policy is invalid