Preprocessing

Preprocessing functions for WSI registration

core.preprocessing.preprocessing.adjust_gamma(image, gamma=0.5)[source]

Apply gamma correction to an image

Parameters:

image – Input image
gamma – Gamma value

Returns:

Gamma corrected image

core.preprocessing.preprocessing.extract_tissue_masks(source_prep, target_prep, artefacts)[source]

Extract tissue masks from preprocessed images

Parameters:

source_prep – Preprocessed source image
target_prep – Preprocessed target image

Returns:

(source_mask, target_mask)

Return type:

tuple

core.preprocessing.preprocessing.gamma_corrections(img, gamma)[source]

core.preprocessing.preprocessing.load_slide(image_path_1: str, resolution: float = 0.625)[source]

core.preprocessing.preprocessing.load_wsi_images(source_path, target_path, resolution=0.625, data='', obj_power='')[source]

Load source and target WSI images

Parameters:

source_path – Path to source WSI
target_path – Path to target WSI
resolution – Resolution for loading
data – Dataset type (e.g. ‘anhir’)
obj_power – Objective power for VirtualWSIReader

Returns:

(source_wsi, target_wsi, source_image, target_image)

Return type:

tuple

Raises:

FileNotFoundError – If either WSI path does not exist on disk.

core.preprocessing.preprocessing.pad_images_np(source, target)[source]

core.preprocessing.preprocessing.pad_single(image, new_shape)[source]

core.preprocessing.preprocessing.pad_to_same_size(image_1: numpy.ndarray, image_2: numpy.ndarray, pad_value: float = 1.0)[source]

Pad two images to the same size.

Parameters:

image_1 – First image array with shape (height, width, channel), (batch, height, width, channel), (height, width), or (batch, channel, height, width)
image_2 – Second image array with shape (height, width, channel), (batch, height, width, channel), (height, width), or (batch, channel, height, width)
pad_value – Value to use for padding

Returns:

(padded_image_1, padded_image_2, padding_params)

Return type:

tuple

core.preprocessing.preprocessing.preprocess_images(source, target, normalize_stain: bool = False)[source]

Preprocess source and target images.

Parameters:

source – Source image array (RGB, uint8).
target – Target image array (RGB, uint8).
normalize_stain – When True, apply Macenko stain normalization to both images so that colour differences caused by staining variation do not interfere with the coarse registration step.

Returns:

(source_prep, target_prep)

Return type:

tuple

core.preprocessing.preprocessing.process_nuclei_patch(img, threshold, gamma=None, min_area=200)[source]

Process a single patch to detect nuclei

Parameters:

img – Input image patch
threshold – Binary threshold value
gamma – Gamma correction value (optional)
min_area – Minimum area for nuclei detection

Returns:

(binary_image, stats, centroids)

Return type:

tuple

core.preprocessing.preprocessing.resize_and_compute_translation(moving_image, fixed_image)[source]

Resizes fixed and moving images to the maximum dimensions using black padding and computes initial translation offsets for 2D or 3D images (where the 3rd dimension is the channel).

Parameters:

fixed_image (np.ndarray) – The fixed image (2D or 3D).
moving_image (np.ndarray) – The moving image (2D or 3D).

Returns:

The fixed image with padding. moving_padded (np.ndarray): The moving image with padding. translation (tuple): The translation offsets (tx, ty) for 2D or (tx, ty, channels) for 3D.

Return type:

fixed_padded (np.ndarray)

core.preprocessing.preprocessing.scale_transformation_matrix(transform_matrix, input_res, output_res)[source]

Scale transformation matrix to different resolution

Parameters:

transform_matrix – Input transformation matrix
input_res – Input resolution
output_res – Output resolution

Returns:

Scaled transformation matrix

class core.preprocessing.tissuemask.FlorenceTissueMaskExtractor(unet_model_path: str = '', unet_device: str = 'cuda')[source]

Bases: object

extract(image: numpy.ndarray, artefacts: bool) → numpy.ndarray[source]

Extracts the tissue mask from an image using instance segmentation or fallback methods.

Extraction order:

Florence-2 + SAM2 prompt-based instance segmentation.
If that fails and a UNet model path is provided, the UNet extractor.
Final fallback: Otsu threshold with morphological cleanup.

Parameters:

image (np.ndarray) – Input RGB image.
artefacts (bool) – When True, return only the first (largest) segment mask so that control tissue artefacts are isolated.

Returns:

Binary tissue mask (uint8, values 0 or 255).

Return type:

np.ndarray

class core.preprocessing.tissuemask.UNetTissueMaskExtractor(model_path: str, device: str = 'cuda')[source]

Bases: object

__init__(model_path: str, device: str = 'cuda')[source]

Parameters:

model_path (str) – Path to the pretrained UNet checkpoint.
device (str) – ‘cuda’ or ‘cpu’.

static convert_pytorch_checkpoint(net_state_dict)[source]: Convert checkpoint from DataParallel to single-GPU format.

extract_masks(image: numpy.ndarray) → numpy.ndarray[source]

Generate a tissue mask for a single image using UNet segmentation.

Parameters:: image (np.ndarray) – Input RGB image.
Returns:: Processed binary tissue mask.
Return type:: np.ndarray

static post_processing_mask(mask: numpy.ndarray) → numpy.ndarray[source]: Fill holes and keep only the largest object in the binary mask.

Nuclei detection and analysis functions (tissue-aware, watershed-based) with accurate area estimation.

core.preprocessing.nuclei_analysis.create_nuclei_dataframe_from_points(points, area_values=None)[source]

core.preprocessing.nuclei_analysis.detect_nuclei_patch_watershed(img, min_area=25)[source]

Detect nuclei in a patch using watershed and estimate area via contours.

Returns:: list of dicts {‘area’: area} nuclei_centroids: np.array of centroids [[x1, y1], [x2, y2], …]
Return type:: nuclei_stats

core.preprocessing.nuclei_analysis.extract_nuclei_points(df, columns=['global_x', 'global_y'])[source]

core.preprocessing.nuclei_analysis.extract_patches_from_wsi(wsi, mask, patch_size=(1000, 1000), stride=(1000, 1000))[source]

core.preprocessing.nuclei_analysis.load_nuclei_coordinates(csv_path)[source]

core.preprocessing.nuclei_analysis.process_fixed_patch(patch_extractor, patch_idx)[source]

core.preprocessing.nuclei_analysis.process_moving_patch(patch_extractor, tfm, patch_idx)[source]

core.preprocessing.nuclei_analysis.process_nuclei_in_patches(fixed_patch_extractor, tfm, start_index=0, end_index=None)[source]

core.preprocessing.nuclei_analysis.save_nuclei_data_to_csv(fixed_nuclei_data, moving_nuclei_data, fixed_csv_path, moving_csv_path)[source]

core.preprocessing.nuclei_analysis.subsample_nuclei(df, n_samples, random_state=42)[source]

class core.preprocessing.stainnorm.StainNormalizer(Io=240, alpha=1, beta=0.15)[source]

Bases: object

Class to normalize staining appearance of H&E stained images. Based on Macenko et al., ISBI 2009 and Vink et al., J Microscopy, 2013.

__init__(Io=240, alpha=1, beta=0.15)[source]: Initialize with default parameters.

compute_svd(ODhat)[source]: Computes SVD to find stain vectors.

extract_H_E(C2)[source]: Extracts Hematoxylin and Eosin stain images.

find_stain_vectors(eigvecs, ODhat)[source]: Finds the hematoxylin and eosin stain vectors.

process(img_path)[source]: Main workflow to normalize image and extract stains.

read_image(path)[source]: Reads and converts an image to RGB.

recreate_image(C2)[source]: Recreates the normalized image from the separated components.

remove_transparent_pixels(OD)[source]: Removes pixels with OD intensity less than beta.

rgb_to_od(img)[source]: Converts RGB image to Optical Density (OD).

separate_stains(OD, HE)[source]: Separates the image into Hematoxylin and Eosin components.

core.preprocessing.padding.apply_padding_landmarks(landmarks: numpy.ndarray, pad: List[Tuple[int, int]]) → numpy.ndarray[source]: Applies padding to landmark coordinates.

core.preprocessing.padding.calculate_pad_value(size_1: Iterable[int], size_2: Iterable[int]) → Tuple[List[Tuple[int, int]], List[Tuple[int, int]]][source]: Calculates the padding required to make two images the same size.

core.preprocessing.padding.pad_image_src(source: numpy.ndarray, target: numpy.ndarray, pad_value: int = 255) → Tuple[numpy.ndarray, numpy.ndarray, Dict, numpy.ndarray, numpy.ndarray][source]

Pads or crops the source image so it matches the target’s size. Target remains unchanged.

Positive padding if source < target. Negative padding (crop) if source > target.

core.preprocessing.padding.pad_images(image_1: numpy.ndarray, image_2: numpy.ndarray, pad_value: float = 1.0) → Tuple[numpy.ndarray, numpy.ndarray, Dict, numpy.ndarray, numpy.ndarray][source]: Pads two images to the same size and optionally adjusts landmarks accordingly.

core.preprocessing.padding.pad_landmarks(padding_params, landmarks_1: numpy.ndarray | None = None, landmarks_2: numpy.ndarray | None = None) → Tuple[numpy.ndarray, numpy.ndarray, Dict, numpy.ndarray, numpy.ndarray][source]

core.preprocessing.padding.remove_padding(image, pad_tuple)[source]: Crop image or deformation field back to original (unpadded) size.