Hide code cell content
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

Part 3: Segmentation#


Separating an image into one or more regions of interest.#

Everyone has heard or seen Photoshop or a similar graphics editor take a person from one image and place them into another. The first step of doing this is identifying where that person is in the source image.

In popular culture, the Terminator’s vision segments humans out of the overall scene:

Segmentation is a fundamental operation in scientific image analysis because we often want to measure properties of real, physical objects such as cells embedded in our image. As such, we want to find those objects within our image.

Computationally, segmentations are most often represented as images, of the same size as the original image, containing integer labels, with one value representing one object.

Here is a very simple image and segmentation, taken from this scikit-image gallery example:

import numpy as np
import matplotlib.pyplot as plt
from scipy import ndimage as ndi

import napari

from skimage.segmentation import watershed
from skimage.feature import peak_local_max
from skimage.util import label_points
from skimage.color import label2rgb


# Generate an initial image with two overlapping circles
x, y = np.indices((80, 80))
x1, y1, x2, y2 = 28, 28, 44, 52
r1, r2 = 16, 20
mask_circle1 = (x - x1)**2 + (y - y1)**2 < r1**2
mask_circle2 = (x - x2)**2 + (y - y2)**2 < r2**2
image = np.logical_or(mask_circle1, mask_circle2)

# Now we want to separate the two objects in image
# Generate the markers as local maxima of the distance to the background
distance = ndi.distance_transform_edt(image)
coords = peak_local_max(distance, footprint=np.ones((3, 3)), labels=image)
markers = label_points(coords, distance.shape)
labels = watershed(-distance, markers, mask=image)

# finally, plot the result
fig, ax = plt.subplots(1, 3)
ax[0].imshow(image)
ax[0].set_title('image')
ax[1].imshow(label2rgb(labels))
ax[1].set_title('labels')
ax[2].imshow(labels)
ax[2].set_title('labels interpreted\nas image');
_images/07f6a86efa9aba3d73d96f727a8080b92fccf3e29cace377cfd56c8420b57ce1.png

Notice that “labels” is just a NumPy array with integer values. We have to be careful to interpret it as labels and not as an image.

The napari viewer#

To look at images and associated data, napari provides additional functionality over matplotlib, including:

  • the canvas is OpenGL, interactive, and responsive

  • the canvas has layers, which can be individually shown or hidden

  • the viewer is n-dimensional, which comes in handy when your images are more than 2D, such as MRI data or (as below) microscopy data

Here’s how to look at the same images with napari:

viewer = napari.Viewer()
image_layer = viewer.add_image(image)
labels_layer = viewer.add_labels(labels)
labels_as_image_layer = viewer.add_image(
    labels, name='labels as image'
)

Unfortuntely, unlike other jupyter widgets, napari is not embedded inside the jupyter notebook. This is because the graphical parts of napari are written in Qt, making it hard to embed on the web.

However, we can take a screenshot of the current state of napari viewer and embed that in the notebook. This can be useful for teaching or sharing purposes where you might want to share key steps in an analysis which makes use of interactive components.

To do this, we use the nbscreenshot utility function

from napari.utils import nbscreenshot

nbscreenshot(viewer)

Note

Unfortunately, in contrast with the real napari viewer, these screenshots won’t be interactive!

Exercise 0: play with the napari viewer#

  • zoom with scroll, pan with click and drag

  • try clicking on the 👁️ icon on each layer in the layer list to make them invisible or visible again

  • try alt-clicking on the icon: this makes every other layer invisible

  • try changing the opacity, colormap, or interpolation on a layer

  • try turning on “grid mode”, from the group of buttons at the bottom-left of the viewer.

  • select the labels layer, click on the paintbrush tool, and tweak the segmentation where the two labels meet.

Segmenting nuclei and measuring cell properties#

In the rest of this notebook, we will segment cell nuclei from a small sample image provided by the Allen Institute for Cell Science.

from skimage import data

cells3d = data.cells3d()
Downloading file 'data/cells3d.tif' from 'https://gitlab.com/scikit-image/data/-/raw/2cdc5ce89b334d28f06a58c9f0ca21aa6992a5ba/cells3d.tif' to '/home/runner/.cache/scikit-image/0.24.0'.
cells3d.shape  # zcyx
(60, 2, 256, 256)
cells3d.dtype
dtype('uint16')
np.max(cells3d)
65535

The pixel spacing in this dataset is 0.29µm in the z (leading!) axis, and 0.26µm in the x and y axes.

spacing = np.array([0.29, 0.26, 0.26])

We can view the 3D image using napari.

viewer, (membrane_layer, nuclei_layer) = napari.imshow(
    cells3d,
    channel_axis=1,  # remember, Python uses 0-based indexing!
    scale=spacing,
    name=['membranes', 'nuclei'],
    ndisplay=3,
)
Hide code cell source
viewer.dims.ndisplay = 3
viewer.camera.angles = (-30, 25, 120)
nbscreenshot(viewer, canvas_only=True)

Edge detection#

We saw the Sobel operator in the filters lesson. It is an edge detection algorithm that approximates the gradient of the image intensity, and is fast to compute. The Scharr filter is a slightly more sophisticated version, with smoothing weights [3, 10, 3]. Finally, the Farid & Simoncelli filter has even more sophisticated weights. All three work for n-dimensional images in scikit-image.

# grab individual channels and convert to float in [0, 1]

membranes = cells3d[:, 0, :, :] / np.max(cells3d)
nuclei = cells3d[:, 1, :, :] / np.max(cells3d)
from skimage import filters


edges = filters.farid(nuclei)

edges_layer = viewer.add_image(
    edges,
    scale=spacing,
    blending='additive',
    colormap='yellow',
)

Thresholding#

Thresholding is used to create binary images. A threshold value determines the intensity value separating foreground pixels from background pixels. Foregound pixels are pixels brighter than the threshold value, background pixels are darker. In many cases, images can be adequately segmented by thresholding followed by labelling of connected components, which is a fancy way of saying “groups of pixels that touch each other”.

Different thresholding algorithms produce different results. Otsu’s method and Li’s minimum cross entropy threshold are two common algorithms. Below, we use Li. You can use skimage.filters.threshold_<TAB> to find different thresholding methods.

denoised = ndi.median_filter(nuclei, size=3)
li_thresholded = denoised > filters.threshold_li(denoised)
threshold_layer = viewer.add_image(
    li_thresholded,
    scale=spacing,
    opacity=0.3,
)
Hide code cell source
nbscreenshot(viewer)

We can see that the thresholded value is full of small holes, because of variation in pixel intensity inside the nuclei. Before proceeding further, we’d like to clean up those holes. Additionally, because of variations in the background intensity, some spurious small clumps of background pixels appear as foreground.

Morphological operations#

Mathematical morphology operations and structuring elements are defined in skimage.morphology. Structuring elements are shapes which define areas over which an operation is applied. The response to the filter indicates how well the neighborhood corresponds to the structuring element’s shape.

There are a number of two and three dimensional structuring elements defined in skimage.morphology. Not all 2D structuring element have a 3D counterpart. The simplest and most commonly used structuring elements are the disk/ball and square/cube.

Functions operating on connected components can remove small undesired elements while preserving larger shapes.

skimage.morphology.remove_small_holes fills holes and skimage.morphology.remove_small_objects removes bright regions. Both functions accept a size parameter, which is the minimum size (in pixels) of accepted holes or objects. It’s useful in 3D to think in linear dimensions, then cube them. In this case, we remove holes / objects of the same size as a cube 20 pixels across.

from skimage import morphology
width = 20

remove_holes = morphology.remove_small_holes(
    li_thresholded, width ** 3
)
width = 20

remove_objects = morphology.remove_small_objects(
    remove_holes, width ** 3
)

viewer.add_image(
    remove_objects,
    name='cleaned',
    scale=spacing,
    opacity=0.3,
);

viewer.layers['li_thresholded'].visible = False
Hide code cell source
nbscreenshot(viewer)

Segmentation#

Now we are ready to label the connected components of this image.

from skimage import measure

labels = measure.label(remove_objects)

viewer.add_labels(
    labels,
    scale=spacing,
    opacity=0.5,
)

viewer.layers['cleaned'].visible = False
Hide code cell source
nbscreenshot(viewer)

We can see that tightly packed cells connected in the binary image are assigned the same label.

A better segmentation would assign different labels to different nuclei.

Typically we use watershed segmentation for this purpose. We place markers at the center of each object, and these labels are expanded until they meet an edge or an adjacent marker.

The trick, then, is how to find these markers.

A common approach, which we saw above with the two overlapping circles, is to compute the distance transform of a binary mask, and place markers at the local maxima of that transform.

As you will see below, it can be quite challenging to find markers with the right location. A slight amount of noise in the image can result in very wrong point locations.

transformed = ndi.distance_transform_edt(
    remove_objects, sampling=spacing
)

maxima = morphology.local_maxima(transformed)
viewer.add_points(
    np.transpose(np.nonzero(maxima)),
    name='bad points',
    scale=spacing,
    size=4,
    n_dimensional=True,  # points have 3D "extent"
)
<Points layer 'bad points' at 0x7fd46c624790>
Hide code cell source
nbscreenshot(viewer)

You can see that these points are actually terrible, with many markers found within each nuclei.

Exercise 1: improve the points#

Try to improve the segmentation to assign one point for each nucleus. Some ideas:

  • use a smoothed version of the nuclei image directly

  • smooth the distance map

  • use morphological operations to smooth the surface of the nuclei to ensure that they are close to spherical

  • use peak_local_max with min_distance parameter instead of morphology.local_maxima

  • find points on a single plane, then prepend the plane index to the found coordinates

# Solution here

Mixing manual annotation and algorithms#

As you will have seen from the previous exercise, there are many approaches to find better seed points, but they are often fiddly and tedious, and sensitive to parameters — when you encounter a new image, you might have to start all over again!

With napari, in many cases, a little interactivity, combined with the segmentation algorithms in scikit-image and elsewhere, can quickly get us the segmentation we want.

Below, you can use full manual annotation, or light editing of the points you found automatically.

viewer.layers['bad points'].visible = False
viewer.dims.ndisplay = 2
viewer.dims.set_point(0, 30 * spacing[0])

points = viewer.add_points(
    name='interactive points',
    scale=spacing,
    ndim=3,
    size=4,
    n_dimensional=True,
)
points.mode = 'add'


# now, we annotate the centers of the nuclei in your image

Once you have marked all the points, you can grab the data back, and make a markers image for skimage.segmentation.watershed:

from skimage import segmentation, util

marker_locations = points.data


markers = util.label_points(marker_locations, nuclei.shape)
markers_big = morphology.dilation(markers, morphology.ball(5))

segmented = segmentation.watershed(
    edges,
    markers_big,
    mask=remove_objects,
)

final_segmentation = viewer.add_labels(
    segmented,
    scale=spacing,
    blending='translucent_no_depth',  # don't hide enclosed points
)

viewer.layers['labels'].visible = False

After watershed, we have better disambiguation between internal cells!

Making measurements#

Once we have defined our objects, we can make measurements on them using skimage.measure.regionprops and skimage.measure.regionprops_table. These measurements include features such as area or volume, bounding boxes, and intensity statistics.

Before measuring objects, it helps to clear objects from the image border. Measurements should only be collected for objects entirely contained in the image.

Given the layer-like structure of our data, we only want to clear the objects touching the sides of the volume, but not the top and bottom, so we pad and crop the volume along the 0th axis to avoid clearing the mitotic nucleus.

segmented_padded = np.pad(
    segmented,
    ((1, 1), (0, 0), (0, 0)),
    mode='constant',
    constant_values=0,
)
interior_labels = segmentation.clear_border(segmented_padded)[1:-1]

skimage.measure.regionprops automatically measures many labeled image features. Optionally, an intensity_image can be supplied and intensity features are extracted per object. It’s good practice to make measurements on the original image.

Not all properties are supported for 3D data. Below we build a list of supported and unsupported 3D measurements.

regionprops = measure.regionprops(interior_labels, intensity_image=nuclei)

supported = [] 
unsupported = []

for prop in regionprops[0]:
    try:
        regionprops[0][prop]
        supported.append(prop)
    except NotImplementedError:
        unsupported.append(prop)

print("3D/nD properties:")
print("  " + "\n  ".join(supported))
print()
print("2D-only properties:")
print("  " + "\n  ".join(unsupported))
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[35], line 6
      3 supported = [] 
      4 unsupported = []
----> 6 for prop in regionprops[0]:
      7     try:
      8         regionprops[0][prop]

IndexError: list index out of range

scikit-image 0.18 adds support for passing custom functions for region properties as extra_properties. After this tutorial, you might want to try it out to determine the surface area of the nuclei or cells!

skimage.measure.regionprops ignores the 0 label, which represents the background.

regionprops_table returns a dictionary of columns compatible with creating a pandas dataframe of properties of the data:

import pandas as pd


info_table = pd.DataFrame(
    measure.regionprops_table(
        interior_labels,
        intensity_image=nuclei,
        properties=['label', 'slice', 'area', 'mean_intensity', 'solidity'],
    )
).set_index('label')
info_table.head()

We can now use pandas and seaborn for some analysis!

import seaborn as sns

sns.scatterplot(x='area', y='solidity', data=info_table, hue='mean_intensity');

We can see that the mitotic nucleus is a clear outlier from the others in terms of solidity and intensity.

Exercise 2: physical measurements#

The “area” property above is actually the volume of the region, measured in voxels. Add a new column to your dataframe, 'area_um3', containing the volume in µm³. Hint: remember the voxel spacing we noted at the start of the tutorial. You only need pandas to do this.

# Solution here

Exercise 3: displaying regionprops (or other values)#

Now that you have segmented cells, (or even with just the nuclei), use skimage.util.map_array to display a volume of the value of a regionprop (say, ‘solidity’ of the cells) on top of the segments.

# Solution here