AI and Machine Learning for Segmentation

The Quantitative Bottleneck in Volume Electron Microscopy

Serial block-face scanning electron microscopy (SBF-SEM) has fundamentally reshaped the landscape of structural biology by permitting the acquisition of three-dimensional tissue ultrastructure at nanoscopic resolution [8, 35, 76, 100, 130, 149, 175]. An automated ultramicrotome mounted within the vacuum chamber of a scanning electron microscope sequentially removes ultrathin sections of a resin-embedded sample, scanning the newly exposed block face iteratively [8, 35, 180]. While this dramatically reduces the z-axis misalignment artifacts common in traditional serial section transmission electron microscopy (ssTEM) [14, 48], the resulting scale of the data is staggering. A single SBF-SEM imaging session can routinely capture volumes spanning hundreds of micrometres [115], yielding complex, highly detailed datasets that range in size from tens of gigabytes to several terabytes—often encompassing tens of thousands of mitochondria or millions of densely packed neurites [130, 149, 175] (Figure 24).

Figure 24.An Integrated Solution for the Complete Serial Block-Face Scanning Electron Microscopy Workflow: From Image Acquisition to Data Processing. [19]. Schematic depiction of an automated SBF-SEM experiment using TESCAN SEM, ConnectomX Katana microtome and open-source acquisition software SBEMimage. The automated execution of an actual SBF-SEM experiment is ensured by open-source Python-based acquisition software SBEMimage [4]. The applicat

This explosion in data acquisition capability has shifted the primary experimental bottleneck from tissue sectioning and imaging to downstream data processing and analysis [76, 150, 177]. To extract meaningful quantitative morphological metrics or map complex connectomic wiring diagrams, structural elements within the volumetric dataset must be reliably delineated [14, 150]. Historically, this process of image segmentation was achieved through manual annotation, requiring a human expert to trace the boundaries of target structures across thousands of individual two-dimensional slices [14, 100, 150, 171, 175, 177, 180]. Manual segmentation is excruciatingly slow, prone to inter-operator bias, and entirely unscalable; reconstructing a mere 1 mm³ of mammalian cortex would theoretically require over two millennia of continuous human labor [14]. Consequently, the development of computer-aided automated and semi-automated segmentation algorithms has become a paramount priority in the field of volume electron microscopy (vEM) [14, 149, 175] (Figures 25–27).

Figure 25.SyConn2: dense synaptic connectivity inference for volume electron microscopy [173]. SyConn2 processing on the voxel level. Neuron segmentation and ultrastructure prediction (synaptic junctions (sj) in red; mitochondria (mi) in blue; vesicle clouds in green) derived from raw data. Contact sites and synaptic junction are assigned as queryable instances to neuron reconstructio

Figure 26.Example test image from the 2D Electron Microscopy Cell dataset, along with the corresponding ground-truth. Gray elements in the ground-truth indicate parts whose label can not be decided with certainty by a human expert. (a) (b) Fig. 10. Segmentation of the image in Fig. 9. (a) In red, segm

Figure 27.A workflow for the automatic segmentation of organelles in electron microscopy image stacks [150]. The manual segmentation of organelles from SBEM image stacks represents a signiﬁcant bottleneck to quantitative analyses. (A) A typical SBEM dataset consists of individual image slices collected in increments of δ nm, with the values of δ reported in the literature typically falling in the

From Heuristics to Interactive Machine Learning

Early attempts to automate the segmentation of vEM datasets relied upon traditional computer vision algorithms such as intensity thresholding, watershed transforms, morphological filtering, and region-growing paradigms [9, 20, 35, 48, 100, 115, 149, 150, 171]. Thresholding assigns voxels to a specific class if their grayscale intensity falls within a predefined range [20, 35, 100, 149]. While highly computationally efficient, these heuristic methods are exceptionally fragile when applied to SBF-SEM data. Electron micrographs lack the distinct multichannel fluorophore targeting of light microscopy; instead, all structures are rendered in a single grayscale channel dependent on the differential uptake of heavy metal stains [8, 76, 170, 172]. Because functionally distinct organelles—such as lysosomes and lipid-rich myelin sheaths—may exhibit nearly identical grayscale intensities, simple thresholding inevitably leads to severe over-segmentation or under-segmentation [76, 172] (Figure 28).

Figure 28.A guide to analysis and reconstruction of serial block face scanning electron microscopy data [35]. Proposed workflow to aid in decision making when choosing appropriate segmentation methods for analysis of SBF-SEM data. The majority of these segmentation methods can be used in MIB and Amira, with some exceptions, the watershed segmentation which is not shown in this paper. The decision to

To overcome the limitations of strictly heuristic approaches, the field transitioned toward shallow, interactive machine learning (ML) models [9, 76, 169]. One of the most widespread frameworks in this domain is the Random Forest algorithm [150, 169], popularized by open-source platforms such as *ilastik* [9, 16, 100, 115, 149, 153, 169, 179] and the *Trainable Weka Segmentation* plugin for Fiji/ImageJ [16, 172]. In these interactive workflows, the user provides sparse annotations by "painting" brush strokes over the foreground objects and the background [9, 169]. The software computes a rich array of local image features for every voxel—including Gaussian smoothing for intensity, Hessian eigenvalues for ridges and edges, and structure tensors for texture—which are then fed into the Random Forest classifier [169, 179].

These interactive ML tools offer several advantages: they train rapidly, provide real-time feedback allowing for iterative correction, and demand very little computational power relative to deep learning [169]. However, traditional pixel-classification workflows are inherently limited by their restricted receptive fields. Because they classify voxels based primarily on local texture and edge features without synthesizing broad spatial context, they often fail to capture the complex, long-range morphological characteristics of highly branched organelles like the endoplasmic reticulum, or intertwined neuronal processes [169].

The Deep Learning Revolution: U-Net and Volumetric Architectures

The most profound revolution in SBF-SEM image segmentation has been the advent of artificial intelligence, specifically Deep Convolutional Neural Networks (DCNNs) [16, 162, 169, 170, 171, 174]. Unlike traditional ML models that rely on hand-crafted feature filters, CNNs autonomously learn optimal hierarchical feature representations directly from the training data through backpropagation [162, 169]. Early CNN applications in EM utilized sliding-window patch classification [16, 150], but these were computationally inefficient and lacked global context.

The breakthrough for biomedical imaging came with the introduction of the fully convolutional U-Net architecture [149, 163, 169, 170, 175]. U-Net derives its name from its symmetric, U-shaped design, comprising a contracting (encoder) pathway and an expansive (decoder) pathway [169, 170, 171, 181]. The encoder employs sequential convolutional and pooling layers to compress the spatial dimensions of the image while exponentially increasing the depth of extracted feature channels, effectively capturing deep semantic context [169, 170]. The decoder progressively upsamples this low-resolution feature map to restore the original spatial resolution [170, 171]. Crucially, U-Net utilizes "skip connections" that concatenate the high-resolution feature maps from the encoder directly with the upsampled representations in the decoder [169, 170, 171, 181]. This architectural innovation allows the network to synthesize broad contextual understanding with precise, pixel-level boundary localization, achieving unprecedented segmentation accuracy even with limited training data [169, 170, 175, 181].

While the original 2D U-Net is highly effective, SBF-SEM datasets are inherently volumetric. Processing 3D volumes slice-by-slice discards valuable axial context, which human annotators naturally rely upon to trace continuous structures [149, 175]. To exploit this third spatial dimension, researchers developed 3D variants such as the 3D U-Net [16, 149, 163, 169, 170] and V-Net [149, 163]. These models replace 2D convolutions with 3D convolutional kernels, analyzing contiguous blocks of tissue simultaneously [16, 149, 169]. Although 3D architectures require substantially more graphical processing unit (GPU) memory, they drastically reduce "flickering" or structural discontinuities across the z-axis [149, 175]. Furthermore, highly advanced feature extractors like ResNet and Inception-ResNet-v2 have been integrated into the U-Net encoding pathways [16, 170, 171]. Residual connections in these networks bypass intermediate layers, preventing the vanishing gradient problem and allowing the training of profoundly deep networks capable of abstracting highly complex biological shapes [16, 170, 171].

Recognizing that optimal deep learning architecture depends heavily on the specific dataset, the field has recently gravitated toward self-configuring frameworks like nnU-Net [149, 163, 169]. Rather than requiring manual hyperparameter tuning, nnU-Net automatically analyzes the geometric and intensity properties of the input SBF-SEM dataset and dynamically configures the optimal preprocessing steps, network topology, batch sizes, and loss functions, frequently outperforming manually engineered networks [149, 163, 169].

Instance Segmentation and Connectomics Paradigms

Deep learning in SBF-SEM is broadly categorized into two distinct tasks: semantic segmentation and instance segmentation [149, 171]. Semantic segmentation predicts a class probability for every voxel (e.g., "mitochondrion" versus "background") [149, 171]. However, if two mitochondria are physically touching, semantic segmentation will fuse them into a single continuous blob. Instance segmentation goes a step further by distinctly delineating separate instances of the same class [149, 171].

Instance segmentation is the foundational challenge of "connectomics"—the comprehensive mapping of neuronal circuits [16, 86]. Densely packed axons and dendrites in neural tissue exhibit highly irregular, tortuous paths that span vast distances, rendering standard bounding-box object detection or simple watershed algorithms highly susceptible to split-and-merge errors [16, 171]. To solve this, researchers pioneered Flood-Filling Networks (FFNs) [16, 163, 171, 173, 180]. Unlike standard CNNs that predict all voxels simultaneously in a bottom-up approach, FFNs merge semantic and instance segmentation using a top-down, single-object tracking technique [16, 171]. Employing Recurrent Neural Networks (RNNs), an FFN is seeded at a specific voxel and sequentially "floods" the interior of the cellular process, utilizing previous shape predictions to inform subsequent tracking steps [16, 171]. This maintains the topological continuity of individual neurons over enormous volumes, demonstrating superhuman accuracy on dense circuit reconstructions [16, 163, 171, 180]. Similar multi-object tracking methods utilizing cross-classification clustering have also been successfully deployed [171].

For lower-resolution or large field-of-view SBF-SEM datasets, hybrid geometric pipelines like DeepACSON have been developed [171]. DeepACSON first uses a deep convolutional network to generate semantic probability maps of white matter ultrastructures. To separate touching, under-segmented myelinated axons, it subsequently applies a Cylindrical Shape Decomposition (CSD) algorithm that utilizes the innate tubular geometry of axons as a prior [171]. Similarly, sphericality priors and elastic deformations are used to disentangle adjacent cell nuclei, demonstrating how deep learning can be powerfully combined with traditional geometric constraints [171].

The Ground Truth Bottleneck and Data Augmentation

Despite their unparalleled accuracy, deep learning architectures are notoriously data-hungry, requiring massive amounts of meticulously labeled "ground truth" data for supervised learning [16, 76, 162]. Because generating voxel-perfect annotations in SBF-SEM requires highly specialized domain expertise, acquiring sufficient training data represents the most significant barrier to widespread AI adoption [14, 16, 76, 162, 163, 167, 171].

To circumvent this bottleneck, the bioimage analysis community has embraced multiple strategies. One of the most critical is aggressive data augmentation [16, 149, 171, 181]. By applying stochastic transformations to a small pool of manually annotated images, researchers can artificially inflate the size and diversity of the training set. Common augmentations for SBF-SEM include horizontal and vertical flipping, arbitrary rotations, and shifting [149, 171, 181]. More advanced techniques deploy elastic deformations to simulate the natural biological variance in organelle shape [171], as well as intensity variations, contrast adjustments, and the injection of additive Gaussian noise to mimic varying microscope beam conditions [181]. Some pipelines even intentionally simulate EM-specific imaging artifacts, such as missing z-sections or local misalignments, forcing the network to learn robust, artifact-resistant features [165].

Another revolutionary approach to the annotation deficit is crowdsourcing and citizen science [16, 76, 86, 115, 150]. Platforms like *EyeWire* and *FlyWire* distribute massive unannotated connectomic datasets to thousands of laymen volunteers globally, framing the tracing of neurons as an online puzzle game [16, 86]. The consensus of multiple amateur tracers is aggregated to correct algorithmic errors and generate pristine ground truth [16, 76, 86]. Similarly, distributed collaborative annotation environments such as *CATMAID* and *webKnossos* permit multi-institutional teams of scientists to synchronously skeleton-trace and proofread vast datasets in a shared, cloud-based space [16, 86, 115].

When analyzing inherently sparse structures—such as synaptic clefts or rare vesicular events—the extreme foreground-to-background class imbalance severely destabilizes standard CNN training [16, 167, 171, 181]. If 99% of a volume is background, a network can achieve a 99% accuracy simply by classifying everything as background. To counteract this, dynamic batch sampling is used to heavily favor regions containing the sparse target, and weighted loss functions (such as weighted Dice or cross-entropy loss) are implemented to disproportionately penalize false negatives [181]. Furthermore, some models utilize sparse annotation techniques, predicting dense 3D distance transforms rather than simple binary boundaries, which forces the network to capture broader spatial relationships from very weakly labeled data [16, 167].

Transfer Learning and Domain Adaptation

A persistent challenge in deep learning for SBF-SEM is the problem of "domain shift." An architecture meticulously trained to segment mitochondria in isotropic FIB-SEM mouse brain tissue will frequently fail catastrophically when applied to anisotropic SBF-SEM datasets of rat liver or human HeLa cells [33, 39, 118]. This fragility stems from variations in the physical appearance of organelles across cell types, differences in heavy metal staining protocols, and discrepancies in voxel resolution between different microscopic modalities [162, 163].

To overcome the lack of generalized models, developers increasingly rely on Transfer Learning [162, 163]. Instead of initializing a network with random parameters, transfer learning initializes the network with weights that have already been pre-trained on massive, diverse datasets [162, 163]. Because the initial convolutional layers of a CNN learn fundamental, low-level visual features—such as texture gradients, membranes, and edges—these features are highly conserved across almost all vEM datasets [162, 163]. Once pre-trained, the model undergoes "fine-tuning," where the weights of the deeper, class-specific layers are updated using only a minuscule subset (often just a few slices) of manually annotated data from the target SBF-SEM dataset [162, 163]. This dramatically reduces the time and labor required to segment novel tissue types.

An even more advanced paradigm is Domain Adaptation (DA), which attempts to correct domain shifts without requiring any new ground-truth annotations in the target dataset (unsupervised DA) [33, 39, 104, 118, 135, 153]. DA techniques mathematically align the high-dimensional feature distributions of the source domain (where labels are abundant) with the target domain (where labels are absent) [33, 39, 118]. Several architectures have been proposed for vEM domain adaptation. For example, Y-NET architectures append a secondary, parallel decoder pathway to the standard U-Net [33, 39, 104, 118, 153]. This secondary decoder functions as an auto-encoder, forced to accurately reconstruct both the source and target raw images from the shared latent space [33, 39, 104, 118, 153]. By minimizing the reconstruction loss across both domains simultaneously, the shared encoder is implicitly forced to learn domain-invariant structural representations [33, 39, 118, 153]. Alternatively, Domain-Adversarial Neural Networks (DANN) employ an adversarial classifier that attempts to guess which domain a feature map belongs to; the encoder is optimized to actively fool this classifier, thereby erasing domain-specific artifacts [104, 118, 153].

Generalist Models and the Democratization of AI Tools

The culmination of transfer learning and massive data aggregation has led to the recent emergence of highly robust "generalist" models, echoing the "Segment Anything" paradigms seen in natural image processing [149, 163]. Instead of relying on bespoke, single-use algorithms, laboratories can now leverage out-of-the-box foundation models.

A prominent example is *CellPose*, a generalist deep learning architecture initially developed for fluorescence microscopy but highly adaptable to SBF-SEM via transfer learning [9, 124, 162, 167]. CellPose replaces the traditional binary boundary predictions with vector gradient fields [124, 162, 167]. The network predicts topological maps that simulate a physical "flow" pointing toward the geometric center of every individual object [162]. This vector flow inherently resolves touching objects and achieves outstanding instance segmentation. Researchers have demonstrated that fine-tuning a pre-trained CellPose model on a fraction of SBF-SEM platelet and thrombi images yields near-perfect segmentation of densely packed, complex cellular domains where base models previously failed entirely [162].

Similarly, *MitoNet* (often accessed via the Empanada plugin in Napari) provides a generalized convolutional architecture explicitly trained on a massive and heterogeneous corpus of electron microscopy imagery containing over 1.5 million bounding boxes and masks [163, 178]. MitoNet is capable of highly accurate mitochondrial segmentation across radically different tissue types—from HeLa cells to rat pancreas—frequently in a "zero-shot" capacity, meaning it requires zero fine-tuning on the user's local dataset [163, 178]. The *Segment Anything Model 2 (SAM2)* has also been introduced to vEM; pre-trained on billions of masks across diverse visual media, SAM2 utilizes promptable bounding boxes or sparse click coordinates to instantly generate volumetric segmentation masks across an SBF-SEM z-stack [149, 163].

For these highly advanced models to yield a tangible impact, they must be accessible to biologists who lack computational or coding expertise [9, 19, 35]. To bridge this usability gap, the community has developed a rich ecosystem of graphical user interfaces and cloud-based platforms [9, 19, 35]. *DeepMIB* (Microscopy Image Browser) provides a comprehensive, open-source MATLAB environment integrating data alignment, manual curation, and deep learning network training in a single workflow [9, 16, 28, 81, 115, 171]. To mitigate the immense hardware cost of GPU arrays, cloud-computing paradigms like *CDeep3M* have been introduced, allowing researchers to upload SBF-SEM data to remote clusters where pre-configured state-of-the-art CNNs execute the segmentation natively in the cloud [9, 16, 149, 176]. Furthermore, tools such as *ZeroCostDL4Mic* [9, 16, 149] and *DeepImageJ* [16, 149] integrate seamlessly into familiar platforms like ImageJ/Fiji. These tools provide pre-written Jupyter notebooks running on free Google Colab servers, fully democratizing access to supercomputing infrastructure.

By integrating U-Net architectures, self-configuring pipelines like nnU-Net, generalist transfer-learning models, and user-friendly open-source software, the analysis of SBF-SEM datasets is transitioning from a prohibitive, multi-year manual endeavor to a highly streamlined, automated computational pipeline.

References cited in this section (41)

[8]P. Borghgraef et al. (2023). Resin comparison for serial block face scanning volume electron microscopy.. Methods in cell biology DOI

[9]Christopher J. Peddie et al. (2022). Volume electron microscopy. Nature Reviews Methods Primers DOI

[14]Christopher J. Peddie, Lucy Collinson (2014). Exploring the third dimension: Volume electron microscopy comes of age. Micron DOI

[16]Arent J. Kievits et al. (2022). How innovations in methodology offer new prospects for volume electron microscopy. Journal of Microscopy DOI

[19]M. Koban, Markéta Machálková, Jakub Javůrek (2023). An Integrated Solution for the Complete Serial Block-Face Scanning Electron Microscopy Workflow: From Image Acquisition to Data Processing.. Microscopy and microanalysis : the official journal of Microscopy Society of America, Microbeam Analysis Society, Microscopical Society of Canada DOI

[20]David J. Smith, T. Starborg (2019). Serial block face scanning electron microscopy in cell biology: Applications and technology.. Tissue & cell DOI

[28]C. Guerin et al. (2019). Targeted Studies Using Serial Block Face and Focused Ion Beam Scan Electron Microscopy.. Journal of visualized experiments : JoVE DOI

[33]J. Mancuso (2010). 3 Dimensional Microscopy, Serial Block Face SEM. Microscopy and Microanalysis DOI

[35]E. Cocks et al. (2017). A guide to analysis and reconstruction of serial block face scanning electron microscopy data. Journal of Microscopy DOI

[39]Mariah J. Berner et al. (2024). Three-dimensional analysis of mitochondria in a patient-derived xenograft model of triple negative breast cancer reveals mitochondrial network remodeling following chemotherapy treatments. bioRxiv DOI

[48]Aaron Brookhouse et al. (2024). Anatomics MLT, an AI tool for large scale quantification of ultrastructural traits. bioRxiv DOI

[76]C. Karabağ et al. (2019). Segmentation and Modelling of the Nuclear Envelope of HeLa Cells Imaged with Serial Block Face Scanning Electron Microscopy. Journal of Imaging DOI

[81]Unknown. Untitled. **. DOI

[86]A. Wanner, Moritz A. Kirschmann, Christel Genoud (2015). Challenges of microtome‐based serial block‐face scanning electron microscopy in neuroscience. Journal of Microscopy DOI

[100]Akter Hussain et al. (2018). An automated workflow for segmenting single adult cardiac cells from large-volume serial block-face scanning electron microscopy data. Journal of Structural Biology DOI

[104]Daniel G. Booth et al. (2016). 3D-CLEM Reveals that a Major Portion of Mitotic Chromosomes Is Not Chromatin. Molecular Cell DOI

[115]A. Abdollahzadeh et al. (2018). Automated 3D Axonal Morphometry of White Matter. Scientific Reports DOI

[118]Florian Kleefeldt et al. (2025). Three-Dimensional Visualization of the Cardiac Stroma. Cells DOI

[124]Daniel Krentzel et al. (2023). CLEM-Reg: An automated point cloud based registration algorithm for correlative light and volume electron microscopy. None DOI

[130]A. Shomorony et al. (2015). Combining quantitative 2D and 3D image analysis in the serial block face SEM: application to secretory organelles of pancreatic islet cells. Journal of Microscopy DOI

[135]Alan R. F. Godwin et al. (2017). Defining the hierarchical organisation of collagen VI microfibrils at nanometre to micrometre length scales. Acta Biomaterialia DOI

[149]Yihe Huang et al. (2025). A generalist deep-learning volume segmentation tool for volume electron microscopy of biological samples. Journal of Structural Biology DOI

[150]Alex J. Perez et al. (2014). A workflow for the automatic segmentation of organelles in electron microscopy image stacks. Frontiers in Neuroanatomy DOI

[153]Unknown. Untitled. **. DOI

[162]Catherine House et al. (2025). From imaging to computational domains for physics-driven molecular biology simulations: Hindered diffusion in platelet masses. PLOS Computational Biology DOI

[163]Ronald Xie et al. (2025). Transfer learning improves performance in volumetric electron microscopy organelle segmentation across tissues. Bioinformatics Advances DOI

[165]Benita Scout Mackay et al. (2020). Automated 3D Labelling of Fibroblasts and Endothelial Cells in SEM-Imaged Placenta using Deep Learning. None DOI

[167]Larissa Heinrich et al. (2021). Whole-cell organelle segmentation in volume electron microscopy. Nature DOI

[169]Leonie C. Schadt et al. (2024). Wrapped up: advancements in volume electron microscopy and application in myelin research. Methods in microscopy DOI

[170]C. Karabağ et al. (2020). Semantic segmentation of HeLa cells: An objective comparison between one traditional algorithm and four deep-learning architectures. PLoS ONE DOI

[171]Unknown. Untitled. **. DOI

[172]Garrett M. Fogo et al. (2021). Machine learning-based classification of mitochondrial morphology in primary neurons and brain. Scientific Reports DOI

[173]Philipp J. Schubert et al. (2022). SyConn2: dense synaptic connectivity inference for volume electron microscopy. Nature Methods DOI

[174]M. Guay, Z. Emam, R. Leapman (2019). Two-stage Neural Architecture Search for Microscopy Image Segmentation. Microscopy and Microanalysis DOI

[175]M. Guay, Z. Emam, R. Leapman (2018). Problems and Progress in Automating Electron Microscopy Segmentation. Microscopy and Microanalysis DOI

[176]T. Steinkellner et al. (2021). Genetic Probe for Visualizing Glutamatergic Synapses and Vesicles by 3D Electron Microscopy. ACS Chemical Neuroscience DOI

[177]Unknown. Untitled. **. DOI

[178]Arent J. Kievits et al. (2024). FAST-EM array tomography: a workflow for multibeam volume electron microscopy. Methods in microscopy DOI

[179]Kendrick Cetina, J. M. Buenaposada, L. Baumela (2018). Multi-class segmentation of neuronal structures in electron microscopy images. BMC Bioinformatics DOI

[180]Joergen Kornfeld, Winfried Denk (2018). Progress and remaining challenges in high-throughput volume electron microscopy. Current Opinion in Neurobiology DOI

[181]Felix Sigmund et al. (2023). Genetically encoded barcodes for correlative volume electron microscopy. Nature Biotechnology DOI

View all 287 references →

Back to SBF-SEM Review