Skip to main content

SAM3 — Segment Anything Model 3

SAM3 (Meta’s Segment Anything Model 3) provides universal segmentation for images and videos. It supports point prompts, bounding box prompts, text prompts, and video tracking.

Input formats

.png, .jpg, .jpeg, .tiff, .bmp, .webp, .mp4, .avi, .mov

Modes

Visual prompt segmentation (PVS)

Click on your image to provide prompts:
  • Point prompts — click foreground points (label=1) or background points (label=0) to guide segmentation
  • Box prompts — draw a bounding box around the object to segment
Best for: segmenting specific objects you can see and point to.

Text concept segmentation (PCS)

Describe what you want to segment in natural language:
  • “red neurons”
  • “cell nuclei”
  • “mitochondria”
Set a confidence threshold (0–1) to filter results by match quality. Best for: segmenting objects by semantic description without clicking.

Video tracking

Track objects across video frames:
  • Select an object in one frame using point or box prompts
  • Propagate the mask forward or backward through the video
  • Specify a start frame for tracking
Best for: following cells, animals, or structures through time-lapse or video data.

Z-stack propagation

For multi-plane images, propagate a segmentation mask across Z-slices automatically.

Parameters

ParameterRangeDefaultDescription
IoU Threshold0.5–1.00.7Minimum intersection-over-union for mask quality
Multi-mask outputToggleOffReturn multiple candidate masks
Return polygonsToggleOffReturn GeoJSON polygon outlines
Return instance IDsToggleOffLabel each segment with a unique ID
Start frameInteger0Starting frame for video tracking
Confidence threshold0–10.5Minimum confidence for text prompts

Outputs

OutputFormatDescription
Segmentation maskPNGBinary or labeled mask image
Polygon outlinesGeoJSONVector boundaries of segmented objects

Compute requirements

ResourceRequirement
GPUT4 minimum (16 GB VRAM), A100 recommended for video
Duration~20 seconds for images, ~60 seconds for video

Presets

PresetDescription
Visual PromptPoint/box-based segmentation for images
Text PromptNatural language description-based segmentation
Video TrackingObject tracking through video frames