SAM3 — Segment Anything Model 3
SAM3 (Meta’s Segment Anything Model 3) provides universal segmentation for images and videos. It supports point prompts, bounding box prompts, text prompts, and video tracking.Input formats
.png, .jpg, .jpeg, .tiff, .bmp, .webp, .mp4, .avi, .mov
Modes
Visual prompt segmentation (PVS)
Click on your image to provide prompts:- Point prompts — click foreground points (label=1) or background points (label=0) to guide segmentation
- Box prompts — draw a bounding box around the object to segment
Text concept segmentation (PCS)
Describe what you want to segment in natural language:- “red neurons”
- “cell nuclei”
- “mitochondria”
Video tracking
Track objects across video frames:- Select an object in one frame using point or box prompts
- Propagate the mask forward or backward through the video
- Specify a start frame for tracking
Z-stack propagation
For multi-plane images, propagate a segmentation mask across Z-slices automatically.Parameters
| Parameter | Range | Default | Description |
|---|---|---|---|
| IoU Threshold | 0.5–1.0 | 0.7 | Minimum intersection-over-union for mask quality |
| Multi-mask output | Toggle | Off | Return multiple candidate masks |
| Return polygons | Toggle | Off | Return GeoJSON polygon outlines |
| Return instance IDs | Toggle | Off | Label each segment with a unique ID |
| Start frame | Integer | 0 | Starting frame for video tracking |
| Confidence threshold | 0–1 | 0.5 | Minimum confidence for text prompts |
Outputs
| Output | Format | Description |
|---|---|---|
| Segmentation mask | PNG | Binary or labeled mask image |
| Polygon outlines | GeoJSON | Vector boundaries of segmented objects |
Compute requirements
| Resource | Requirement |
|---|---|
| GPU | T4 minimum (16 GB VRAM), A100 recommended for video |
| Duration | ~20 seconds for images, ~60 seconds for video |
Presets
| Preset | Description |
|---|---|
| Visual Prompt | Point/box-based segmentation for images |
| Text Prompt | Natural language description-based segmentation |
| Video Tracking | Object tracking through video frames |