cvat: a professional data annotation platform for building high-quality computer vision datasets

cvat: a professional data annotation platform for building high-quality computer vision datasets

What it solves

CVAT is a data annotation platform designed to help teams build high-quality visual datasets for computer vision and visual AI. It eliminates the manual effort of labeling images, videos, and 3D point clouds by providing a centralized environment for dataset management and collaboration.

How it works

Users upload visual data to a self-hosted server (deployed via Docker) and use a web-based interface to apply labels such as bounding boxes, polygons, and masks. The platform supports both manual labeling and AI-powered auto-labeling by connecting external ML models (via Nuclio) for tasks like detection, segmentation, and tracking. It also provides a Python SDK, CLI, and REST API for automating the data pipeline.

Who it’s for

It is built for research and production AI teams who need to create and manage large-scale visual datasets while maintaining full control over their data infrastructure.

Highlights

  • Multi-modal Annotation: Supports images, videos, and 3D point clouds.
  • AI-Assisted Labeling: Integrates with models like SAM, YOLO, and Mask RCNN to speed up the annotation process.
  • Enterprise-Grade Collaboration: Includes multi-user support, role-based access, task assignments, and review workflows.
  • Extensive Format Support: Imports and exports data in over 20 industry-standard formats, including COCO, YOLO, and Pascal VOC.
  • Cloud Integration: Connects directly to cloud storage providers like AWS S3, Azure, and Google Cloud.

Sources