cvat: a professional data annotation platform for building high-quality computer vision datasets
cvat: a professional data annotation platform for building high-quality computer vision datasets
What it solves
CVAT is a data annotation platform designed to help teams build high-quality visual datasets for computer vision and visual AI. It eliminates the manual effort of labeling images, videos, and 3D point clouds by providing a centralized environment for dataset management and collaboration.
How it works
Users upload visual data to a self-hosted server (deployed via Docker) and use a web-based interface to apply labels such as bounding boxes, polygons, and masks. The platform supports both manual labeling and AI-powered auto-labeling by connecting external ML models (via Nuclio) for tasks like detection, segmentation, and tracking. It also provides a Python SDK, CLI, and REST API for automating the data pipeline.
Who it’s for
It is built for research and production AI teams who need to create and manage large-scale visual datasets while maintaining full control over their data infrastructure.
Highlights
- Multi-modal Annotation: Supports images, videos, and 3D point clouds.
- AI-Assisted Labeling: Integrates with models like SAM, YOLO, and Mask RCNN to speed up the annotation process.
- Enterprise-Grade Collaboration: Includes multi-user support, role-based access, task assignments, and review workflows.
- Extensive Format Support: Imports and exports data in over 20 industry-standard formats, including COCO, YOLO, and Pascal VOC.
- Cloud Integration: Connects directly to cloud storage providers like AWS S3, Azure, and Google Cloud.
Sources
- undefinedcvat-ai/cvat