WeClone: what it is, what problem it solves & why it's gaining traction
WeClone: what it is, what problem it solves & why it's gaining traction
What it solves
WeClone provides an end-to-end pipeline to create a digital avatar of a person based on their actual chat history. It allows users to clone the speaking style and personality of an individual by fine-tuning a Large Language Model (LLM) on exported messaging data, effectively creating a bot that mimics a specific person's "flavor" of conversation.
How it works
The project implements a complete workflow:
- Data Export & Preprocessing: It supports exporting chat records from platforms like Telegram (with support for images) and is building support for WhatsApp, Discord, and Slack. It uses Microsoft Presidio to filter out sensitive private information (phone numbers, emails, etc.) and allows for custom blocklists.
- Fine-tuning: It uses the Qwen2.5-VL-7B-Instruct model by default, employing the LoRA (Low-Rank Adaptation) method for Supervised Fine-Tuning (SFT). It integrates with LLaMA Factory for model training.
- Deployment: The fine-tuned model can be deployed as an API server or integrated into chatbot frameworks like AstrBot or LangBot to be used on platforms such as Discord, Telegram, and Slack.
Who it’s for
- Individuals wanting to create a digital twin or a personalized AI assistant that speaks like them or a loved one.
- Researchers experimenting with personality-driven LLM fine-tuning and multimodal (text and image) chat data.
Highlights
- End-to-End Pipeline: Covers everything from data export and cleaning to training and deployment.
- Multimodal Support: Supports fine-tuning with image data to better capture communication styles.
- Privacy-Focused: Includes built-in PII (Personally Identifiable Information) filtering to protect sensitive data during training.
- ** uma**
- Flexible Deployment: Compatible with various chatbot frameworks and messaging platforms via an OpenAI-compatible API server.
Sources
- undefinedxming521/WeClone