Building a Real-time Transcript System with WebSocket

- Published on
- /2 mins read/---
Introduction
Building a real-time transcript system requires careful consideration of architecture, state management, and performance optimization. In this article, we'll explore how to design and implement a robust transcript system that can handle real-time updates efficiently.
System Architecture
The transcript system follows a distributed architecture with a source-of-truth pattern:
graph TD
    Backend[Backend TranscriptService] --> |WebSocket Events| FrontendService[Frontend TranscriptsService]
    FrontendService --> |Notify| Store[Store Layer]
    Store --> |Render| UI[UI Components]
    UI --> |User Actions| FrontendService
    FrontendService --> |WebSocket Events| BackendBackend Design
The backend serves as the source of truth for all transcript data. It maintains:
- User-specific State Maps - Last spoke time tracking
- Current text buffers
- Translation states
- Transcript history
 
- Core Components 
class TranscriptService {
       /** Maximum time gap (ms) between speech segments */
       private readonly MAX_GAP = 3000;
 
       /** User-specific state maps */
       private readonly lastSpokeTime: Map<userId, { 
           interviewer: number; 
           candidate: number 
       }>;
       private readonly currentText: Map<userId, { 
           interviewer: string; 
           candidate: string 
       }>;
       private readonly transcripts: Map<userId, Map<transcriptId, TranscriptType>>;
}Frontend Architecture
The frontend maintains a synchronized local cache and handles:
- State Management
   class TranscriptsService {
       /** Local cache of transcripts */
       private transcripts: TranscriptType[];
 
       /** Feature configuration */
       private isAIEnabled: boolean;
       private isTranslateEnabled: boolean;
       private targetLanguage: 'en' | 'zh';
}- Message Types
   interface TranscriptMessage {
       type: 'transcript';
       transcript: TranscriptType;
       aiEnabled: boolean;
       role: Role;
       sessionId: string;
}Real-time Communication
WebSocket Events
- Backend to Frontend - transcript_update: New or updated transcript
- sync_transcripts: Initial state sync
- clear_transcripts: Reset notification
 
- Frontend to Backend - Configuration updates
- User actions
- State synchronization requests
 
Data Flow
- Initial Load
   class TranscriptsService {
       private setupSocketListeners() {
           this.socket.on('sync_transcripts', (data) => {
               this.transcripts = data;
               this.notifyStores();
           });
    }
}- Real-time Updates
   @SubscribeMessage('transcript')
   async handleTranscript(message: TranscriptMessage) {
       // Update source of truth
       const updated = await this.transcriptService.update(message);
       // Broadcast to all clients
       this.broadcast(updated);
}Error Handling
Connection Management
- Automatic Reconnection
   class TranscriptsService {
       private setupReconnection() {
           this.socket.on('disconnect', () => {
               this.reconnectAttempts = 0;
               this.scheduleReconnect();
        });
    }
 
       private scheduleReconnect() {
           if (this.reconnectAttempts >= MAX_RECONNECT_ATTEMPTS) {
               this.notifyError('Connection failed');
               return;
           }
           setTimeout(() => this.connect(), this.getBackoffDelay());
       }
}- State Recovery
   class TranscriptService {
       async recoverState(userId: string) {
           const transcripts = this.transcripts.get(userId);
           return Array.from(transcripts.values());
    }
}←  Previous postBuilding a Scalable Real-time Speech Recognition System
