Building a Real-time Transcript System with WebSocket

Introduction

Building a real-time transcript system requires careful consideration of architecture, state management, and performance optimization. In this article, we'll explore how to design and implement a robust transcript system that can handle real-time updates efficiently.

System Architecture

The transcript system follows a distributed architecture with a source-of-truth pattern:

graph TD
    Backend[Backend TranscriptService] --> |WebSocket Events| FrontendService[Frontend TranscriptsService]
    FrontendService --> |Notify| Store[Store Layer]
    Store --> |Render| UI[UI Components]
    UI --> |User Actions| FrontendService
    FrontendService --> |WebSocket Events| Backend

Backend Design

The backend serves as the source of truth for all transcript data. It maintains:

User-specific State Maps
- Last spoke time tracking
- Current text buffers
- Translation states
- Transcript history
Core Components

class TranscriptService {
       /** Maximum time gap (ms) between speech segments */
       private readonly MAX_GAP = 3000;
 
       /** User-specific state maps */
       private readonly lastSpokeTime: Map<userId, { 
           interviewer: number; 
           candidate: number 
       }>;
       private readonly currentText: Map<userId, { 
           interviewer: string; 
           candidate: string 
       }>;
       private readonly transcripts: Map<userId, Map<transcriptId, TranscriptType>>;
}

Frontend Architecture

The frontend maintains a synchronized local cache and handles:

State Management

   class TranscriptsService {
       /** Local cache of transcripts */
       private transcripts: TranscriptType[];
 
       /** Feature configuration */
       private isAIEnabled: boolean;
       private isTranslateEnabled: boolean;
       private targetLanguage: 'en' | 'zh';
}

Message Types

   interface TranscriptMessage {
       type: 'transcript';
       transcript: TranscriptType;
       aiEnabled: boolean;
       role: Role;
       sessionId: string;
}

Real-time Communication

WebSocket Events

Backend to Frontend
- transcript_update: New or updated transcript
- sync_transcripts: Initial state sync
- clear_transcripts: Reset notification
Frontend to Backend
- Configuration updates
- User actions
- State synchronization requests

Data Flow

Initial Load

   class TranscriptsService {
       private setupSocketListeners() {
           this.socket.on('sync_transcripts', (data) => {
               this.transcripts = data;
               this.notifyStores();
           });
    }
}

Real-time Updates

   @SubscribeMessage('transcript')
   async handleTranscript(message: TranscriptMessage) {
       // Update source of truth
       const updated = await this.transcriptService.update(message);
       // Broadcast to all clients
       this.broadcast(updated);
}

Error Handling

Connection Management

Automatic Reconnection

   class TranscriptsService {
       private setupReconnection() {
           this.socket.on('disconnect', () => {
               this.reconnectAttempts = 0;
               this.scheduleReconnect();
        });
    }
 
       private scheduleReconnect() {
           if (this.reconnectAttempts >= MAX_RECONNECT_ATTEMPTS) {
               this.notifyError('Connection failed');
               return;
           }
           setTimeout(() => this.connect(), this.getBackoffDelay());
       }
}

State Recovery

   class TranscriptService {
       async recoverState(userId: string) {
           const transcripts = this.transcripts.get(userId);
           return Array.from(transcripts.values());
    }
}