Refactoring Speech Recognition Service for Multi-client Support

Introduction

When building real-time speech recognition services, handling multiple clients correctly is crucial. In this article, we'll explore how to refactor a speech recognition service to properly support multiple concurrent clients while maintaining clean architecture and efficient resource management.

The Problem

Initial Architecture Issues

The initial implementation of our speech recognition service had a critical architectural flaw: using a singleton pattern for the recognition service caused audio streams from multiple clients to interfere with each other. Here's why this was problematic:

State Confusion
- All clients shared the same recognition service instance
- Multiple audio streams were sent to the same recognition stream
- Callback functions were overwritten by newly connected clients

Technical Limitations

@Injectable()
class SpeechRecognitionService {
    private recognitionStream: any;
    private resultCallback: (result: any) => void;
    
    // This callback would be overwritten by each new client
    setResultCallback(callback: (result: any) => void) {
        this.resultCallback = callback;
    }
}

Root Causes

Streaming Nature
- Speech APIs use WebSocket-based streaming
- Each stream maintains its own state
- Streams are stateful and can't be shared
Service Instance Limitations
- Each recognition service can only maintain one active stream
- New audio data affects current stream's results
- Multiple clients' audio data contaminate each other's context

NestJS Dependency Injection

@Injectable()
class SpeechGateway {
    constructor(
        // This creates a singleton instance
        private readonly recognitionService: SpeechRecognitionService
    ) {}
}

The Solution

1. Factory Pattern Implementation

Instead of using singleton services, we implement a factory pattern to create dedicated service instances for each client:

@Injectable()
export class RecognitionFactory {
    constructor(private configService: ConfigService) {}
 
    async create(source: RecognitionSource): Promise<IRecognitionService> {
        const service = source === RecognitionSource.BAIDU
            ? new BaiduRecognitionService(this.configService)
            : new GoogleRecognitionService(this.configService);
        
        await service.onModuleInit();
        return service;
    }
}

2. Modular Architecture

Organize the components into a cohesive module:

@Module({
    providers: [
        SpeechGateway,
        RecognitionFactory,
        GoogleRecognitionService,
        BaiduRecognitionService,
    ],
    exports: [SpeechGateway],
})
export class SpeechModule {}

3. Connection Management

Implement proper connection tracking and management:

@WebSocketGateway()
export class SpeechGateway implements OnModuleInit {
    private readonly connections = new Map<string, ConnectionInfo>();
 
    constructor(private factory: RecognitionFactory) {}
 
    async handleConnection(client: Socket) {
        const service = await this.factory.create(RecognitionSource.GOOGLE);
        
        this.connections.set(client.id, {
            socket: client,
            recognitionService: service,
            config: this.getDefaultConfig()
        });
    }
 
    async handleDisconnect(client: Socket) {
        const connection = this.connections.get(client.id);
        if (connection) {
            await connection.recognitionService.cleanup();
            this.connections.delete(client.id);
        }
    }
}

Best Practices

1. Dependency Management

Follow these principles for clean dependency management:

// Service interface for better abstraction
interface IRecognitionService {
    initialize(): Promise<void>;
    processAudio(data: Buffer): Promise<void>;
    cleanup(): Promise<void>;
}
 
// Factory method for service creation
@Injectable()
class RecognitionFactory {
    create(config: RecognitionConfig): Promise<IRecognitionService> {
        // Create and configure service instance
        return this.createAndConfigure(config);
    }
}

2. Resource Management

Implement proper resource cleanup:

class BaseRecognitionService implements IRecognitionService {
    private stream: any;
    private resources: Resource[] = [];
 
    async cleanup(): Promise<void> {
        // Close stream
        if (this.stream) {
            await this.stream.close();
        }
 
        // Release resources
        for (const resource of this.resources) {
            await resource.release();
        }
    }
}

3. Error Handling

Implement comprehensive error handling:

class SpeechGateway {
    private handleError(client: Socket, error: Error) {
        // Log error
        this.logger.error({
            clientId: client.id,
            error: error.message,
            stack: error.stack
        });
 
        // Notify client
        client.emit('recognition_error', {
            message: error.message,
            code: this.getErrorCode(error)
        });
 
        // Cleanup if necessary
        if (this.isRecoverable(error)) {
            this.handleRecovery(client);
        } else {
            this.handleFatalError(client);
        }
    }
}

Performance Monitoring

Implement comprehensive monitoring:

class RecognitionMetrics {
    private readonly metrics = new Map<string, {
        activeStreams: number;
        processedAudio: number;
        errors: number;
        latency: number[];
    }>();
 
    recordMetrics(clientId: string, data: MetricData) {
        const current = this.metrics.get(clientId) || this.getDefaultMetrics();
        this.metrics.set(clientId, {
            ...current,
            ...data,
            latency: [...current.latency, data.latency]
        });
    }
 
    getAggregateMetrics() {
        // Calculate aggregate metrics
        return this.calculateAggregates(this.metrics);
    }
}

Future Improvements

Connection Limits

class ConnectionManager {
    private readonly MAX_CONNECTIONS = 100;
    private readonly activeConnections = new Map<string, ConnectionInfo>();
 
    async addConnection(client: Socket): Promise<boolean> {
        if (this.activeConnections.size >= this.MAX_CONNECTIONS) {
            throw new ConnectionLimitError();
        }
        // Add connection logic
    }
}

Resource Monitoring

class ResourceMonitor {
    private readonly memoryThreshold = 0.8; // 80%
    
    checkResources(): boolean {
        const usage = process.memoryUsage();
        return usage.heapUsed / usage.heapTotal < this.memoryThreshold;
    }
}

Graceful Degradation

class ServiceDegrader {
    private readonly strategies = new Map<ErrorType, DegradationStrategy>();
 
    handleDegradation(error: Error) {
        const strategy = this.strategies.get(error.type);
        if (strategy) {
            return strategy.execute();
        }
        return this.defaultStrategy.execute();
    }
}

Conclusion

Refactoring a speech recognition service to properly handle multiple clients requires careful consideration of architecture, resource management, and error handling. By implementing a factory pattern, proper connection management, and comprehensive monitoring, we can create a robust service that efficiently handles multiple concurrent clients.

Key takeaways:

Use factory pattern for service instances
Implement proper resource management
Handle errors gracefully
Monitor performance and resources
Plan for scalability