System Architecture
EHR/HIS System
The host application embeds the SofIA SDK Web Component, providing configuration and patient context.
Communication Channels
The SDK communicates through two channels: the SofIA REST API for AI processing, and WebSocket connections for real-time transcription.
Backend Processing
The REST API routes requests through the Cognitive Framework for report generation, while the Transcription Engine processes audio streams.
SDK Architecture
The SofIA SDK is a Web Component (<sofia-sdk>) built with React and TypeScript, wrapped using @r2wc/react-to-web-component. It uses Shadow DOM for complete style encapsulation, ensuring the SDK’s styles never conflict with the host application.
Provider Hierarchy
The component internally uses a layered provider architecture. Each provider manages a specific domain of the SDK’s functionality: The providers wrap each other in this order (outermost to innermost): ApiConfigProvider → SettingsProvider → I18nProvider → TranscriptorProvider → LangGraphProvider → ProcessingThreadsProvider → ThreadProvider → MessageImagesProvider → StreamProvider → ToastProvider → App| Provider | Responsibility |
|---|---|
| ApiConfigProvider | Stores API credentials (apikey, baseurl, wssurl), widget visibility, and template configuration state |
| SettingsProvider | Manages user preferences, language selection, template ID, and template fields |
| I18nProvider | Internationalization — resolves UI strings based on the active language (es, en) |
| TranscriptorProvider | Audio recording state, transcription sessions, and real-time transcript management |
| LangGraphProvider | Session data (patient, doctor), report state, thread management, and AI interaction context |
| ProcessingThreadsProvider | Manages background processing threads for report generation |
| ThreadProvider | Chat thread lifecycle — creating, switching, and caching conversation threads |
| MessageImagesProvider | Handles image attachments within chat messages |
| StreamProvider | LangGraph streaming connection for real-time AI responses |
| ToastProvider | UI notification system |
Web Component Registration
The SDK registers itself as a custom element when loaded:| Attribute Type | Examples | Conversion |
|---|---|---|
string | apikey, baseurl, userid | Direct pass-through |
boolean | isopen, debug | String → Boolean |
json | patientdata, template | JSON.parse automatically |
function | handle-report, set-is-open | Callback binding |
Data Flow
Chat and Report Generation
User interaction
The healthcare professional interacts via chat (text) or voice (microphone button). Audio is captured through the browser’s MediaStream API.
Transcription
Audio streams through a WebSocket connection (
wssurl) to the transcription service, which returns real-time text segments. The SDK supports multiple transcription engines.AI Processing
Chat messages (typed or transcribed) are sent to the SofIA REST API (
baseurl), which routes them through the cognitive framework using LangGraph streaming for real-time response delivery.Report Generation
When
template and templateid are provided, the generate button appears. On click, the AI uses the JSON Schema template to structure clinical data from the conversation into a formatted report.Template Resolution
The SDK resolves template configuration with backward compatibility:templateid and a valid template are provided. Otherwise, the SDK operates in chat-only mode automatically.
Cognitive Framework
The SofIA processing core is based on a system of specialized agents that collaborate to generate accurate clinical documentation:- Reasoning engine: Language model specialized in medical domain that coordinates agent operations
- Documentation agent: Generates structured clinical notes following predefined JSON Schema templates
- Coding agents: Specialized in medical classification systems (SNOMED CT, ICD-10, LOINC)
- Validation agent: Verifies clinical coherence and reduces errors through cross-validation
Transcription System
The transcription stack is optimized for medical audio processing:- Audio capture: Browser MediaStream API → AudioContext → WebSocket streaming
- Real-time transcriptor: Provides immediate feedback during consultations
- Medical transcription engine: Specialized in clinical terminology and specific context
- Translation system: Native support for multiple languages
- Speaker separation: Automatic identification of doctor and patient
- Terminological normalization: Correction and standardization of medical terms
Audio Pipeline
Audio Processing
The stream passes through an AudioContext for processing, then is sent over a WebSocket connection.
Security Architecture
- Shadow DOM encapsulation: SDK styles and DOM are isolated from the host application
- HTTPS/WSS required: All communications use encrypted protocols
- PII masking: Patient and doctor identifiers are automatically masked in debug logs for HIPAA/GDPR compliance
- Encrypted local storage: Session data is encrypted using the API key as seed
- No data persistence: Clinical data is not stored locally beyond the active session