Termbase: Database system storing standardized terminology with metadata including definitions, contexts, POS tags, and domain labels. Ensures cross-platform compatibility through XLIFF format with auto-recognition and enforcement functions. Key parameters: Term entry capacity (recommended million-level), recognition response time (<200ms), multi-field search (supports regex & fuzzy matching). Technical components include term extraction engine, alignment memory, and QA validation module, supporting TBX standard output and API integration with major CAT tools.
Core Terminology Lexicon for Translation Industry
Terminology Management System (TMS)
Translation Memory (TM)
Translation Unit (TU): Minimum segment storing source-target language pairs with metadata including match rate (75% fuzzy threshold), context ID, and timestamp. Key metrics: Segment repetition rate (impacts ROI), leverage rate, and perfect match ratio. Advanced features: Context-sensitive matching (CSM), auto-subsegment splitting, regex variable protection. Utilizes SRX standards for sentence segmentation and TMX format for cross-platform migration, recommending quarterly term cleansing and deduplication.
Computer-Assisted Translation (CAT)
Segmentation Rules: SRX-based logic for text segmentation using regex-defined separators. Core parameters: Max segment length (≤150 chars), hard breaks (./!), soft breaks (,/;). Advanced features: Context propagation, tag protection, placeholder locking. Supports TM match rate tiered pricing: 100% matches (repetitions), 95-99% fuzzy matches, <75% new translations.
Localization QA (LQA)
Severity Levels: Critical (functional failures), Major (comprehension issues), Minor (formatting errors). Dimensions: Terminology consistency (≥98%), measurement conversions (locale-specific), cultural adaptation (neutral content). Automation: Regex bulk checks (date/currency formats), XML tag validation, termbase enforcement. Manual review uses AQL sampling (1.5% AQL with 6.5% defect limit).
Pseudo-localization
Accent Testing: UI layout validation using diacritics (àçéñ) with 30-40% character expansion. Boundary Markers: [##] identifiers for truncation detection. Test scenarios: BIDI rendering, hotkey conflicts (& processing), multi-byte encoding (UTF-8/16). Implementation: ICU Library transformations, regex string expansion, pseudo TM automation.
Neural Machine Translation (NMT)
Model Specs: 12-24 encoder layers, multi-head attention (8-16 heads), 1024-4096D hidden layers. Training Data: ≥5M parallel sentences with 80% domain coverage. Metrics: BLEU (≥50), TER (≤35%), METEOR (≥60). Deployment: ONNX runtime acceleration, FP16 quantization, GPU cluster inference (≥5k chars/sec). Features: Real-time post-editing (PE), confidence scoring (0-1), glossary injection.