{"id":1659,"date":"2026-02-19T11:43:52","date_gmt":"2026-02-19T02:43:52","guid":{"rendered":"https:\/\/datalab.flitto.com\/en\/company\/blog\/?p=1659"},"modified":"2026-02-19T12:05:54","modified_gmt":"2026-02-19T03:05:54","slug":"small-language-models-machine-translation","status":"publish","type":"post","link":"https:\/\/datalab.flitto.com\/en\/company\/blog\/small-language-models-machine-translation\/","title":{"rendered":"[Data Deep Dive Special Edition] Small Language Models for Machine Translation"},"content":{"rendered":"\n<div style=\"height:35px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Large Language Models have significantly advanced machine translation performance.<\/strong><br>As model sizes scale from billions to trillions of parameters, benchmark scores continue to rise\u2014but so do computational costs and deployment complexity.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><mark style=\"background-color:rgba(0, 0, 0, 0);color:#1978f0\" class=\"has-inline-color\"><strong>Is bigger always better?<\/strong><\/mark><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In mobile, on-device, and cost-sensitive enterprise environments, efficiency becomes as critical as accuracy. When supported by high-quality parallel corpora and domain-aware data design, small language models can deliver competitive translation performance while remaining practical to deploy.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In this fourth installment of Data Deep Dive, we explore how small language models are reshaping machine translation strategy, and why data remains the decisive factor behind model performance..<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">As technology advances, machine translation has undergone significant changes. From <a href=\"https:\/\/en.wikipedia.org\/wiki\/Rule-based_machine_translation\">RBMT <\/a>(Rule-Based Machine Translation), which translated based on linguistic rules systematized by linguists, to <a href=\"https:\/\/en.wikipedia.org\/wiki\/Statistical_machine_translation\">SMT (Statistical Machine Translation),<\/a> which generated translation rules through statistical analysis of large-scale language data, the field has evolved to <a href=\"https:\/\/www.linkedin.com\/pulse\/what-ai-training-data-why-language-defines-performance-flitto-xpemc\/\">NMT (Neural Machine Translation)<\/a>, which utilizes artificial neural networks.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Recently, machine translation using Large Language Models (LLMs) has also been actively researched. Many large language models are being developed for research and commercial use. As competition over the number of parameters intensifies, models have progressed from tens of billions to hundreds of billions, and now even to trillions of parameters.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">While large language models with a vast number of parameters can achieve strong performance, they require significant GPU resources to operate and demand substantial time and cost for training. In particular, such models are difficult to deploy on personal mobile devices, making them unsuitable for mobile environments.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0);color:#1978f0\" class=\"has-inline-color\">Recently, there have been increasing attempts to optimize smaller language models for specific purposes.<\/mark> <\/strong>The \u201cArabic\u2013English translation model\u201d introduced below follows this approach. As Flitto also provides its own English\u2013Arabic translation model, we will examine its characteristics in detail.<\/p>\n\n\n\n<div style=\"height:35px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<div style=\"height:35px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Arabic \u2194 English Translation Model \u2018Mutarjim\u2019<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Mutarjim: A bidirectional Arabic\u2013English translation model based on Kuwain-1.5B<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">LLMs can understand multiple languages by being trained on various languages such as English, Spanish, French, German, Chinese, Japanese, and Korean. However, since the amount of training data varies by language, performance may be strong in certain languages\u2014such as English\u2014but relatively lower in others. In some cases, LLMs may barely understand certain languages.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Traditionally, machine translation models have used an \u201cEncoder\u2013Decoder Model.\u201d To build a model that translates from Korean to English, for example, a large-scale parallel corpus is required. Such a corpus consists of Korean sentences and their corresponding English translations.<\/p>\n\n\n\n<div style=\"height:35px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Korean: \ub098\ub294 \ud559\uad50\uc5d0 \uac11\ub2c8\ub2e4.<br>Korean: \uc624\ub298 \ub0a0\uc528\ub294 \ub9e4\uc6b0 \uc88b\uc2b5\ub2c8\ub2e4.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">English: I go to school.<br>English: The weather is very nice today.<\/p>\n\n\n\n<div style=\"height:35px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Using a parallel corpus, the model can be trained in a Sequence-to-Sequence (Seq2Seq) manner to learn how a sentence in one language should be generated in another language.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">With the emergence of LLMs, decoder-only models have also become widely used. In a decoder-only model, given a sentence, the model generates text by predicting the next most probable token. While decoder-only models are particularly suited for generating narratives, they are also used to build translation models through fine-tuning so that they generate corresponding translated sentences when given an input sentence.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A comparison between encoder\u2013decoder models and decoder-only models used in machine translation is shown below. <\/p>\n\n\n\n<div style=\"height:35px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Category<\/strong><\/td><td>Encoder\u2013Decoder Model<\/td><td>Decoder-Only Model<\/td><\/tr><tr><td><strong>Architecture<\/strong><\/td><td>Encoder + Decoder<\/td><td>Decoder<\/td><\/tr><tr><td><strong>Objective<\/strong><\/td><td>Transform an input sequence into an output sequence<\/td><td>Generate natural language text based on given text<\/td><\/tr><tr><td><strong>Training Method<\/strong><\/td><td>Learn the relationship between input and output using large-scale parallel corpora<\/td><td>Pre-trained on large-scale monolingual data, then fine-tuned using parallel corpora<\/td><\/tr><tr><td><strong>Characteristics<\/strong><\/td><td>Accurate and consistent sentence-level translation<\/td><td>Able to respond to diverse translation requirements through prompts<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\">&lt;Table 1. Comparison between Encoder\u2013Decoder Model and Decoder-Only Model&gt;<\/p>\n\n\n\n<div style=\"height:35px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Mutarjim is a bidirectional Arabic\u2013English translation model trained on Arabic\u2013English parallel corpora, using the decoder-only model Kuwain-1.5B as its base model.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Kuwain-1.5B: A 1.5B-parameter small English language model trained through language injection by incorporating Arabic data.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:35px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Training Method of Mutarjim<\/strong><\/h2>\n\n\n\n<div style=\"height:35px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">The base model Kuwain-1.5B was originally trained in English and later injected with Arabic text to enable it to understand Arabic. However, knowing English and Arabic does not automatically mean knowing how English translates into Arabic, or vice versa.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Mutarjim was trained in two stages. First, it was trained on 10 billion Arabic\u2013English tokens. Subsequently, it was fine-tuned using 6 million refined Arabic\u2013English parallel data samples.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Since Kuwain-1.5B is a decoder-only model, training is conducted by predicting the next token in a given sequence. To distinguish languages, special tokens such as &lt;Arabic> and &lt;English> were introduced and inserted into the parallel corpus to construct the training dataset. <\/p>\n\n\n\n<div style=\"height:35px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"107\" src=\"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-content\/uploads\/structure-of-the-training-dataset-600x107.png\" alt=\"\" class=\"wp-image-1663\" style=\"aspect-ratio:5.608295603914123;width:840px;height:auto\" srcset=\"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-content\/uploads\/structure-of-the-training-dataset-600x107.png 600w, https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-content\/uploads\/structure-of-the-training-dataset-300x54.png 300w, https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-content\/uploads\/structure-of-the-training-dataset-768x137.png 768w, https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-content\/uploads\/structure-of-the-training-dataset-1024x183.png 1024w, https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-content\/uploads\/structure-of-the-training-dataset.png 1247w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><figcaption class=\"wp-element-caption\">Figure 1. Structure of the Training Dataset<\/figcaption><\/figure>\n\n\n\n<div style=\"height:35px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">As shown in the figure above, in the initial training phase, &lt;Arabic&gt; and &lt;English&gt; tokens were placed before Arabic and English sentences, respectively. To support both Arabic-to-English and English-to-Arabic translation, the order of sentences was mixed appropriately so that either language could appear first.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">After the initial training phase, the dataset format was modified to resemble that of encoder\u2013decoder models for fine-tuning, enabling the LLM to more clearly understand translation tasks.<\/p>\n\n\n\n<div style=\"height:35px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<div style=\"height:35px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Arabic\u2013English Benchmark Dataset Tarjama-25<\/h2>\n\n\n\n<div style=\"height:35px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Tarjama-25: An Arabic\u2013English dataset composed of approximately 5,000 parallel sentences, including sentences related to Islamic contexts.<\/li>\n<\/ul>\n\n\n\n<div style=\"height:35px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Existing Arabic\u2013English benchmark datasets have often been constructed by translating English sentences into Arabic. As a result, these datasets did not sufficiently reflect the cultural characteristics of Islamic contexts where Arabic is widely used. Along with the release of Mutarjim, the Tarjama-25 Arabic\u2013English benchmark dataset was also introduced.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Tarjama-25 consists of approximately 5,000 parallel sentences, with part of the dataset originally constructed in Arabic. Additionally, 5.9% of the corpus includes sentences related to Islamic contexts, allowing for more accurate evaluation of Arabic machine translation performance. <\/p>\n\n\n\n<div style=\"height:35px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"400\" src=\"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-content\/uploads\/Figure-2.-Domain-Distribution-of-Tarjama-25-600x400.png\" alt=\"\" class=\"wp-image-1662\" style=\"width:802px;height:auto\" srcset=\"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-content\/uploads\/Figure-2.-Domain-Distribution-of-Tarjama-25-600x400.png 600w, https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-content\/uploads\/Figure-2.-Domain-Distribution-of-Tarjama-25-300x200.png 300w, https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-content\/uploads\/Figure-2.-Domain-Distribution-of-Tarjama-25.png 629w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\">&lt;Figure 2. Domain Distribution of Tarjama-25&gt;<\/p>\n\n\n\n<div style=\"height:35px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"has-text-align-left wp-block-paragraph\">Since Flitto also provides bidirectional Arabic \u2194 English machine translation, we compared the COMET scores of Mutarjim and Flitto using the Tarjama-25 dataset.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>COMET: A translation quality evaluation metric developed by Unbabel. Unlike BLEU, which measures word overlap, COMET converts text into vector representations using neural networks and evaluates semantic similarity.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td class=\"has-text-align-center\" data-align=\"center\">Model<\/td><td class=\"has-text-align-center\" data-align=\"center\">Method<\/td><td class=\"has-text-align-center\" data-align=\"center\">Number of Parameters<\/td><td class=\"has-text-align-center\" data-align=\"center\">Arabic \u2192 English Translation (COMET Score)<\/td><td class=\"has-text-align-center\" data-align=\"center\">English \u2192 Arabic Translation (COMET Score)<\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>Mutarjim<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>LLM<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>1.5B<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>82.63<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>83.41<\/strong><\/td><\/tr><tr><td class=\"has-text-align-center\" data-align=\"center\"><strong>Flitto<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>NMT<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>0.2B<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>73.18<\/strong><\/td><td class=\"has-text-align-center\" data-align=\"center\"><strong>78.77<\/strong><\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\">&lt;Table 2. Comparison of Translation Scores between Mutarjim and Flitto Machine Translation Models&gt;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Although direct comparison is challenging because Mutarjim is based on an LLM architecture while Flitto\u2019s translation model is based on NMT, Mutarjim was trained to achieve high performance as a small language model, whereas Flitto\u2019s translation model uses a smaller number of parameters.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Mutarjim achieved higher scores in both Arabic-to-English and English-to-Arabic translation. Since the Tarjama-25 benchmark includes Islamic-related sentences, and Flitto\u2019s training dataset contains relatively fewer parallel corpora in that domain, this may have contributed to the observed results.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Flitto is currently developing an Arabic \u2194 English translation model based not only on NMT but also on LLM architecture. By referencing the training methodology of Mutarjim, Flitto aims to provide even more advanced machine translation models.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Large Language Models have significantly advanced machine translation performance.As model sizes scale from billions to trillions of parameters, benchmark scores continue to rise\u2014but so do computational costs and deployment complexity. Is bigger always better? In mobile, on-device, and cost-sensitive enterprise environments, efficiency becomes as critical as accuracy. When supported by high-quality parallel corpora and domain-aware [&hellip;]<\/p>\n","protected":false},"author":8,"featured_media":1664,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[8],"tags":[118,148,147,51,59,13,145,146,144],"class_list":["post-1659","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-analysis","tag-ai-training-data","tag-aidata","tag-arabic","tag-flitto","tag-llm","tag-machine-translation","tag-nmt","tag-parallel-corpora","tag-small-language-model"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Small Language Models for Machine Translation<\/title>\n<meta name=\"description\" content=\"How small language models achieve competitive machine translation performance\u2014and why parallel data matters.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/datalab.flitto.com\/en\/company\/blog\/small-language-models-machine-translation\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Small Language Models for Machine Translation\" \/>\n<meta property=\"og:description\" content=\"How small language models achieve competitive machine translation performance\u2014and why parallel data matters.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/datalab.flitto.com\/en\/company\/blog\/small-language-models-machine-translation\/\" \/>\n<meta property=\"og:site_name\" content=\"Flitto DataLab\" \/>\n<meta property=\"article:published_time\" content=\"2026-02-19T02:43:52+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-02-19T03:05:54+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-content\/uploads\/Small-Language-Models-for-Machine-Translation-600x400.png\" \/>\n\t<meta property=\"og:image:width\" content=\"600\" \/>\n\t<meta property=\"og:image:height\" content=\"400\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Flitto DataLab Admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Flitto DataLab Admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/small-language-models-machine-translation\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/small-language-models-machine-translation\\\/\"},\"author\":{\"name\":\"Flitto DataLab Admin\",\"@id\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/#\\\/schema\\\/person\\\/c09e946fb133658e0475d281e795362e\"},\"headline\":\"[Data Deep Dive Special Edition] Small Language Models for Machine Translation\",\"datePublished\":\"2026-02-19T02:43:52+00:00\",\"dateModified\":\"2026-02-19T03:05:54+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/small-language-models-machine-translation\\\/\"},\"wordCount\":1165,\"publisher\":{\"@id\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/small-language-models-machine-translation\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/wp-content\\\/uploads\\\/Small-Language-Models-for-Machine-Translation.png\",\"keywords\":[\"AI Training Data\",\"AIData\",\"Arabic\",\"Flitto\",\"LLM\",\"Machine Translation\",\"NMT\",\"Parallel Corpora\",\"Small Language Model\"],\"articleSection\":[\"Analysis\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/small-language-models-machine-translation\\\/\",\"url\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/small-language-models-machine-translation\\\/\",\"name\":\"Small Language Models for Machine Translation\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/small-language-models-machine-translation\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/small-language-models-machine-translation\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/wp-content\\\/uploads\\\/Small-Language-Models-for-Machine-Translation.png\",\"datePublished\":\"2026-02-19T02:43:52+00:00\",\"dateModified\":\"2026-02-19T03:05:54+00:00\",\"description\":\"How small language models achieve competitive machine translation performance\u2014and why parallel data matters.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/small-language-models-machine-translation\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/small-language-models-machine-translation\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/small-language-models-machine-translation\\\/#primaryimage\",\"url\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/wp-content\\\/uploads\\\/Small-Language-Models-for-Machine-Translation.png\",\"contentUrl\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/wp-content\\\/uploads\\\/Small-Language-Models-for-Machine-Translation.png\",\"width\":1536,\"height\":1024,\"caption\":\"Small Language Models for Machine Translation\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/small-language-models-machine-translation\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"[Data Deep Dive Special Edition] Small Language Models for Machine Translation\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/\",\"name\":\"Flitto DataLab\",\"description\":\"Latest AI and Data Insights\",\"publisher\":{\"@id\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/#organization\",\"name\":\"Flitto DataLab\",\"url\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/datalab.svg\",\"contentUrl\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/wp-content\\\/uploads\\\/2023\\\/07\\\/datalab.svg\",\"width\":1,\"height\":1,\"caption\":\"Flitto DataLab\"},\"image\":{\"@id\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.linkedin.com\\\/showcase\\\/flitto-datalab\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/#\\\/schema\\\/person\\\/c09e946fb133658e0475d281e795362e\",\"name\":\"Flitto DataLab Admin\",\"url\":\"https:\\\/\\\/datalab.flitto.com\\\/en\\\/company\\\/blog\\\/author\\\/daeun-lee\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Small Language Models for Machine Translation","description":"How small language models achieve competitive machine translation performance\u2014and why parallel data matters.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/datalab.flitto.com\/en\/company\/blog\/small-language-models-machine-translation\/","og_locale":"en_US","og_type":"article","og_title":"Small Language Models for Machine Translation","og_description":"How small language models achieve competitive machine translation performance\u2014and why parallel data matters.","og_url":"https:\/\/datalab.flitto.com\/en\/company\/blog\/small-language-models-machine-translation\/","og_site_name":"Flitto DataLab","article_published_time":"2026-02-19T02:43:52+00:00","article_modified_time":"2026-02-19T03:05:54+00:00","og_image":[{"width":600,"height":400,"url":"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-content\/uploads\/Small-Language-Models-for-Machine-Translation-600x400.png","type":"image\/png"}],"author":"Flitto DataLab Admin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Flitto DataLab Admin","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/datalab.flitto.com\/en\/company\/blog\/small-language-models-machine-translation\/#article","isPartOf":{"@id":"https:\/\/datalab.flitto.com\/en\/company\/blog\/small-language-models-machine-translation\/"},"author":{"name":"Flitto DataLab Admin","@id":"https:\/\/datalab.flitto.com\/en\/company\/blog\/#\/schema\/person\/c09e946fb133658e0475d281e795362e"},"headline":"[Data Deep Dive Special Edition] Small Language Models for Machine Translation","datePublished":"2026-02-19T02:43:52+00:00","dateModified":"2026-02-19T03:05:54+00:00","mainEntityOfPage":{"@id":"https:\/\/datalab.flitto.com\/en\/company\/blog\/small-language-models-machine-translation\/"},"wordCount":1165,"publisher":{"@id":"https:\/\/datalab.flitto.com\/en\/company\/blog\/#organization"},"image":{"@id":"https:\/\/datalab.flitto.com\/en\/company\/blog\/small-language-models-machine-translation\/#primaryimage"},"thumbnailUrl":"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-content\/uploads\/Small-Language-Models-for-Machine-Translation.png","keywords":["AI Training Data","AIData","Arabic","Flitto","LLM","Machine Translation","NMT","Parallel Corpora","Small Language Model"],"articleSection":["Analysis"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/datalab.flitto.com\/en\/company\/blog\/small-language-models-machine-translation\/","url":"https:\/\/datalab.flitto.com\/en\/company\/blog\/small-language-models-machine-translation\/","name":"Small Language Models for Machine Translation","isPartOf":{"@id":"https:\/\/datalab.flitto.com\/en\/company\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/datalab.flitto.com\/en\/company\/blog\/small-language-models-machine-translation\/#primaryimage"},"image":{"@id":"https:\/\/datalab.flitto.com\/en\/company\/blog\/small-language-models-machine-translation\/#primaryimage"},"thumbnailUrl":"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-content\/uploads\/Small-Language-Models-for-Machine-Translation.png","datePublished":"2026-02-19T02:43:52+00:00","dateModified":"2026-02-19T03:05:54+00:00","description":"How small language models achieve competitive machine translation performance\u2014and why parallel data matters.","breadcrumb":{"@id":"https:\/\/datalab.flitto.com\/en\/company\/blog\/small-language-models-machine-translation\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/datalab.flitto.com\/en\/company\/blog\/small-language-models-machine-translation\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/datalab.flitto.com\/en\/company\/blog\/small-language-models-machine-translation\/#primaryimage","url":"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-content\/uploads\/Small-Language-Models-for-Machine-Translation.png","contentUrl":"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-content\/uploads\/Small-Language-Models-for-Machine-Translation.png","width":1536,"height":1024,"caption":"Small Language Models for Machine Translation"},{"@type":"BreadcrumbList","@id":"https:\/\/datalab.flitto.com\/en\/company\/blog\/small-language-models-machine-translation\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/datalab.flitto.com\/en\/company\/blog\/"},{"@type":"ListItem","position":2,"name":"[Data Deep Dive Special Edition] Small Language Models for Machine Translation"}]},{"@type":"WebSite","@id":"https:\/\/datalab.flitto.com\/en\/company\/blog\/#website","url":"https:\/\/datalab.flitto.com\/en\/company\/blog\/","name":"Flitto DataLab","description":"Latest AI and Data Insights","publisher":{"@id":"https:\/\/datalab.flitto.com\/en\/company\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/datalab.flitto.com\/en\/company\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/datalab.flitto.com\/en\/company\/blog\/#organization","name":"Flitto DataLab","url":"https:\/\/datalab.flitto.com\/en\/company\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/datalab.flitto.com\/en\/company\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-content\/uploads\/2023\/07\/datalab.svg","contentUrl":"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-content\/uploads\/2023\/07\/datalab.svg","width":1,"height":1,"caption":"Flitto DataLab"},"image":{"@id":"https:\/\/datalab.flitto.com\/en\/company\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.linkedin.com\/showcase\/flitto-datalab\/"]},{"@type":"Person","@id":"https:\/\/datalab.flitto.com\/en\/company\/blog\/#\/schema\/person\/c09e946fb133658e0475d281e795362e","name":"Flitto DataLab Admin","url":"https:\/\/datalab.flitto.com\/en\/company\/blog\/author\/daeun-lee\/"}]}},"_links":{"self":[{"href":"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-json\/wp\/v2\/posts\/1659","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-json\/wp\/v2\/comments?post=1659"}],"version-history":[{"count":3,"href":"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-json\/wp\/v2\/posts\/1659\/revisions"}],"predecessor-version":[{"id":1668,"href":"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-json\/wp\/v2\/posts\/1659\/revisions\/1668"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-json\/wp\/v2\/media\/1664"}],"wp:attachment":[{"href":"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-json\/wp\/v2\/media?parent=1659"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-json\/wp\/v2\/categories?post=1659"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datalab.flitto.com\/en\/company\/blog\/wp-json\/wp\/v2\/tags?post=1659"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}