Flitto Selected to Participate in the 2024 'Korean-Foreign Language Parallel Corpora Project'

Flitto has been selected to participate in the 2024 “Korean-Foreign Language Parallel Corpora Project”! This marks the fourth consecutive year for Flitto to take part in the project since its inception in 2021.

What is the Korean-Foreign Language Parallel Corpora Project?

The “Korea-Foreign Parallel Corpora Project” is organized by National Institute of Korean Language (NIKL) under the South Korean Ministry of Culture, Sports and Tourism. The project aims to activate the language and culture industries of various countries by establishing a robust language data environment using artificial intelligence (AI) technology. This year’s project boasts a significant scale of KRW 4.6 billion (approx. USD 3.4 million) and will be primarily led by the Kyung Hee University Industry-Academic Cooperation Foundation.

What is Flitto’s contribution to the K-FL Parallel Corpora Project?

Flitto will be building a total of 1,260 million words of data in nine languages this year: Vietnamese, Indonesian, Thai, Hindi, Khmer, Tagalog, Russian, Uzbek, and English, with 1.4 million words per language. Additionally, we will be adding English data to maximize the usability of Korean-English parallel corpus data, similar to last year.

Behind Flitto’s selection for this year’s K-FL Parallel Corpora Project is our expertise in data. Leveraging our extensive experience in public and private projects, Flitto’s proprietary platform capabilities, and a diverse pool of professional translators and language experts, we have demonstrated a stable corpora construction ability over the past three years. These proven track records have contributed greatly in our project participation this year.

The parallel corpora data to be constructed through this project are valuable in many aspects. First, the government can utilize this data for the development of AI technologies to advance cultural industries, such as translation software and natural language processing. Additionally, eight of the constructed languages are classified as low-resource languages. These languages face a shortage of language data essential for significant research in linguistics, cultural industries, and AI development. The language data that Flitto will construct this time will greatly contribute to resolving this data resource imbalance and promoting cultural exchange.

Wrapping up…

Flitto is honored to once again participate in the government-initiated corpus project for four consecutive years. We will be using the best of our expertise and experience in language data construction to successfully carry out the initiative. As a leading data partner in South Korea, we will continue to provide high-quality language data required in the field of AI development.

Through the collaborative project with the National Institute of Korean Language, Flitto will make relentless efforts to contribute to the advancement of Korea’s AI industry and foster active exchanges among various countries.

Flitto Selected to Participate in the 2024 ‘Korean-Foreign Language Parallel Corpora Project’

What is the Korean-Foreign Language Parallel Corpora Project?

What is Flitto’s contribution to the K-FL Parallel Corpora Project?

Wrapping up…

By Flitto DataLab

More on Flitto DataLab

AI Interpreter for Meetings at Cisco Connect 2026

[Data Deep Dive #6] What Is LLM Training Data? RLHF & CoT Explained

[Data Deep Dive #5] How Speech and Multimodal Data Power AI Training

[Data Deep Dive #4] Data Warehouse Strategy for AI Training Data

What is the Korean-Foreign Language Parallel Corpora Project?

What is Flitto’s contribution to the K-FL Parallel Corpora Project?

Wrapping up…

By Flitto DataLab

Related Post

More on Flitto DataLab