On May 8, Flitto signed a memorandum of understanding (MOU) with Upstage, an AI company specializing in large language models. The partnership is centered around the construction of AI language data for multilingual LLMs.
The collaboration between the two companies are especially meaningful, as both represent a specific field in Korea’s generative AI industry. Flitto’s data technology is capable of constructing AI data across 116+ languages and dialects powered by 14 million users. Meanwhile, Upstage’s cutting-edge technology in the field of large language models (LLM) is a top performer in the global AI performance leaderboard.
What is the direction of cooperation between Flitto and Upstage?
The direction underlying this business cooperation agreement is the expansion of AI-supported languages. Upstage is currently focusing on developing tailored models for various region-specific languages, such as those in Southeast Asia. To achieve this, securing high-quality, low-resource language data like Japanese and Thai is crucial.
Other agreements between the two companies include: Establishing a benchmark dataset for the Korean LLM leaderboard “Ko-LLM,”‘ benchmark dataset; Operating multilingual LLMs through the localization of low-resource language data; And enhancing data supply partnerships for small language models (sLLM) optimized for business integration. Flitto and Upstage are expecting tangible outcomes to take place across these collaboration areas.
Data is key to LLM operation
In the end, the key to high-performance language models lies in combining good models with good data.
Language models of any size, including small language models (sLLM) like Upstage’s Solar and well-known models like ChatGPT, require a vast amount of high-quality language data.
However, these language models struggle when it comes to languages with scarcer data. Specifically, achieving higher usability in terms of lower-resource languages, including Asian languages such as Thai, Japanese, Vietnamese, Lao, and Khmer, is a challenge that all language models must address.
For the past 12 years of operation, Flitto’s AI data technology has been acknowledged for its technological expertise and scalability. This includes building the low-resource language data essential to advancing LLM technology. In line with this, Flitto was recently selected to participate in the “Korean-Foreign Language Parallel Corpora Project” organized by the National Institute of Korean Language (NIKL), making it our fourth consecutive year in the project.
Flitto is continuously evolving in constructing and processing text, image, and speech datasets without copyright issues for multilingual AIs. Meanwhile, Upstage plans to solidify its position as a global AI company by securing a large amount of low-resource language data through collaboration with Flitto. Through this partnership, both companies will be gaining a competitive edge in the niche and highly-competitive AI and LLM market.
Wrapping up…
Flitto’s AI data technology, in synergy with Upstage’s language model development technology, will be bringing a significant impact on the global AI market.
On the partnership, Flitto CEO Simon Lee commented, “Training on low-resource languages has emerged as a key factor in the performance of large language models.” He added, “Through the partnership between the two companies, we aim to demonstrate how the synergy between high-quality AI data and advanced technologies can positively influence the domestic generative AI ecosystem.”
Upstage CEO Sung Kim noted, “Together with the generative AI boom driven by language models, securing quality language data is an essential task,” adding, “Through our partnership with Flitto, Upstage will endeavor to build advanced datasets, enabling a wider global audience to experience the innovation brought by generative AI technology.”
Read more about the MOU between Flitto and Upstage through the press release.