The Rise of Domestic AI Startup 'Solar Open 100B': The Beginning of a Storm
"Did they blatantly copy a Chinese AI model?" What lies behind the unexpected controversy surrounding Upstage's ambitious creation?
In 2024, a significant event shook the South Korean AI industry. It revolves around the large language model (LLM) 'Solar Open 100B' developed by AI startup Upstage, which has been accused of model replication. This incident transcends a mere question of a single company’s technical capabilities, raising critical concerns about transparency and trustworthiness within the domestic AI industry.
The Spark Behind the Upstage Solar-Open-100B Controversy
The trigger was surprisingly straightforward. Ko Seok-hyun, CEO of ScionicAI, raised concerns via GitHub about Upstage’s Solar-Open-100B. He presented detailed analysis claiming that Solar Open 100B was derived from the Chinese Zhipu AI’s 'GLM-4.5-Air' model.
What gave this claim its credibility was not mere speculation but technical evidence. A comparison of the LayerNorm parameter weights between the two models revealed a cosine similarity of an astonishing 0.989 — that is, 98.9%. Statistically, it is extraordinarily unlikely that two independently trained models would coincidentally exhibit such a high degree of similarity.
Further fueling suspicion was the discovery of debugging code from the GLM developers embedded within the Solar source code, followed by a swift revision of licensing information immediately after the controversy surfaced.
The Weight of Being a Government-Supported Project
This controversy escalated beyond a technical debate because it involves a government-funded project funded by taxpayers’ money. Solar Open 100B is the product of the “Independent AI Foundation Model Development Project.” Upstage leads a consortium including Naver Cloud, SK Telecom, NC AI, LG AI Research, and others, receiving substantial government resources including GPUs, data, and top talent.
Ko Seok-hyun expressed bluntly, “It is deeply disappointing that a model suspected to be a fine-tuned copy of a Chinese model has been submitted under a project funded by the national budget.” This issue thus threatens not just a single company’s reputation but the very credibility of South Korea’s AI investments.
Upstage’s Immediate Rebuttal
In response to the accusations, Upstage swiftly denied the claims. CEO Kim Sung-hoon emphasized, “The assertion that Solar Open 100B is a result of copying and fine-tuning a Chinese model is factually incorrect.”
Going further, Upstage asserted that the model was developed 'from scratch', meaning every step—from data collection and model architecture design to training and optimization—was conducted independently. In other words, they insist the model was entirely built in-house, not merely adjusted based on existing weights.
Diverging Opinions Among Technical Experts
Intriguingly, domestic open-source AI expert Kevin Ko offered a different perspective on the analysis. He dismissed the allegations as “a misinterpretation of the statistical indicators,” highlighting that even with the same technical data, experts can draw varying conclusions.
This controversy exposes how ambiguous the standards for verifying AI model development can be and how much expert judgment is needed to interpret technical evidence. A high similarity index alone is insufficient; what truly matters is how those numbers are interpreted—and that remains a nuanced and contentious issue.
The Blurred Boundaries of Technology: Shocking Similarity in LayerNorm Weights—The Core of the Upstage Solar-Open-100B Controversy
A staggering 98.9% match in core parameters between two models—could this be mere coincidence, or is it evidence of replication? Let’s delve into the meaning behind the cosine similarity.
Matching the Neural Network’s ‘Fingerprint’: What Are LayerNorm Weights?
At the center of the Upstage Solar-Open-100B controversy lies a technical metric: LayerNorm (Layer Normalization) parameter weights.
In large-scale language models, LayerNorm normalizes the inputs to activation functions at each network layer, stabilizing training. These LayerNorm parameters act as a unique ‘fingerprint’ that the model develops throughout its learning process. Even with the same architecture, different datasets and training runs will lead these parameters to converge on markedly different values.
98.9% Match—The Statistical Odds Against Chance
The analysis presented by Go Seok-hyun, CEO of SionicAI, sent shockwaves through the industry. He revealed that the cosine similarity of the LayerNorm weights between Upstage’s Solar-Open-100B and China’s Zhipu AI GLM-4.5-Air models reached an astonishing 0.989.
Cosine similarity is a mathematical measure ranging from 0 to 1, indicating how closely two vectors align directionally. A value of 0.989 means the LayerNorm parameters of both models are nearly identical. Considering the randomness inherent in neural network training, including random initialization and stochastic gradient descent (SGD), the chance of such a high degree of similarity occurring by coincidence is statistically minuscule.
Beyond Coincidence: A Web of Technical Evidences
What makes this more compelling is that the similarity in LayerNorm weights is not an isolated piece of evidence. The suspicions surrounding Upstage’s Solar-Open-100B arise from multiple layers:
Code Trace Evidence: Debugging code from GLM’s development team was found within Solar-Open-100B’s codebase—a concrete trace that’s hard to attribute to coincidence.
License Label Modifications: License declarations were altered immediately after the controversy came to light, raising eyebrows. In an ecosystem that prizes transparency, such changes demand further explanation.
Parameter-Level Match: The 98.9% similarity transcends mere architectural inspiration—it points to actual trained parameter overlap, a deeply significant technical indicator.
Diverging Interpretations: Experts at Odds
Intriguingly, experts are divided in interpreting these technical findings. Kevin Ko, a Korean open-source AI specialist, dismissed the allegations, calling the analysis a “misinterpretation of statistical metrics.”
This highlights that the Upstage Solar-Open-100B controversy isn’t just about black-and-white truth—it underscores the need for rigorous discourse on statistical interpretation methods. The same figures can tell very different stories depending on one’s perspective.
A Call for Proof of Technical Originality
Ultimately, the implication of a 98.9% LayerNorm similarity demands clarity. If the model was trained independently, its journey should be traceable through intermediate checkpoints and training logs. Conversely, if it is a fine-tuned version of a pre-existing model, that fact must be transparently disclosed.
The Upstage Solar-Open-100B debate transcends the technical originality of a single company—it challenges the entire domestic AI industry to establish robust standards for technology verification.
National Budget-Backed AI: A Crisis of Trust?
A government-supported independent AI project has been engulfed in allegations of copying a Chinese model. It's time to weigh the responsibility and gravity of developing technology funded by taxpayers.
Why Government-Supported Projects Matter
The core issue behind the Upstage Solar-Open-100B controversy goes beyond mere technical suspicion. This model is the centerpiece of the government’s “Independent AI Foundation Model Project.” Five companies, including Naver Cloud, SK Telecom, NC AI, and LG AI Research Institute, are involved, and given the project’s national significance, substantial resources have been invested.
From GPU infrastructure and training data to research talent—every resource funded by taxpayers was directed toward one objective: reducing overseas dependency and securing our own large-scale language model. Because of this strategic importance, the Upstage Solar-Open-100B issue is not just a corporate matter but questions the credibility of the entire national AI policy.
The Basis of Suspicion: What Technical Analysis Reveals
Critics base their claims on concrete technical analyses. According to cosine similarity analysis presented by SiyonicAI’s CEO Seokhyun Ko, the LayerNorm parameter weights of Solar-Open-100B and the Chinese Zhipu AI’s ‘GLM-4.5-Air’ model reached an astonishing 0.989 (98.9%) similarity.
What does this figure imply? Statistically, the chance that two neural networks’ weights would coincidentally show such high similarity is virtually zero. Furthermore, there are claims that debugging codes from the GLM development team were found within Solar-Open-100B's codebase. Combined, these technical evidences make it difficult to dismiss suspicions that the model might have been fine-tuned based on a Chinese model.
It’s especially noteworthy that license markings were altered immediately after these allegations surfaced. In tech communities, this is often interpreted as a sign of “after-the-fact adjustments.”
Upstage’s Stance: Asserting From-Scratch Development
Upstage responded swiftly to refute the claims. CEO Seonghoon Kim emphasized that the assertion that “Solar-Open-100B is a copy and fine-tuned version of a Chinese model” is not true. Instead, Upstage claims the model was developed “from scratch.”
What does “from scratch” mean? It implies that every stage—from data collection and model architecture design to training and tuning—was independently completed from the ground up. If true, the high similarity observed in technical analysis might be explained by other factors—for example, optimized neural architectures yielding similar parameter distributions.
The Need for Objective Verification
To resolve this controversy, Upstage has proposed public verification measures:
- Disclosure of training checkpoints at various stages
- Full public release of ‘wandb’ logs
- Objective verification of the training pathway and independence
Through this transparency, external experts can conclusively determine whether the model was truly trained from scratch or fine-tuned at some point based on an existing external model. Since checkpoints are footprints of the training process, they hold great potential to uncover the technical truth.
However, expert opinions are not unanimous. Domestic open-source AI expert Kevin Ko dismissed the analysis as a “misinterpretation of statistical indicators.” This indicates that interpretative differences continue to exist within the tech community.
Questioning the Validation Standards of National AI Projects
The most significant implication of the Upstage Solar-Open-100B controversy is that it can serve as a catalyst to raise the bar for validation standards across all independent AI foundation model projects, not just for a single company.
If national budgets are involved, participating firms must adopt stricter accountability for explaining the origin of training data and developmental processes during model disclosure. This isn’t merely a matter of ethics—it directly impacts public trust and the credibility of future government AI investment policies.
At the same time, Upstage aims to go public in the second half of 2026, targeting a corporate valuation of 1 trillion KRW. At this critical juncture, it faces the urgent challenge to prove both technical originality and trustworthiness. The results of public verification will not only restore honor but also potentially influence the company’s future valuation.
Can AI developed with national funds truly be trusted? The search for that answer has just begun.
Upstage's Rebuttal and the Decisive Public Verification: From Scratch Development vs. Chinese Model Derivation
“From Scratch Development” vs. “Chinese Model Derivation” – as these two claims collide, the crucial showdown to uncover the truth behind Upstage’s Solar-Open-100B controversy is approaching. Whose claim will stand? Let’s delve into the real story that the soon-to-be-revealed 'checkpoints' and 'training logs' will unveil.
Upstage’s Strong Rebuttal: “This Is Not True”
Immediately after the controversy erupted, Upstage launched an emphatic rebuttal. CEO Kim Seong-hoon drew a clear line regarding SionicAI’s allegations. He firmly stated that “the claim that Solar Open 100B is a result of copying a Chinese model followed by fine-tuning is untrue.”
Upstage’s core argument is crystal clear: the model was developed “from scratch.” This means the entire process—from data collection to model architecture design, training, and tuning—was conducted independently by Upstage, far beyond mere fine-tuning or slight modifications.
Developing a Large Language Model from scratch is the most fundamental yet challenging methodology in LLM development. It signifies a fully independent process from start to finish without relying on existing models. If true, this stands as the strongest evidence proving Upstage’s technical capability and credibility amid the Solar-Open-100B controversy.
The Crux of the Suspicion: 98.9% Layer Normalization Similarity
However, the basis for suspicion is not insignificant. SionicAI’s CEO Ko Seok-hyun presented very specific technical evidence: a cosine similarity of 0.989 (98.9%) between the LayerNorm parameter weights of the two models.
This figure carries deep implications. Statistically, it is extremely rare for two entirely independent models to possess such highly similar parameters. Moreover, the discovery of GLM developers’ debugging code within Solar’s codebase and the post-controversy license attribution changes have also been pointed out as circumstantial evidence intensifying the doubts.
A Reversal in Perspectives: Experts Diverge on Technical Interpretation
Interestingly, experts disagree on the technical interpretations. Kevin Ko, a domestic open-source AI expert, dismissed the controversy by calling the analysis a “misinterpretation of statistical indicators.” His stance suggests that a 98.9% similarity does not necessarily equate to model copying.
Such technical disagreements present a challenging gray area for the general public. High cosine similarity alone may not definitively prove model derivation. This nuance complicates the resolution of the Upstage Solar-Open-100B controversy even further.
Public Verification: The Decisive Moment to Reveal the Truth
A key to breaking this deadlock is now set: Upstage plans a public verification including:
1. Disclosure of ‘Checkpoint’ snapshots used during training
- These snapshots capture the model’s state at various training stages, objectively showing when and how the model evolved.
- If there is a sudden jump in performance at a certain point, it might hint at fine-tuning based on an external model.
2. Full release of ‘Wandb’ training logs
- Wandb (Weights & Biases) records and monitors every detail of a machine learning training run.
- Transparency in learning rates, loss values, accuracy, and training curves over time will allow verification of whether the model was genuinely trained from scratch.
3. Objective verification of the training path and independence
- Comprehensive data confirming that the entire training process was conducted independently.
- This will play a critical role in validating Upstage’s claim of from-scratch development.
Raising the Bar for Industry-wide Verification Standards
The lesson from the Upstage Solar-Open-100B controversy goes beyond a single company’s credibility. Industry insiders see this as a chance to “raise the verification standards across the entire domestic AI foundation model sector.”
Given that this project received government funding, other participant firms are expected to face increased accountability regarding training origins and development processes in their model disclosures. Transparency and verifiability are becoming no longer optional but essential requirements.
A Crossroads for Technical Originality and Trustworthiness
At the same time, Upstage is aiming to go public in late 2026, on the verge of reaching a 1 trillion KRW company valuation. In this context, the controversy has become a critical crossroads where Upstage must prove its technological originality and reliability.
Whether the checkpoints and training logs are transparently disclosed will likely determine Upstage’s future credibility and market standing. Providing real proof of from-scratch development could set a benchmark for domestic AI self-reliance. On the other hand, confirming the suspicions would raise profound questions about the transparency and verification frameworks of government-backed projects.
The moment of truth is fast approaching.
Section 5: The Crossroads of AI Technological Originality and Lessons for the Future
The Upstage Solar-Open-100B controversy goes beyond merely questioning the technical credibility of a single company. It poses a fundamental question facing the domestic AI ecosystem. How do we define and verify technological originality? The answer to this question could determine the future of Korea’s AI industry.
The Standard for AI Technological Originality: From Ambiguity to Clarity
Until now, the concept of “originality” in the Korean AI industry has been somewhat vague. Before the Upstage Solar-Open-100B controversy erupted, many companies took it for granted to fine-tune models based on open-source frameworks. However, projects supported by the government—especially those initiated under the clear goal of a “proprietary AI foundation model project”—require a different standard.
At the heart of the controversy lies a 98.9% cosine similarity, an exceedingly high correspondence that cannot be attributed to chance. This raises strong suspicions of technical imitation rather than mere “fine-tuning.” On the other hand, Upstage’s claim of developing the model “from scratch” implies that every step—from data collection and model architecture design to training—was carried out independently.
At this juncture where these two claims clash, Korea’s AI industry verification standards are facing a pivotal transformation.
Government-Supported Projects and the Responsibility of Transparency
The proprietary AI foundation model project, involving Upstage, Naver Cloud, SK Telecom, NC AI, and LG AI Research, is a public endeavor funded by taxpayers. This demands a level of accountability beyond that of private companies’ independent R&D efforts.
When public funds are invested, transparency and originality in the development process are not optional—they are imperative. Upstage’s decision to fully disclose training checkpoints and wandb logs is a step toward fulfilling this responsibility. Once these datasets are made public, the timeline and independence of the model’s training can be traced objectively.
The lesson for society from this controversy is clear: AI projects funded by public money must adopt transparency and verification standards commensurate with that investment.
Divergent Expert Opinions and the Importance of Technical Interpretation
Interestingly, Kevin Ko, a domestic open-source AI expert, pointed out that this analysis could represent a “statistical indicator interpretation error.” This highlights that interpretation may vary even among specialists reviewing the same data.
Such divergent opinions underscore the need for a more sophisticated verification system in our industry. Simple statistical metrics alone are insufficient; comprehensive technical analysis and independent expert validation are essential. In particular, the training logs and checkpoints that Upstage will release are key materials enabling multi-faceted verification.
The Link Between Corporate Credibility and Ecosystem Health
Upstage aims to go public in the second half of 2026 with a target valuation of 1 trillion won. At this critical moment, proving technological originality and credibility is not simply about corporate reputation—it affects the trustworthiness of Korea’s entire AI startup ecosystem.
If the Solar-Open-100B controversy remains unresolved or if transparency shortcomings emerge during the process, the international credibility of domestic AI companies could suffer a serious blow. Conversely, establishing transparent verification procedures as a result of this controversy could propel Korea’s AI industry to become a more globally trusted ecosystem.
Elevating Industry-Wide Verification Standards
Industry insiders foresee this controversy as an opportunity to raise the verification standards across the entire proprietary AI foundation model initiative. This is a very positive signal.
Moving forward, other participating companies will likely be required to:
- Clearly specify sources and composition of training data
- Provide technical evidence for each stage of model development
- Conduct and disclose similarity checks with external models
- Accept independent expert validation
While these heightened standards may impose short-term burdens on companies, they will ultimately represent an investment that elevates Korea’s AI industry credibility to a world-class level.
The Critical Choice We Face
At this very moment, Upstage and the entire Korean AI industry stand at a crucial crossroads. Will they embrace transparency and rigorous verification, or maintain the existing ambiguous standards?
This is not merely a corporate decision. It is a defining juncture that will determine whether Korea’s global AI competitiveness will be shaped by technological trustworthiness—or remain stuck in a cycle of imitation and doubt. How transparently and convincingly Upstage conducts its public verification will serve as the industry’s answer to this pivotal question.
Comments
Post a Comment