The development of a new AI bot by a tiny Chinese company has been accompanied by tumbling stock market values and outrageous claims. What makes it so unique?
The reason behind this turmoil? The “large language model” ( LLM) that powers the app is reportedly less expensive to train and operate but has reasoning capabilities comparable to those of US models like OpenAI’s o1.
Analysis
Dr. Andrew Duncan is the chairman of the Alan Turing Institute in London, UK, for science and innovation important AI.
DeepSeek claims to have accomplished this by utilizing a number of specialized methods that reduced both the amount of memory needed to store and the processing time required to train its design ( also known as R1 ). According to DeepSeek, the decline of these overheads resulted in a serious cost reduction. At the same time, R1’s base model V3 reportedly required 2.788 million hours to train ( running across numerous graphical processing units – GPUs ), which is estimated to cost less than$ 6 million ( £4.8 million ), compared to the more than$ 100 million ( £80 million ) that OpenAI boss Sam Altman claims was needed to train GPT-4.
Despite the strike taken to Nvidia’s market value, the DeepSeek models were trained on around 2, 000 Nvidia H800 GPUs, according to one study report released by the company. These cards are a modified version of the popular H100 device, made in accordance with China’s import regulations. These were good in stock before the Biden presidency tightened the restrictions more, which properly forbade Nvidia from exporting the H800s to China in October 2023. Working within these considerations, it is likely that DeepSeek has been forced to find creative ways to make the best use of the tools it has at hand.
Reducing the mathematical costs of running and training designs may also address concerns about AI’s potential negative effects on the environment. The information centers they run on have a lot of electricity and water needs, primarily to prevent the servers from overheating. While the majority of tech companies do not make their designs ‘ carbon footprints public, according to a new estimate, ChatGPT’s regular carbon dioxide emissions are the equivalent of 260 flights from London to New York. Therefore, from an economic perspective, increasing the effectiveness of AI models may be beneficial for the business.
Of course, whether DeepSeek’s models do provide real-world savings in electricity remains to be seen, and it’s also unclear if cheaper, more efficient AI could lead to more people using the model, and thus an increase in overall energy intake.
If nothing else, it might help advance the issue of green AI at the approaching Paris AI Action Summit so that the AI tools we use in the upcoming are also environmentally friendly.
Some people have been left shocked by DeepSeek’s rapid adoption of a dynamic, large-language business model; Liang Wenfeng was the only person to be recognized as an” AI hero” in China at the time of his founding.
The design is made up of a number of much smaller models, each with expertise in a particular field.
The most recent DeepSeek model is also notable for having its “weights,” which are the numeric parameters of the design obtained from the coaching process, boldly disclosed, along with a professional paper describing the model’s development process. This enables different organizations to use their own equipment and adjust the model to a variety of tasks.
Researchers around the world can now look under the woman’s bonnet to learn what makes it tick, in contrast to OpenAI’s o1 and o3, which are essentially dark containers. However, some details remain incomplete, such as the datasets and the code used to teach the models, but groups of researchers are now attempting to piece these collectively.
Not all of DeepSeek’s cost-cutting techniques are fresh either – some have been used in various LLMs. Mistral AI made a public release of its Mixtral 8x7B concept in 2023 that was comparable to the more advanced models at the time. Both Mixtral and DeepSeek models make use of the “mixture of experts” approach, which consists of a group of many smaller models each with expertise in a particular field. Given a job, the combination model assigns it to the most competent “expert”.
DeepSeek has also made its unsuccessful attempts to improve LLM reasoning using different technical methods, including Monte Carlo Tree Search, a method that has long been touted as a potential method to control the LLM’s reasoning process. Researchers will use this data to research how the model’s now amazing problem-solving abilities can be even more advanced, improvements that are likely to be used in the next generation of AI models.
Getty Images
What does all of this mean for the AI industry’s coming?
DeepSeek might be demonstrating that you don’t require a lot of money to create powerful AI models. As businesses discover ways to make model training and operation more effective, I’d venture to see very worthy AI models being created with always fewer resources.
Up until now, the AI landscape has been dominated by” Great Software” companies in the US – Donald Trump has called the fall of DeepSeek” a wake-up call” for the US tech industry. However, this development may not always be negative for Nvidia in the long run because businesses and governments will be able to follow AI technology more quickly as the cost of developing it decreases. That will in turn drive demand for new products, and the cards that strength them – and so the cycle continues.
Smaller businesses like DeepSeek will likely enjoy a growing part in developing AI tools that have the power to simplify our lives, according to what we think. That would be an error to understate.
For more science, engineering, environment and health reports from the BBC, follow us on , Facebook,  , X , and , Instagram.