DeepSeek-affiliated Hangzhou DeepSeek AI Fundamental Technology Research Co.,Adult | Adult Movies Online Ltd. today filed a patent for a new web data collection system designed to improve efficiency and data quality. The patent outlines a method for discovering more webpage links while minimizing website traffic impact. It assesses downloaded content to predict the quality of undiscovered links, prioritizing high-value data and reducing redundant downloads. Efficient web data collection is crucial for training large language models (LLMs), which power AI systems like ChatGPT. Existing techniques struggle with incomplete link retrieval, excessive downloads that can crash websites, and low-quality data filtering. DeepSeek’s proposed system aims to solve these issues by optimizing data allocation and maintaining metadata accuracy. [iThome, in Chinese]
Related Articles
2025-06-26 08:00
1274 views
NYT Strands hints, answers for May 18
If you're reading this, you're looking for a little help playing Strands, the New York Times' elevat
Read More
2025-06-26 07:42
70 views
Ubisoft's 'Starlink' sounded like the worst idea at E3, but we were all wrong
I can safely say I'm a Starlink: Battle for Atlasbeliever now. It took a minute. The game's E3 2017
Read More
2025-06-26 06:49
608 views
Prince Charles is a brilliant Harry Potter reader who does 'all the voices'
The nation's favourite grandad is apparently a very gifted mimic who entertains children with spellb
Read More