What's a deepseek? Sounds like a search engine?
Memes
Rules:
- Be civil and nice.
- Try not to excessively repost, as a rule of thumb, wait at least 2 months to do it if you have to.
Deepseek is a Chinese AI company that released Deepseek R1, a direct competitor to ChatGPT.
Nice! What are they competing for? I'm new to this AI business thing.
So far, they are training models extremely efficiently while having US gatekeeping their GPUs and doing everything they can to slow their progress. Any innovation in having efficient models to operate and train is great for accessibility of the technology and to reduce the environment impacts of this (so far) very wasteful tech.
Deepseek is a Chinese AI company
Oh. So, military, then.
You can say the same thing about any US AI company. Of course the local terrorists want in
Deepseek collects and process all the data you sent to their LLN even from API calls. It is a no-go for most of businesses applications. For example, OpenAI and Anyhropic do not collect or process anyhow data sent via API and there is an opy-ouy button in their settings that allows to avoid processing of the data sent via UI.
You can run 'em locally, tho, if their gh page is to be believed. And this way you can make sure nothing gets even sent to their servers, and not just believe nothing is processed.
I got it running with ollama locally, works as advertised
DeepSeek is an open source project that anybody can run, and it's performant enough that even running the full model is cheap enough for any company to do.
Since it's open source is there a way for companies to adjust so it doesn't intentionally avoid dating anything bad about China?
It should be repeated: no American corporation is going to let their employees put data into DeepSeek.
Accept this truth. The LLM you can download and run locally is not the same as what you’re getting on their site. If it is, it’s shit, because I’ve been testing r1 in ollama and it’s trash.
It should be repeated: anybody can run DeepSeek themselves on premise. You have absolutely no clue what you're talking about. Keep on coping there though, it's pretty adorable.
Ok, I still don't trust them.. especially when they have a former NSA chief working at their board of directors
why are you so heavily and openly advertising Deepseek?
Because it's an open source project that's destroying the whole closed source subscription AI model.
I don't think you or that Medium writer understand what "open source" means. Being able to run a local stripped down version for free puts it on par with Llama, a Meta product. Privacy-first indeed. Unless you can train your own from scratch, it's not open source.
Here's the OSI's helpful definition for your reference https://opensource.org/ai/open-source-ai-definition
You can run the full version if you have the hardware, the weights are published, and importantly the research behind it is published as well. Go troll somewhere else.
All that is true of Meta's products too. It doesn't make them open source.
Do you disagree with the OSI?
What part of OSI are you claiming DeepSeek doesn't satisfy specifically?
The data part. ie the very first part of the OSI's definition.
It's not available from their articles https://arxiv.org/html/2501.12948v1 https://arxiv.org/html/2401.02954v1
Nor on their github https://github.com/deepseek-ai/DeepSeek-LLM
Note that the OSI only ask for transparency of what the dataset was - a name and the fee paid will do - not that full access to it to be free and Free.
It's worth mentioning too that they've used the MIT license for the "code" included with the model (a few YAML files to feed it to software) but they have created their own unrecognised non-free license for the model itself. Why they having this misleading label on their github page would only be speculation.
Without making the dataset available then nobody can accurately recreate, modify or learn from the model they've released. This is the only sane definition of open source available for an LLM model since it is not in itself code with a "source".
Uh yeah, that's because people publish data to huggingface. GitHub isn't made for huge data files in case you weren't aware. You can scroll down to datasets here https://huggingface.co/deepseek-ai
That's the "prover" dataset, ie the evaluation dataset mentioned in the articles I linked you to. It's for checking the output, it is not the training output.
It's also 20mb, which is miniscule not just for a training dataset but even as what you seem to think is a "huge data file" in general.
You really need to stop digging and admit this is one more thing you have surface-level understanding of.
Do show me a published data set of the kind you're demanding.
Since you're definitely asking this in good faith and not just downvoting and making nonsense sealion requests in an attempt to make me shut up, sure! Here's three.
https://github.com/togethercomputer/RedPajama-Data
https://huggingface.co/datasets/legacy-datasets/wikipedia/tree/main/
Oh, and it's not me demanding. It's the OSI defining what an open source AI model is. I'm sure once you've asked all your questions you'll circle back around to whether you disagree with their definition or not.
So you found a legacy data set that's been released nearly a year ago as your best example. Thanks for proving my point. And since you obviously know what you're talking about, do explain to the class what stops people from using these data sets to train a DeepSeek model?
The most recent crawl is from December 15th
https://commoncrawl.org/blog/december-2024-crawl-archive-now-available
You don't know, and can't know, when DeepSeeker's dataset is from. Thanks for proving my point.
What I do know is that you can take DeepSeek model and train it on this open crawl to get a fully open model. I love how you ignored this part in your reply being the clown that you are.
I ignored the bit you edited in after I replied? And you're complaining about ignoring questions in general? Do you disagree with the OSI definition Yogsy? You feel ready for that question yet?
What on earth do you even mean "take a model and train it on thos open crawl to get a fully open model"? This sentence doesn't even make sense. Never mind that that's not how training a model works - let's pretend it is. You understand that adding open source data to closed source data wouldn't make the closed source data less closed source, right?.. Right?
Thank fuck you're not paid real money for this Yiggly because they'd be looking for their dollars back
Why would you lie about something with timestamps. I edited 18 min ago, and you replied 17 min ago. 🤡
Do you disagree with the OSI definition Yogsy? You feel ready for that question yet?
I already answered this question earlier in the thread, but clearly your reading comprehension needs some work.
What on earth do you even mean “take a model and train it on thos open crawl to get a fully open model”?
I'm talking about taking the code that DeepSeek released publicly, and training it on the open source data that's available. That's what model training is. The fact that this needs to be spelled out for you is amazing.
You understand that adding open source data to closed source data wouldn’t make the closed source data less closed source, right?.. Right?
What closed source data are you talking about, nobody is suggesting this.
Thank fuck you’re not paid real money for this Yiggly because they’d be looking for their dollars back
You sound upset there little buddy. I guess misspelling my handle was the peak insult you could muster. Really showing your intellectual prowess there champ.
I take more than a minute on my replies Autocorrect Disaster. You asked for information and I treat your request as genuine because it just leads to more hilarity like you describing a model as "code".
The only hilarity here is you exposing yourself as being utterly clueless on the subject you're attempting to debate. A model is a deep neural network that's generated by code through reinforcement training on the data. Evidently you don't understand this leading you to make absurd statements. I asked you for information because I knew you were a troll and now you've confirmed it.
I understand it completely in so much that it's nonsensically irrelevant - the model is what you're calling open source, and the model is not open source because the data set not published or recreateable. They can open source any training code they want - I genuinely haven't even checked - but the model is not open source. Which is my point from about 20 comments ago. Unless you disagree with the OSI's definition which is a valid and interesting opinion. If that's the case you could have just said so. OSI are just of dudes. They have plenty of critics in the Free/Open communities. Hey they're probably American too if you want to throw in some downfall of The West classic hits too!
If a troll is "not letting you pretend you have a clue what you're talking about because you managed to get ollama to run a model locally and think it's neat", cool. Owning that. You could also just try owning that you think its neat. It is. It's not an open source model though. You can run Meta's model with the same level of privacy (offline) and with the same level of ability to adapt or recreate it (you can't, you don't have the full data set or steps to recreate it).
I never disagreed that you can run Meta's model with the same level of privacy, so don't know why you keep bringing that up as some sort of gotcha. The point about DeepSeek is its efficiency. OSI definition for open source is good, and it does look like you're right that the full data set is not available. However, the real question is why you'd be so hung up on that.
Given that the code for training a new model is released, and it can be applied to open data sets, that means it's perfectly possible to make a version that's trained on open data that would check off the final requirement you keep bringing up. Also, adapting it does not require having the original training set since it's done by tuning the weights in the network itself. Go read up on how LoRA works for example.
So... as far as I understand from this thread, it's basically a finished model (llama or qwen) which is then fine tuned using an unknown dataset? That'd explain the claimed 6M training cost, hiding the fact that the heavy lifting has been made by others (US of A's Meta in this case). Nothing revolutionary to see here, I guess. Small improvements are nice to have, though. I wonder how their smallest models perform, are they any better than llama3.2:8b?
What's revolutionary here is the use of mixture-of-experts approach to get far better performance. While it has 671 billion parameters overall, it only uses 37 billion at a time, making it very efficient. For comparison, Meta’s Llama3.1 uses 405 billion parameters used all at once. It does as well as GPT-4o in the benchmarks, and excels in advanced mathematics and code generation. It also has 128K token context window means it can process and understand very long documents, and processes text at 60 tokens per second, twice as fast as GPT-4o.
I think deepseek opens up new efficient ways for LLM training which in turn increases competition.
Paid influencers are subtle.