Underthinking of o1-like LLMs
good shit
20 yrs in jail or $1 million for downloading Chinese models proposed at congress
Higher Parameters with Lower Quant: Is It Better?
DeepSeek-R1 fails every safety test. It exhibits a 100% attack success rate, meaning it failed to block a single harmful prompt.
GPU pricing is spiking as people rush to self-host deepseek
Mistral Small 3 24B GGUF quantization Evaluation results
Mistral Small 3 knows the truth
How can MOEs outperform dense models when activated params are 1/16th?
Please explain what Mark Chen meant by "misalignment" by supervising CoTs? How am I losing supervising R1's CoTs?
DeepSeek's AI breakthrough bypasses Nvidia's industry-standard CUDA, uses assembly-like PTX programming instead
Meta is reportedly scrambling multiple ‘war rooms’ of engineers to figure out how DeepSeek’s AI is beating everyone else at a fraction of the price
Trump to impose 25% to 100% tariffs on Taiwan-made chips, impacting TSMC
Financial Times: "DeepSeek shocked Silicon Valley"
7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient
How better is Deepseek r1 compared to llama3? Both are open source right?
Model training data density
R1 R1 R1 R1 R1 R̸1̶ ̷ R̴̩͘1̴̟̐ ̷͚̽R̷͖̈́1̴̠̎ ̸̛̥R̴͎̂1̶̤̓ ̷͓̒ R̶̲̓̋1̵̭̤̊ ̶̖̩̫͖̄̌̐R̸̼̲̗̦͉̃͋̌̓1̵̗̲͕͆̒̚͜ ̵͇̪͉̲̺͂͐͘͠ͅR̸̖̩̱͖͌ ̵̨̢̫̼͇͈̤̱̄́̎̅͛̑́͗͒͌͜R̵̨̡̛̦̜̖̭̪̪̮̰͒̄̃̎͗̈̂͒̆1̷̧͙̖̲̻̪͔̭̤̟̑̿ ̴̢̛̛̛̭̞͔͙̯̄̅̽̂̑͐̅̅̽̓͛̋͒̓͋̄̉̆̿͐̀͆̓͋̉̏͌́̃͋͌̒̂̓̿̒̚̚
8xB200 - Fully Idle for the Next Few Weeks - What Should I Run on It?
How is DeepSeek chat free?
For those planning to, What's your plan if you can't get a 5000 series GPU?
notebookLM's Deep Dive podcasts are refreshingly uncensored and capable of a surprisingly wide variety of sounds.
Llama 4 is going to be SOTA
Meta panicked by Deepseek
We need to be able to train models on consumer-grade hardware