qrios

?>

Xiwei

2025-02-03 06:34:03

Underthinking of o1-like LLMs

diligentgrasshopper

2025-01-29 13:32:50

good shit

segmond

2025-02-03 00:26:00

20 yrs in jail or $1 million for downloading Chinese models proposed at congress

TheMikeans

2025-02-02 20:50:38

Higher Parameters with Lower Quant: Is It Better?

Qaxar

2025-02-02 20:12:17

DeepSeek-R1 fails every safety test. It exhibits a 100% attack success rate, meaning it failed to block a single harmful prompt.

Charuru

2025-01-31 15:50:54

GPU pricing is spiking as people rush to self-host deepseek

AaronFeng47

2025-01-31 14:03:45

Mistral Small 3 24B GGUF quantization Evaluation results

magicduck

2025-01-30 22:30:25

Mistral Small 3 knows the truth

BarnardWellesley

2025-01-29 00:10:23

How can MOEs outperform dense models when activated params are 1/16th?

robertpiosik

2025-01-28 23:22:57

Please explain what Mark Chen meant by "misalignment" by supervising CoTs? How am I losing supervising R1's CoTs?

Slasher1738

2025-01-28 20:00:18

DeepSeek's AI breakthrough bypasses Nvidia's industry-standard CUDA, uses assembly-like PTX programming instead

FullstackSensei

2025-01-27 21:13:50

Meta is reportedly scrambling multiple ‘war rooms’ of engineers to figure out how DeepSeek’s AI is beating everyone else at a fraction of the price

noblex33

2025-01-28 09:04:57

Trump to impose 25% to 100% tariffs on Taiwan-made chips, impacting TSMC

mayalihamur

2025-01-26 13:19:03

Financial Times: "DeepSeek shocked Silicon Valley"

AaronFeng47

2025-01-26 00:26:33

7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient

trenmost

2025-01-26 12:51:27

How better is Deepseek r1 compared to llama3? Both are open source right?

frivolousfidget

2025-01-26 15:58:55

Model training data density

qrios

2025-01-25 00:49:16

R1 R1 R1 R1 R1 R̸1̶ ̷ R̴̩͘1̴̟̐ ̷͚̽R̷͖̈́1̴̠̎ ̸̛̥R̴͎̂1̶̤̓ ̷͓̒ R̶̲̓̋1̵̭̤̊ ̶̖̩̫͖̄̌̐R̸̼̲̗̦͉̃͋̌̓1̵̗̲͕͆̒̚͜ ̵͇̪͉̲̺͂͐͘͠ͅR̸̖̩̱͖͌ ̵̨̢̫̼͇͈̤̱̄́̎̅͛̑́͗͒͌͜R̵̨̡̛̦̜̖̭̪̪̮̰͒̄̃̎͗̈̂͒̆1̷̧͙̖̲̻̪͔̭̤̟̑̿ ̴̢̛̛̛̭̞͔͙̯̄̅̽̂̑͐̅̅̽̓͛̋͒̓͋̄̉̆̿͐̀͆̓͋̉̏͌́̃͋͌̒̂̓̿̒̚̚

yanjb

2025-01-24 13:50:52

8xB200 - Fully Idle for the Next Few Weeks - What Should I Run on It?

Divergence1900

2025-01-24 14:09:51

How is DeepSeek chat free?

segmond

2025-01-24 22:37:30

For those planning to, What's your plan if you can't get a 5000 series GPU?

qrios

2024-12-04 18:54:17

notebookLM's Deep Dive podcasts are refreshingly uncensored and capable of a surprisingly wide variety of sounds.

Xhehab_

2025-01-24 15:27:34

Llama 4 is going to be SOTA

Optimal_Hamster5789

2025-01-23 17:15:55

Meta panicked by Deepseek

yoop001

2025-01-18 23:55:33

We need to be able to train models on consumer-grade hardware