Leigh-Anne Pinnock on fake news: “Do your own research”

2026年2月8日 · 吴鹏 · 来源：dev资讯

Медведев вышел в финал турнира в Дубае17:59

「過去人們認為，如果你告訴AI它是一位數學教授，例如，它在回答數學問題時實際上會更準確。」桑德·舒爾霍夫（Sander Schulhoff）說。他是一位企業家和研究員，也是「提示工程」理念的推廣者。但舒爾霍夫和其他人表示，當你尋找資訊或提出只有一個正確答案的問題時，角色扮演反而會降低AI模型的準確性。

A01头版，更多细节参见safew官方版本下载

Initially I aimed to test with at least 10 formulas for each model for SAT/UNSAT, but it turned out to be more expensive than I expected, so I tested ~5 formulas for each case/model. First, I used the openrouter API to automate the process, but I experienced response stops in the middle due to long reasoning process, so I reverted to using the chat interface (I don't if this was a problem from the model provider or if it's an openrouter issue). For this reason I don't have standard outputs for each testing, but I linked to the output for each case I mentioned in results.

At some point I realized I could run tests forever. And I had already done that last year, and wrote it up in blog posts (one and two). Doing it again here didn’t seem especially valuable. So I pivoted to a “how to” page. In redesign 3 I decided to show the concepts, then a JavaScript implementation using CPU rendering, and then another implementation using GPU rendering. I made new versions of the diagrams: