Daily Productive Sharing 691

One helpful tip per day:)

Jacob Kaplan-Moss explained that often, hard work seems like magic:

Some magic tricks may look glamorous, but they require a lot of time and effort for preparation;
The same applies to software development. Although we have been pursuing automation, sometimes only tedious manual work can solve the problem.

The recently popular ChatGPT was successful by adding reinforcement learning with human feedback (RLHF) to the basic GPT model. GPT training is not new, and reinforcement learning is not new either. What's new is adding human feedback to reinforcement learning. Human feedback here requires a lot of manpower to label data and score the data generated by the model. In this sense, grinds made the miracle.

If you enjoy today's sharing, why not subscribe

Need a superb CV, please try our CV Consultation

Jacob Kaplan-Moss 解释道，很多时候，苦差看起来像是魔术：

有些魔术看起来很光鲜，但是前期的准备要花费大量的时间和精力；
在软件开发中也是如此，虽然我们一直在追求自动化，但是有些时候，只有不厌其烦的手工活才能解决问题。

最近大火的 ChatGPT 是在基础的 GPT 模型上，加入了 reinforcement learning with human feedback (RLHF) 才成功的。GPT 的训练不是新鲜事，强化学习 (reinforcement learning) 也不是新鲜事，新鲜的是，要把 human feedback 加入到强化学习里。这里的 human feedback 就是需要大量人力来标注数据，来给模型生成的数据打分。所以本质上也是苦差出奇迹。

如果你喜欢我们的内容，不如支持我们 :)

需要更棒的简历，不妨试试我们的 CV Consultation