Daily Productive Sharing 674 - How OpenAI Built GPT Models?

One helpful tip per day:)

In this article, OpenAI introduces in plain language how they built the underlying model for ChatGPT:

First, they asked the annotators to answer some questions and used these questions and answers to iterate the GPT-3 model.
Then, they had the model generate several answers to a question and asked the annotators to rate these answers. Using this data, they trained a reward model.
Next, they had the model generated new answers to the questions based on the iteration from the first step and then used the reward model from the second step to score the answers. In other words, the two models formed an adversarial relationship and used reinforcement learning to further iterate the model from the first step.

If you enjoy today's sharing, why not subscribe

Need a superb CV, please try our CV Consultation

OpenAI 在这篇文章中用通俗易懂的语言介绍了他们如何构建 ChatGPT 的底层模型：

首先他们让标注者回答一些问题，把这些问题和回答拿去迭代 GPT-3 模型；
然后他们让模型给一个问题生成若干答案，让标注者给这些答案打分。利用这些数据，他们训练了一个 reward 模型；
然后他们让第一步迭代后的模型根据问题生成新的答案，然后利用第二步中的 reward 模型进行打分。也就是两个模型形成了对抗，用强化学习进一步迭代第一步中的模型。

如果你喜欢我们的内容，不如支持我们 :)

需要更棒的简历，不妨试试我们的 CV Consultation