[논문 리딩] Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model키워드LLMyear2023저자Rafael Rafailov et al.VenueArXivMemoDPO. 분류연구DONE생성 일시@2023년 11월 19일 오후 5:54최종 편집 일시@2023년 11월 20일 오후 12:08Working@article{Rafailov2023DirectPO, title={Direct Preference Optimization: Your Language Model is Secretly a Reward Model}, author={Rafael Rafailov and Archit Sharma and Eric Mitchell and Stefano..