KG-BERT의 문제점
Link Prediction (LP)
Relation Prediction (RP)
Relevance Ranking (RR)
Datasets
Baseline
Settings
Evaluation
Main Results

Multi-Task Learning for Knowledge Graph Completion with Pre-trained Language Models

키워드	CompletionGraphKGKGCPLM
year	2020
저자	Bosung Kim et al.
Venue	COLING 2020
Memo	LR-RP-RR. KG-BERT에 멀티 테스크 러닝을 붙임.
분류	연구
DONE
생성 일시	@2023년 11월 27일 오전 4:09
최종 편집 일시	@2023년 11월 27일 오후 1:05
Working

@inproceedings{Kim2020MultiTaskLF,
  title={Multi-Task Learning for Knowledge Graph Completion with Pre-trained Language Models},
  author={Bosung Kim and Taesuk Hong and Youngjoong Ko and Jungyun Seo},
  booktitle={International Conference on Computational Linguistics},
  year={2020},
  url={https://api.semanticscholar.org/CorpusID:227231134}
}

PKGC에서 baseline으로 사용된 PLM-based KGC 모델 중 하나이다.

Introduction

KG-BERT의 문제점

KGs의 많은 릴레이션 정보들을 놓친다. (Binary cross entropy loss만 사용)

언어적으로 유사한 후보군들 사이에서 정답을 고르기 어렵다.
예) 주어진 head, relation : (take a breather, derivationally related for, _ )
정답 tail : breathing time
KG-BERT의 예측 tail : snorkel breather, breath
breath 와 언어적으로 유사하기 때문

이를 해결하기 위해 multi-task learning 방식을 도입했다.

Methodology

KGC를 위한 multi-task learning

MT-DNN의 multi-task learning 프레임워크를 따르고, pre-trained BERT를 공유 레이어로 사용했다. Link prediction, relation prediction, relevance ranking, 세가지 테스크를 합침.

각 테스크들마다 classification layer $encoding="application/x-tex">W\in \R^{K\times H}</annotation></semantics></math>$ 를 갖는다.

$K$ : 레이블의 수

$H$ : BERT의 은닉 크기

입력 시퀀스에는 문장 가장 앞머리에 $encoding="application/x-tex">[\texttt{CLS}]</annotation></semantics></math>$ 토큰이 있고, $encoding="application/x-tex">[\texttt{SEP}]</annotation></semantics></math>$ 토큰이 seperator로 사용됐다.

$S$ : Training set of triple

$S^{'}$ : Negative triple set

$x$ : Input. text sequence of $(h, r, t)$ .

$C$ : $encoding="application/x-tex">[\texttt{CLS}]</annotation></semantics></math>$ 토큰의 최종 벡터

$W$ : classification layer

각 엔티티들은 entity name 과 description으로 표현된다.

예시) Triple : (plant tissue, hypernym, plant structure)

input sequence $x$ : $encoding="application/x-tex">[\texttt{CLS}]</annotation></semantics></math>$ plant tissue, the tissue of a plant $encoding="application/x-tex">[\texttt{SEP}]</annotation></semantics></math>$ hypernym $encoding="application/x-tex">[\texttt{SEP}]</annotation></semantics></math>$ plant structure, any part of a plant or fungus $encoding="application/x-tex">[\texttt{SEP}]</annotation></semantics></math>$

Link Prediction (LP)

메인 테스크.

Training : 주어진 트리플이 옳은 것인지 아닌지에 대해 훈련한다. (Binary cross entropy)

Negative triple은 head나 tail을 랜덤한 엔티티로 대체해서 만든다.

f(x)=softmax(CWTLP)=[^y0,^y1],LLP=−∑x∈{S∪S′}ylog^y1+(1−y)log^y0<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>f</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mtext>softmax</mtext><mo stretchy="false">(</mo><mi>C</mi><msubsup><mi>W</mi><mrow><mi>L</mi><mi>P</mi></mrow><mi>T</mi></msubsup><mo stretchy="false">)</mo><mo>=</mo><mo stretchy="false">[</mo><mover accent="true"><msub><mi>y</mi><mn>0</mn></msub><mo>^</mo></mover><mo separator="true">,</mo><mover accent="true"><msub><mi>y</mi><mn>1</mn></msub><mo>^</mo></mover><mo stretchy="false">]</mo><mo separator="true">,</mo><mspace linebreak="newline"></mspace><msub><mi mathvariant="script">L</mi><mrow><mi>L</mi><mi>P</mi></mrow></msub><mo>=</mo><mo>−</mo><munder><mo>∑</mo><mrow><mi>x</mi><mo>∈</mo><mo stretchy="false">{</mo><mi>S</mi><mo>∪</mo><msup><mi>S</mi><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msup><mo stretchy="false">}</mo></mrow></munder><mi>y</mi><mi>log</mi><mo>⁡</mo><mover accent="true"><msub><mi>y</mi><mn>1</mn></msub><mo>^</mo></mover><mo>+</mo><mo stretchy="false">(</mo><mn>1</mn><mo>−</mo><mi>y</mi><mo stretchy="false">)</mo><mi>log</mi><mo>⁡</mo><mover accent="true"><msub><mi>y</mi><mn>0</mn></msub><mo>^</mo></mover></mrow><annotation encoding="application/x-tex">f(x) = \text{softmax}(CW^T_{LP}) = [\hat{y_0},\hat{y_1}],\\ \mathcal{L}_{LP} = - \sum_{x\in\{S\cup S'\}} y\log{\hat{y_1}} + (1-y) \log{\hat{y_0}}</annotation></semantics></math>

$encoding="application/x-tex">W_{LP} \in \R^{2 \times H}</annotation></semantics></math>$ : Link prediction을 위한 classification layer

$f (x)$ : 모델의 최종 출력

$\in \{0,1\}</annotation></semantics></math>$ : label

$encoding="application/x-tex">[s_0, s_1] \in \R^2</annotation></semantics></math>$ : $encoding="application/x-tex">CW^T_{LP}</annotation></semantics></math>$ 의 출력. $encoding="application/x-tex">s_1</annotation></semantics></math>$ 은 평가에서 최종 ranking score로 사용된다.

Relation Prediction (RP)

Input : head, tail sequence

$encoding="application/x-tex">[\texttt{CLS}]</annotation></semantics></math>$ plant tissue, the tissue of a plant $encoding="application/x-tex">[\texttt{SEP}]</annotation></semantics></math>$ plant structure, any part of a plant or fungus $encoding="application/x-tex">[\texttt{SEP}]</annotation></semantics></math>$

Training : 위의 시퀀스가 주어졌을 때, 모델은 relation hypernym 을 예측하도록 훈련된다. (Cross entropy)

\text{softmax}(CW^T_{RP}),\\ \mathcal{L}_{RP} = - \sum_{x\in S} y\log g(x)</annotation></semantics></math>

$encoding="application/x-tex">W_{RP} \in \R^{R \times H}</annotation></semantics></math>$ : Relation prediction을 위한 classification layer.

$R$ : Relation의 개수

$g (x)$ : 모델의 최종 출력

$encoding="application/x-tex">y\in\R^R</annotation></semantics></math>$ : class indicator

Relevance Ranking (RR)

Positive triple들이 negative들 보다 더 높은 점수를 유지하기 위한 목적.

Training : 입력은 LR과 같다. (Margin ranking loss)

\text{sigmoid}(CW^T_{RR}) ,\\ \mathcal{L}_{RR} = - \sum_{x\in S , x'\in S'} \max{\{0,h'(x') - h(x) +\lambda\}}</annotation></semantics></math>

$encoding="application/x-tex">W_{RR} \in \R^{1 \times H}</annotation></semantics></math>$ : Relevance Ranking을 위한 classification layer.

$h (x)$ : 모델의 최종 출력

$encoding="application/x-tex">\lambda</annotation></semantics></math>$ : margin

훈련할 때, 각 테스크마다 미니 배치 $encoding="application/x-tex">D_{LP}, D_{RP}, D_{RR}</annotation></semantics></math>$ 을 구성해서 모든 데이터들을 합쳤다 $D_{LP} \cup D_{RP}\cup D_{RR}</annotation></semantics></math>$ . 각 훈련 스텝 마다 미니 배치는 $D$ 에서 랜덤하게 선택되고, 배치에 해당되는 테스크가 순차적으로 훈련된다.

Experiments

Datasets

WN18RR (Dettmers et al., 2018)

WordNet의 부분집합. 영어의 언어 데이터베이스.

엔티티 : 단어 또는 짧은 구문. Definition - Synset definition

릴레이션 : 11가지

FB15k-237 (Toutanova and Chen, 2015)

Freebase (Bollacker et al., 2008)의 부분집합. 일반적인 지식을 포함하는 거대한 크기의 그래프 데이터베이스.

엔티티 : 더 일반적인 엔티티들을 갖는다. Definition - descriptions from Xie et al. (2016)

릴레이션 : WN18RR보다 더 길고 복잡함.

Baseline

KG-BERT (Yao et al., 2019)

TransE (Bordes et al., 2013)

DistMult (Yang et al., 2014)

ComplEx (Trouillon et al., 2016)

ConvE (Dettmers et al., 2018)

RotatE (Sun et al., 2019)

Settings

Pre-trained BERT-base

Fine-tune : 멀티테스크 구조에 3 epoch로

mini-batch size : 32

Adam optimizer (Kingma and Ba, 2014), learning rate : 2e-5

$encoding="application/x-tex">\lambda </annotation></semantics></math>$ : 0.1

Evaluation

Mean Rank (MR)

Mean Reciprocal Rank (MRR)

Hits@1, 3, 10

Main Results

KG-BERT와 달리 breathing time 이라는 정답을 찾을 수 있다.

다른 영역에서는 SOTA를 달성했지만, FB15k-237 Hits@10은 RotatE의 성능이 가장 높았다.

이에 대해 저자들은 FB15k-237은 WN18RR 보다 더 많은 릴레이션과 복잡한 그래프 구조를 갖고 있기 때문에, Table 4 의 결과를 통해 PLMs는 KG의 복잡한 구조적 정보를 포착하지 못한다고 추측했다.

728x90

저작자표시 비영리 (새창열림)

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

[논문 리딩] Multi-Task Learning for Knowledge Graph Completion with Pre-trained Language Models