You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Scalar Mixing Weights, which layers more important?
Cumulative Scoring, how many layer need in that task?
Language Models as Knowledge Bases? [EMNLP 2019] Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel.
Bert contain relational knowledge, even if without fine-tune.
But the experimental can not verify this. Because of the Google-RE and T-REx are both part of Wikipedia which is the train set of BERT.
maybe is co-occurrence patterns.
the output of BERT is bigger, the more likely to be correct.
by using pearson correlation coefficient, to explain the co-occurrence.
ELMO is more like to BERT, even if the train set have no wikipedia.