Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md

Name

Last commit message

Last commit date

Probe Task

BERT Rediscovers the Classical NLP Pipeline [ACL 2019] Ian Tenney, Dipanjan Das, Ellie Pavlick.
- Scalar Mixing Weights, which layers more important?
- Cumulative Scoring, how many layer need in that task?
Language Models as Knowledge Bases? [EMNLP 2019] Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel.
- Bert contain relational knowledge, even if without fine-tune.
- But the experimental can not verify this. Because of the Google-RE and T-REx are both part of Wikipedia which is the train set of BERT.
- maybe is co-occurrence patterns.
- the output of BERT is bigger, the more likely to be correct.
- by using pearson correlation coefficient, to explain the co-occurrence.
- ELMO is more like to BERT, even if the train set have no wikipedia.

Provide feedback