NLP处理阶段

词法:切分为token

uneasy” can be broken into two sub-word tokens as “un-easy”.

句法:1.检查句子结构有问题与否;2.形成一个能够体现词间句法关系的结果

eg: “The school goes to the boy”

语义:语义是否正确

semantic analyzer would reject a sentence like “Hot ice-cream”

Pragmatic :歧义,中选择一个意思

知识图谱

存储提取的信息的一种方式。存储结构一般包括:a subject, a predicate and an object(主谓宾)

这些技术用于构建知识图谱

 sentence segmentation, dependency parsing, parts of speech tagging, and entity recognition.

抽取实体

从句子中抽取主语和宾语,需要特殊处理的是复合名称和修饰词。

抽取关系

从句子中提取“主要的”动词

完成此二者之后便可进行知识图谱的构建,构建时最好将每个关系单独构建一个图谱,这是为了更好可视化。

BERT

适用于少数据集,question answering and sentiment analysis 任务

遮掩形式下的看两边,而Transformer一次可以读取整个token,利用注意力机制学习词间关系。

这里预训练采用的是半监督方式(自监督)eg:完形填空,而微调时采用的是全标注的数据集

输入:一个分类标签[CLS]作为第一输入,随后跟上其他的word序列,   不断向上传,每一层都有注意力,通过前向传播,传给下一个编码器,从最后输出的CLS token.对应的值中可以获得分类信息

更具体的解释:输入是一下三种Embeddings的累加

the input representation for BERT: The input embeddings are the sum of the token embeddings, the segmentation embeddings and the position embeddings.

  • Token embeddings: A token is added to the input word tokens at the beginning of the first sentence and a token is inserted at the end of each sentence.
  • Segment embeddings: A marker indicating Sentence A or Sentence B is added to each token. This allows the encoder to distinguish between sentences.
  • Positional embeddings: A positional embedding is added to each token to indicate its position in the sentence.

对比:it takes both the previous and next tokens into account at the same time

典型代表:ELMo

  • 工作原理

    1. 训练一个从左到右的正向LSTM:这个LSTM读取句子,对于每个词,它生成一个只包含了左侧上下文信息的表示。

    2. 训练一个从右到左的反向LSTM:这个LSTM从句子末尾开始读,对于每个词,它生成一个只包含了右侧上下文信息的表示。

    3. 拼接:当需要处理一个词(例如“银行”)时,我们简单地将正向LSTM和反向LSTM在这个词上的输出向量拼接在一起。

非上下文模型,上下文模型(单向,双向)

 Context-free models like word2vec generate a single word embedding representation (a vector of numbers) for each word in the vocabulary.

context-based models generate a representation of each word that is based on the other words in the sentence.

“I accessed the bank account,” a unidirectional contextual model would represent “bank” based on “I accessed the” but not “account.” However, BERT represents “bank” using both its previous and next context — “I accessed the … account”

Transformer注意力机制的作用: 理解句中所有词间的关系(by dot),而不是凭借位置关系

For example, given the sentence, “I arrived at the bank after crossing the river”, to determine that the word “bank” refers to the shore of a river and not a financial institution, the Transformer can learn to immediately pay attention to the word “river” and make this decision in just one step.

采用Transformer的解码器,BERT是作为a language representation model,就是将语言转化为计算机能够处理的数值向量

解释15%随机位置(随机发生已经知道了)

To prevent the model from focusing too much on a particular position or tokens that are masked, the researchers randomly masked 15% of the words.

解释15%随机发生

he masked words were not always replaced by the masked tokens [MASK] because the [MASK] token would never appear during fine-tuning.

So, the researchers used the below technique:

  • 80% of the time the words were replaced with the masked token [MASK]
  • 10% of the time the words were replaced with random words
  • 10% of the time the words were left unchanged

BERT实践

处理输入:

1.第一个句子开头加入[CLS],句子间加入[SEP]

2.句子分为token

n. 训练流程