NLP基础
NLP的核心任务:understanding and synthesizing NLP输入预处理 Tokenization Case folding 将输入统一大小写,以减少内存,提高效率 ,but可能创造歧义,so具体问题具体分析 For example "Green" (name) has a different meaning to "green" (colour) but both would get the same token if case folding is applied. Stop word removal 移除一些含义较少的词,同样提高效率,but可能造成语义不完整,具体问题具体分析 Examples include, "a", "the", "of", "an", "this","that".For some tasks like topic modelling (identifying topics in text), contextual information is not as important compared to a task like sentiment analysis where the stop word "not" can change the sentiment completely. ...