NLP
Jump to navigation
Jump to search
Contents
Vocabulary
- Token
- Bag of words
- Stemming and lemmatization
Resources
- Stanford CS224n: Natural Language Processing with Deep Learning (Winter 2019)
- NLTK book
- Jurafsky book
Parts of speech tagging
- POS, word classes, syntactic categories: nouns, verbs, adjectives, conjunctions
- Categorizing and Tagging Words (NLTK book chapter 5)
- Rachiele G. 2018 Medium article
- Three POS algorithms
- hidden markov model (HMM) - generative
- maximum entropy markov model - discriminative
- Neural language model - uses RNNs
nltk syntax
tokens = nltk.word_tokenize()
is a more robust.split()
nltk.pos_tag( tokens )
NLTK POS tagset (Penn Treebank)
- CC coordinating conjunction
- CD cardinal digit
- DT determiner
- EX existential there (like: “there is” … think of it like “there exists”)
- FW foreign word
- IN preposition/subordinating conjunction
- JJ adjective ‘big’
- JJR adjective, comparative ‘bigger’
- JJS adjective, superlative ‘biggest’
- LS list marker 1)
- MD modal could, will
- NN noun, singular ‘desk’
- NNS noun plural ‘desks’
- NNP proper noun, singular ‘Harrison’
- NNPS proper noun, plural ‘Americans’
- PDT predeterminer ‘all the kids’
- POS possessive ending parent’s
- PRP personal pronoun I, he, she
- PRP$ possessive pronoun my, his, hers
- RB adverb very, silently,
- RBR adverb, comparative better
- RBS adverb, superlative best
- RP particle: give up turn the paper "over"
- TO, to go ‘to’ the store.
- UH interjection, errrrrrrrm
- VB verb, base form take
- phrasal verb - "turn down" "rule out" "find out" "go on"
- VBD verb, past tense took
- VBG verb, gerund/present participle taking
- VBN verb, past participle taken
- VBP verb, sing. present, non-3d take
- VBZ verb, 3rd person sing. present takes
- WDT wh-determiner which
- WP wh-pronoun who, what
- WP$ possessive wh-pronoun whose
- WRB wh-abverb where, when