Website | Source



Tokenization Segmenting text into words, punctuations marks etc.
Part-of-speech (POS) Tagging Assigning word types to tokens, like verb or noun.
Dependency Parsing Assigning syntactic dependency labels, describing the relations between individual tokens, like subject or object.
Lemmatization Assigning the base forms of words. For example, the lemma of “was” is “be”, and the lemma of “rats” is “rat”.
Sentence Boundary Detection (SBD) Finding and segmenting individual sentences.
Named Entity Recognition (NER) Labelling named “real-world” objects, like persons, companies or locations.
Entity Linking (EL) Disambiguating textual entities to unique identifiers in a knowledge base.
Similarity Comparing words, text spans and documents and how similar they are to each other.
Text Classification Assigning categories or labels to a whole document, or parts of a document.
Rule-based Matching Finding sequences of tokens based on their texts and linguistic annotations, similar to regular expressions.
Training Updating and improving a statistical model’s predictions.
Serialization Saving objects to files or byte strings.

Tags: library   python   nlp  

Last modified 03 May 2022