stem/Speech/NLP/NLP.md
Andy Pack abbd7bba68 vault backup: 2023-12-22 16:39:03
Affected files:
.obsidian/community-plugins.json
.obsidian/graph.json
.obsidian/plugins/calendar/data.json
.obsidian/plugins/calendar/main.js
.obsidian/plugins/calendar/manifest.json
.obsidian/plugins/dataview/main.js
.obsidian/plugins/dataview/manifest.json
.obsidian/plugins/dataview/styles.css
.obsidian/workspace.json
Events/Cardiff.md
Events/November 27th Week.md
Events/🪣🪣🪣.md
Food/From Aldi.md
Food/Meal Plans/Meals - 2023-06-18.md
Food/Meal Plans/Meals - 2023-06-24.md
Food/Meal Plans/Meals - 2023-07-30.md
Food/Meal Plans/Meals - 2023-08-06.md
Food/Meal Plans/Meals - 2023-08-13.md
Food/Meal Plans/Meals - 2023-08-20.md
Food/Meal Plans/Meals - 2023-08-27.md
Food/Meal Plans/Meals - 2023-09-03.md
Food/Meal Plans/Meals - 2023-09-10.md
Food/Meal Plans/Meals - 2023-09-17.md
Food/Meal Plans/Meals - 2023-09-25.md
Food/Meal Plans/Meals - 2023-10-02.md
Food/Meal Plans/Meals - 2023-10-14.md
Food/Meal Plans/Meals - 2023-10-22.md
Food/Meal Plans/Meals - 2023-10-30.md
Food/Meal Plans/Meals - 2023-11-05.md
Food/Meal Plans/Meals - 2023-11-14.md
Food/Meal Plans/Meals - 2023-11-20.md
Food/Meal Plans/Meals - 2023-12-03.md
Food/Meal Plans/Meals - 2023-12-11.md
Food/Meal Plans/Meals - 2023-12-16.md
Food/Meals.md
Food/Sauces.md
Lab/DNS.md
Lab/Deleted Packages.md
Lab/Domains.md
Lab/Ebook Laundering.md
Lab/Home.md
Lab/Linux/Alpine.md
Lab/Linux/KDE.md
Lab/Photo Migration.md
Lab/VPN Servers.md
Languages/Arabic.md
Languages/Spanish/Spanish.md
Languages/Spanish/Tenses.md
Languages/Spanish/Verbs.md
Money/Me/Accounts.md
Money/Me/Car.md
Money/Me/Home.md
Money/Me/Income.md
Money/Me/Monthly/23-04.md
Money/Me/Monthly/23-05.md
Money/Me/Monthly/23-06.md
Money/Me/Monthly/23-07.md
Money/Me/Monthly/23-08.md
Money/Me/Monthly/23-09.md
Money/Me/Monthly/23-10.md
Money/Me/Monthly/23-11.md
Money/Me/Monthly/23-12.md
Money/Me/Subs.md
STEM/AI/Classification/Classification.md
STEM/AI/Classification/Decision Trees.md
STEM/AI/Classification/Gradient Boosting Machine.md
STEM/AI/Classification/Logistic Regression.md
STEM/AI/Classification/Random Forest.md
STEM/AI/Classification/Supervised/SVM.md
STEM/AI/Classification/Supervised/Supervised.md
STEM/AI/Ethics.md
STEM/AI/Kalman Filter.md
STEM/AI/Learning.md
STEM/AI/Literature.md
STEM/AI/Neural Networks/Activation Functions.md
STEM/AI/Neural Networks/Architectures.md
STEM/AI/Neural Networks/CNN/CNN.md
STEM/AI/Neural Networks/CNN/Convolutional Layer.md
STEM/AI/Neural Networks/CNN/Examples.md
STEM/AI/Neural Networks/CNN/FCN/FCN.md
STEM/AI/Neural Networks/CNN/FCN/FlowNet.md
STEM/AI/Neural Networks/CNN/FCN/Highway Networks.md
STEM/AI/Neural Networks/CNN/FCN/ResNet.md
STEM/AI/Neural Networks/CNN/FCN/Skip Connections.md
STEM/AI/Neural Networks/CNN/FCN/Super-Resolution.md
STEM/AI/Neural Networks/CNN/GAN/CycleGAN.md
STEM/AI/Neural Networks/CNN/GAN/DC-GAN.md
STEM/AI/Neural Networks/CNN/GAN/GAN.md
STEM/AI/Neural Networks/CNN/GAN/StackGAN.md
STEM/AI/Neural Networks/CNN/GAN/cGAN.md
STEM/AI/Neural Networks/CNN/Inception Layer.md
STEM/AI/Neural Networks/CNN/Interpretation.md
STEM/AI/Neural Networks/CNN/Max Pooling.md
STEM/AI/Neural Networks/CNN/Normalisation.md
STEM/AI/Neural Networks/CNN/UpConv.md
STEM/AI/Neural Networks/CV/Data Manipulations.md
STEM/AI/Neural Networks/CV/Datasets.md
STEM/AI/Neural Networks/CV/Filters.md
STEM/AI/Neural Networks/CV/Layer Structure.md
STEM/AI/Neural Networks/CV/Visual Search/Visual Search.md
STEM/AI/Neural Networks/Deep Learning.md
STEM/AI/Neural Networks/Learning/Boltzmann.md
STEM/AI/Neural Networks/Learning/Competitive Learning.md
STEM/AI/Neural Networks/Learning/Credit-Assignment Problem.md
STEM/AI/Neural Networks/Learning/Hebbian.md
STEM/AI/Neural Networks/Learning/Learning.md
STEM/AI/Neural Networks/Learning/Tasks.md
STEM/AI/Neural Networks/MLP/Back-Propagation.md
STEM/AI/Neural Networks/MLP/Decision Boundary.md
STEM/AI/Neural Networks/MLP/MLP.md
STEM/AI/Neural Networks/Neural Networks.md
STEM/AI/Neural Networks/Properties+Capabilities.md
STEM/AI/Neural Networks/RNN/Autoencoder.md
STEM/AI/Neural Networks/RNN/Deep Image Prior.md
STEM/AI/Neural Networks/RNN/LSTM.md
STEM/AI/Neural Networks/RNN/MoCo.md
STEM/AI/Neural Networks/RNN/RNN.md
STEM/AI/Neural Networks/RNN/Representation Learning.md
STEM/AI/Neural Networks/RNN/SimCLR.md
STEM/AI/Neural Networks/RNN/VQA.md
STEM/AI/Neural Networks/SLP/Least Mean Square.md
STEM/AI/Neural Networks/SLP/Perceptron Convergence.md
STEM/AI/Neural Networks/SLP/SLP.md
STEM/AI/Neural Networks/Training.md
STEM/AI/Neural Networks/Transformers/Attention.md
STEM/AI/Neural Networks/Transformers/LLM.md
STEM/AI/Neural Networks/Transformers/Transformers.md
STEM/AI/Neural Networks/Weight Init.md
STEM/AI/Pattern Matching/Dynamic Time Warping.md
STEM/AI/Pattern Matching/Markov/Markov.md
STEM/AI/Pattern Matching/Pattern Matching.md
STEM/AI/Problem Solving.md
STEM/AI/Properties.md
STEM/AI/Searching/Informed.md
STEM/AI/Searching/Searching.md
STEM/AI/Searching/Uninformed.md
STEM/CS/ABI.md
STEM/CS/Calling Conventions.md
STEM/CS/ISA.md
STEM/CS/Languages/Assembly.md
STEM/CS/Languages/Javascript.md
STEM/CS/Languages/Python.md
STEM/CS/Languages/Rust.md
STEM/CS/Quantum.md
STEM/CS/Resources.md
STEM/IOT/Networking/Networking.md
STEM/Light.md
STEM/Quantum/Confinement.md
STEM/Quantum/Orbitals.md
STEM/Quantum/Schrödinger.md
STEM/Quantum/Standard Model.md
STEM/Quantum/Wave Function.md
STEM/Speech/Linguistics/Consonants.md
STEM/Speech/Linguistics/Language Structure.md
STEM/Speech/Linguistics/Linguistics.md
STEM/Speech/Linguistics/Terms.md
STEM/Speech/Linguistics/Vowels.md
STEM/Speech/Literature.md
STEM/Speech/NLP/NLP.md
STEM/Speech/NLP/Recognition.md
STEM/Speech/Speech Processing/Applications.md
Work/Possible Tasks.md
Work/Tech.md
2023-12-22 16:39:03 +00:00

32 lines
672 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
tags:
- ai
---
# Text Normalisation
- Tokenisation
- Labelling parts of sentence
- Usually words
- Can be multiple
- Proper nouns
- New York
- Emoticons
- Hashtags
- May need some named entity recognition
- Penn Treebank standard
- Byte-pair encoding
- Standard cant understand unseen words
- Encode as subwords
- -est, -er
- Lemmatisation
- Determining roots of words
- Verb infinitives
- Find lemma
- Derived forms are inflections or inflected
- Word-forms
- Critical for morphological complex languages
- Arabic
- Stemming
- Simpler than lemmatisation
- Just removing suffixes
- Normalising word formats
- Segmenting sentences