ML + AI Terms

Machine Learning - A subset of AI. Can be trained on a CPU on a relatively small data set.
Deep Learning - A subset of Machine Learning. Needs large data and must be trained on a GPU. Able to learn from mistakes.
Artificial Neural Network (ANN) - A computational model loosely based on biological animal brains, the backbone concept behind most machine learning systems.
Autoencoder - A type of ANN used for unsupervised learning. “Encodes” itself.
Tensor - A “tensor” in the context of machine learning is basically a multi-dimensional array of data represented as integers. Tensors are used to store and manipulate data during the training and inference processes in machine learning. Operations like matrix multiplication, convolution, and pooling are performed on the tensor to extract data and features of the data.
Convolution - A mathematical process used in machine learning to extract features from input data. Convolution involves sliding a filter over the input data (also known as a kernel or weight matrix) resulting in a feature map of the input data. Convolutional neural networks are often used in computer vision, such as image classification.
Back Propagation - An algorithm used to train neural networks. A neural net program is given an input and a desired ouput, the program produces an output, then the output is compared to the desired output. The diff between the produced output and the desired output is referred to as “error”; the error is used to adjust weight of the connections between neurons in the neural net. This process of applying the error/result back to weights of connections between neurons in the neural net is referred to as backpropagation.
Convergence - This is a term used to describe the point at which a model is basically “done training”. As a model trains, during each training iteration the model is evaluated against certain criteria, such as a stopping condition. Stopping condition could be something like number of epochs/iterations, the model’s performance against a set of validations, or output of an “objective function”. Convergence is often associated with iterative optimization algorithms (such as gradient descent), but can also be achieved through means such as linear regression or a technique called “expectation maximization (EM)”. Anyhow, when the models output is stable and no longer changing significantly, or the stopping condition is met otherwise, then training of the model is no longer needed or beneficial. This is convergence.
Text Embedding - This refers to the process of taking a text input and converting it to numerical vectors, i.e., how it’s stored in the neural net.
Label - a “label” refers to an expected target output value of a data point in a data set in the context of supervised learning. Labels are used to guide the model’s learning process during training and evaluate its predictions. A very simple example: in a system where the goal is to classify emails as spam, the labels would be binary values, such as 0 or 1, where 0 represents not spam and 1 represents spam.
inpainting - basically, using machine learning to fill in areas of an image being generated. Using a trained model to fill in damaged or incorrect pixels.
VAE — Variational Autoencoder - This is a machine learning architecture that uses two neural networks (an encoder and a decoder) to map a low-resolution representation of the input data to the underlying structure of the data. The encoder maps the input data to a lower dimensional space, and the decoder maps the latent space back to the input space.
RLHF - Reinforcement Learning from Human Feedback. Basically a training phase of a neural network which requires manual intervention by humans to ensure the quality of the output. For example, to remove bad/harmful outputs.
Alignment - Refers to the general pursuit of making sure that AI systems are not evil or against humans. Also, generally speaking, making sure that models don’t produce toxic or NSFW outputs.
Emergence - As systems grow quantitatively, qualitative behaviors appear. For example, as the number of parameters grows in a LLM, behaviors and capabilities like arithmetic and language understanding basically start to magically appear.

Data Structures

This is not necessarily ML-specific, but having a clear understanding of these structures is critical to ML.

Scalar - a single numberical value, a one-dimensional node.
Vector - a list of ordered scalars, one-dimensional. e.g., [55, 48, 28]
Matrix - a rectangular array of scalar values. e.g., [[24, 248], [12,58473], [[12,24,448], [448, 293, 347]], [[0,1], [12, 5775]]]. Any level of nesting.

Last modified: May 10, 2023