Artificial Intelligence

History & Timeline of AI

The entries below walk through the major eras of AI in chronological order. Each era lists the defining ideas, representative systems, and the application scenarios they unlocked. Later sections of this document (roadmap, terminology, CNN, NLP, LLM Engineering, MLOps, Responsible AI) expand on these ideas in practice, and nothing from the original notes has been removed.

timeline title Major Eras of AI 1943-1956 : Symbolic seeds & the birth of AI 1956-1974 : Early AI & the first golden age 1974-1980 : First AI winter 1980-1987 : Expert systems boom 1987-1993 : Second AI winter 1993-2011 : Statistical ML & data-driven AI 2012-2016 : Deep learning revolution 2017-2019 : Transformer & pre-training era 2020-2022 : Foundation models & scaling laws 2022-2024 : ChatGPT, RAG, agents, multimodal 2025-2026 : Reasoning models, on-device AI, agentic systems

1943-1956 — Symbolic seeds and the birth of AI

The field grew out of logic, cybernetics, and neuroscience before it had a name. Two threads ran in parallel: symbolic reasoning over rules, and simplified mathematical models of the neuron.

Application scenarios (mostly aspirational at the time): theorem proving, checkers and chess playing programs, early natural language parsers, and pattern recognition for printed characters.

1956-1974 — Early AI and the first golden age

Symbolic AI dominated. Programs manipulated symbols and rules, and early optimism suggested human-level AI was a decade away.

Application scenarios: theorem proving, symbolic math (MACSYMA), toy natural-language interfaces, early game playing, and the first expert-system prototypes such as DENDRAL for chemical structure identification.

1974-1980 — The first AI winter

Overpromising met the reality of combinatorial explosion, limited compute, and the collapse of perceptron hype after Minsky & Papert's 1969 book Perceptrons pointed out XOR-style limitations of single-layer networks. DARPA and UK funding dried up, and neural network research went largely dormant.

Even so, important groundwork continued:

Application scenarios of what survived: narrow rule-based systems for scheduling, configuration, and medical triage in research settings.

1980-1987 — Expert systems boom

The commercial breakthrough came from rule-based expert systems that encoded the knowledge of human specialists as if-then rules backed by inference engines.

Application scenarios: medical diagnosis, industrial configuration, fault diagnosis, financial credit scoring, and rule-based process automation — the direct ancestors of today's Decision/BPM systems.

1987-1993 — The second AI winter

Specialized Lisp machines could not compete with general-purpose workstations from Sun and Apollo. Expert systems proved brittle: expensive to maintain, weak at common sense, and unable to learn from data. Funding contracted again, and neural networks were still too compute-hungry for most tasks.

Under the surface, the building blocks of modern AI were being laid:

Application scenarios that paid the bills: OCR for postal and banking workflows, speech recognition research, and statistical NLP.

1993-2011 — Statistical ML and data-driven AI

With the web, cheap storage, and growing CPU power, AI shifted from hand-coded rules to learning from data. This is the era that still supplies most production tabular models today.

Application scenarios: search ranking, email spam filtering, recommender systems, credit-risk scoring, fraud detection, statistical machine translation, and the first commercially successful speech assistants. This material maps directly onto Classical Machine Learning and Big Data in this repo.

2012-2016 — The deep learning revolution

GPUs, large labeled datasets, and ReLU-style architectures unlocked deep learning at scale.

Application scenarios: image classification and detection (covered later in CNN, Object Detection, R-CNN), face recognition, speech recognition (DeepSpeech), neural machine translation, style transfer, and the first wave of production self-driving perception stacks.

2017-2019 — The Transformer and the pre-training era

Attention replaced recurrence, and pre-training on massive unlabeled text became the new default.

Application scenarios: high-quality neural MT, search ranking upgrades (BERT in Google Search, 2019), semantic similarity and retrieval, sentiment and intent classification, question answering over Wikipedia, code completion prototypes, and the first production embeddings for recommendation. This era is the foundation of everything under ai/transformers/ and ai/llm/.

2020-2022 — Foundation models and scaling laws

Models grew by orders of magnitude, and emergent capabilities — in-context learning, chain-of-thought reasoning, tool use — appeared only at scale.

Application scenarios: code assistants, AI-assisted writing, semantic search with dense embeddings, text-to-image creative tooling, voice clones, and the first serious production use of LLMs behind API keys.

2022-2024 — ChatGPT, RAG, agents, and multimodal

The launch of ChatGPT in November 2022 turned LLMs into a mass-market product and reframed AI as a general-purpose interface.

Application scenarios: conversational assistants, enterprise RAG over private documents, AI coding (Cursor, Copilot, Claude Code), customer support triage, document extraction, contract review, content generation, text-to-speech, speech-to-text at near-human quality, and AI-native search.

2025-2026 — Reasoning models, agentic systems, and on-device AI

The current frontier pushes three directions at once: explicit reasoning at inference time, autonomous agents, and shrinking models that run on the edge.

Application scenarios: autonomous coding agents that plan across repos, end-to-end customer support that can act on APIs, AI analysts for finance and healthcare triage (under human review), on-device assistants for privacy-sensitive data, AI-generated video for marketing and prototyping, and the first production "AI employees" handling narrow back-office workflows.

For a hands-on path through modern models and serving, see llm/index.md, llm/fine-tuning.md, and transformers/index.md.

Concepts → application scenarios cheat sheet

A quick way to map the ideas above to where they show up in production today:

Concept / era Core idea Today's application scenarios
Expert systems (1980s) Hand-coded if-then rules + inference engine Compliance checks, decision tables, BPM, triage rule engines
Classical ML (1990s-2010s) Learn functions from tabular data Credit scoring, fraud detection, churn, demand forecasting, A/B analysis
CNNs (2012+) Local filters + hierarchy for grid data Image classification, defect inspection, medical imaging, face/OCR, AV perception
RNN / LSTM (2014+) Recurrence over sequences Legacy speech-to-text, time-series forecasting, early MT (still useful for small-footprint devices)
Word embeddings (2013+) Words as dense vectors Semantic search, recommendation, clustering, de-duplication
Transformers (2017+) Self-attention over sequences Translation, summarization, classification, embeddings, code models
Pre-training + fine-tuning (2018+) Train once on huge corpus, adapt cheaply Task-specific NLP, domain adapters, LoRA for private data
Diffusion models (2022+) Iterative denoising in latent space Text-to-image, text-to-video, image editing, design assistants
LLMs + prompting (2020+) Few-shot, chain-of-thought, tool calling General assistants, writing, coding, data extraction
RAG (2023+) Retrieve → ground → generate Enterprise Q&A, support copilots, legal/medical assistants, site search
Agents (2023+) Plan + tool use + memory Coding agents, browsing agents, workflow automation, research assistants
Reasoning models (2024+) Inference-time compute + RL on verifiable tasks Hard math/code, theorem aided proofs, complex planning, autonomous debugging
On-device / edge AI (2024+) Quantized small models Private chat, offline voice, industrial edge, IoT copilots (see Jetson Nano)

The rest of this page (roadmap, terminology, CNN, NLP, LLM Engineering, MLOps, Responsible AI) elaborates on each of these rows and links to concrete code and tools.

Learning Roadmap

A practical 6-stage path, distilled from 2026 guidance by Coursera, Dataquest, Fast.ai and DeepLearning.AI. Build intuition, not mastery, at each stage — and ship small projects throughout instead of waiting until the end.

flowchart LR Math["Stage 0
Math Foundations"] --> Python["Stage 1
Python & Tooling"] Python --> ClassicML["Stage 2
Classical ML"] ClassicML --> DL["Stage 3
Deep Learning"] DL --> NLP["Stage 4
NLP & Transformers"] NLP --> LLM["Stage 5
LLM & AI Engineering"] DL --> Specialized["Specialized tracks
CV / Audio / RL"]

Math Foundations

Three pillars cover ~80% of what you need to read papers and implement models.

Curated resources:

Python & ML Tooling

Ship experiments, not just notebooks. Pick one framework and go deep.

Classical Machine Learning

Still the right tool for tabular data, small datasets, and most business problems. Don't skip it.

Curated resources:

Terminology

Neurons

Name Description
Vanilla Basic unit computing weighted sum + activation; used in FFNNs and CNNS
LSTM Advanced neuron with memory and gates for long-term dependencies in sequences

Layers

Name Description
Fully Connected Standard layer where each neuron connects to all inputs
Recurrent Maintains memory across timesteps; used in RNNs, LSTMs
Convolutional Extracts spatial features using filters; used in image data
Attention Computes weighted importance of different inputs; key in Transformers
Pooling Downsamples spatial data; used in CNNs to reduce size and noise
Normalization Stabilizes training by normalizing activations; includes BatchNorm, LayerNorm, GroupNorm
Dropout Randomly deactivates neurons during training to prevent overfitting

Activations

Name Description
ReLU max(0, x); fast, widely used in deep networks
Sigmoid S-shaped, output in (0, 1); used in binary classification
Tanh Like sigmoid but centered at 0; output in (-1, 1)
Softmax Outputs a probability distribution; used in final layer of multi-class classification

Networks

Name Description
Feedforward (FFNN) Basic architecture with no loops; used for static input-output tasks
RNN Handles sequential data using recurrence; remembers previous inputs
CNN Uses convolutions to process grid-like data such as images
Transformer Uses self-attention to model sequences without recurrence; state-of-the-art in NLP & beyond

Gradient Descent

Gradient Descent

Optimizer

Variations of gradient desents

Hyperparameters

Values that guide the training process

Reinforcement Learning

Updates parameters to maximize rewards

Variants of REINFORCE to improve stability and efficiency

Core families to know:

Curated resources:

Dataset

Machine Learning (ML)

ML on embedded devices

ML pipeline

The canonical end-to-end flow. Each stage has its own failure modes — most production issues live at the boundaries.

  1. Data ingestion — pull from warehouses, APIs, logs, labelers.
  2. Cleaning & validation — handle missing values, outliers, schema checks (e.g. Great Expectations, Pandera).
  3. Feature engineering — encoding, scaling, aggregations; track with a feature store (Feast).
  4. Training — model selection, hyperparameter tuning, reproducible runs.
  5. Evaluation — offline metrics, slice-based analysis, calibration, fairness checks.
  6. Deployment — batch, online service, or edge; versioned artifacts.
  7. Monitoring — data/concept drift, performance decay, alerting, feedback loops.

See the ### MLOps & Deployment section below for tooling.

Feature Selection and Extraction

Neural Networks and Training

Speech and Language Processing

Model Evaluation, Underfitting, and Overfitting

Anomaly Detection

Convolutional Neural Networks(CNN)

NLP & Sequence Models

The path from bag-of-words to the Transformer is the shortest route to understanding modern LLMs.

Curated resources:

Sample Rate and Bit Depth

Mel Frequency Cepstral Coefficient(MFCC)

AI Accelerator

Image Classification and Neural Networks

CNN Visualizations and Data Augmentation

Transfer Learning

Object Detection

R-CNN

Advanced Image Processing

LLM Engineering

Shipping applications with pre-trained LLMs is a distinct discipline from training them. Detailed coverage lives in ai/llm/index.md and ai/llm/fine-tuning.md; the short version:

Curated resources:

MLOps & Deployment

Getting a model to production is usually harder than training it.

Curated resources:

Responsible AI

Ship models that work for everyone and that you can explain after the fact.

Curated resources:

Neural network architecture example

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, InputLayer, Dropout, Conv1D, Conv2D, Flatten, Reshape, MaxPooling1D, MaxPooling2D, AveragePooling2D, BatchNormalization, Permute, ReLU, Softmax
from tensorflow.keras.optimizers.legacy import Adam

EPOCHS = args.epochs or 100
LEARNING_RATE = args.learning_rate or 0.005
# If True, non-deterministic functions (e.g. shuffling batches) are not used.
# This is False by default.
ENSURE_DETERMINISM = args.ensure_determinism
# this controls the batch size, or you can manipulate the tf.data.Dataset objects yourself
BATCH_SIZE = args.batch_size or 32
if not ENSURE_DETERMINISM:
    train_dataset = train_dataset.shuffle(buffer_size=BATCH_SIZE*4)
train_dataset=train_dataset.batch(BATCH_SIZE, drop_remainder=False)
validation_dataset = validation_dataset.batch(BATCH_SIZE, drop_remainder=False)

# model architecture
model = Sequential()
model.add(Reshape((int(input_length / 13), 13), input_shape=(input_length, )))
model.add(Conv1D(8, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2, strides=2, padding='same'))
model.add(Dropout(0.25))
model.add(Conv1D(16, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2, strides=2, padding='same'))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(classes, name='y_pred', activation='softmax'))

# this controls the learning rate
opt = Adam(learning_rate=LEARNING_RATE, beta_1=0.9, beta_2=0.999)
callbacks.append(BatchLoggerCallback(BATCH_SIZE, train_sample_count, epochs=EPOCHS, ensure_determinism=ENSURE_DETERMINISM))

# train the neural network
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
model.fit(train_dataset, epochs=EPOCHS, validation_data=validation_dataset, verbose=2, callbacks=callbacks)

# Use this flag to disable per-channel quantization for a model.
# This can reduce RAM usage for convolutional models, but may have
# an impact on accuracy.
disable_per_channel_quantization = False

  1. big_data
  2. chatgpt
  3. data_analysis
  4. elasticsearch
  5. llm
  6. transformers

Page Source