FLOWLYTIX AI SOLUTIONS
STUDENT REFERENCE EDITION
Flowlytix AI Bootcamp Series
Basic Concepts of
Artificial Intelligence
& Machine Learning
A complete study booklet covering foundations, machine learning, neural networks, data quality, model architectures, and responsible AI — with recap notes, Q&A, MCQs, and practical exercises for every module.
6
Modules
70+
Q&A / MCQs
1
Final Assessment
Companion booklet to: "Basic Concepts of AI & ML — Full Course" © Flowlytix AI Solutions

About This Booklet

A self-contained companion to the "Basic Concepts of AI & ML" course — built for review, revision, and practice.

Why This Booklet Exists

Slides are great for a live walkthrough, but they compress ideas into short phrases and diagrams. This booklet expands the same six modules from the course into full explanations, worked examples, and practice questions — so you can revisit any concept on your own, at your own pace, without needing the slide deck in front of you.

How This Booklet Is Organized

Each of the six modules follows the same structure, so you always know where to look:

Concept Sections
Full explanations of every idea from the course, with plain-language definitions and real-world examples.
Key Terms
A quick-glance box of the vocabulary introduced in that module.
Check Your Understanding (Q&A)
Short-answer questions with model answers, for active recall.
Multiple-Choice Questions
Exam-style MCQs to test recognition and recall — answer key at the end of the booklet.
Practical / Applied Exercise
A scenario, small calculation, or hands-on style question that mirrors what you'd be asked to do with these concepts in practice — not just define them.

How to Use It

TIP This booklet assumes zero prior background in AI/ML — exactly like the original course. If a term feels unfamiliar, check the Glossary near the end.

Table of Contents

1Foundations of AI
Definitions, history, and the taxonomy of intelligent systems
2Machine Learning
How systems learn from data instead of fixed rules
3Inside Neural Networks
Neurons, weights, activation, and the learning loop
4Data & Model Quality
Splits, overfitting, bias-variance, evaluation metrics
5Architectures & Applications
CNNs, RNNs, Transformers, and where AI shows up today
6Ethics & The Road Ahead
Responsible AI and where the field is going next
Master Cheat Sheet
Every formula and diagram in one place
ZGlossary
A–Z of terms used across all six modules
Final Comprehensive Assessment
25 mixed questions spanning the entire course
KAnswer Key
Answers for every MCQ in the booklet
COURSE SNAPSHOT Prepared for the Flowlytix AI Bootcamp Series · 6 Modules · 32 Slides · ~120 Minutes · 0 Prerequisites Needed
MODULE 1 OF 6

Foundations of AI

What we mean when we say "intelligence," and how the field got here.

1.1  What Is Artificial Intelligence?

Artificial Intelligence (AI) is the field of building machines that perform tasks which normally require human intelligence — perceiving the world, reasoning about it, deciding what to do, and learning from experience.

A common misconception is that AI refers to one specific algorithm. It does not. AI is a goal, not a technique. Rule-based systems, statistical methods, search algorithms, and neural networks all qualify as "AI" as long as they pursue that goal of intelligent behaviour.

Perceive
Take in raw signals — pixels, sound waves, text — and turn them into structured information.
Reason
Combine facts and rules to draw conclusions, plan steps, or make a decision.
Learn
Improve performance from experience (data) rather than being explicitly reprogrammed.
AI in Action Today

Common thread: each system takes messy real-world input and produces a useful decision — without a human hand-coding every possible case.

1.2  A Brief History of AI

AI is often talked about as a "new" technology, but the field is seven decades old, and its progress has never been a straight line — long stretches of hype have repeatedly been followed by disappointment ("AI winters"), and then by genuine breakthroughs that restart momentum.

Year Milestone
1950 Alan Turing asks "Can machines think?" and proposes the Turing Test.
1956 The Dartmouth Workshop coins the term "Artificial Intelligence."
1980s Expert systems boom, then the first "AI winter" as their limits show.
1997 IBM's Deep Blue defeats world chess champion Garry Kasparov.
2012 Deep learning (AlexNet) triggers a leap in image recognition accuracy.
2017 The Transformer architecture is introduced — it now powers modern LLMs.
2022+ ChatGPT and generative AI bring AI into everyday consumer use.
PATTERN TO NOTICE Progress in AI has never been a straight line. Hype cycles are followed by "AI winters," and then real breakthroughs restart momentum — this cycle has repeated at least three times since 1956.

1.3  Types of AI, by Capability

This is a progression — a way of describing how far AI has come, and how far it still has to go.

Type Status Description
Narrow AI (ANI) Exists today Excels at one specific task — image recognition, translation, playing chess. Cannot generalize beyond its training. Every AI system in use today is Narrow AI.
General AI (AGI) Not yet achieved Hypothetical — would match human-level reasoning across any task, transferring knowledge between domains the way people do.
Superintelligence (ASI) Speculative Theoretical — would exceed human intelligence across every domain. Purely speculative, and discussed mainly in AI safety research.

1.4  Types of AI, by Functionality

A second, independent way to classify AI systems: by how much memory and self-awareness they have.

Reactive Machines
No memory of the past — reacts only to the current input. Example: Deep Blue evaluating a chess board.
Limited Memory
Uses recent past data to inform decisions. Example: a self-driving car tracking nearby vehicle speed.
Theory of Mind
Would understand the beliefs, intentions, and emotions of others. Still an active research goal — not deployed.
Self-Aware AI
Would possess consciousness and self-understanding. Entirely theoretical — no working example exists.

1.5  AI vs. Machine Learning vs. Deep Learning

These three terms are often used interchangeably in casual conversation, but they describe nested ideas, not three separate things.

Term What it is
Artificial Intelligence The goal — building machines that act intelligently. Includes hand-coded rule systems too, not just learning-based ones.
Machine Learning One approach to AI — instead of a human hand-coding the rules, the system learns the rules itself from labeled examples.
Deep Learning One technique within ML — uses multi-layer ("deep") neural networks, especially powerful for images, audio, and language.
KEY TAKEAWAY Every deep learning system is machine learning. Every machine learning system is AI. The reverse is not true — not all AI uses machine learning, and not all machine learning uses deep neural networks.
KEY TERMS — MODULE 1
Artificial Intelligence: machines performing tasks that normally need human intelligence.
Narrow AI (ANI): AI specialized for one task; all AI in use today.
AGI: hypothetical human-level general intelligence.
ASI: theoretical superhuman intelligence.
Reactive Machine: AI with no memory of the past.
Machine Learning: systems that learn patterns from data.
Deep Learning: ML using multi-layer neural networks.
Turing Test: a proposed test of whether a machine's behaviour is indistinguishable from a human's.
CHECK YOUR UNDERSTANDING
Q1.Why is it inaccurate to describe AI as "one algorithm"?
A: Because AI is a goal (building machines that behave intelligently), not a single method. Rule-based systems, search, statistics, and neural networks can all count as AI if they pursue that goal.
Q2.What is the key difference between Narrow AI and General AI?
A: Narrow AI excels at one specific task and cannot generalize beyond its training; General AI (still hypothetical) would apply human-level reasoning across any task and transfer knowledge between domains.
Q3.How are AI, Machine Learning, and Deep Learning related to one another?
A: They are nested, not separate. Deep Learning is a technique inside Machine Learning, and Machine Learning is one approach to achieving AI. Every DL system is ML, and every ML system is AI — but not the reverse.
Q4.Why did the field experience "AI winters"?
A: Periods of high hype and investment were followed by disappointment when the technology of the time hit real limits (e.g., expert systems in the 1980s) — funding and enthusiasm dried up until the next genuine breakthrough revived the field.
MULTIPLE-CHOICE QUESTIONS
1.Which of the following best defines Artificial Intelligence?
  • A. A single fixed algorithm used in all smart devices
  • B. The field of building machines that perform tasks normally requiring human intelligence
  • C. Any computer program that runs faster than a human
  • D. A type of database used for storing large files
2.The Dartmouth Workshop of 1956 is significant because it:
  • A. Built the first neural network
  • B. Coined the term "Artificial Intelligence"
  • C. Introduced the Transformer architecture
  • D. Defeated a chess champion
3.Which type of AI is currently deployed in every real-world AI product?
  • A. Artificial General Intelligence (AGI)
  • B. Artificial Superintelligence (ASI)
  • C. Narrow AI (ANI)
  • D. Theory of Mind AI
4.A self-driving car that tracks nearby vehicle speeds over the last few seconds is an example of:
  • A. A Reactive Machine
  • B. Limited Memory AI
  • C. Theory of Mind AI
  • D. Self-Aware AI
5.Which statement correctly describes the relationship between AI, ML, and DL?
  • A. They are three completely unrelated fields
  • B. ML is a subset of DL, which is a subset of AI
  • C. AI is the broadest goal; ML is one approach to it; DL is one technique within ML
  • D. DL is broader than AI
6.IBM's Deep Blue defeating Garry Kasparov in 1997 is an example of:
  • A. Deep learning outperforming humans at image recognition
  • B. A Narrow AI system mastering one specific task (chess)
  • C. The first working example of AGI
  • D. A Transformer-based model
7."Theory of Mind" AI, as a category, refers to a system that would be able to:
  • A. React only to its current input with no memory
  • B. Store the last few seconds of sensor data
  • C. Understand the beliefs, intentions, and emotions of others
  • D. Run only on quantum hardware
8.Superintelligence (ASI) is best described as:
  • A. Already deployed in most smartphones
  • B. A purely theoretical AI that would exceed human intelligence in every domain
  • C. A synonym for Narrow AI
  • D. The AI used in 1980s expert systems
PRACTICAL / APPLIED EXERCISE

Scenario: You are shown four systems. For each, decide whether it is (a) Reactive Machine, (b) Limited Memory, or (c) Theory of Mind / Self-Aware — and briefly justify your answer in one sentence.

  1. A thermostat that turns on the heater only when the current room temperature drops below a set point.
  2. A fraud-detection system that flags a transaction as suspicious by comparing it to the user's spending pattern over the last 30 days.
  3. A hypothetical customer-support AI that infers a customer is frustrated (not just angry-sounding) and adapts its tone because it models what the customer is likely thinking.

Model answers are provided in the Answer Key at the end of this booklet.

MODULE 2 OF 6

Machine Learning

How a system learns rules from examples instead of being told the rules.

2.1  What Is Machine Learning?

The core shift behind Machine Learning is simple to state but powerful in consequence: instead of a person writing explicit "if this, then that" logic, the system is shown examples of inputs paired with correct outputs, and it infers the underlying rule for itself.

Traditional Programming Machine Learning
Data + Rules → Output. A human writes explicit logic; the program applies it. Data + Output → Rules. The system is shown inputs and correct outputs, and infers the rule.
FORMAL DEFINITION (TOM MITCHELL, 1997) "A computer program is said to learn from experience E with respect to task T and performance measure P, if its performance at T, measured by P, improves with experience E."
Worked Example — Spam Filter

T (Task) = classify emails as spam / not-spam  ·  E (Experience) = a labeled set of past emails  ·  P (Performance) = % classified correctly

2.2  The Machine Learning Pipeline

Every ML project — no matter the domain — moves through the same six stages:

# Stage What Happens
1 Collect Data Gather raw examples relevant to the task.
2 Prepare Data Clean, label, and split into train / test sets.
3 Choose Features Select the signals the model should look at.
4 Train Model The model adjusts itself to fit the training data.
5 Evaluate Test on unseen data to measure real performance.
6 Deploy & Monitor Ship the model, and watch for drift over time.
IT'S ITERATIVE, NOT LINEAR Poor evaluation results usually send you back to feature selection or data collection. Real ML work loops through these stages many times before a model is ready to ship.

2.3  Three Ways Machines Learn

Every ML technique falls into one of three families, distinguished by what kind of feedback the system receives during training.

Paradigm How It Learns Examples
Supervised Learning Learns from labeled examples — input paired with the correct answer. Email spam detection, house price prediction
Unsupervised Learning Finds hidden structure in unlabeled data — no "correct answer" given. Customer segmentation, anomaly detection
Reinforcement Learning Learns by trial and error, receiving rewards or penalties for actions. Game-playing agents, robotics, resource routing

Supervised Learning, In Detail

Supervised learning splits into two problem types, based on what you are predicting:

Regression
Predicts a continuous numeric value. Example: predicting a house's sale price from its size, location, and age.
Classification
Predicts a discrete category or class — binary (Spam vs. Not Spam) or multi-class (Cat vs. Dog vs. Bird).

Unsupervised Learning, In Detail

There is no label to predict — the algorithm finds structure in the data on its own.

Clustering
Groups similar data points together based on shared characteristics, with no predefined categories. Example: grouping customers by purchase behaviour for targeted marketing.
Dimensionality Reduction
Compresses many features into fewer, information-dense ones. Example: compressing hundreds of product attributes into 2 dimensions to plot and explore visually (e.g., PCA).

Reinforcement Learning, In Detail

An agent learns by acting in an environment, observing the outcome, and adjusting — much like training a pet with rewards.

Term Meaning
State What the agent currently observes about its situation.
Action A choice the agent can make from its current state.
Reward Feedback signal — positive for good outcomes, negative for bad.
Policy The strategy the agent learns for picking actions.
Example An agent learning to play a video game tries moves, gets points (reward) for good ones, and gradually learns a winning strategy (policy).
KEY TERMS — MODULE 2
Machine Learning: systems that learn rules from data rather than being told them.
Feature: an input signal the model uses to make predictions.
Label: the correct answer provided in supervised learning.
Regression: predicting a continuous numeric value.
Classification: predicting a discrete category.
Clustering: grouping unlabeled data by similarity.
Dimensionality Reduction: compressing many features into fewer ones.
Policy: the strategy an RL agent learns for choosing actions.
CHECK YOUR UNDERSTANDING
Q1.In Tom Mitchell's formal definition of learning, what do T, E, and P stand for?
A: T = Task (what the program is trying to do), E = Experience (the data it learns from), and P = Performance measure (how you score success). A program "learns" if its performance at T, as measured by P, improves with more experience E.
Q2.Why is the ML pipeline described as iterative rather than linear?
A: Because a poor result at the Evaluate stage typically means you need to go back and choose different features, or collect more/better data — teams cycle through the six stages multiple times before a model is production-ready.
Q3.What distinguishes supervised, unsupervised, and reinforcement learning from one another?
A: Supervised learning uses labeled input-output pairs; unsupervised learning finds structure in unlabeled data with no "correct answer" given; reinforcement learning learns from trial-and-error feedback (rewards/penalties) rather than fixed labels.
Q4.Give one real-world example each of regression and classification.
A: Regression — predicting a house's sale price (a continuous number). Classification — deciding whether an email is spam or not-spam (a discrete category).
MULTIPLE-CHOICE QUESTIONS
1.In traditional programming vs. machine learning, what is different about the "rules"?
  • A. Traditional programming has no rules at all
  • B. In ML, the system infers the rules from data instead of a human writing them
  • C. ML always produces slower rules than traditional programming
  • D. There is no difference
2.Predicting a house's exact sale price in dollars is an example of:
  • A. Classification
  • B. Clustering
  • C. Regression
  • D. Reinforcement learning
3.Grouping customers into segments with no predefined categories is an example of:
  • A. Supervised classification
  • B. Regression
  • C. Reinforcement learning
  • D. Clustering (unsupervised learning)
4.In reinforcement learning, the "policy" refers to:
  • A. The dataset used for training
  • B. The strategy the agent learns for choosing actions
  • C. The final test accuracy of the model
  • D. A rule written by a human programmer
5.Which ML pipeline stage comes immediately after "Train Model"?
  • A. Collect Data
  • B. Choose Features
  • C. Evaluate
  • D. Deploy & Monitor
6.A "label," in the context of supervised learning, is:
  • A. An input feature used to make predictions
  • B. The correct answer paired with an input example
  • C. A synonym for "reward" in reinforcement learning
  • D. The name of the dataset file
7.Compressing hundreds of product attributes down to 2 dimensions for visualization (e.g., PCA) is an example of:
  • A. Classification
  • B. Reinforcement learning
  • C. Dimensionality reduction
  • D. Regression
PRACTICAL / APPLIED EXERCISE

Scenario: A retail company wants to build three different ML solutions. For each one below, name (a) the learning paradigm (supervised / unsupervised / reinforcement) and (b) whether it's regression, classification, clustering, or none of these.

  1. Predicting next month's total revenue in rupees based on the last 24 months of sales.
  2. Grouping the store's 50,000 customers into "types" of shoppers with no predefined labels, purely from their purchase history.
  3. Training a warehouse robot to pick the fastest route between shelves by trying different paths and being "rewarded" for shorter delivery times.

Model answers are provided in the Answer Key at the end of this booklet.

MODULE 3 OF 6

What Neural Networks Are Made Of

Zooming into the building blocks that power modern deep learning.

3.1  Anatomy of a Neural Network

A neural network is built from layers of connected nodes ("neurons") that transform input into output, one layer at a time.

Layer Role
Input Layer Raw features go in — pixels, words, numbers.
Hidden Layer 1 Detects simple patterns.
Hidden Layer 2 Combines those patterns into higher-level concepts.
Output Layer Produces the final prediction.
WHY "DEEP" LEARNING? "Deep" simply means many hidden layers stacked between the input and output. Each additional layer learns increasingly abstract features — early layers might detect edges, later ones detect shapes, and the deepest ones detect entire objects.

3.2  What's Inside a Single Neuron?

Every network — no matter how large — is built from this one basic unit, repeated millions of times.

The Neuron's Math

z = (w₁x₁ + w₂x₂ + w₃x₃ + ... + b)  →  output = activation(z)

3.3  Activation Functions

Activation functions are small formulas that decide whether — and how strongly — a neuron "fires."

Function Typically Used For
Sigmoid Output layer for binary classification (squashes values into a 0–1 range).
ReLU The most common choice in hidden layers — fast to compute and avoids vanishing gradients.
Tanh Hidden layers when a zero-centered output helps training.

3.4  How a Neural Network Learns

Training is a four-step loop, repeated thousands (sometimes billions) of times over the course of a training run:

# Step What Happens
1 Forward Pass Input flows through the network layer by layer, producing a prediction.
2 Compute Loss Compare the prediction to the true answer — a loss function scores how wrong it was.
3 Backpropagation The error is sent backward through the network to find each weight's contribution to the mistake.
4 Gradient Descent Every weight is nudged slightly in the direction that reduces the loss.
VOCABULARY One full pass through this loop for a batch of data is called a training step. One pass through the entire dataset is an epoch.

3.5  Gradient Descent, Visualized

Think of the loss (error) as a landscape, with weight values along one axis and error along the other. Training is like walking downhill toward the lowest point — each training step is one small step down the slope.

The learning rate controls how big each downhill step is:

Learning Rate Effect
Too high Overshoots the minimum — loss bounces around or even diverges.
Just right Steadily converges to the minimum in a reasonable amount of time.
Too low Crawls toward the minimum — training takes far too long.
KEY TERMS — MODULE 3
Weight: a learned value controlling how much an input matters.
Bias: a learned constant that shifts a neuron's output.
Activation Function: adds non-linearity to a neuron's output.
Loss Function: measures how wrong a prediction is.
Backpropagation: sends error backward to update weights.
Gradient Descent: the algorithm that nudges weights to reduce loss.
Learning Rate: how big each gradient descent step is.
Epoch: one full pass through the entire training dataset.
CHECK YOUR UNDERSTANDING
Q1.What role does the activation function play inside a neuron?
A: It introduces non-linearity into the neuron's output. Without it, stacking layers would just be repeated linear operations, and the network could not learn complex, non-linear patterns.
Q2.What is the difference between a training step and an epoch?
A: A training step is one forward-and-backward pass for a single batch of data. An epoch is one complete pass through the entire training dataset — which usually consists of many training steps.
Q3.Explain, in your own words, what backpropagation does.
A: After the loss is computed, backpropagation works backward through the network to calculate how much each individual weight contributed to the error, so gradient descent knows which direction to adjust each weight.
Q4.Why can a learning rate that is too high cause training to fail?
A: Because each update to the weights overshoots the minimum of the loss landscape — instead of steadily descending, the loss can bounce around or even increase (diverge) instead of converging.
MULTIPLE-CHOICE QUESTIONS
1.In the formula z = (w₁x₁ + w₂x₂ + ... + b), what does "b" represent?
  • A. The batch size
  • B. The bias — a constant that shifts the result
  • C. The backpropagation rate
  • D. The number of hidden layers
2.Which activation function is most commonly used in hidden layers because it is fast and avoids vanishing gradients?
  • A. Sigmoid
  • B. ReLU
  • C. Softmax
  • D. Linear
3.What is the correct order of the neural network training loop?
  • A. Gradient Descent → Loss → Forward Pass → Backpropagation
  • B. Forward Pass → Compute Loss → Backpropagation → Gradient Descent
  • C. Backpropagation → Forward Pass → Gradient Descent → Loss
  • D. Compute Loss → Gradient Descent → Forward Pass → Backpropagation
4.A learning rate that is too low will most likely cause:
  • A. The loss to diverge wildly
  • B. Training to take far too long, crawling toward the minimum
  • C. Instant convergence in one step
  • D. No effect on training speed
5."Deep" in deep learning refers to:
  • A. The size of the dataset
  • B. How long training takes
  • C. Many hidden layers stacked between input and output
  • D. The number of output classes
6.Which layer of a neural network detects the simplest patterns, closest to the raw input?
  • A. Output Layer
  • B. Hidden Layer 1 (first hidden layer)
  • C. The final hidden layer
  • D. None — all layers detect the same level of pattern
PRACTICAL / APPLIED EXERCISE

Calculation: A neuron receives two inputs, x₁ = 2 and x₂ = 3, with weights w₁ = 0.5 and w₂ = -1, and a bias b = 1.

  1. Calculate z = (w₁x₁ + w₂x₂ + b).
  2. If the activation function is ReLU (which outputs max(0, z)), what is the neuron's final output?
  3. Would the output change if a Sigmoid activation were used instead? Explain, in one sentence, what range Sigmoid would squash the result into.

Model answers are provided in the Answer Key at the end of this booklet.

MODULE 4 OF 6

Data & Model Quality

Why the data you feed a model matters as much as the model itself.

4.1  Data: The Fuel of Every Model

Term Meaning
Feature An input variable the model uses — e.g., square footage, age, pixel value.
Label The correct answer for supervised learning — e.g., the actual sale price.
Sample / Row One complete example: a full set of features (plus a label, if supervised).

Splitting the Dataset

Train — 70%
Fits the model.
Validation — 15%
Tunes it.
Test — 15%
Gives the final, honest score.
WHY NOT TRAIN ON EVERYTHING? If a model is evaluated on the same data it trained on, it can simply memorize the answers — giving a misleadingly perfect score. Holding out unseen test data simulates the real world: how will the model perform on examples it has never encountered?
Analogy Training = studying practice questions. Validation = a mock exam to check your prep. Test = the real, unseen final exam.

4.2  Overfitting vs. Underfitting

These are two opposite ways a model can fail to generalize to new data — with a sweet spot in between.

Failure Mode Description Symptom
Underfitting Too simple — misses the underlying pattern entirely. High error on both training and test data.
Good Fit Captures the true trend without memorizing noise. Low error on training and test data.
Overfitting Memorizes noise instead of the pattern. Near-zero training error, but high test error.

4.3  The Bias-Variance Tradeoff

Underfitting and overfitting are really two ends of the same dial — model complexity.

High Bias (→ Underfitting)
The model makes strong, oversimplified assumptions — it systematically misses the true pattern, no matter how much data it sees.
High Variance (→ Overfitting)
The model is overly sensitive to the specific training data, including its noise. Small data changes swing predictions wildly.

4.4  Evaluating a Model's Performance

The confusion matrix is the source for nearly every classification metric. Example: a medical test predicting disease, where "Positive" = predicted disease present.

Predicted Positive Predicted Negative
Actual Positive True Positive (TP) False Negative (FN)
Actual Negative False Positive (FP) True Negative (TN)
Metric Formula Question It Answers
Accuracy (TP + TN) / Total Overall, how often is the model correct?
Precision TP / (TP + FP) Of predicted positives, how many were right?
Recall TP / (TP + FN) Of actual positives, how many did we catch?
F1 Score Harmonic mean of Precision & Recall One balanced number for both.
A TRADEOFF, NOT A FREE LUNCH Precision vs. Recall is a tradeoff — optimizing one often costs the other. In disease screening, you may accept lower precision (more false alarms) to get higher recall (catching every real case). The right balance depends on the problem.
KEY TERMS — MODULE 4
Feature: an input variable used by the model.
Label: the correct answer for a training example.
Overfitting: memorizing training data instead of learning the pattern.
Underfitting: a model too simple to capture the pattern.
Bias (statistical): systematic error from oversimplified assumptions.
Variance (statistical): sensitivity to fluctuations in training data.
Confusion Matrix: a table of TP, FP, TN, FN counts.
F1 Score: the harmonic mean of precision and recall.
CHECK YOUR UNDERSTANDING
Q1.Why do we split data into train, validation, and test sets instead of using it all for training?
A: Training fits the model; validation tunes choices like hyperparameters; the test set — never touched during training or tuning — gives an honest estimate of how the model will perform on genuinely unseen data.
Q2.How can you tell, from training and test error alone, whether a model is overfitting?
A: Overfitting shows near-zero training error but noticeably high test error — the gap between the two is the giveaway, since the model has memorized the training set rather than learning a pattern that generalizes.
Q3.In a medical screening context, would you prioritize precision or recall — and why?
A: Usually recall, because missing an actual positive case (a false negative) can be far more costly than a false alarm — you would rather flag some healthy patients for a follow-up test than miss someone who is actually sick.
Q4.What is the relationship between bias/variance and underfitting/overfitting?
A: High bias corresponds to underfitting (oversimplified, systematically wrong assumptions), while high variance corresponds to overfitting (overly sensitive to training data's specific noise). They are two ends of the same model-complexity dial.
MULTIPLE-CHOICE QUESTIONS
1.Which data split gives the "final, honest score" of a model?
  • A. Training set
  • B. Validation set
  • C. Test set
  • D. The full original dataset
2.Near-zero training error but high test error is the classic signature of:
  • A. Underfitting
  • B. Overfitting
  • C. A perfectly balanced model
  • D. Low bias and low variance
3.Precision is calculated as:
  • A. TP / (TP + FN)
  • B. TP / (TP + FP)
  • C. (TP + TN) / Total
  • D. FP / (FP + TN)
4.High bias is most closely associated with:
  • A. Overfitting
  • B. Underfitting
  • C. A perfect model
  • D. The test set only
5.The F1 score is best described as:
  • A. The sum of precision and recall
  • B. The harmonic mean of precision and recall, balancing both
  • C. Another name for accuracy
  • D. The number of false positives
6.A "Sample" or "Row" in a dataset refers to:
  • A. A single feature value
  • B. One complete example — a full set of features (plus label, if supervised)
  • C. The entire training set
  • D. The model's final prediction
PRACTICAL / APPLIED EXERCISE

Scenario: A spam classifier is tested on 100 emails. The confusion matrix results are: TP = 40, FP = 10, FN = 5, TN = 45.

  1. Calculate the model's Accuracy.
  2. Calculate the model's Precision.
  3. Calculate the model's Recall.
  4. Based on these numbers, is the model more likely to let spam through, or to wrongly flag a real email as spam? Justify your answer using the numbers above.

Model answers are provided in the Answer Key at the end of this booklet.

MODULE 5 OF 6

Architectures & Applications

The major model families in use today, and where they show up in the real world.

5.1  Key Architectures at a Glance

Different data shapes call for different network designs. The three families below cover the vast majority of production AI systems today.

Architecture Full Name How It Works Used For
CNN Convolutional Neural Network Scans images in small patches to detect edges, then shapes, then objects — layer by layer. Image recognition, medical scans, self-driving perception
RNN Recurrent Neural Network Processes sequences step by step, carrying a "memory" of what came before. Time series, older speech & text models
Transformer Attention-Based Architecture Looks at an entire sequence at once, weighing how much each part should "attend to" every other part. LLMs (GPT, Claude), translation, modern NLP
WHY THIS MATTERS Choosing the right architecture for the shape of your data is one of the most consequential decisions in an ML project — a CNN built for images will not work well on sequential text data, and vice versa.

5.2  Where AI & ML Show Up Today

The same core techniques from Modules 2–4 are applied across very different industries:

Industry Applications
Healthcare Diagnostic imaging, drug discovery, patient risk scoring
Finance Fraud detection, credit scoring, algorithmic trading
Retail Recommendation engines, demand forecasting, pricing
Automotive Autonomous driving, predictive maintenance
Customer Service Chatbots, sentiment analysis, ticket routing
Voice & Audio Speech-to-text, voice assistants, transcription
Computer Vision Facial recognition, quality inspection, satellite imagery
Education Personalized learning paths, automated grading
KEY TERMS — MODULE 5
CNN: a network specialized for grid-like data such as images.
RNN: a network that processes sequences step by step with memory.
Transformer: an attention-based architecture that processes a whole sequence at once.
Attention: a mechanism for weighing how much each part of the input matters to every other part.
LLM: Large Language Model — a Transformer trained on huge amounts of text.
CHECK YOUR UNDERSTANDING
Q1.Why is a CNN a natural fit for image data specifically?
A: Because CNNs scan the input in small local patches to detect edges first, then shapes, then whole objects — exploiting the grid-like, spatially local structure of pixels layer by layer.
Q2.What core idea distinguishes a Transformer from an RNN when processing a sequence?
A: An RNN processes a sequence step by step, carrying forward a memory of prior steps. A Transformer instead looks at the entire sequence at once, using attention to weigh how much every part should attend to every other part — without processing strictly in order.
Q3.Name two industries and one specific ML application in each.
A: Example: Healthcare — diagnostic imaging; Finance — fraud detection. (Any two industry/application pairs from the table are acceptable.)
MULTIPLE-CHOICE QUESTIONS
1.Which architecture underlies modern LLMs such as GPT and Claude?
  • A. CNN
  • B. RNN
  • C. Transformer
  • D. Decision Tree
2.An architecture that carries a "memory" of what came before while processing a sequence step by step is a:
  • A. CNN
  • B. RNN
  • C. Transformer
  • D. None of the above
3.CNNs are especially well-suited to:
  • A. Time-series forecasting only
  • B. Image recognition and computer vision tasks
  • C. Reinforcement learning exclusively
  • D. Tabular spreadsheet data only
4.Fraud detection and algorithmic trading are common AI applications in which industry?
  • A. Healthcare
  • B. Finance
  • C. Education
  • D. Automotive
5.The "attention" mechanism in a Transformer is used to:
  • A. Randomly drop out neurons during training
  • B. Weigh how much each part of a sequence should relate to every other part
  • C. Compress an image into fewer pixels
  • D. Split data into train/test sets
PRACTICAL / APPLIED EXERCISE

Scenario: Match each real-world project below to the single best-fit architecture (CNN, RNN, or Transformer), and justify your choice in one sentence.

  1. Detecting tumors in X-ray images.
  2. Powering a customer-support chatbot that must understand a full paragraph of context before replying.
  3. A legacy speech-to-text system built before Transformers became mainstream, processing audio one time-step at a time.

Model answers are provided in the Answer Key at the end of this booklet.

MODULE 6 OF 6

Ethics & Responsible AI

The responsibility that comes with building systems that make decisions at scale.

Powerful systems trained on human data inherit human problems — and can add new ones of their own. As AI moves from research labs into hiring, lending, healthcare, and law enforcement, four themes come up again and again:

Bias & Fairness
Models trained on historical data can learn and amplify existing societal biases (e.g., in hiring or lending decisions).
Privacy
Large datasets often contain personal information; models can sometimes memorize and leak training data.
Transparency
Complex models (especially deep learning) can be "black boxes" — hard to explain why a particular decision was made.
Accountability
When an AI system causes harm, who is responsible — the developer, the deployer, or the data provider?
WHY THIS ISN'T OPTIONAL A model can be statistically accurate and still be unfair, unsafe, or non-compliant with regulation. Responsible AI practice treats fairness, privacy, transparency, and accountability as first-class requirements — not afterthoughts bolted on after deployment.

Course Recap — What We Covered

NEXT STEP Pick one small project — a classifier, a chatbot, a prediction model — and build it end-to-end using the six-stage pipeline from Module 2.
KEY TERMS — MODULE 6
Bias (ethical): systematic unfairness in a model's outputs toward a group.
Fairness: ensuring a model's outcomes do not unjustly disadvantage groups.
Privacy: protecting personal data used in training and inference.
Transparency / Explainability: the ability to explain why a model made a decision.
Accountability: clarity on who is responsible when an AI system causes harm.
CHECK YOUR UNDERSTANDING
Q1.How can a model trained on historical data end up perpetuating bias, even without anyone intending it to?
A: If the historical data reflects past discriminatory patterns (e.g., in hiring or lending), the model learns those patterns as "normal," and reproduces — or even amplifies — them in its future predictions.
Q2.Why are deep learning models often called "black boxes," and why does that matter?
A: Because their decisions emerge from millions of learned weights across many layers, making it hard for even the model's own creators to explain why a specific decision was made. This matters for accountability, trust, and regulatory compliance in high-stakes settings like healthcare or lending.
Q3.Who might be considered responsible when an AI system causes real-world harm?
A: There is no single universal answer — responsibility could fall on the developer who built the model, the organization that deployed it, or the party that supplied the (possibly biased) training data. This is an active area of policy and legal debate.
MULTIPLE-CHOICE QUESTIONS
1.A hiring model that unintentionally favors one demographic over another because of patterns in past hiring data is primarily an issue of:
  • A. Overfitting
  • B. Bias & Fairness
  • C. Learning rate tuning
  • D. Activation function choice
2.A model that can be shown to sometimes reproduce snippets of its training data is raising a concern about:
  • A. Privacy
  • B. Gradient descent
  • C. Regression
  • D. Clustering
3."Black box" is a term used to describe models that are:
  • A. Extremely fast to train
  • B. Hard to explain — it's unclear why they made a given decision
  • C. Always more accurate than simpler models
  • D. Only used in reinforcement learning
4.Which of these is NOT one of the four ethical themes highlighted in this module?
  • A. Bias & Fairness
  • B. Privacy
  • C. Transparency
  • D. Learning Rate
PRACTICAL / APPLIED EXERCISE

Case Study: A bank deploys a credit-scoring model. After launch, an audit finds the model approves loans for one neighborhood at a much lower rate than another, even after controlling for income.

  1. Which ethical theme from this module does this scenario primarily illustrate?
  2. Name one step the bank could take, using ideas from this course, to investigate whether the model is genuinely biased.
  3. Why might "the model is just following the data" not be an acceptable explanation to regulators or affected customers?

Model answers are provided in the Answer Key at the end of this booklet.

QUICK REVISION

Master Cheat Sheet

Every core formula and framework from the course, on one page.

The Neuron

Weighted sum z = (w₁x₁ + w₂x₂ + ... + wₙxₙ + b)
Neuron output output = activation(z)

Classification Metrics (from the Confusion Matrix)

Metric Formula
Accuracy (TP + TN) / Total
Precision TP / (TP + FP)
Recall TP / (TP + FN)
F1 Score Harmonic mean of Precision and Recall

Dataset Split (Typical Ratio)

Train 70% Fits the model
Validation 15% Tunes the model
Test 15% Final honest score

The Neural Network Training Loop

Forward Pass → Compute Loss → Backpropagation → Gradient Descent → (repeat)

The Machine Learning Pipeline

Collect Data → Prepare Data → Choose Features → Train Model → Evaluate → Deploy & Monitor

Three Learning Paradigms

Supervised Labeled data → Regression / Classification
Unsupervised Unlabeled data → Clustering / Dimensionality Reduction
Reinforcement Reward signal → Policy learning

Architecture Cheat Sheet

CNN Images, spatial data
RNN Sequences, time series (with memory)
Transformer Sequences (attention-based, whole context at once) — powers LLMs

Bias-Variance, In One Line

High Bias = Underfitting (too simple)  |  High Variance = Overfitting (too sensitive to training data)

A–Z REFERENCE

Glossary

Every key term used across all six modules, in one alphabetical list.

Accuracy — (TP+TN)/Total; the overall proportion of correct predictions.
Activation Function — a formula (e.g., ReLU, Sigmoid, Tanh) that adds non-linearity to a neuron's output.
AGI (Artificial General Intelligence) — hypothetical AI matching human-level reasoning across any task.
AI (Artificial Intelligence) — the field of building machines that perform tasks normally requiring human intelligence.
ASI (Artificial Superintelligence) — theoretical AI exceeding human intelligence in every domain.
Backpropagation — the process of sending prediction error backward through a network to compute each weight's contribution to the mistake.
Bias (model parameter) — a learned constant added to a neuron's weighted sum.
Bias (statistical/ethical) — systematic error, either from oversimplified model assumptions or unfair treatment of a group.
Classification — predicting a discrete category or class.
Clustering — grouping unlabeled data points by shared characteristics.
CNN (Convolutional Neural Network) — a network architecture specialized for image and grid-like data.
Confusion Matrix — a table of True/False Positive/Negative counts used to derive classification metrics.
Deep Learning — machine learning using neural networks with many hidden layers.
Dimensionality Reduction — compressing many features into fewer, information-dense ones.
Epoch — one complete pass through the entire training dataset.
F1 Score — the harmonic mean of precision and recall.
Feature — an input variable used by a model to make predictions.
Gradient Descent — the algorithm that adjusts weights step-by-step to reduce loss.
Label — the correct answer paired with an input in supervised learning.
Learning Rate — a value controlling how large each gradient descent step is.
Loss Function — a function scoring how wrong a model's prediction was.
Machine Learning (ML) — systems that learn patterns and rules from data rather than being explicitly programmed.
Narrow AI (ANI) — AI specialized for one task; the only type of AI that exists in deployment today.
Overfitting — a model that memorizes training data (including noise) rather than learning the general pattern.
Policy — the strategy a reinforcement learning agent learns for choosing actions.
Precision — TP/(TP+FP); of predicted positives, how many were actually correct.
Recall — TP/(TP+FN); of actual positives, how many the model correctly caught.
Regression — predicting a continuous numeric value.
Reinforcement Learning — learning by trial and error via rewards and penalties.
RNN (Recurrent Neural Network) — a network that processes sequences step by step, retaining memory of prior steps.
Supervised Learning — learning from labeled input-output pairs.
Transformer — an attention-based architecture that processes an entire sequence at once; powers modern LLMs.
Turing Test — a proposed test of whether a machine's behaviour is indistinguishable from a human's.
Underfitting — a model too simple to capture the true underlying pattern in the data.
Unsupervised Learning — learning structure from unlabeled data, with no correct answer given.
Weight — a learned value controlling how much an input contributes to a neuron's output.
FINAL ASSESSMENT

Final Comprehensive Assessment

25 mixed questions spanning all six modules. Attempt every question before checking the Answer Key.

1.AI is best described as:
  • A. A single algorithm
  • B. A goal pursued by many different techniques
  • C. Only neural networks
  • D. A database technology
2.Which term coined at the 1956 Dartmouth Workshop is still used today?
  • A. Machine Learning
  • B. Artificial Intelligence
  • C. Deep Learning
  • D. Neural Network
3.All AI systems deployed in the real world today are examples of:
  • A. AGI
  • B. ASI
  • C. Narrow AI
  • D. Self-Aware AI
4.Which statement about AI, ML, and DL is correct?
  • A. AI ⊂ ML ⊂ DL
  • B. DL ⊂ ML ⊂ AI
  • C. They are unrelated fields
  • D. ML ⊂ AI, but DL is unrelated to either
5.In Tom Mitchell's formal definition of learning, "P" stands for:
  • A. Parameters
  • B. Performance measure
  • C. Pipeline
  • D. Policy
6.Which ML pipeline stage involves cleaning and splitting data into train/test sets?
  • A. Collect Data
  • B. Prepare Data
  • C. Deploy & Monitor
  • D. Evaluate
7.Predicting whether a tumor is malignant or benign is an example of:
  • A. Regression
  • B. Binary classification
  • C. Clustering
  • D. Dimensionality reduction
8.A recommendation engine that groups shoppers with similar behaviour, without any predefined categories, is using:
  • A. Supervised classification
  • B. Reinforcement learning
  • C. Unsupervised clustering
  • D. Backpropagation
9.An RL agent's "reward" signal is:
  • A. The training dataset
  • B. Feedback that is positive for good outcomes and negative for bad ones
  • C. A synonym for "label"
  • D. The learning rate
10.Weights and bias inside a neuron are combined, before activation, as:
  • A. A product of all inputs
  • B. A weighted sum plus a bias term
  • C. The average of all previous outputs
  • D. The loss function directly
11.Which activation function is most associated with avoiding vanishing gradients in hidden layers?
  • A. Sigmoid
  • B. ReLU
  • C. Step function
  • D. Identity function
12.The correct order of the neural network training loop is:
  • A. Loss → Gradient Descent → Forward Pass → Backprop
  • B. Forward Pass → Loss → Backprop → Gradient Descent
  • C. Backprop → Loss → Gradient Descent → Forward Pass
  • D. Gradient Descent → Forward Pass → Loss → Backprop
13.A learning rate set far too high typically causes:
  • A. Instant, perfect convergence
  • B. The loss to overshoot the minimum and bounce or diverge
  • C. No change in training behaviour
  • D. Underfitting only
14.One full pass through the entire training dataset is called:
  • A. A batch
  • B. A training step
  • C. An epoch
  • D. A gradient
15.Which data split is used only for the final, honest performance score?
  • A. Training set
  • B. Validation set
  • C. Test set
  • D. Deployment logs
16.A model with near-zero training error but high test error is:
  • A. Underfitting
  • B. Overfitting
  • C. Perfectly generalizing
  • D. Using too high a bias
17.High bias in a model corresponds to:
  • A. Overfitting
  • B. Underfitting
  • C. Perfect accuracy
  • D. High recall always
18.Recall is calculated as:
  • A. TP / (TP + FP)
  • B. TP / (TP + FN)
  • C. (TP + TN) / Total
  • D. FN / (FN + TP)
19.A model scanning small patches of an image to detect edges, then shapes, then objects is a:
  • A. CNN
  • B. RNN
  • C. Transformer
  • D. Decision tree
20.Transformers process a sequence by:
  • A. Reading it strictly one token at a time with no memory
  • B. Looking at the entire sequence at once, using attention
  • C. Ignoring word order completely
  • D. Converting it into an image first
21.Autonomous driving and predictive maintenance are common AI applications in:
  • A. Education
  • B. Automotive
  • C. Retail
  • D. Customer Service
22.A hiring algorithm that systematically disadvantages a demographic group due to historical data patterns is primarily an issue of:
  • A. Overfitting
  • B. Bias & Fairness
  • C. Learning rate
  • D. Dimensionality reduction
23.A model that is difficult to explain, even by its own creators, is referred to as a:
  • A. Reactive Machine
  • B. Black box
  • C. Confusion matrix
  • D. Feature vector
24.Scenario: A retail company wants to forecast next quarter's exact revenue in dollars using 3 years of historical sales figures. Which combination best fits this task?
  • A. Unsupervised clustering
  • B. Supervised regression
  • C. Reinforcement learning
  • D. Supervised classification
25.Scenario: A spam filter achieves 98% accuracy on the training set but only 62% accuracy on new, unseen emails. What is most likely happening, and what data-split concept would have caught this earlier?
  • A. Underfitting; more training data was needed
  • B. Overfitting; a held-out validation/test set would have revealed the gap earlier
  • C. The model is using reinforcement learning
  • D. This is expected behaviour and not a concern
SCORING GUIDE 22–25 correct: Excellent — ready for hands-on projects. 17–21: Solid — revisit the modules for missed questions. Below 17: Re-read the relevant modules and retry the module-level MCQs before returning here.
ANSWER KEY

Answer Key — MCQs

Answers for every multiple-choice question in this booklet, by module.

Module 1 — Foundations of AI

1 — B
2 — B
3 — C
4 — B
5 — C
6 — B
7 — C
8 — B

Module 2 — Machine Learning

1 — B
2 — C
3 — D
4 — B
5 — C
6 — B
7 — C

Module 3 — Inside Neural Networks

1 — B
2 — B
3 — B
4 — B
5 — C
6 — B

Module 4 — Data & Model Quality

1 — C
2 — B
3 — B
4 — B
5 — B
6 — B

Module 5 — Architectures & Applications

1 — C
2 — B
3 — B
4 — B
5 — B

Module 6 — Ethics & The Road Ahead

1 — B
2 — A
3 — B
4 — D

Final Comprehensive Assessment

1 — B
2 — B
3 — C
4 — B
5 — B
6 — B
7 — B
8 — C
9 — B
10 — B
11 — B
12 — B
13 — B
14 — C
15 — C
16 — B
17 — B
18 — B
19 — A
20 — B
21 — B
22 — B
23 — B
24 — B
25 — B

Answer Key — Practical Exercises

Module 1 — AI Functionality Classification

1. Thermostat → Reactive Machine (reacts to current temperature only, no memory). 2. Fraud detection using 30-day history → Limited Memory (uses recent past data). 3. Support AI modeling customer's mental state → Theory of Mind (hypothetical — infers beliefs/intentions, not yet deployed in practice).

Module 2 — Matching Business Problems to Paradigms

1. Revenue forecasting → Supervised / Regression (predicting a continuous number from historical data). 2. Customer segmentation with no labels → Unsupervised / Clustering. 3. Warehouse robot learning from trial-and-error rewards → Reinforcement Learning (no fixed regression/classification category — it learns a policy).

Module 3 — Neuron Calculation

z = (0.5 × 2) + (-1 × 3) + 1 = 1 − 3 + 1 = -1. With ReLU, output = max(0, -1) = 0 (the neuron does not fire). With Sigmoid instead, the same z = -1 would be squashed into the 0–1 range (specifically, a value below 0.5, since z is negative) rather than clamped to exactly 0 — Sigmoid never outputs a hard zero.

Module 4 — Confusion Matrix Calculation

TP=40, FP=10, FN=5, TN=45, Total=100.
Accuracy = (40+45)/100 = 0.85 (85%).
Precision = 40/(40+10) = 0.80 (80%).
Recall = 40/(40+5) = 0.89 (≈89%).
With Recall (89%) higher than Precision (80%), and FN (5) lower than FP (10), the model is more likely to wrongly flag a real email as spam (more false positives than false negatives) than to let spam through undetected.

Module 5 — Architecture Matching

1. Tumor detection in X-rays → CNN (image data, spatial patterns). 2. Chatbot understanding a full paragraph of context → Transformer (needs whole-sequence attention, powers modern conversational AI). 3. Legacy step-by-step speech-to-text system → RNN (sequential processing with memory, pre-dates widespread Transformer adoption).

Module 6 — Credit-Scoring Case Study

1. This scenario primarily illustrates Bias & Fairness (a disparate outcome across groups even after controlling for income). 2. The bank could run a structured fairness audit — e.g., comparing approval rates and model errors (using confusion-matrix-style metrics) across demographic/neighborhood groups, and testing whether removing/adjusting proxy variables changes the disparity. 3. "The model is just following the data" is not sufficient because the training data itself can encode historical discrimination — a model that faithfully reproduces biased patterns is still causing unfair, and potentially unlawful, harm; accountability rests with the organization deploying the model, not just the data.

Flowlytix AI Solutions

Building practical, human-centered AI education — from first principles to real-world systems. This booklet is a companion to our "Basic Concepts of AI & ML" course, part of the Flowlytix AI Bootcamp Series.

Thank you for learning with us. Keep building.
© Flowlytix AI Solutions — Student Reference Edition