FLOWLYTIX AI SOLUTIONS

STUDENT REFERENCE EDITION

Flowlytix AI Bootcamp Series

Basic Concepts of
Artificial Intelligence
& Machine Learning

A complete study booklet covering foundations, machine learning, neural networks, data quality, model architectures, and responsible AI — with recap notes, Q&A, MCQs, and practical exercises for every module.

6

Modules

70+

Q&A / MCQs

1

Final Assessment

Companion booklet to: "Basic Concepts of AI & ML — Full Course" © Flowlytix AI Solutions

About This Booklet

A self-contained companion to the "Basic Concepts of AI & ML" course — built for review, revision, and practice.

Why This Booklet Exists

Slides are great for a live walkthrough, but they compress ideas into short phrases and diagrams. This booklet expands the same six modules from the course into full explanations, worked examples, and practice questions — so you can revisit any concept on your own, at your own pace, without needing the slide deck in front of you.

How This Booklet Is Organized

Each of the six modules follows the same structure, so you always know where to look:

Concept Sections

Full explanations of every idea from the course, with plain-language definitions and real-world examples.

Key Terms

A quick-glance box of the vocabulary introduced in that module.

Check Your Understanding (Q&A)

Short-answer questions with model answers, for active recall.

Multiple-Choice Questions

Exam-style MCQs to test recognition and recall — answer key at the end of the booklet.

Practical / Applied Exercise

A scenario, small calculation, or hands-on style question that mirrors what you'd be asked to do with these concepts in practice — not just define them.

How to Use It

Read a module's concept sections first, ideally alongside the original slides.
Cover the answers and attempt the Q&A and MCQs from memory before checking the key.
Attempt every practical exercise on paper — these mirror how the concepts are actually tested and applied.
Use the Master Cheat Sheet and Glossary near the end for quick revision before an exam or interview.
Finish with the Final Comprehensive Assessment, which mixes questions from all six modules.

TIP This booklet assumes zero prior background in AI/ML — exactly like the original course. If a term feels unfamiliar, check the Glossary near the end.

Foundations of AI

What we mean when we say "intelligence," and how the field got here.

1.1 What Is Artificial Intelligence?

Artificial Intelligence (AI) is the field of building machines that perform tasks which normally require human intelligence — perceiving the world, reasoning about it, deciding what to do, and learning from experience.

A common misconception is that AI refers to one specific algorithm. It does not. AI is a goal, not a technique. Rule-based systems, statistical methods, search algorithms, and neural networks all qualify as "AI" as long as they pursue that goal of intelligent behaviour.

Perceive

Take in raw signals — pixels, sound waves, text — and turn them into structured information.

Reason

Combine facts and rules to draw conclusions, plan steps, or make a decision.

Learn

Improve performance from experience (data) rather than being explicitly reprogrammed.

AI in Action Today

A chatbot answering a support question in natural language
A phone camera recognizing faces to unlock the screen
A driver-assist system detecting lanes and obstacles
A bank flagging an unusual transaction as possible fraud

Common thread: each system takes messy real-world input and produces a useful decision — without a human hand-coding every possible case.

1.2 A Brief History of AI

AI is often talked about as a "new" technology, but the field is seven decades old, and its progress has never been a straight line — long stretches of hype have repeatedly been followed by disappointment ("AI winters"), and then by genuine breakthroughs that restart momentum.

Year	Milestone
1950	Alan Turing asks "Can machines think?" and proposes the Turing Test.
1956	The Dartmouth Workshop coins the term "Artificial Intelligence."
1980s	Expert systems boom, then the first "AI winter" as their limits show.
1997	IBM's Deep Blue defeats world chess champion Garry Kasparov.
2012	Deep learning (AlexNet) triggers a leap in image recognition accuracy.
2017	The Transformer architecture is introduced — it now powers modern LLMs.
2022+	ChatGPT and generative AI bring AI into everyday consumer use.

PATTERN TO NOTICE Progress in AI has never been a straight line. Hype cycles are followed by "AI winters," and then real breakthroughs restart momentum — this cycle has repeated at least three times since 1956.

1.3 Types of AI, by Capability

This is a progression — a way of describing how far AI has come, and how far it still has to go.

Type	Status	Description
Narrow AI (ANI)	Exists today	Excels at one specific task — image recognition, translation, playing chess. Cannot generalize beyond its training. Every AI system in use today is Narrow AI.
General AI (AGI)	Not yet achieved	Hypothetical — would match human-level reasoning across any task, transferring knowledge between domains the way people do.
Superintelligence (ASI)	Speculative	Theoretical — would exceed human intelligence across every domain. Purely speculative, and discussed mainly in AI safety research.

1.4 Types of AI, by Functionality

A second, independent way to classify AI systems: by how much memory and self-awareness they have.

Reactive Machines

No memory of the past — reacts only to the current input. Example: Deep Blue evaluating a chess board.

Limited Memory

Uses recent past data to inform decisions. Example: a self-driving car tracking nearby vehicle speed.

Theory of Mind

Would understand the beliefs, intentions, and emotions of others. Still an active research goal — not deployed.

Self-Aware AI

Would possess consciousness and self-understanding. Entirely theoretical — no working example exists.

1.5 AI vs. Machine Learning vs. Deep Learning

These three terms are often used interchangeably in casual conversation, but they describe nested ideas, not three separate things.

Term	What it is
Artificial Intelligence	The goal — building machines that act intelligently. Includes hand-coded rule systems too, not just learning-based ones.
Machine Learning	One approach to AI — instead of a human hand-coding the rules, the system learns the rules itself from labeled examples.
Deep Learning	One technique within ML — uses multi-layer ("deep") neural networks, especially powerful for images, audio, and language.

KEY TAKEAWAY Every deep learning system is machine learning. Every machine learning system is AI. The reverse is not true — not all AI uses machine learning, and not all machine learning uses deep neural networks.

KEY TERMS — MODULE 1

Artificial Intelligence: machines performing tasks that normally need human intelligence.

Narrow AI (ANI): AI specialized for one task; all AI in use today.

AGI: hypothetical human-level general intelligence.

ASI: theoretical superhuman intelligence.

Reactive Machine: AI with no memory of the past.

Machine Learning: systems that learn patterns from data.

Deep Learning: ML using multi-layer neural networks.

Turing Test: a proposed test of whether a machine's behaviour is indistinguishable from a human's.

CHECK YOUR UNDERSTANDING

Q1.Why is it inaccurate to describe AI as "one algorithm"?

A: Because AI is a goal (building machines that behave intelligently), not a single method. Rule-based systems, search, statistics, and neural networks can all count as AI if they pursue that goal.

Q2.What is the key difference between Narrow AI and General AI?

A: Narrow AI excels at one specific task and cannot generalize beyond its training; General AI (still hypothetical) would apply human-level reasoning across any task and transfer knowledge between domains.

Q3.How are AI, Machine Learning, and Deep Learning related to one another?

A: They are nested, not separate. Deep Learning is a technique inside Machine Learning, and Machine Learning is one approach to achieving AI. Every DL system is ML, and every ML system is AI — but not the reverse.

Q4.Why did the field experience "AI winters"?

A: Periods of high hype and investment were followed by disappointment when the technology of the time hit real limits (e.g., expert systems in the 1980s) — funding and enthusiasm dried up until the next genuine breakthrough revived the field.

MULTIPLE-CHOICE QUESTIONS

1.Which of the following best defines Artificial Intelligence?

A. A single fixed algorithm used in all smart devices
B. The field of building machines that perform tasks normally requiring human intelligence
C. Any computer program that runs faster than a human
D. A type of database used for storing large files

2.The Dartmouth Workshop of 1956 is significant because it:

A. Built the first neural network
B. Coined the term "Artificial Intelligence"
C. Introduced the Transformer architecture
D. Defeated a chess champion

3.Which type of AI is currently deployed in every real-world AI product?

A. Artificial General Intelligence (AGI)
B. Artificial Superintelligence (ASI)
C. Narrow AI (ANI)
D. Theory of Mind AI

4.A self-driving car that tracks nearby vehicle speeds over the last few seconds is an example of:

A. A Reactive Machine
B. Limited Memory AI
C. Theory of Mind AI
D. Self-Aware AI

5.Which statement correctly describes the relationship between AI, ML, and DL?

A. They are three completely unrelated fields
B. ML is a subset of DL, which is a subset of AI
C. AI is the broadest goal; ML is one approach to it; DL is one technique within ML
D. DL is broader than AI

6.IBM's Deep Blue defeating Garry Kasparov in 1997 is an example of:

A. Deep learning outperforming humans at image recognition
B. A Narrow AI system mastering one specific task (chess)
C. The first working example of AGI
D. A Transformer-based model

7."Theory of Mind" AI, as a category, refers to a system that would be able to:

A. React only to its current input with no memory
B. Store the last few seconds of sensor data
C. Understand the beliefs, intentions, and emotions of others
D. Run only on quantum hardware

8.Superintelligence (ASI) is best described as:

A. Already deployed in most smartphones
B. A purely theoretical AI that would exceed human intelligence in every domain
C. A synonym for Narrow AI
D. The AI used in 1980s expert systems

PRACTICAL / APPLIED EXERCISE

Scenario: You are shown four systems. For each, decide whether it is (a) Reactive Machine, (b) Limited Memory, or (c) Theory of Mind / Self-Aware — and briefly justify your answer in one sentence.

A thermostat that turns on the heater only when the current room temperature drops below a set point.
A fraud-detection system that flags a transaction as suspicious by comparing it to the user's spending pattern over the last 30 days.
A hypothetical customer-support AI that infers a customer is frustrated (not just angry-sounding) and adapts its tone because it models what the customer is likely thinking.

Model answers are provided in the Answer Key at the end of this booklet.

MODULE 2 OF 6

Machine Learning

How a system learns rules from examples instead of being told the rules.

2.1 What Is Machine Learning?

The core shift behind Machine Learning is simple to state but powerful in consequence: instead of a person writing explicit "if this, then that" logic, the system is shown examples of inputs paired with correct outputs, and it infers the underlying rule for itself.

Traditional Programming	Machine Learning
Data + Rules → Output. A human writes explicit logic; the program applies it.	Data + Output → Rules. The system is shown inputs and correct outputs, and infers the rule.

FORMAL DEFINITION (TOM MITCHELL, 1997) "A computer program is said to learn from experience E with respect to task T and performance measure P, if its performance at T, measured by P, improves with experience E."

Worked Example — Spam Filter

T (Task) = classify emails as spam / not-spam · E (Experience) = a labeled set of past emails · P (Performance) = % classified correctly

2.2 The Machine Learning Pipeline

Every ML project — no matter the domain — moves through the same six stages:

#	Stage	What Happens
1	Collect Data	Gather raw examples relevant to the task.
2	Prepare Data	Clean, label, and split into train / test sets.
3	Choose Features	Select the signals the model should look at.
4	Train Model	The model adjusts itself to fit the training data.
5	Evaluate	Test on unseen data to measure real performance.
6	Deploy & Monitor	Ship the model, and watch for drift over time.

IT'S ITERATIVE, NOT LINEAR Poor evaluation results usually send you back to feature selection or data collection. Real ML work loops through these stages many times before a model is ready to ship.

2.3 Three Ways Machines Learn

Every ML technique falls into one of three families, distinguished by what kind of feedback the system receives during training.

Paradigm	How It Learns	Examples
Supervised Learning	Learns from labeled examples — input paired with the correct answer.	Email spam detection, house price prediction
Unsupervised Learning	Finds hidden structure in unlabeled data — no "correct answer" given.	Customer segmentation, anomaly detection
Reinforcement Learning	Learns by trial and error, receiving rewards or penalties for actions.	Game-playing agents, robotics, resource routing

Supervised Learning, In Detail

Supervised learning splits into two problem types, based on what you are predicting:

Regression

Predicts a continuous numeric value. Example: predicting a house's sale price from its size, location, and age.

Classification

Predicts a discrete category or class — binary (Spam vs. Not Spam) or multi-class (Cat vs. Dog vs. Bird).

Unsupervised Learning, In Detail

There is no label to predict — the algorithm finds structure in the data on its own.

Clustering

Groups similar data points together based on shared characteristics, with no predefined categories. Example: grouping customers by purchase behaviour for targeted marketing.

Dimensionality Reduction

Compresses many features into fewer, information-dense ones. Example: compressing hundreds of product attributes into 2 dimensions to plot and explore visually (e.g., PCA).

Reinforcement Learning, In Detail

An agent learns by acting in an environment, observing the outcome, and adjusting — much like training a pet with rewards.

Term	Meaning
State	What the agent currently observes about its situation.
Action	A choice the agent can make from its current state.
Reward	Feedback signal — positive for good outcomes, negative for bad.
Policy	The strategy the agent learns for picking actions.

Example An agent learning to play a video game tries moves, gets points (reward) for good ones, and gradually learns a winning strategy (policy).

KEY TERMS — MODULE 2

Machine Learning: systems that learn rules from data rather than being told them.

Feature: an input signal the model uses to make predictions.

Label: the correct answer provided in supervised learning.

Regression: predicting a continuous numeric value.

Classification: predicting a discrete category.

Clustering: grouping unlabeled data by similarity.

Dimensionality Reduction: compressing many features into fewer ones.

Policy: the strategy an RL agent learns for choosing actions.

CHECK YOUR UNDERSTANDING

Q1.In Tom Mitchell's formal definition of learning, what do T, E, and P stand for?

A: T = Task (what the program is trying to do), E = Experience (the data it learns from), and P = Performance measure (how you score success). A program "learns" if its performance at T, as measured by P, improves with more experience E.

Q2.Why is the ML pipeline described as iterative rather than linear?

A: Because a poor result at the Evaluate stage typically means you need to go back and choose different features, or collect more/better data — teams cycle through the six stages multiple times before a model is production-ready.

Q3.What distinguishes supervised, unsupervised, and reinforcement learning from one another?

A: Supervised learning uses labeled input-output pairs; unsupervised learning finds structure in unlabeled data with no "correct answer" given; reinforcement learning learns from trial-and-error feedback (rewards/penalties) rather than fixed labels.

Q4.Give one real-world example each of regression and classification.

A: Regression — predicting a house's sale price (a continuous number). Classification — deciding whether an email is spam or not-spam (a discrete category).

MULTIPLE-CHOICE QUESTIONS

1.In traditional programming vs. machine learning, what is different about the "rules"?

A. Traditional programming has no rules at all
B. In ML, the system infers the rules from data instead of a human writing them
C. ML always produces slower rules than traditional programming
D. There is no difference

2.Predicting a house's exact sale price in dollars is an example of:

A. Classification
B. Clustering
C. Regression
D. Reinforcement learning

3.Grouping customers into segments with no predefined categories is an example of:

A. Supervised classification
B. Regression
C. Reinforcement learning
D. Clustering (unsupervised learning)

4.In reinforcement learning, the "policy" refers to:

A. The dataset used for training
B. The strategy the agent learns for choosing actions
C. The final test accuracy of the model
D. A rule written by a human programmer

5.Which ML pipeline stage comes immediately after "Train Model"?

A. Collect Data
B. Choose Features
C. Evaluate
D. Deploy & Monitor

6.A "label," in the context of supervised learning, is:

A. An input feature used to make predictions
B. The correct answer paired with an input example
C. A synonym for "reward" in reinforcement learning
D. The name of the dataset file

7.Compressing hundreds of product attributes down to 2 dimensions for visualization (e.g., PCA) is an example of:

A. Classification
B. Reinforcement learning
C. Dimensionality reduction
D. Regression

PRACTICAL / APPLIED EXERCISE

Scenario: A retail company wants to build three different ML solutions. For each one below, name (a) the learning paradigm (supervised / unsupervised / reinforcement) and (b) whether it's regression, classification, clustering, or none of these.

Predicting next month's total revenue in rupees based on the last 24 months of sales.
Grouping the store's 50,000 customers into "types" of shoppers with no predefined labels, purely from their purchase history.
Training a warehouse robot to pick the fastest route between shelves by trying different paths and being "rewarded" for shorter delivery times.

Model answers are provided in the Answer Key at the end of this booklet.

MODULE 3 OF 6

What Neural Networks Are Made Of

Zooming into the building blocks that power modern deep learning.

3.1 Anatomy of a Neural Network

A neural network is built from layers of connected nodes ("neurons") that transform input into output, one layer at a time.

Layer	Role
Input Layer	Raw features go in — pixels, words, numbers.
Hidden Layer 1	Detects simple patterns.
Hidden Layer 2	Combines those patterns into higher-level concepts.
Output Layer	Produces the final prediction.

WHY "DEEP" LEARNING? "Deep" simply means many hidden layers stacked between the input and output. Each additional layer learns increasingly abstract features — early layers might detect edges, later ones detect shapes, and the deepest ones detect entire objects.

3.2 What's Inside a Single Neuron?

Every network — no matter how large — is built from this one basic unit, repeated millions of times.

The Neuron's Math

z = (w₁x₁ + w₂x₂ + w₃x₃ + ... + b) → output = activation(z)

Inputs (x): the values flowing in from the previous layer.
Weights (w): control how much each input matters — learned during training.
Bias (b): a constant that shifts the result up or down.
Activation function: adds non-linearity, so the network can learn more than straight lines.

3.3 Activation Functions

Activation functions are small formulas that decide whether — and how strongly — a neuron "fires."

Function	Typically Used For
Sigmoid	Output layer for binary classification (squashes values into a 0–1 range).
ReLU	The most common choice in hidden layers — fast to compute and avoids vanishing gradients.
Tanh	Hidden layers when a zero-centered output helps training.

3.4 How a Neural Network Learns

Training is a four-step loop, repeated thousands (sometimes billions) of times over the course of a training run:

#	Step	What Happens
1	Forward Pass	Input flows through the network layer by layer, producing a prediction.
2	Compute Loss	Compare the prediction to the true answer — a loss function scores how wrong it was.
3	Backpropagation	The error is sent backward through the network to find each weight's contribution to the mistake.
4	Gradient Descent	Every weight is nudged slightly in the direction that reduces the loss.

VOCABULARY One full pass through this loop for a batch of data is called a training step. One pass through the entire dataset is an epoch.

3.5 Gradient Descent, Visualized

Think of the loss (error) as a landscape, with weight values along one axis and error along the other. Training is like walking downhill toward the lowest point — each training step is one small step down the slope.

The learning rate controls how big each downhill step is:

Learning Rate	Effect
Too high	Overshoots the minimum — loss bounces around or even diverges.
Just right	Steadily converges to the minimum in a reasonable amount of time.
Too low	Crawls toward the minimum — training takes far too long.

KEY TERMS — MODULE 3

Weight: a learned value controlling how much an input matters.

Bias: a learned constant that shifts a neuron's output.

Activation Function: adds non-linearity to a neuron's output.

Loss Function: measures how wrong a prediction is.

Backpropagation: sends error backward to update weights.

Gradient Descent: the algorithm that nudges weights to reduce loss.

Learning Rate: how big each gradient descent step is.

Epoch: one full pass through the entire training dataset.

CHECK YOUR UNDERSTANDING

Q1.What role does the activation function play inside a neuron?

A: It introduces non-linearity into the neuron's output. Without it, stacking layers would just be repeated linear operations, and the network could not learn complex, non-linear patterns.

Q2.What is the difference between a training step and an epoch?

A: A training step is one forward-and-backward pass for a single batch of data. An epoch is one complete pass through the entire training dataset — which usually consists of many training steps.

Q3.Explain, in your own words, what backpropagation does.

A: After the loss is computed, backpropagation works backward through the network to calculate how much each individual weight contributed to the error, so gradient descent knows which direction to adjust each weight.

Q4.Why can a learning rate that is too high cause training to fail?

A: Because each update to the weights overshoots the minimum of the loss landscape — instead of steadily descending, the loss can bounce around or even increase (diverge) instead of converging.

MULTIPLE-CHOICE QUESTIONS

1.In the formula z = (w₁x₁ + w₂x₂ + ... + b), what does "b" represent?

A. The batch size
B. The bias — a constant that shifts the result
C. The backpropagation rate
D. The number of hidden layers

2.Which activation function is most commonly used in hidden layers because it is fast and avoids vanishing gradients?

A. Sigmoid
B. ReLU
C. Softmax
D. Linear

3.What is the correct order of the neural network training loop?

A. Gradient Descent → Loss → Forward Pass → Backpropagation
B. Forward Pass → Compute Loss → Backpropagation → Gradient Descent
C. Backpropagation → Forward Pass → Gradient Descent → Loss
D. Compute Loss → Gradient Descent → Forward Pass → Backpropagation

4.A learning rate that is too low will most likely cause:

A. The loss to diverge wildly
B. Training to take far too long, crawling toward the minimum
C. Instant convergence in one step
D. No effect on training speed

5."Deep" in deep learning refers to:

A. The size of the dataset
B. How long training takes
C. Many hidden layers stacked between input and output
D. The number of output classes

6.Which layer of a neural network detects the simplest patterns, closest to the raw input?

A. Output Layer
B. Hidden Layer 1 (first hidden layer)
C. The final hidden layer
D. None — all layers detect the same level of pattern

PRACTICAL / APPLIED EXERCISE

Calculation: A neuron receives two inputs, x₁ = 2 and x₂ = 3, with weights w₁ = 0.5 and w₂ = -1, and a bias b = 1.

Calculate z = (w₁x₁ + w₂x₂ + b).
If the activation function is ReLU (which outputs max(0, z)), what is the neuron's final output?
Would the output change if a Sigmoid activation were used instead? Explain, in one sentence, what range Sigmoid would squash the result into.

Model answers are provided in the Answer Key at the end of this booklet.

MODULE 4 OF 6

Data & Model Quality

Why the data you feed a model matters as much as the model itself.

4.1 Data: The Fuel of Every Model

Term	Meaning
Feature	An input variable the model uses — e.g., square footage, age, pixel value.
Label	The correct answer for supervised learning — e.g., the actual sale price.
Sample / Row	One complete example: a full set of features (plus a label, if supervised).

Splitting the Dataset

Train — 70%

Fits the model.

Validation — 15%

Tunes it.

Test — 15%

Gives the final, honest score.

WHY NOT TRAIN ON EVERYTHING? If a model is evaluated on the same data it trained on, it can simply memorize the answers — giving a misleadingly perfect score. Holding out unseen test data simulates the real world: how will the model perform on examples it has never encountered?

Analogy Training = studying practice questions. Validation = a mock exam to check your prep. Test = the real, unseen final exam.

4.2 Overfitting vs. Underfitting

These are two opposite ways a model can fail to generalize to new data — with a sweet spot in between.

Failure Mode	Description	Symptom
Underfitting	Too simple — misses the underlying pattern entirely.	High error on both training and test data.
Good Fit	Captures the true trend without memorizing noise.	Low error on training and test data.
Overfitting	Memorizes noise instead of the pattern.	Near-zero training error, but high test error.

4.3 The Bias-Variance Tradeoff

Underfitting and overfitting are really two ends of the same dial — model complexity.

High Bias (→ Underfitting)

The model makes strong, oversimplified assumptions — it systematically misses the true pattern, no matter how much data it sees.

High Variance (→ Overfitting)

The model is overly sensitive to the specific training data, including its noise. Small data changes swing predictions wildly.

4.4 Evaluating a Model's Performance

The confusion matrix is the source for nearly every classification metric. Example: a medical test predicting disease, where "Positive" = predicted disease present.

	Predicted Positive	Predicted Negative
Actual Positive	True Positive (TP)	False Negative (FN)
Actual Negative	False Positive (FP)	True Negative (TN)

Metric	Formula	Question It Answers
Accuracy	(TP + TN) / Total	Overall, how often is the model correct?
Precision	TP / (TP + FP)	Of predicted positives, how many were right?
Recall	TP / (TP + FN)	Of actual positives, how many did we catch?
F1 Score	Harmonic mean of Precision & Recall	One balanced number for both.

A TRADEOFF, NOT A FREE LUNCH Precision vs. Recall is a tradeoff — optimizing one often costs the other. In disease screening, you may accept lower precision (more false alarms) to get higher recall (catching every real case). The right balance depends on the problem.

KEY TERMS — MODULE 4

Feature: an input variable used by the model.

Label: the correct answer for a training example.

Overfitting: memorizing training data instead of learning the pattern.

Underfitting: a model too simple to capture the pattern.

Bias (statistical): systematic error from oversimplified assumptions.

Variance (statistical): sensitivity to fluctuations in training data.

Confusion Matrix: a table of TP, FP, TN, FN counts.

F1 Score: the harmonic mean of precision and recall.

CHECK YOUR UNDERSTANDING

Q1.Why do we split data into train, validation, and test sets instead of using it all for training?

A: Training fits the model; validation tunes choices like hyperparameters; the test set — never touched during training or tuning — gives an honest estimate of how the model will perform on genuinely unseen data.

Q2.How can you tell, from training and test error alone, whether a model is overfitting?

A: Overfitting shows near-zero training error but noticeably high test error — the gap between the two is the giveaway, since the model has memorized the training set rather than learning a pattern that generalizes.

Q3.In a medical screening context, would you prioritize precision or recall — and why?

A: Usually recall, because missing an actual positive case (a false negative) can be far more costly than a false alarm — you would rather flag some healthy patients for a follow-up test than miss someone who is actually sick.

Q4.What is the relationship between bias/variance and underfitting/overfitting?

A: High bias corresponds to underfitting (oversimplified, systematically wrong assumptions), while high variance corresponds to overfitting (overly sensitive to training data's specific noise). They are two ends of the same model-complexity dial.

MULTIPLE-CHOICE QUESTIONS

1.Which data split gives the "final, honest score" of a model?

A. Training set
B. Validation set
C. Test set
D. The full original dataset

2.Near-zero training error but high test error is the classic signature of:

A. Underfitting
B. Overfitting
C. A perfectly balanced model
D. Low bias and low variance

3.Precision is calculated as:

A. TP / (TP + FN)
B. TP / (TP + FP)
C. (TP + TN) / Total
D. FP / (FP + TN)

4.High bias is most closely associated with:

A. Overfitting
B. Underfitting
C. A perfect model
D. The test set only

5.The F1 score is best described as:

A. The sum of precision and recall
B. The harmonic mean of precision and recall, balancing both
C. Another name for accuracy
D. The number of false positives

6.A "Sample" or "Row" in a dataset refers to:

A. A single feature value
B. One complete example — a full set of features (plus label, if supervised)
C. The entire training set
D. The model's final prediction

PRACTICAL / APPLIED EXERCISE

Scenario: A spam classifier is tested on 100 emails. The confusion matrix results are: TP = 40, FP = 10, FN = 5, TN = 45.

Calculate the model's Accuracy.
Calculate the model's Precision.
Calculate the model's Recall.
Based on these numbers, is the model more likely to let spam through, or to wrongly flag a real email as spam? Justify your answer using the numbers above.

Model answers are provided in the Answer Key at the end of this booklet.

MODULE 5 OF 6

Architectures & Applications

The major model families in use today, and where they show up in the real world.

5.1 Key Architectures at a Glance

Different data shapes call for different network designs. The three families below cover the vast majority of production AI systems today.

Architecture	Full Name	How It Works	Used For
CNN	Convolutional Neural Network	Scans images in small patches to detect edges, then shapes, then objects — layer by layer.	Image recognition, medical scans, self-driving perception
RNN	Recurrent Neural Network	Processes sequences step by step, carrying a "memory" of what came before.	Time series, older speech & text models
Transformer	Attention-Based Architecture	Looks at an entire sequence at once, weighing how much each part should "attend to" every other part.	LLMs (GPT, Claude), translation, modern NLP

WHY THIS MATTERS Choosing the right architecture for the shape of your data is one of the most consequential decisions in an ML project — a CNN built for images will not work well on sequential text data, and vice versa.

5.2 Where AI & ML Show Up Today

The same core techniques from Modules 2–4 are applied across very different industries:

Industry	Applications
Healthcare	Diagnostic imaging, drug discovery, patient risk scoring
Finance	Fraud detection, credit scoring, algorithmic trading
Retail	Recommendation engines, demand forecasting, pricing
Automotive	Autonomous driving, predictive maintenance
Customer Service	Chatbots, sentiment analysis, ticket routing
Voice & Audio	Speech-to-text, voice assistants, transcription
Computer Vision	Facial recognition, quality inspection, satellite imagery
Education	Personalized learning paths, automated grading

KEY TERMS — MODULE 5

CNN: a network specialized for grid-like data such as images.

RNN: a network that processes sequences step by step with memory.

Transformer: an attention-based architecture that processes a whole sequence at once.

Attention: a mechanism for weighing how much each part of the input matters to every other part.

LLM: Large Language Model — a Transformer trained on huge amounts of text.

CHECK YOUR UNDERSTANDING

Q1.Why is a CNN a natural fit for image data specifically?

A: Because CNNs scan the input in small local patches to detect edges first, then shapes, then whole objects — exploiting the grid-like, spatially local structure of pixels layer by layer.

Q2.What core idea distinguishes a Transformer from an RNN when processing a sequence?

A: An RNN processes a sequence step by step, carrying forward a memory of prior steps. A Transformer instead looks at the entire sequence at once, using attention to weigh how much every part should attend to every other part — without processing strictly in order.

Q3.Name two industries and one specific ML application in each.

A: Example: Healthcare — diagnostic imaging; Finance — fraud detection. (Any two industry/application pairs from the table are acceptable.)

MULTIPLE-CHOICE QUESTIONS

1.Which architecture underlies modern LLMs such as GPT and Claude?

A. CNN
B. RNN
C. Transformer
D. Decision Tree

2.An architecture that carries a "memory" of what came before while processing a sequence step by step is a:

A. CNN
B. RNN
C. Transformer
D. None of the above

3.CNNs are especially well-suited to:

A. Time-series forecasting only
B. Image recognition and computer vision tasks
C. Reinforcement learning exclusively
D. Tabular spreadsheet data only

4.Fraud detection and algorithmic trading are common AI applications in which industry?

A. Healthcare
B. Finance
C. Education
D. Automotive

5.The "attention" mechanism in a Transformer is used to:

A. Randomly drop out neurons during training
B. Weigh how much each part of a sequence should relate to every other part
C. Compress an image into fewer pixels
D. Split data into train/test sets

PRACTICAL / APPLIED EXERCISE

Scenario: Match each real-world project below to the single best-fit architecture (CNN, RNN, or Transformer), and justify your choice in one sentence.

Detecting tumors in X-ray images.
Powering a customer-support chatbot that must understand a full paragraph of context before replying.
A legacy speech-to-text system built before Transformers became mainstream, processing audio one time-step at a time.

Model answers are provided in the Answer Key at the end of this booklet.

MODULE 6 OF 6

Ethics & Responsible AI

The responsibility that comes with building systems that make decisions at scale.

Powerful systems trained on human data inherit human problems — and can add new ones of their own. As AI moves from research labs into hiring, lending, healthcare, and law enforcement, four themes come up again and again:

Bias & Fairness

Models trained on historical data can learn and amplify existing societal biases (e.g., in hiring or lending decisions).

Privacy

Large datasets often contain personal information; models can sometimes memorize and leak training data.

Transparency

Complex models (especially deep learning) can be "black boxes" — hard to explain why a particular decision was made.

Accountability

When an AI system causes harm, who is responsible — the developer, the deployer, or the data provider?

WHY THIS ISN'T OPTIONAL A model can be statistically accurate and still be unfair, unsafe, or non-compliant with regulation. Responsible AI practice treats fairness, privacy, transparency, and accountability as first-class requirements — not afterthoughts bolted on after deployment.

Course Recap — What We Covered

AI is the goal, ML is an approach, DL is a technique — nested, not separate.
ML learns rules from data through supervised, unsupervised, or reinforcement learning.
Neural networks are built from neurons: weighted sums passed through activation functions.
Training = forward pass → loss → backpropagation → gradient descent, repeated.
Good models balance bias and variance, validated on data they've never seen.
CNNs, RNNs, and Transformers each suit different kinds of data.
Every powerful system carries a responsibility to be fair, private, and transparent.

NEXT STEP Pick one small project — a classifier, a chatbot, a prediction model — and build it end-to-end using the six-stage pipeline from Module 2.

KEY TERMS — MODULE 6

Bias (ethical): systematic unfairness in a model's outputs toward a group.

Fairness: ensuring a model's outcomes do not unjustly disadvantage groups.

Privacy: protecting personal data used in training and inference.

Transparency / Explainability: the ability to explain why a model made a decision.

Accountability: clarity on who is responsible when an AI system causes harm.

CHECK YOUR UNDERSTANDING

Q1.How can a model trained on historical data end up perpetuating bias, even without anyone intending it to?

A: If the historical data reflects past discriminatory patterns (e.g., in hiring or lending), the model learns those patterns as "normal," and reproduces — or even amplifies — them in its future predictions.

Q2.Why are deep learning models often called "black boxes," and why does that matter?

A: Because their decisions emerge from millions of learned weights across many layers, making it hard for even the model's own creators to explain why a specific decision was made. This matters for accountability, trust, and regulatory compliance in high-stakes settings like healthcare or lending.

Q3.Who might be considered responsible when an AI system causes real-world harm?

A: There is no single universal answer — responsibility could fall on the developer who built the model, the organization that deployed it, or the party that supplied the (possibly biased) training data. This is an active area of policy and legal debate.

MULTIPLE-CHOICE QUESTIONS

1.A hiring model that unintentionally favors one demographic over another because of patterns in past hiring data is primarily an issue of:

A. Overfitting
B. Bias & Fairness
C. Learning rate tuning
D. Activation function choice

2.A model that can be shown to sometimes reproduce snippets of its training data is raising a concern about:

A. Privacy
B. Gradient descent
C. Regression
D. Clustering

3."Black box" is a term used to describe models that are:

A. Extremely fast to train
B. Hard to explain — it's unclear why they made a given decision
C. Always more accurate than simpler models
D. Only used in reinforcement learning

4.Which of these is NOT one of the four ethical themes highlighted in this module?

A. Bias & Fairness
B. Privacy
C. Transparency
D. Learning Rate

PRACTICAL / APPLIED EXERCISE

Case Study: A bank deploys a credit-scoring model. After launch, an audit finds the model approves loans for one neighborhood at a much lower rate than another, even after controlling for income.

Which ethical theme from this module does this scenario primarily illustrate?
Name one step the bank could take, using ideas from this course, to investigate whether the model is genuinely biased.
Why might "the model is just following the data" not be an acceptable explanation to regulators or affected customers?

Model answers are provided in the Answer Key at the end of this booklet.

QUICK REVISION

Master Cheat Sheet

Every core formula and framework from the course, on one page.

The Neuron

Weighted sum	z = (w₁x₁ + w₂x₂ + ... + wₙxₙ + b)
Neuron output	output = activation(z)

Classification Metrics (from the Confusion Matrix)

Metric	Formula
Accuracy	(TP + TN) / Total
Precision	TP / (TP + FP)
Recall	TP / (TP + FN)
F1 Score	Harmonic mean of Precision and Recall

Dataset Split (Typical Ratio)

Train	70%	Fits the model
Validation	15%	Tunes the model
Test	15%	Final honest score

The Neural Network Training Loop

Forward Pass → Compute Loss → Backpropagation → Gradient Descent → (repeat)

The Machine Learning Pipeline

Collect Data → Prepare Data → Choose Features → Train Model → Evaluate → Deploy & Monitor

Three Learning Paradigms

Supervised	Labeled data → Regression / Classification
Unsupervised	Unlabeled data → Clustering / Dimensionality Reduction
Reinforcement	Reward signal → Policy learning

Architecture Cheat Sheet

CNN	Images, spatial data
RNN	Sequences, time series (with memory)
Transformer	Sequences (attention-based, whole context at once) — powers LLMs

Bias-Variance, In One Line

High Bias = Underfitting (too simple) | High Variance = Overfitting (too sensitive to training data)

A–Z REFERENCE

Glossary

Every key term used across all six modules, in one alphabetical list.

Accuracy — (TP+TN)/Total; the overall proportion of correct predictions.

Activation Function — a formula (e.g., ReLU, Sigmoid, Tanh) that adds non-linearity to a neuron's output.

AGI (Artificial General Intelligence) — hypothetical AI matching human-level reasoning across any task.

AI (Artificial Intelligence) — the field of building machines that perform tasks normally requiring human intelligence.

ASI (Artificial Superintelligence) — theoretical AI exceeding human intelligence in every domain.

Backpropagation — the process of sending prediction error backward through a network to compute each weight's contribution to the mistake.

Bias (model parameter) — a learned constant added to a neuron's weighted sum.

Bias (statistical/ethical) — systematic error, either from oversimplified model assumptions or unfair treatment of a group.

Classification — predicting a discrete category or class.

Clustering — grouping unlabeled data points by shared characteristics.

CNN (Convolutional Neural Network) — a network architecture specialized for image and grid-like data.

Confusion Matrix — a table of True/False Positive/Negative counts used to derive classification metrics.

Deep Learning — machine learning using neural networks with many hidden layers.

Dimensionality Reduction — compressing many features into fewer, information-dense ones.

Epoch — one complete pass through the entire training dataset.

F1 Score — the harmonic mean of precision and recall.

Feature — an input variable used by a model to make predictions.

Gradient Descent — the algorithm that adjusts weights step-by-step to reduce loss.

Label — the correct answer paired with an input in supervised learning.

Learning Rate — a value controlling how large each gradient descent step is.

Loss Function — a function scoring how wrong a model's prediction was.

Machine Learning (ML) — systems that learn patterns and rules from data rather than being explicitly programmed.

Narrow AI (ANI) — AI specialized for one task; the only type of AI that exists in deployment today.

Overfitting — a model that memorizes training data (including noise) rather than learning the general pattern.

Policy — the strategy a reinforcement learning agent learns for choosing actions.

Precision — TP/(TP+FP); of predicted positives, how many were actually correct.

Recall — TP/(TP+FN); of actual positives, how many the model correctly caught.

Regression — predicting a continuous numeric value.

Reinforcement Learning — learning by trial and error via rewards and penalties.

RNN (Recurrent Neural Network) — a network that processes sequences step by step, retaining memory of prior steps.

Supervised Learning — learning from labeled input-output pairs.

Transformer — an attention-based architecture that processes an entire sequence at once; powers modern LLMs.

Turing Test — a proposed test of whether a machine's behaviour is indistinguishable from a human's.

Underfitting — a model too simple to capture the true underlying pattern in the data.

Unsupervised Learning — learning structure from unlabeled data, with no correct answer given.

Weight — a learned value controlling how much an input contributes to a neuron's output.

FINAL ASSESSMENT

Final Comprehensive Assessment

25 mixed questions spanning all six modules. Attempt every question before checking the Answer Key.

1.AI is best described as:

A. A single algorithm
B. A goal pursued by many different techniques
C. Only neural networks
D. A database technology

2.Which term coined at the 1956 Dartmouth Workshop is still used today?

A. Machine Learning
B. Artificial Intelligence
C. Deep Learning
D. Neural Network

3.All AI systems deployed in the real world today are examples of:

A. AGI
B. ASI
C. Narrow AI
D. Self-Aware AI

4.Which statement about AI, ML, and DL is correct?

A. AI ⊂ ML ⊂ DL
B. DL ⊂ ML ⊂ AI
C. They are unrelated fields
D. ML ⊂ AI, but DL is unrelated to either

5.In Tom Mitchell's formal definition of learning, "P" stands for:

A. Parameters
B. Performance measure
C. Pipeline
D. Policy

6.Which ML pipeline stage involves cleaning and splitting data into train/test sets?

A. Collect Data
B. Prepare Data
C. Deploy & Monitor
D. Evaluate

7.Predicting whether a tumor is malignant or benign is an example of:

A. Regression
B. Binary classification
C. Clustering
D. Dimensionality reduction

8.A recommendation engine that groups shoppers with similar behaviour, without any predefined categories, is using:

A. Supervised classification
B. Reinforcement learning
C. Unsupervised clustering
D. Backpropagation

9.An RL agent's "reward" signal is:

A. The training dataset
B. Feedback that is positive for good outcomes and negative for bad ones
C. A synonym for "label"
D. The learning rate

10.Weights and bias inside a neuron are combined, before activation, as:

A. A product of all inputs
B. A weighted sum plus a bias term
C. The average of all previous outputs
D. The loss function directly

11.Which activation function is most associated with avoiding vanishing gradients in hidden layers?

A. Sigmoid
B. ReLU
C. Step function
D. Identity function

12.The correct order of the neural network training loop is:

A. Loss → Gradient Descent → Forward Pass → Backprop
B. Forward Pass → Loss → Backprop → Gradient Descent
C. Backprop → Loss → Gradient Descent → Forward Pass
D. Gradient Descent → Forward Pass → Loss → Backprop

13.A learning rate set far too high typically causes:

A. Instant, perfect convergence
B. The loss to overshoot the minimum and bounce or diverge
C. No change in training behaviour
D. Underfitting only

14.One full pass through the entire training dataset is called:

A. A batch
B. A training step
C. An epoch
D. A gradient

15.Which data split is used only for the final, honest performance score?

A. Training set
B. Validation set
C. Test set
D. Deployment logs

16.A model with near-zero training error but high test error is:

A. Underfitting
B. Overfitting
C. Perfectly generalizing
D. Using too high a bias

17.High bias in a model corresponds to:

A. Overfitting
B. Underfitting
C. Perfect accuracy
D. High recall always

18.Recall is calculated as:

A. TP / (TP + FP)
B. TP / (TP + FN)
C. (TP + TN) / Total
D. FN / (FN + TP)

19.A model scanning small patches of an image to detect edges, then shapes, then objects is a:

A. CNN
B. RNN
C. Transformer
D. Decision tree

20.Transformers process a sequence by:

A. Reading it strictly one token at a time with no memory
B. Looking at the entire sequence at once, using attention
C. Ignoring word order completely
D. Converting it into an image first

21.Autonomous driving and predictive maintenance are common AI applications in:

A. Education
B. Automotive
C. Retail
D. Customer Service

22.A hiring algorithm that systematically disadvantages a demographic group due to historical data patterns is primarily an issue of:

A. Overfitting
B. Bias & Fairness
C. Learning rate
D. Dimensionality reduction

23.A model that is difficult to explain, even by its own creators, is referred to as a:

A. Reactive Machine
B. Black box
C. Confusion matrix
D. Feature vector

24.Scenario: A retail company wants to forecast next quarter's exact revenue in dollars using 3 years of historical sales figures. Which combination best fits this task?

A. Unsupervised clustering
B. Supervised regression
C. Reinforcement learning
D. Supervised classification

25.Scenario: A spam filter achieves 98% accuracy on the training set but only 62% accuracy on new, unseen emails. What is most likely happening, and what data-split concept would have caught this earlier?

A. Underfitting; more training data was needed
B. Overfitting; a held-out validation/test set would have revealed the gap earlier
C. The model is using reinforcement learning
D. This is expected behaviour and not a concern

SCORING GUIDE 22–25 correct: Excellent — ready for hands-on projects. 17–21: Solid — revisit the modules for missed questions. Below 17: Re-read the relevant modules and retry the module-level MCQs before returning here.

ANSWER KEY

Answer Key — MCQs

Answers for every multiple-choice question in this booklet, by module.

Module 1 — Foundations of AI

1 — B

2 — B

3 — C

4 — B

5 — C

6 — B

7 — C

8 — B

Module 2 — Machine Learning

1 — B

2 — C

3 — D

4 — B

5 — C

6 — B

7 — C

Module 3 — Inside Neural Networks

1 — B

2 — B

3 — B

4 — B

5 — C

6 — B

Module 4 — Data & Model Quality

1 — C

2 — B

3 — B

4 — B

5 — B

6 — B

Module 5 — Architectures & Applications

1 — C

2 — B

3 — B

4 — B

5 — B

Module 6 — Ethics & The Road Ahead

1 — B

2 — A

3 — B

4 — D

Final Comprehensive Assessment

1 — B

2 — B

3 — C

4 — B

5 — B

6 — B

7 — B

8 — C

9 — B

10 — B

11 — B

12 — B

13 — B

14 — C

15 — C

16 — B

17 — B

18 — B

19 — A

20 — B

21 — B

22 — B

23 — B

24 — B

25 — B

Answer Key — Practical Exercises

Module 1 — AI Functionality Classification

1. Thermostat → Reactive Machine (reacts to current temperature only, no memory). 2. Fraud detection using 30-day history → Limited Memory (uses recent past data). 3. Support AI modeling customer's mental state → Theory of Mind (hypothetical — infers beliefs/intentions, not yet deployed in practice).

Module 2 — Matching Business Problems to Paradigms

1. Revenue forecasting → Supervised / Regression (predicting a continuous number from historical data). 2. Customer segmentation with no labels → Unsupervised / Clustering. 3. Warehouse robot learning from trial-and-error rewards → Reinforcement Learning (no fixed regression/classification category — it learns a policy).

Module 3 — Neuron Calculation

z = (0.5 × 2) + (-1 × 3) + 1 = 1 − 3 + 1 = -1. With ReLU, output = max(0, -1) = 0 (the neuron does not fire). With Sigmoid instead, the same z = -1 would be squashed into the 0–1 range (specifically, a value below 0.5, since z is negative) rather than clamped to exactly 0 — Sigmoid never outputs a hard zero.

Module 4 — Confusion Matrix Calculation

TP=40, FP=10, FN=5, TN=45, Total=100.
Accuracy = (40+45)/100 = 0.85 (85%).
Precision = 40/(40+10) = 0.80 (80%).
Recall = 40/(40+5) = 0.89 (≈89%).
With Recall (89%) higher than Precision (80%), and FN (5) lower than FP (10), the model is more likely to wrongly flag a real email as spam (more false positives than false negatives) than to let spam through undetected.

Module 5 — Architecture Matching

1. Tumor detection in X-rays → CNN (image data, spatial patterns). 2. Chatbot understanding a full paragraph of context → Transformer (needs whole-sequence attention, powers modern conversational AI). 3. Legacy step-by-step speech-to-text system → RNN (sequential processing with memory, pre-dates widespread Transformer adoption).

Module 6 — Credit-Scoring Case Study

1. This scenario primarily illustrates Bias & Fairness (a disparate outcome across groups even after controlling for income). 2. The bank could run a structured fairness audit — e.g., comparing approval rates and model errors (using confusion-matrix-style metrics) across demographic/neighborhood groups, and testing whether removing/adjusting proxy variables changes the disparity. 3. "The model is just following the data" is not sufficient because the training data itself can encode historical discrimination — a model that faithfully reproduces biased patterns is still causing unfair, and potentially unlawful, harm; accountability rests with the organization deploying the model, not just the data.

Flowlytix AI Solutions

Building practical, human-centered AI education — from first principles to real-world systems. This booklet is a companion to our "Basic Concepts of AI & ML" course, part of the Flowlytix AI Bootcamp Series.

Thank you for learning with us. Keep building.

About This Booklet

Why This Booklet Exists

How This Booklet Is Organized

How to Use It

Table of Contents

Foundations of AI

1.1 What Is Artificial Intelligence?

1.2 A Brief History of AI

1.3 Types of AI, by Capability

1.4 Types of AI, by Functionality

1.5 AI vs. Machine Learning vs. Deep Learning

Machine Learning

2.1 What Is Machine Learning?

2.2 The Machine Learning Pipeline

2.3 Three Ways Machines Learn

Supervised Learning, In Detail

Unsupervised Learning, In Detail

Reinforcement Learning, In Detail

What Neural Networks Are Made Of

3.1 Anatomy of a Neural Network

3.2 What's Inside a Single Neuron?

3.3 Activation Functions

3.4 How a Neural Network Learns

3.5 Gradient Descent, Visualized

Data & Model Quality

4.1 Data: The Fuel of Every Model

Splitting the Dataset

4.2 Overfitting vs. Underfitting

4.3 The Bias-Variance Tradeoff

4.4 Evaluating a Model's Performance

Architectures & Applications

5.1 Key Architectures at a Glance

5.2 Where AI & ML Show Up Today

Ethics & Responsible AI

Course Recap — What We Covered

Master Cheat Sheet

The Neuron

Classification Metrics (from the Confusion Matrix)

Dataset Split (Typical Ratio)

The Neural Network Training Loop

The Machine Learning Pipeline

Three Learning Paradigms

Architecture Cheat Sheet

Bias-Variance, In One Line

Glossary

Final Comprehensive Assessment

Answer Key — MCQs

Module 1 — Foundations of AI

Module 2 — Machine Learning

Module 3 — Inside Neural Networks

Module 4 — Data & Model Quality

Module 5 — Architectures & Applications

Module 6 — Ethics & The Road Ahead

Final Comprehensive Assessment

Answer Key — Practical Exercises

Module 1 — AI Functionality Classification

Module 2 — Matching Business Problems to Paradigms

Module 3 — Neuron Calculation

Module 4 — Confusion Matrix Calculation

Module 5 — Architecture Matching

Module 6 — Credit-Scoring Case Study

Flowlytix AI Solutions