Reinforcement Learning – Short Notes | TechAmbitionX
Reinforcement Learning (RL) – Exam Oriented Short Notes
Platform: TechAmbitionX
1. Definition
Reinforcement Learning is a type of machine learning in which an agent learns optimal behavior by interacting with an environment and maximizing cumulative reward.
2. Key Components
- Agent: Learner or decision-maker
- Environment: External system the agent interacts with
- State (S): Current situation of the agent
- Action (A): Possible moves by the agent
- Reward (R): Feedback from environment
- Policy (π): Strategy followed by the agent
3. Working of Reinforcement Learning
Observe State → Take Action → Receive Reward → Move to New State → Update Policy → Repeat
4. Reward Concept
- Positive reward → Encourages action
- Negative reward → Discourages action
- Goal is to maximize total reward over time
5. Exploration vs Exploitation
- Exploration: Trying new actions
- Exploitation: Using known best actions
- RL must balance both
6. Markov Decision Process (MDP)
Reinforcement Learning problems are modeled using MDP.
MDP = (S, A, R, P, γ)
- S → Set of states
- A → Set of actions
- R → Reward function
- P → Transition probability
- γ → Discount factor
7. Value Functions
- V(s): Value of being in state s
- Q(s,a): Value of taking action a in state s
8. Q-Learning
Q-Learning is a model-free reinforcement learning algorithm that learns the optimal action-value function.
Q-Learning Update Equation
Q(s, a) = Q(s, a) + α [ r + γ max Q(s′, a′) − Q(s, a) ]
Where:
- Q(s, a) → Current Q-value
- α (alpha) → Learning speed
- r → Immediate reward after taking an action
- γ (gamma) → Discount factor (The discount factor γ controls the importance of future rewards in reinforcement learning)
- max Q(s′, a′) → Maximum future Q-value
- s′ → Next state
9. Types of Reinforcement Learning
- Model-Free RL: Q-Learning, SARSA
- Model-Based RL: Learns environment model
- Deep RL: Uses neural networks (DQN)
10. Applications of Reinforcement Learning
- Some Games
- Robotics
- Self-driving cars
- Resource scheduling
11. Advantages
- No labeled data required
- Learns optimal behavior
- Suitable for dynamic environments
12. Limitations
- Requires large training time
- Reward design is difficult
- Computationally expensive
13. RL vs Other Learning Types
- Supervised: Uses labeled data
- Unsupervised: Finds hidden patterns
- Reinforcement: Uses reward signals
Focus: Exams • Concepts • Quick Revision
Comments
Post a Comment