Skip to content

easydel

Ppo Trainer

Initializing search

GitHub

easydel

GitHub

APIs
APIs
- Data Preprocessing
  Data Preprocessing
  - Processor
- Etils
  Etils
  - Auto Tx
  - Configs
  - Easystate
  - Errors
  - Etils
- Eval
  Eval
  - Lm Eval
- Modules
  Modules
  - Arctic
    Arctic
    
    Arctic Configuration
    
    Modelling Arctic Flax
  - Attention Module
  - Attentions
    Attentions
    
    Blockwise Attn
    
    Flash
    
    Ring
    
    Vanilla
  - Auto Easydel Model
  - Cohere
    Cohere
    
    Cohere Configuration
    
    Modelling Cohere Flax
  - Dbrx
    Dbrx
    
    Dbrx Configuration
    
    Modelling Dbrx Flax
  - Deepseek V2
    Deepseek V2
    
    Deepseek Configuration
    
    Modeling Deepseek Flax
  - Easydel Modelling Utils
  - Falcon
    Falcon
    
    Falcon Configuration
    
    Modelling Falcon Flax
  - Flax Modelling Utils
  - Gemma
    Gemma
    
    Gemma Configuration
    
    Modelling Gemma Flax
  - Gpt J
    Gpt J
    
    Gpt J Configuration
    
    Modelling Gpt J Flax
  - Gpt Neo X
    Gpt Neo X
    
    Gpt Neo X Configuration
    
    Modelling Gpt Neo X Flax
  - Gpt2
    Gpt2
    
    Gpt2 Configuration
    
    Modelling Gpt2 Flax
  - Grok 1
    Grok 1
    
    Grok 1 Configuration
    
    Modelling Grok 1 Flax
  - Jetmoe
    Jetmoe
    
    Jetmoe Configuration
    
    Modelling Jetmoe Flax
  - Llama
    Llama
    
    Llama Configuration
    
    Modelling Llama Flax
    
    Modelling Vision Llama Flax
    
    Vision Llama Configuration
  - Lucid Transformer
    Lucid Transformer
    
    Lt Configuration
    
    Modelling Lt Flax
  - Mamba
    Mamba
    
    Mamba Configuration
    
    Modelling Mamba Flax
  - Mistral
    Mistral
    
    Mistral Configuration
    
    Modelling Mistral Flax
    
    Modelling Vision Mistral Flax
    
    Vision Mistral Configuration
  - Mixtral
    Mixtral
    
    Mixtral Configuration
    
    Modelling Mixtral Flax
  - Mosaic Mpt
    Mosaic Mpt
    
    Modelling Mpt Flax
    
    Mosaic Configuration
  - Olmo
    Olmo
    
    Modelling Olmo Flax
    
    Olmo Configuration
  - Openelm
    Openelm
    
    Modelling Openelm Flax
    
    Openelm Configuration
  - Opt
    Opt
    
    Modelling Opt Flax
    
    Opt Configuration
  - Palm
    Palm
    
    Modelling Palm Flax
    
    Palm Configuration
  - Phi
    Phi
    
    Modelling Phi Flax
    
    Phi Configuration
  - Phi3
    Phi3
    
    Modelling Phi3 Flax
    
    Phi3 Configuration
  - Qwen1
    Qwen1
    
    Modelling Qwen1 Flax
    
    Qwen1 Configuration
  - Qwen2
    Qwen2
    
    Modelling Qwen Flax
    
    Qwen Configuration
  - Qwen2 Moe
    Qwen2 Moe
    
    Configuration Qwen2 Moe
    
    Modeling Qwen2 Moe Flax
  - Roberta
    Roberta
    
    Modelling Roberta Flax
    
    Roberta Configuration
  - Rwkv
    Rwkv
    
    Modelling Rwkv Flax
    
    Rwkv Configuration
  - Stablelm
    Stablelm
    
    Modelling Stablelm Flax
    
    Stablelm Configuration
  - T5
    T5
    
    Modelling T5 Flax
    
    T5 Configuration
  - Whisper
    Whisper
    
    Modelling Whisper Flax
    
    Whisper Configuration
- Partitioning
  Partitioning
  - Partitioner
- Reinforcement Learning
  Reinforcement Learning
  - Core
  - Models
    Models
    
    Modelling Casual Language Rl
  - Trainer
    Trainer
    
    Partitioner Config
    
    Ppo Config
    
    Ppo Trainer Ppo Trainer
    Table of contents
    
    ppo_trainer
    
    Training Configs
    
    Utils
  - Utils
    Utils
    
    Collectors
- Serve
  Serve
  - Gradio User Interface Base
  - Jax Serve
  - Prompters
    Prompters
    
    Base Prompter
    
    Cargo Prompter
    
    Chatml Prompter
    
    Gemma Prompter
    
    Guanaco Prompter
    
    Llama2 Prompter
    
    Openchat Prompter
    
    Zephyr Prompter
  - Serve Engine
    Serve Engine
    
    Client
    
    Configuration
    
    Serve
  - Torch Serve
  - Utils
- Smi
  Smi
  - Smi
- Trainer
  Trainer
  - Base Trainer
  - Causal Language Model Trainer
    Causal Language Model Trainer
    
    Causal Language Model Trainer
    
    Fwd Bwd Functions
    
    Modeling Output
  - Dpo
    Dpo
    
    Dpo Trainer
    
    Fwd Bwd Functions
    
    Modelling Output
    
    Utils
  - Orpo
    Orpo
    
    Fwd Bwd Functions
    
    Modelling Output
    
    Orpo Trainer
    
    Utils
  - Sft
    Sft
    
    Stf Trainer
    
    Utils
  - Training Configurations
  - Utils
  - Vision Causal Language Model Trainer
    Vision Causal Language Model Trainer
    
    Fwd Bwd Functions
    
    Modelling Output
    
    Vision Causal Language Model Trainer
- Transform
  Transform
  - Easydel Transform
  - Falcon
  - Llama
  - Mistral
  - Mpt
  - Utils
- Utils
  Utils
  - Checker
  - Prompters
  - Tensor Utils
  - Utils
AvailableModels
CONTRIBUTING
EasyBIT
Examples
Examples
Home
install

Table of contents

ppo_trainer

reinforcement_learning.trainer.ppo_trainer

Erfan Zare Chavoshi-easydel

Made with Material for MkDocs