Skip to content

NevarokML: UNevarokMLBaseAlgorithm API

The UNevarokMLBaseAlgorithm class represents a base algorithm for reinforcement learning in NevarokML.


Properties

  • _algorithm (ENevarokMLAlgorithm): The type of algorithm.
  • _policy (ENevarokMLPolicy): The policy used by the algorithm.
  • _learningRate (float): The learning rate for the algorithm.
  • _nSteps (int32): The number of steps per batch.
  • _batchSize (int32): The batch size.
  • _nEpochs (int32): The number of training epochs.
  • _gamma (float): The discount factor for future rewards.
  • _entCoef (float): The coefficient for the entropy bonus.
  • _vfCoef (float): The coefficient for the value function loss.
  • _clipRange (float): The clipping range for the policy loss.
  • _maxGradNorm (float): The maximum gradient norm for gradient clipping.
  • _verbose (int): The verbosity level for logging.
  • _gaeLambda (float): The lambda parameter for generalized advantage estimation.
  • _useSde (bool): Whether to use state-dependent exploration.
  • _sdeSampleFreq (int): The frequency of sampling for state-dependent exploration.
  • _rmsPropEps (float): The epsilon value for RMSprop optimizer.
  • _useRmsProp (bool): Whether to use RMSprop optimizer.
  • _normalizeAdvantage (bool): Whether to normalize advantages.
  • _bufferSize (int): The size of the replay buffer.
  • _learningStarts (int): The number of steps before starting to learn.
  • _tau (float): The soft update coefficient for target networks.
  • _gradientSteps (int): The number of gradient steps per update.
  • _optimizeMemoryUsage (bool): Whether to optimize memory usage.
  • _targetUpdateInterval (int): The interval for updating target networks.
  • _explorationFraction (float): The fraction of exploration during training.
  • _explorationInitialEps (float): The initial value for exploration epsilon.
  • _explorationFinalEps (float): The final value for exploration epsilon.
  • _useSdeAtWarmup (bool): Whether to use state-dependent exploration during warm-up phase.
  • _policyDelay (int): The number of steps to delay policy updates.
  • _targetPolicyNoise (float): The noise added to target policy for TD3 algorithm.
  • _targetNoiseClip (float): The range of noise for target policy for TD3 algorithm.
  • _trainFreq (int): The frequency of training steps.
  • _entCoefAuto (bool): Whether to automatically adjust the entropy coefficient.
  • _targetEntropyAuto (bool): Whether to automatically adjust the target entropy.
  • _targetEntropy (float): The target entropy for SAC algorithm.

Methods

PPO

UFUNCTION(BlueprintPure, Category = "NevarokML|BaseAlgorithm")
static UNevarokMLBaseAlgorithm* PPO(UObject* owner, const ENevarokMLPolicy policy = ENevarokMLPolicy::MLP_POLICY,
                                    const float learningRate = 3e-4, const int nSteps = 2048,
                                    const int batchSize = 64, const int nEpochs = 10, const float gamma = 0.99,
                                    const float gaeLambda = 0.95, const float clipRange = 0.2,
                                    const float entCoef = 0.0, const float vfCoef = 0.5,
                                    const float maxGradNorm = 0.5, const bool useSde = false,
                                    const int sdeSampleFreq = -1, const int verbose = 1);
Creates a PPO (Proximal Policy Optimization) algorithm instance with the specified parameters.

A2C

UFUNCTION(BlueprintPure, Category = "NevarokML|BaseAlgorithm")
static UNevarokMLBaseAlgorithm* A2C(UObject* owner, const ENevarokMLPolicy policy = ENevarokMLPolicy::MLP_POLICY,
                                    const float learningRate = 7e-4, const int nSteps = 5,
                                    const float gamma = 0.99, const float gaeLambda = 1.0,
                                    const float entCoef = 0.0, const float vfCoef = 0.5,
                                    const float maxGradNorm = 0.5, const float rmsPropEps = 1e-5,
                                    const bool useRmsProp = true, const bool useSde = false,
                                    const int sdeSampleFreq = -1, const bool normalizeAdvantage = false,
                                    const int verbose = 1);
Creates an A2C (Advantage Actor-Critic) algorithm instance with the specified parameters.

DDPG

UFUNCTION(BlueprintPure, Category = "NevarokML|BaseAlgorithm")
static UNevarokMLBaseAlgorithm* DDPG(UObject* owner, const ENevarokMLPolicy policy = ENevarokMLPolicy::MLP_POLICY,
                                     const float learningRate = 1e-3, const int bufferSize = 1000000,
                                     const int learningStarts = 100, const int batchSize = 100,
                                     const float tau = 0.005, const float gamma = 0.99,
                                     const int trainFreq = 1, const int gradientSteps = -1,
                                     const bool optimizeMemoryUsage = false, const int verbose = 1);
Creates a DDPG (Deep Deterministic Policy Gradient) algorithm instance with the specified parameters.

DQN

UFUNCTION(BlueprintPure, Category = "NevarokML|BaseAlgorithm")
static UNevarokMLBaseAlgorithm* DQN(UObject* owner, const ENevarokMLPolicy policy = ENevarokMLPolicy::MLP_POLICY,
                                    const float learningRate = 1e-4, const int bufferSize = 1000000,
                                    const int learningStarts = 50000, const int batchSize = 32,
                                    const float tau = 1.0, const float gamma = 0.99,
                                    const int trainFreq = 4, const int gradientSteps = 1,
                                    const bool optimizeMemoryUsage = false,
                                    const int targetUpdateInterval = 10000,
                                    const float explorationFraction = 0.1,
                                    const float explorationInitialEps = 1.0,
                                    const float explorationFinalEps = 0.05,
                                    const float maxGradNorm = 10, const int verbose = 1);
Creates a DQN (Deep Q-Network) algorithm instance with the specified parameters.

SAC

UFUNCTION(BlueprintPure, Category = "NevarokML|BaseAlgorithm")
static UNevarokMLBaseAlgorithm* SAC(UObject* owner, const ENevarokMLPolicy policy = ENevarokMLPolicy::MLP_POLICY,
                                    const float learningRate = 3e-4, const int bufferSize = 1000000,
                                    const int learningStarts = 100, const int batchSize = 256,
                                    const float tau = 0.005, const float gamma = 0.99,
                                    const int trainFreq = 1, const int gradientSteps = 1,
                                    const bool optimizeMemoryUsage = false, const bool entCoefAuto = true,
                                    const float entCoef = 0.0, const int targetUpdateInterval = 1,
                                    const bool targetEntropyAuto = true, const float targetEntropy = 0.0,
                                    const bool useSde = false, const int sdeSampleFreq = -1,
                                    const bool useSdeAtWarmup = false, const int verbose = 1);
Creates a SAC (Soft Actor-Critic) algorithm instance with the specified parameters.

TD3

UFUNCTION(BlueprintPure, Category = "NevarokML|BaseAlgorithm")
static UNevarokMLBaseAlgorithm* TD3(UObject* owner, const ENevarokMLPolicy policy = ENevarokMLPolicy::MLP_POLICY,
                                    const float learningRate = 1e-3, const int bufferSize = 1000000,
                                    const int learningStarts = 100, const int batchSize = 100,
                                    const float tau = 0.005, const float gamma = 0.99,
                                    const int trainFreq = 1, const int gradientSteps = -1,
                                    const bool optimizeMemoryUsage = false, const int policyDelay = 2,
                                    const float targetPolicyNoise = 0.2,
                                    const float targetNoiseClip = 0.5, const int verbose = 1);
Creates a TD3 (Twin Delayed Deep Deterministic) algorithm instance with the specified parameters.

GetAlgorithmType

ENevarokMLAlgorithm GetAlgorithmType() const;
Returns the type of algorithm.

GetPolicyType

ENevarokMLPolicy GetPolicyType() const;
Returns the policy used by the algorithm.

ToJson

TSharedPtr<FJsonObject> ToJson() const;
Converts the algorithm settings to a JSON object.