nevo.core.basal_ganglia

Basal Ganglia Operator Selection

This module implements neuromorphic operator selection using basal ganglia circuits in Nengo, with pluggable Temporal Difference learning for adaptive value estimation.

class UtilityFunction(name, function, initial_weight=1.0)[source]

Bases: object

Defines utility function for an operator.

Maps state features → scalar utility value. Higher utility = more suitable for current state.

__init__(name, function, initial_weight=1.0)[source]
Parameters:
  • name (str) – Operator name

  • function (Callable) – Function mapping state features [diversity, improvement, convergence] → utility

  • initial_weight (float) – Initial adaptive weight

compute(features)[source]

Compute weighted utility.

Parameters:

features (np.ndarray) – State features [diversity, improvement_rate, convergence]

Returns:

utility – Weighted utility value

Return type:

float

update_weight(reward, lr=0.1)[source]

Update weight based on operator performance.

Parameters:
  • reward (float) – Performance reward (positive = good, negative = bad)

  • lr (float) – Learning rate for weight update

utility_levy_flight(x)[source]

LevyFlight utility: high when stuck and not converged.

Input: [diversity, improvement_rate, convergence]

Return type:

float

utility_differential_evolution(x)[source]

DifferentialEvolution utility: high when diversity exists.

Input: [diversity, improvement_rate, convergence]

Return type:

float

utility_particle_swarm(x)[source]

ParticleSwarm utility: high when improving and converging.

Input: [diversity, improvement_rate, convergence]

Return type:

float

utility_spiral(x)[source]

SpiralOptimisation utility: high when highly converged.

Input: [diversity, improvement_rate, convergence]

Return type:

float

RandomSearch utility: high when stuck, baseline exploration.

Input: [diversity, improvement_rate, convergence]

Return type:

float

utility_local_random_walk(x)[source]

LocalRandomWalk utility: high when converging, need local refinement.

Input: [diversity, improvement_rate, convergence]

Return type:

float

GravitationalSearch utility: high when diversity exists, need directed exploration.

Input: [diversity, improvement_rate, convergence]

Return type:

float

utility_firefly(x)[source]

FireflyAlgorithm utility: high when moderate convergence, need attraction.

Input: [diversity, improvement_rate, convergence]

Return type:

float

utility_central_force(x)[source]

CentralForce utility: high when need strong directional bias.

Input: [diversity, improvement_rate, convergence]

Return type:

float

utility_genetic_crossover(x)[source]

GeneticCrossover utility: high when diversity exists, want recombination.

Input: [diversity, improvement_rate, convergence]

Return type:

float

utility_genetic_mutation(x)[source]

GeneticMutation utility: high when converging too fast, need diversity.

Input: [diversity, improvement_rate, convergence]

Return type:

float

utility_simulated_annealing(x)[source]

SimulatedAnnealing utility: high when need controlled exploitation.

Input: [diversity, improvement_rate, convergence]

Return type:

float

HarmonySearch utility: high when need memory-guided search.

Input: [diversity, improvement_rate, convergence]

Return type:

float

TabuSearch utility: high when stuck in local optima.

Input: [diversity, improvement_rate, convergence]

Return type:

float

utility_neuromorphic_exploration(x)[source]

Neuromorphic exploration utility: prefer when progress is low or diversity drops.

Return type:

float

utility_neuromorphic_exploitation(x)[source]

Neuromorphic exploitation utility: prefer when converged and still improving.

Return type:

float

class BasalGangliaSelector(operators, utility_functions=None, neurons_per_ensemble=100, epsilon=0.1, learning_rate=0.1, gamma=0.99, lambda_coeff=0.0, learning_rule=None, value_model=None, td_enabled=True)[source]

Bases: object

Basal ganglia-based operator selection network with modular TD learning.

Implements Winner-Take-All (WTA) selection of operators based on state-dependent utility functions and learned value estimates via Temporal Difference learning (TD(0) or TD(λ)).

__init__(operators, utility_functions=None, neurons_per_ensemble=100, epsilon=0.1, learning_rate=0.1, gamma=0.99, lambda_coeff=0.0, learning_rule=None, value_model=None, td_enabled=True)[source]
Parameters:
  • operators (List[Operator]) – List of available operators

  • utility_functions (Dict[str, Callable], optional) – Custom utility functions (uses defaults if None)

  • neurons_per_ensemble (int) – Neurons per ensemble in basal ganglia

  • epsilon (float) – Epsilon-greedy exploration rate (0.0-1.0)

  • learning_rate (float) – TD learning rate α

  • gamma (float) – Discount factor for upcoming rewards

  • lambda_coeff (float) – λ parameter (0.0=TD(0), 1.0=Monte Carlo)

  • learning_rule (LearningRule, optional) – Pluggable learning rule (default: SimpleTDRule)

  • value_model (ValueModel, optional) – Pluggable value model (default: LinearValueModel)

  • td_enabled (bool) – Enable TD learning (vs. basic utility weight adaptation)

build_network(model, state_ensemble)[source]

Build basal ganglia selection network.

Parameters:
  • model (nengo.Network) – Parent Nengo network

  • state_ensemble (nengo.Ensemble) – State feature ensemble (3D: diversity, improvement, convergence)

Returns:

selected_operator_ens – One-hot encoding of selected operator

Return type:

nengo.Ensemble

select_operator(operator_selection, current_best_fitness)[source]

Select operator using Nengo basal ganglia output + epsilon-greedy policy.

When td_enabled=True, TD(0)/TD(λ) value estimates bias the utility weights that feed the Nengo BG network, so learned knowledge flows back through the neuromorphic circuit rather than bypassing it.

Decision flow

  1. Compute reward from fitness improvement (after last operator executed).

  2. Update TD values for the last operator using reward + bootstrap.

  3. Add TD value bias to utility weights (scales Nengo BG input).

  4. Read Nengo thalamus output (operator_selection) as action scores.

  5. Epsilon-greedy: random with prob ε, else argmax of BG output.

type operator_selection:

ndarray

param operator_selection:

Thalamus output from Nengo BG network (n_operators,)

type operator_selection:

np.ndarray

type current_best_fitness:

float

param current_best_fitness:

Current best fitness value

type current_best_fitness:

float

returns:

operator – Selected operator

rtype:

Operator

begin_episode()[source]

Reset for new episode.

Call this at the start of optimisation to initialise TD learning.

end_episode()[source]

Finalise episode learning.

Call this at the end of optimisation run.

set_td_lambda(lambda_coeff)[source]

Adjust TD(λ) parameter dynamically.

Parameters:

lambda_coeff (float) – New λ value (0.0 = TD(0), 1.0 = Monte Carlo)

set_learning_rule(learning_rule)[source]

Replace learning rule on the fly.

Parameters:

learning_rule (LearningRule) – New learning rule instance

set_value_model(value_model)[source]

Replace value model on the fly.

Parameters:

value_model (ValueModel) – New value model instance

get_utility_weights()[source]

Get current utility weights.

Returns:

weights – Mapping of operator names to their current weights

Return type:

Dict[str, float]

get_td_values()[source]

Get current TD-learned values for all operators.

Returns:

values – Value estimates (only if TD learning enabled)

Return type:

np.ndarray

get_td_statistics()[source]

Get TD learning statistics.

Returns:

stats – Statistics about TD learning process

Return type:

Dict[str, Any]

reset_td_learning()[source]

Reset all TD value estimates.