Michael Walton, Benjamin Migliori, John Reeder A system is provided for performing a predetermined function within a total area of operation, wherein the system includes a plurality of autonomous agents. Each autonomous agent is able to detect respective local parameters. Each autonomous agent uses a Kalman filter component to establish an environment state based a plurality of state measurements over time. The output of the Kalman filter component within a respective agent is applied to reinforcement learning by an actor-critic task controller, within the respective agent, to determine a subsequent action to be performed by the respective agent in accordance with a reward function. Each agent includes a Kalman consensus filter that addresses errors of the plurality of state measurements over time.