Gray Sky AI

Abstract

The Memetic Agents Project represents an original approach to creating AI agents capable of social learning and evolution. The system implements a multi-tiered memory architecture, dynamic prompt modification, and inter-agent communication protocols that enable agents to learn from experience and share knowledge. The project demonstrates how AI agents can maintain persistent knowledge, engage in meaningful interactions, and evolve their capabilities through social learning mechanisms inspired by memetic theory.

Introduction

This project introduces Memetic Agents, an innovative approach to creating AI agents that combine dynamic memory management, social learning, and memetic evolution. Built on large language models, these agents implement a multi-tiered memory architecture that enables both short-term recall and long-term knowledge consolidation, mimicking human cognitive processes. The framework includes mechanisms for agent-to-agent communication, experiential learning, feedback and the transmission of prompts that can be replicated and modified across the agent network.

The implementation demonstrates how Memetic Agents can maintain persistent knowledge, engage in meaningful multi-agent interactions, and evolve their capabilities through social learning. The system incorporates a layered approach to memory consolidation, converting ephemeral conversations and feedback into structured long-term memories and reflections, while also enabling agents to share and build upon each other's experiences through a conversational knowledge transfer system.

The field of multi-agent systems has seen significant advances in recent months, particularly in areas of agent communication and learning. This work extends these foundations by introducing a memetic approach to agent evolution, where prompts are treated as units of cultural transmission that can be shared and modified through social interaction. While primarily focused on memetic evolution, the project also aims to align with the IEEE P3394 Draft standard for Large Language Model Agent Interface¹.

The project is still under active development and this is an initial release of work-in-progress research. The focus is on architectural design, implementation and the development and incorporation of new ideas. Future work will include quantitative evaluation and benchmarking. No formal empirical evaluation has been conducted yet.

¹ "This article solely represents the views of the author, and does not necessarily represent a position of either the IEEE P3394 Working Group, the IEEE C/AISC - Artificial Intelligence Standards Committee, IEEE or the IEEE Standards Association.

Why Memetic Agents?

The theoretical foundation for this project emerged from observations about the nature of prompt engineering and its similarities to memetic evolution. After experimenting with the idea of agents that can modify their own system prompts, it became apparent that prompts share many characteristics with memes, not internet memes, but memes in the sense coined by Richard Dawkins in his 1976 book 'The Selfish Gene'. In the final chapter of the book, Dawkins coins the term "meme" (from the Greek word mimeme, meaning "imitated thing") to describe a unit of cultural transmission or imitation, analogous to a gene in biological evolution. In his book he uses it to explain how ideas, behaviours, and cultural phenomena can replicate, evolve, and spread across populations. Examples of memes include melodies, catchphrases, fashion trends, and religious beliefs. Through prompt engineering and prompt templates the AI community have been refining and replicating successful prompts: populations of agents and agentic systems are using variations of Tree of Thoughts, ReAct, Reflexion etc. With Memetic Agents this is taken a step further allowing agents to share their prompts with each other directly without human intervention.

Evolution of the Architecture

Early experiments with agents that could modify their own system prompts revealed a critical challenge: agents given the ability to modify their entire system prompt would often overwrite it with simplistic variations. This led to the development of a modular prompt architecture with three distinct components:

Editable sections that can be modified by the agent
Procedurally generated sections determined by the agent's capabilities (e.g. the tool list)
Immutable sections that need to maintain a strict format for optimal agent operation (e.g. JSON output schema)

This separation of concerns prevented destructive prompt modifications while enabling targeted improvements. The procedural generation of a tool list ensured that agents maintain access to their toolkit even as they evolve. Assembling prompts dynamically has been a key factor implemented in multiple areas of memetic agent architecture empowering self-reflection and social learning without the prompts devolving over time.

Autonoms and the Autonomic LLM System

As mentioned in the last section, successful agents often utilise multiple types of LLM calls with different system prompts to achieve a cohesive and autonomous, or near autonomous, self. Not all of these LLM calls are agentic in nature. These helper functions or subroutines are often referred to as "sub-agents" or "agents" (both lexical choices that imply they have some level of agency); other designations such as "module" and "plugin" don't adequately reflect that there is an LLM call at their core. This project proposes the term "Autonom" (derived from the autonomic nervous system) to describe non-agentic subroutines that make LLM calls to perform tasks but have no or little agency of their own. Unlike tools or skills, which are actively selected or invoked by an agent based on its decision-making process, autonoms are triggered procedurally and share some characteristics with their namesake the autonomic nervous system.

An agent's autonoms are the constituent parts of its Autonomic LLM System - an agentic workflow that exists outside of its awareness (aka context window), but is as intrinsic to its operation as the main reasoning loop.

Some of the Autonoms in Memetic Agents include:

Thought Loop: Decides when to exit the reasoning loop and when to continue.
Give Feedback: Provides feedback to another agent after an interaction.
Reflect on Memories: Reflects on the agent's conversations and feedback looking for learning opportunities.
Transfer to Long Term Memory: Extracts, categories and tags short term memories for storage in long term memory.

State Management

A state management system has been implemented using a finite state machine that controls agent behaviour and availability. Each agent can be in one of several states: Available, Learning, Memorising, Socialising, Sleeping or Shutting Down. The state determines which system prompts are active and what types of interactions are permitted. This state-based approach allows for clear behavioural boundaries and helps coordinate multi-agent interactions by making each agent's current status accessible to others in the network. The Available state acts as a resting state that the agents will return to after a conversation or task is complete. State transitions can be triggered by API calls and transitioning to a new state will trigger the autonomic subroutine associated with the new state.

Currently state transitions are all triggered by API calls which works well for testing purposes. In a future release this will be handled by an autonom or centralised algorithm that will regulate these states and provide the agents with a circadian like rhythm.

State Description

To better understand the state management system, the following table outlines the different states and their transitions:

State	Description
Available	Default resting state where the agent can receive and respond to standard messages. Agents return to this state after completing other tasks.
Memorising	Agent processes short-term memories and feedback into long-term memories and reflections through memory consolidation.
Learning	Agent reviews its reflections and potentially updates its prompts based on past experiences and feedback.
Socialising	Agent actively seeks out and engages with other agents to share and compare prompts, enabling social learning.
Sleeping	Inactive state where the agent is unavailable for communication. The agent will also transfer its working memory to short term memory and reset its message queue.
Shutting Down	Transitional state where the agent converts working memory to short-term memory before going offline.

Inter-Agent Communication

The Memetic Agents project implements a distributed-style communication system where each agent operates independently while maintaining the ability to interact with other agents through a structured messaging protocol. Although agents are run locally in the current implementation, each agent listens on a separate port and communicates via API calls to simulate a distributed architecture. This design choice allows for future expansion to truly distributed deployments while maintaining consistent behaviour during development and testing.

Agent Discovery and Availability

Agents maintain awareness of their peers through a centralised directory service. Using a dedicated tool, agents can query this directory to:

Discover other active agents in the network
Retrieve agent specialisations and capabilities
Check agent availability status
Obtain connection details (ports/endpoints)

Message Types

The framework supports three distinct types of messages, each serving a specific purpose in agent interactions:

Standard Messages
- Primary method for general communication between agents
- Used for task-related discussions and information exchange
- Maintains conversation context through unique conversation IDs
- Supports both synchronous and asynchronous communication patterns
Feedback Messages
- Automatically generated after standard message exchanges
- Contains both numerical scoring and qualitative feedback
- Helps agents evaluate and improve their responses
- Stored separately from standard messages for targeted retrieval during learning phases
- Links to original conversations through conversation IDs
Social Messages
- Specialised message type for agent-to-agent learning
- Carries prompt information in its payload using PromptModel format
- Enables direct sharing of reasoning techniques and behavioural patterns
- Includes metadata for prompt evaluation and comparison
- Facilitates the memetic evolution of agent capabilities

Communication Flow

A typical interaction between agents follows this pattern (illustrated in Figure 1):

The initiating agent queries the directory to find suitable conversation partners
Upon selecting a partner, the agent sends a standard message to initiate conversation
The receiving agent processes the message and responds
An autonom automatically generates and sends feedback about the interaction
Feedback is stored in its own memory collection
The conversation continues until naturally concluded
Messages are stored in working memory and are then transferred to short term memory in the next 'Sleeping' phase

Agent Communication Architecture

Social Messaging Flow

During social learning phases, agents use SocialMessages to share and evaluate prompts:

Agents in 'Socialising' state discover each other
They exchange prompts through SocialMessages
Receiving agents evaluate received prompts against their own
Agents may choose to incorporate beneficial prompt elements

Message Persistence

All communication is persisted through the memory system:

Active conversations remain in working memory
Completed conversations move to short-term memory
During memory consolidation, conversations are analysed and converted to atomic memories
Feedback is stored separately but linked to original conversations
Social interactions receive special treatment during memory consolidation to capture learning opportunities

Future Alignment with IEEE P3394 draft standard

A secondary objective of this project is to align with the IEEE P3394 Draft standard for Large Language Model Agent Interface. This proposed standard, still under development, defines an interface and protocol for Agent-to-Agent interoperability and communication. The Memetic Agents project aims to become a proof of concept implementation of this draft standard as an ancillary goal of the project. Alignment with the draft standard will strengthen the communication framework of the project and provide a testing ground for the draft standard as it is developed.

Types of Memory

Building upon the communication framework, the system implements a sophisticated memory architecture that enables both short-term recall and long-term knowledge consolidation:

Working Memory - Includes conversations the agent is having in the current session and is non-persistent (it gets wiped when the agent shuts down). While a conversation is in working memory, new messages are simply added to the end of the conversation. Each conversation has its own unique conversation id.
Short Term Memory - Each time the agent goes to sleep or is shut down its working memory is converted to vector embeddings and stored in a vector database. The conversations are stored as is with no loss of information. Once a conversation is in short term memory if the agent receives new messages related to it the previous messages need to be retrieved from the vector database and converted back into the LLM's native message format and recreating the conversation in working memory. After the agent shuts down the new messages are stored as a new short term memory (related to the previous short term memory via the conversation id).
Feedback - During conversations agents will receive feedback from other agents that gets stored in its own vector database. The feedback consists of both a score and a comment on what they did well and where they could have done better. The conversation id is stored as metadata so that the feedback can be associated with the correct conversation.
Long Term Memory - When agents are placed into the 'memorising' state an autonom will convert their short term memories (and the feedback that goes with it) into long term memories breaking down conversations and feedback into "atomic memories" – discrete pieces of information with clear relationships and tags for easy retrieval. This process is described in more detail in the next section on Memory consolidation. Once a conversation is converted to long term memories it is no longer retrievable in full and the agent must search for the relevant memories when it needs to.
Reflections - Reflections are created at the same time as long term memories. Each reflection consists of a lesson, importance, category and the autonom's thoughts.

Memory Consolidation

The memory consolidation process first involves retrieving all short term memories and feedback associated with a conversation. The chunks of conversation are then reassembled in chronological order to which the feedback is added. This new 'message' is then fed to an autonom designed to break down conversations and feedback into "atomic memories" – discrete pieces of information with clear relationships and tags for easy retrieval. The autonom uses structured output to return the atomic memories as a list of json objects ready to be stored in the vector database as long term memories. Each atomic memory has a subject, object and predicate along with a series of semantic metatags that are included as part of the vector embedding. By keeping each embedding small, but enriched with semantic metadata, the system can target specific memories for retrieval and application. Keeping the token usage low and honing in to only include what is relevant as extra context.

Self-Reflection

When reflections are formed during the memory consolidation process they are each given a category related to an aspect of the agent's makeup/architecture. Most of these relate back to system or autonom prompts (There are also placeholder categories for 'tools', 'agentic structure' and 'insight' that have yet to be implemented). Then when an agent is in the 'learning' state it will find the category with the highest combined reflection score and will firstly assemble a system prompt using the self-improvement prompt stub and the prompt to be reflected on. The output schema is kept separate and is currently static as its structure needs to remain intact and in the same format in order for the structured data to be extracted.

The next step is to retrieve all of the reflections associated with the category, these are assembled into a single message and along with the system prompt created in the previous step are fed to an autonom that responds with structured output containing a new version of the prompt being reflected on. Structured output is used to ensure that any deliberations or thoughts of the autonom are separated from the prompt. The agent's existing prompt is then updated with the new version.

The system employs a scoring mechanism to evaluate reflections and determine when prompt updates are warranted. This scoring helps ensure that prompts are only modified when there is sufficient evidence for improvement, based on both the quantity and quality of accumulated reflections. Even with this controlled approach to updates, the system maintains a degree of randomness in the learning process, which serves as an important mitigant against a prompt monoculture forming - preventing all agents from converging on identical prompts when they are sharing their prompts with each other.

Evolving Prompts

At the core of the Memetic Agents project is the idea that the agents will evolve through social interaction and experience. When the memetic agents are in the 'socialising' state they will search for other agents in the same state and will initiate conversations with them. Unlike standard messages, social messages can include prompts and prompt evaluations as part of the message payload. In the current build the process of sending and receiving social messages is formalised with the agents exchanging a predetermined sequence of messages.

As illustrated in Figure 2, the initiating agent will find their prompt with the lowest confidence score and will then search agent directory for agents in the socialising state it will then pick one of these agents at random. Each agent will send their prompt to the other agent twice in an interleaved sequence. After the first exchange the agents will evaluate the other agents' prompt and provide a score and written feedback. The agents will then both use the feedback to make updates to their prompts and then will send them back for a final evaluation which will be used to calculate the new confidence score.

Social Learning Process

Potential Risks

While the memetic evolution of prompts offers powerful learning capabilities, it also introduces important considerations regarding system safety. Agents that can update and share their own system prompts introduce a new variation of prompt attack where a malicious prompt can infect and spread among a population of agents. However, such a system also opens up the opportunity for researching this phenomena in a closed environment with controlled parameters in a manner that is not practicable or ethical in the real world.

Current Status and Future Research

This article presents the architectural design and initial implementation of the Memetic Agents system. While the core components are functional there is still more to build and formal empirical evaluation and benchmarking have not yet been conducted. The current release focuses on sharing the project and implementation details with the research community.

Future releases will include:

Goal Setting, Project Planning & Task Management: Allowing the agents to set their own goals will greatly increase agent autonomy and allow the system to be run and observed for longer periods of time without human intervention.
Artificial Circadian Rhythm: This is the second component required to allow the agents to be left to their own devices for extended periods of time.
Alignment with the IEEE P3394 draft Standard for Large Language Model Agent Interface: This will allow the project to be used as a testbed for the draft standard as it is developed.
Conflicting Memory Resolution & Agent Bias: This will allow the observence of how a population of agennts behave when subsets ogf agents have conflicting memories, ideas or goals.
Quantitative evaluation of agent learning and evolution: Once the system can be run autonomously it will be possible to setup identical populations of agents with different circadian rhythms (disabling the learning and socialising phases) and observe how the system evolves over time.

Conclusion

The Memetic Agents project introduces several novel contributions to the field of multi-agent systems and social learning. By implementing a biomimetic approach that draws parallels between cultural evolution and artificial intelligence, this work demonstrates how LLM-based agents can evolve through social interaction and experience. The project's key innovations include a sophisticated memory architecture mimicking human cognitive processes, the introduction of "Autonoms" as non-agentic LLM-driven subroutines, and an original implementation of memetic evolution allowing agents to share and modify their prompts through social interaction.

While this implementation is still in its early stages, it provides a promising model for exploring how AI systems might develop and adapt through social learning. The current architecture demonstrates important capabilities in dynamic prompt evolution, multi-layered memory management, self-reflection, and structured inter-agent communication. Future research could explore quantitative evaluation of agent evolution, more sophisticated learning mechanisms, and emergent behaviours in larger agent populations.

As an open-source project, Memetic Agents serves as both a practical implementation and a theoretical basis for researchers exploring social learning in AI systems. The code and documentation are available on GitHub, and the research community is invited to build upon this foundation. Through collaborative development and research, this project aims to contribute to the understanding of how AI systems can learn, adapt, and evolve through social interaction, while maintaining alignment with human values and goals.

This article is licensed under CC BY-SA 4.0.