CodeAct Agent Framework
This folder is an implementation of OpenHands's main agent, the CodeAct Agent. It is based on (CodeAct, tweet), an idea of consolidating LLM agents' actions into a unified code action space for both simplicity and performance.
Overview
The CodeAct agent operates through a function calling interface. At each turn, the agent can:
- Converse: Communicate with humans in natural language to ask for clarification, confirmation, etc.
- CodeAct: Execute actions through a set of well-defined tools:
- Execute Linux
bash
commands withexecute_bash
- Run Python code in an IPython environment with
execute_ipython_cell
- Interact with web browsers using
browser
andweb_read
- Edit files using
str_replace_editor
oredit_file
- Execute Linux
Built-in Tools
The agent provides several built-in tools:
1. execute_bash
- Execute any valid Linux bash command
- Handles long-running commands by running them in background with output redirection
- Supports interactive processes with STDIN input and process interruption
- Handles command timeouts with automatic retry in background mode
2. execute_ipython_cell
- Run Python code in an IPython environment
- Supports magic commands like
%pip
- Variables are scoped to the IPython environment
- Requires defining variables and importing packages before use
3. web_read
and browser
web_read
: Read and convert webpage content to markdownbrowser
: Interact with webpages through Python code- Supports common browser actions like navigation, clicking, form filling, scrolling
- Handles file uploads and drag-and-drop operations
4. str_replace_editor
- View, create and edit files through string replacement
- Persistent state across command calls
- File viewing with line numbers
- String replacement with exact matching
- Undo functionality for edits
5. edit_file
(LLM-based)
- Edit files using LLM-based content generation
- Support for partial file edits with line ranges
- Handles large files by editing specific sections
- Append mode for adding content to files
Configuration
Tools can be enabled/disabled through configuration parameters:
codeact_enable_browsing
: Enable browser interaction toolscodeact_enable_jupyter
: Enable IPython code executioncodeact_enable_llm_editor
: Enable LLM-based file editing (falls back to string replacement editor if disabled)
Micro-agents
The agent includes specialized micro-agents for specific tasks:
- npm: Handles npm package installation with non-interactive shell workarounds
- github: Manages GitHub operations with API token support and PR creation guidelines
- flarglebargle: Easter egg response handler
Adding New Tools
The CodeAct agent uses a function calling interface based on litellm
's ChatCompletionToolParam
. To add a new tool:
- Define the tool in
function_calling.py
:
MyTool = ChatCompletionToolParam(
type='function',
function=ChatCompletionToolParamFunctionChunk(
name='my_tool',
description='Description of what the tool does and how to use it',
parameters={
'type': 'object',
'properties': {
'param1': {
'type': 'string',
'description': 'Description of parameter 1',
},
'param2': {
'type': 'integer',
'description': 'Description of parameter 2',
},
},
'required': ['param1'], # List required parameters here
},
),
)
- Add the tool to
get_tools()
infunction_calling.py
- Implement the corresponding action handler in the agent class
Implementation Details
The agent is implemented in two main files:
codeact_agent.py
: Core agent implementation with:- Message history management
- Tool execution handling
- State management
- Action/observation processing
function_calling.py
: Tool definitions and function calling interface with:- Tool parameter specifications
- Tool descriptions and examples
- Function calling response parsing