File size: 4,536 Bytes
246d201 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
# CodeAct Agent Framework
This folder is an implementation of OpenHands's main agent, the CodeAct Agent. It is based on ([CodeAct](https://arxiv.org/abs/2402.01030), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)), an idea of consolidating LLM agents' **act**ions into a unified **code** action space for both *simplicity* and *performance*.
## Overview
The CodeAct agent operates through a function calling interface. At each turn, the agent can:
1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc.
2. **CodeAct**: Execute actions through a set of well-defined tools:
- Execute Linux `bash` commands with `execute_bash`
- Run Python code in an [IPython](https://ipython.org/) environment with `execute_ipython_cell`
- Interact with web browsers using `browser` and `web_read`
- Edit files using `str_replace_editor` or `edit_file`

## Built-in Tools
The agent provides several built-in tools:
### 1. `execute_bash`
- Execute any valid Linux bash command
- Handles long-running commands by running them in background with output redirection
- Supports interactive processes with STDIN input and process interruption
- Handles command timeouts with automatic retry in background mode
### 2. `execute_ipython_cell`
- Run Python code in an IPython environment
- Supports magic commands like `%pip`
- Variables are scoped to the IPython environment
- Requires defining variables and importing packages before use
### 3. `web_read` and `browser`
- `web_read`: Read and convert webpage content to markdown
- `browser`: Interact with webpages through Python code
- Supports common browser actions like navigation, clicking, form filling, scrolling
- Handles file uploads and drag-and-drop operations
### 4. `str_replace_editor`
- View, create and edit files through string replacement
- Persistent state across command calls
- File viewing with line numbers
- String replacement with exact matching
- Undo functionality for edits
### 5. `edit_file` (LLM-based)
- Edit files using LLM-based content generation
- Support for partial file edits with line ranges
- Handles large files by editing specific sections
- Append mode for adding content to files
## Configuration
Tools can be enabled/disabled through configuration parameters:
- `codeact_enable_browsing`: Enable browser interaction tools
- `codeact_enable_jupyter`: Enable IPython code execution
- `codeact_enable_llm_editor`: Enable LLM-based file editing (falls back to string replacement editor if disabled)
## Micro-agents
The agent includes specialized micro-agents for specific tasks:
1. **npm**: Handles npm package installation with non-interactive shell workarounds
2. **github**: Manages GitHub operations with API token support and PR creation guidelines
3. **flarglebargle**: Easter egg response handler
## Adding New Tools
The CodeAct agent uses a function calling interface based on `litellm`'s `ChatCompletionToolParam`. To add a new tool:
1. Define the tool in `function_calling.py`:
```python
MyTool = ChatCompletionToolParam(
type='function',
function=ChatCompletionToolParamFunctionChunk(
name='my_tool',
description='Description of what the tool does and how to use it',
parameters={
'type': 'object',
'properties': {
'param1': {
'type': 'string',
'description': 'Description of parameter 1',
},
'param2': {
'type': 'integer',
'description': 'Description of parameter 2',
},
},
'required': ['param1'], # List required parameters here
},
),
)
```
2. Add the tool to `get_tools()` in `function_calling.py`
3. Implement the corresponding action handler in the agent class
## Implementation Details
The agent is implemented in two main files:
1. `codeact_agent.py`: Core agent implementation with:
- Message history management
- Tool execution handling
- State management
- Action/observation processing
2. `function_calling.py`: Tool definitions and function calling interface with:
- Tool parameter specifications
- Tool descriptions and examples
- Function calling response parsing
|