Fara-7B Microsoft’s first Agentic Small Language Model (SLM) Designed specifically for computer use. With only 7 billion parameters, FARA-7B is an ultra-compact computer usage agent (CUA) that achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems.
Try Fara-7b locally as follows (see Installation for detailed instructions):
# 1. Clone repository
git clone https://github.com/microsoft/fara.git
cd fara
# 2. Setup environment
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
playwright install
Then in one process, host the model:
vllm serve "microsoft/Fara-7B" --port 5000 --dtype auto
You can then do an iterative query with:
fara-cli --task "whats the weather in new york now"
Hint: May need to --tensor-parallel-size 2 With vllm command if you run out of memory
What makes the Fara-7B unique?
Unlike traditional chat models that generate text-based responses, Fara-7B leverages the computer interface—mouse and keyboard—to perform multi-step tasks on behalf of users. Model:
- visually driven By viewing webpages and performing actions such as scrolling, typing, and clicking directly on predicted coordinates.
- uses the same methods as humans To interact with computers – no accessibility trees or separate parsing models are needed
- Enables on-device deployment Due to its compact 7B parameter size, latency is reduced and privacy is improved as user data remains local.
- completes tasks efficientlyOnly ~16 steps on average per task compared to ~41 for comparable models
FARA-7B is trained using a novel synthetic data generation pipeline built on the Magentic-One multi-agent framework, with 145K trajectories covering different websites, task types, and difficulty levels. This model is based on Qwen2.5-VL-7B and trained with supervised fine-tuning.
Fara-7B can automate everyday web tasks including:
- Searching for information and summarizing results
- Filling out forms and managing accounts
- Booking of travel, movie tickets and restaurant reservations
- Shopping and comparing prices between retailers
- Finding job postings and real estate listings
FARA-7B achieves state-of-the-art results in several web agent benchmarks, outperforming both comparably sized models and larger systems:
| Sample | parameters | webvoyager | Online-M2W | deepshop | webtelbench |
|---|---|---|---|---|---|
| SOM Agent | |||||
| SoM Agent (GPT-4o-0513) | , | 90.6 | 57.7 | 49.1 | 60.4 |
| SoM Agent (o3-mini) | , | 79.3 | 55.4 | 49.7 | 52.7 |
| SoM Agent (GPT-4o) | , | 65.1 | 34.6 | 16.0 | 30.8 |
| glm-4.1v-9b-thinking | 9b | 66.8 | 33.9 | 32.0 | 22.4 |
| computer usage model | |||||
| OpenAI computer-use-preview | , | 70.9 | 42.9 | 24.7 | 25.7 |
| ui-tars-1.5-7b | 7b | 66.4 | 31.3 | 11.6 | 19.5 |
| Fara-7B | 7b | 73.5 | 34.1 | 26.2 | 38.4 |
Table: Online agent evaluation results showing success rates (%) in four web benchmarks. Results are averaged over 3 runs.
WebtailBench: A new benchmark for real-world web tasks
we are releasing webtelbenchA new assessment benchmark focuses on 11 real-world work types that are underrepresented or missing in existing benchmarks. The benchmark consists of 609 tasks in various categories, with the first 8 sections testing single skills or objectives (usually on the same website), and the remaining 3 evaluating more difficult multi-step or cross-site tasks.
webtelbench detailed results
| work section | Work | SoM GPT-4o-0513 | SOM O3-Mini | SoM GPT-4o | GLM-4.1V-9B | OAI comp-usage | ui-tars-1.5 | Fara-7B |
|---|---|---|---|---|---|---|---|---|
| single-site work | ||||||||
| shopping | 56 | 62.5 | 71.4 | 38.1 | 31.0 | 42.3 | 41.1 | 52.4 |
| stamp | 51 | 60.1 | 39.2 | 11.1 | 10.5 | 17.6 | 10.5 | 37.9 |
| hotel | 52 | 68.6 | 56.4 | 31.4 | 19.9 | 26.9 | 35.3 | 53.8 |
| restaurant | 52 | 67.9 | 59.6 | 47.4 | 32.1 | 35.9 | 22.4 | 47.4 |
| activities | 80 | 70.4 | 62.9 | 41.7 | 26.3 | 30.4 | 9.6 | 36.3 |
| Ticketing | 57 | 58.5 | 56.7 | 37.4 | 35.7 | 49.7 | 30.4 | 38.6 |
| real estate | 48 | 34.0 | 17.4 | 20.1 | 16.0 | 9.0 | 9.7 | 23.6 |
| Jobs/Careers | 50 | 49.3 | 44.0 | 32.7 | 22.7 | 20.7 | 20.7 | 28.0 |
| multi-step work | ||||||||
| Shopping List (2 items) | 51 | 66.0 | 62.7 | 17.0 | 7.8 | 34.0 | 20.9 | 49.0 |
| comparison shopping | 57 | 67.3 | 59.1 | 27.5 | 22.8 | 1.2 | 8.8 | 32.7 |
| creative work | 55 | 51.5 | 39.4 | 26.7 | 17.0 | 10.3 | 9.1 | 23.0 |
| overall | ||||||||
| macro average | 609 | 59.7 | 51.7 | 30.1 | 22.0 | 25.3 | 19.9 | 38.4 |
| micro average | 609 | 60.4 | 52.7 | 30.8 | 22.4 | 25.7 | 19.5 | 38.4 |
Table: Details of WebtailBench results in all 11 segments. Success rate (%) is averaged over 3 independent runs. The Fara-7B achieves the highest performance among computer-use models in all task categories.
coming soon:
- Task Verification Pipeline for LLM-A-Judge Assessment
- Official human comments from WebtelBench (in partnership with BrowserBase)
evaluation infrastructure
Our evaluation setup leverages:
- playwright – A cross-browser automation framework that mimics the browser environment
- abstract web agent interface – Allows integration of any model from any source into the assessment environment
- fara-agent class – Reference implementation for running FARA models
Comment: FARA-7B is an experimental release designed to invite practical exploration and feedback from the community. We recommend running it in a sandboxed environment, monitoring its execution, and avoiding domains with sensitive data or high risk.
Install packages using uv or pip:
Or
Then install Playwright Browser:
recommended: The easiest way to get started is to use Azure Foundry hosting, which doesn’t require any GPU hardware or model downloads. Alternatively, you can self-host with VLLM if you have the GPU resources available.
Azure Foundry Hosting (recommended)
Deploy Fara-7B on Azure Foundry without the need to download weights or manage GPU infrastructure.
to install:
- Deploy the Fara-7B model on Azure Foundry and get your endpoint URL and API key
- Add your endpoint details to existing
endpoint_configs/Directory (example configurations already provided):
# Edit one of the existing config files or create a new one
# endpoint_configs/fara-7b-hosting-ansrz.json (example format):
{
"model": "Fara-7B",
"base_url": "https://your-endpoint.inference.ml.azure.com/",
"api_key": "YOUR_API_KEY_HERE"
}
- Run Fara Agent:
fara-cli --task "how many pages does wikipedia have" --start_page "https://www.bing.com"
That’s it! No GPU or model downloads required.
If you have access to GPU resources, you can self-host Fara-7B using VLLM. This requires a GPU machine with sufficient VRAM.
To start the VLLM server it is necessary to simply run the following command:
vllm serve "microsoft/Fara-7B" --port 5000 --dtype auto
Run the test script to see Fara in action:
fara-cli --task "how many pages does wikipedia have" --start_page "https://www.bing.com" --endpoint_config endpoint_configs/azure_foundry_config.json [--headful] [--downloads_folder "/path/to/downloads"] [--save_screenshots] [--max_rounds 100] [--browserbase]
In a self-hosting scenario endpoint_config Signal of endpoint_configs/vllm_config.json From the above VLLM server.
If you set --browserbaseExport environment variables for API key and project ID.
Initializing Browser...
Browser Running... Starting Fara Agent...
##########################################
Task: how many pages does wikipedia have
##########################################
Running Fara...
Thought #1: To find the current number of Wikipedia pages, I'll search for the latest Wikipedia page count statistics.
Action #1: executing tool 'web_search' with arguments {"action": "web_search", "query": "Wikipedia total number of articles"}
Observation#1: I typed 'Wikipedia total number of articles' into the browser search bar.
Thought #2: Wikipedia currently has 7,095,446 articles.
Action #2: executing tool 'terminate' with arguments {"action": "terminate", "status": "success"}
Observation#2: Wikipedia currently has 7,095,446 articles.
Final Answer: Wikipedia currently has 7,095,446 articles.
Enter another task (or press Enter to exit):
We provide an outline in webeval/ To reproduce our results on WebVoyager and OnlineMind2Web. Agent evaluations on live websites present unique challenges due to day-to-day changes. We implement a number of measures to ensure reliable and comparable valuations:
Browserbase integration
We use BrowserBase to manage browser session hosting, enabling reliable browser instance management.
time-sensitive task updates
Tasks in benchmarks like WebVoyager may become stale or impossible. We:
- ~48 impossible tasks removed from original WebVoyager benchmarks
- Updated ~50 tasks with future dates to keep them attainable
- Example: “Find a hotel in Bali from January 1st to January 4th, 2024” “Find a hotel in Bali from January 1st to January 4th, 2026”
- Our updated WebVoyager benchmarks are available here
webeval/data/webvoyager/WebVoyager_data_08312025.jsonl
environmental error handling
Browser errors (connection drops, page timeouts) are tightly handled:
- Trajectories are retried up to 5 times when environmental errors occur
- Perfect but wrong trajectories are never retried
- Each retry starts with a new browser session, with no retained state.
stage budget
Each trajectory is limited to a maximum of 100 actions in all online benchmarks. Trajectories that exceed this budget without stopping are considered incorrect.
WebEval Package Installation
conda create --name fara_webeval python=3.12
conda activate fara_webeval
# Install fara package
pip install -e .
# Install autogen submodule
git submodule update --init --recursive
cd autogen/python/packages
pip install -e autogen-core
pip install -e autogen-ext
# Install webeval
cd webeval
pip install -e .
# Install playwright
playwright install
Go to script directory:
Make sure you set a valid OpenAI GPT-4o endpoint endpoint_configs_gpt4o/dev To run WebVoyager LLM as a judge!
Option 1: Self-Hosted VLLM
python webvoyager.py --model_url /path/where/you/want/to/download/model/ --model_port 5000 --eval_oai_config ../endpoint_configs_gpt4o/dev/ --out_url /path/to/save/eval/files --device_id 0,1 --processes 1 --run_id 1 --max_rounds 100
Option 2: Azure Foundry deployment
Deploy Fara-7b to Foundry endpoint, then put endpoint URLs and keys into JSON endpoint_configs/,
python webvoyager.py --model_endpoint ../../endpoint_configs/ --eval_oai_config ../endpoint_configs_gpt4o/dev/ --out_url /path/to/save/eval/files --processes 1 --run_id 1_endpoint --max_rounds 100
- We use the same LLM-a-Judge prompt and model (GPT-4o) as WebVoyager, so
--eval_oai_configlogic - set
--browserbaseFor browser session management (requires export API key and project ID environment variables) - Avoid overloading a single VLLM deployment with more than ~10 concurrent processes due to known issues
- View debugging output
fara/webeval/scripts/stdout.txt
Analysis of evaluation results
evaluation output structure
The evaluation results are stored under --out_url In folders organized by:
- model name
- dataset
- username
- run id
Example path:
/runs/WebSurfer-fara-100-max_n_images-3/fara-7b//WebVoyager_WebVoyager_data_08312025.jsonl/
Each assessment folder contains:
gpt_eval/– LLM-A-Judge Assessment Resulttraj/– Per-task trajectory subdirectories including:final_answer.json(As,Amazon--1_final_answer.json,indicates abort or phase budget exceedancescores/gpt_eval.json– LLM Judge Scoreweb_surfer.log– History of actions and errorsscreenshot_X.png– Screenshots captured before each action
Use the Analysis notebook to calculate metrics:
cd webeval/scripts/analyze_eval_results/
jupyter notebook analyze.ipynb
Script:
- Identifies aborted trajectories for mid-performance and clinical reasons
- Calculates the average score on non-cancelled trajectories
- Distinguishes between aborted trajectories (errors during sampling) and completed trajectories (with terminate() calls or step budget exceeded)
To re-run failed tasks, execute the evaluation script again with the same run_id And username – This will skip non-cancelled tasks.
Example WebVoyager GPT Eval Results
{
"score": 1.0,
"gpt_response_text": "To evaluate the task, we need to verify if the criteria have been met:\n\n1. **Recipe Requirement**: A vegetarian lasagna recipe with zucchini and at least a four-star rating.\n\n2. **Search and Results**:\n - The screenshots show that the search term used was \"vegetarian lasagna zucchini.\"\n - Among the search results, \"Debbie's Vegetable Lasagna\" is prominently featured.\n \n3. **Evaluation of the Recipe**:\n - Rating: \"Debbie's Vegetable Lasagna\" has a rating of 4.7, which satisfies the requirement of being at least four stars.\n - The presence of zucchini in the recipe is implied through the search conducted, though the screenshots do not explicitly show the ingredients list. However, the result response confirms the match to the criteria.\n\nGiven the information provided, the task seems to have fulfilled the requirement of finding a vegetarian lasagna recipe with zucchini and a four-star rating or higher. \n\n**Verdict: SUCCESS**"
}
If you use Fara in your research, please cite our work:
<a href