GitHub – microsoft/fara


Fara-7B Microsoft’s first Agentic Small Language Model (SLM) Designed specifically for computer use. With only 7 billion parameters, FARA-7B is an ultra-compact computer usage agent (CUA) that achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems.

Try Fara-7b locally as follows (see Installation for detailed instructions):

# 1. Clone repository
git clone https://github.com/microsoft/fara.git
cd fara

# 2. Setup environment
python3 -m venv .venv 
source .venv/bin/activate
pip install -e .
playwright install

Then in one process, host the model:

vllm serve "microsoft/Fara-7B" --port 5000 --dtype auto 

You can then do an iterative query with:

fara-cli --task "whats the weather in new york now"

Hint: May need to --tensor-parallel-size 2 With vllm command if you run out of memory

What makes the Fara-7B unique?

Unlike traditional chat models that generate text-based responses, Fara-7B leverages the computer interface—mouse and keyboard—to perform multi-step tasks on behalf of users. Model:

  • visually driven By viewing webpages and performing actions such as scrolling, typing, and clicking directly on predicted coordinates.
  • uses the same methods as humans To interact with computers – no accessibility trees or separate parsing models are needed
  • Enables on-device deployment Due to its compact 7B parameter size, latency is reduced and privacy is improved as user data remains local.
  • completes tasks efficientlyOnly ~16 steps on average per task compared to ~41 for comparable models

FARA-7B is trained using a novel synthetic data generation pipeline built on the Magentic-One multi-agent framework, with 145K trajectories covering different websites, task types, and difficulty levels. This model is based on Qwen2.5-VL-7B and trained with supervised fine-tuning.

Fara-7B can automate everyday web tasks including:

  • Searching for information and summarizing results
  • Filling out forms and managing accounts
  • Booking of travel, movie tickets and restaurant reservations
  • Shopping and comparing prices between retailers
  • Finding job postings and real estate listings

FARA-7B achieves state-of-the-art results in several web agent benchmarks, outperforming both comparably sized models and larger systems:

Sample parameters webvoyager Online-M2W deepshop webtelbench
SOM Agent
SoM Agent (GPT-4o-0513) , 90.6 57.7 49.1 60.4
SoM Agent (o3-mini) , 79.3 55.4 49.7 52.7
SoM Agent (GPT-4o) , 65.1 34.6 16.0 30.8
glm-4.1v-9b-thinking 9b 66.8 33.9 32.0 22.4
computer usage model
OpenAI computer-use-preview , 70.9 42.9 24.7 25.7
ui-tars-1.5-7b 7b 66.4 31.3 11.6 19.5
Fara-7B 7b 73.5 34.1 26.2 38.4

Table: Online agent evaluation results showing success rates (%) in four web benchmarks. Results are averaged over 3 runs.

WebtailBench: A new benchmark for real-world web tasks

we are releasing webtelbenchA new assessment benchmark focuses on 11 real-world work types that are underrepresented or missing in existing benchmarks. The benchmark consists of 609 tasks in various categories, with the first 8 sections testing single skills or objectives (usually on the same website), and the remaining 3 evaluating more difficult multi-step or cross-site tasks.

webtelbench detailed results

work section Work SoM GPT-4o-0513 SOM O3-Mini SoM GPT-4o GLM-4.1V-9B OAI comp-usage ui-tars-1.5 Fara-7B
single-site work
shopping 56 62.5 71.4 38.1 31.0 42.3 41.1 52.4
stamp 51 60.1 39.2 11.1 10.5 17.6 10.5 37.9
hotel 52 68.6 56.4 31.4 19.9 26.9 35.3 53.8
restaurant 52 67.9 59.6 47.4 32.1 35.9 22.4 47.4
activities 80 70.4 62.9 41.7 26.3 30.4 9.6 36.3
Ticketing 57 58.5 56.7 37.4 35.7 49.7 30.4 38.6
real estate 48 34.0 17.4 20.1 16.0 9.0 9.7 23.6
Jobs/Careers 50 49.3 44.0 32.7 22.7 20.7 20.7 28.0
multi-step work
Shopping List (2 items) 51 66.0 62.7 17.0 7.8 34.0 20.9 49.0
comparison shopping 57 67.3 59.1 27.5 22.8 1.2 8.8 32.7
creative work 55 51.5 39.4 26.7 17.0 10.3 9.1 23.0
overall
macro average 609 59.7 51.7 30.1 22.0 25.3 19.9 38.4
micro average 609 60.4 52.7 30.8 22.4 25.7 19.5 38.4

Table: Details of WebtailBench results in all 11 segments. Success rate (%) is averaged over 3 independent runs. The Fara-7B achieves the highest performance among computer-use models in all task categories.

coming soon:

  • Task Verification Pipeline for LLM-A-Judge Assessment
  • Official human comments from WebtelBench (in partnership with BrowserBase)

evaluation infrastructure

Our evaluation setup leverages:

  1. playwright – A cross-browser automation framework that mimics the browser environment
  2. abstract web agent interface – Allows integration of any model from any source into the assessment environment
  3. fara-agent class – Reference implementation for running FARA models

Comment: FARA-7B is an experimental release designed to invite practical exploration and feedback from the community. We recommend running it in a sandboxed environment, monitoring its execution, and avoiding domains with sensitive data or high risk.


Install packages using uv or pip:

Or

Then install Playwright Browser:


recommended: The easiest way to get started is to use Azure Foundry hosting, which doesn’t require any GPU hardware or model downloads. Alternatively, you can self-host with VLLM if you have the GPU resources available.

Azure Foundry Hosting (recommended)

Deploy Fara-7B on Azure Foundry without the need to download weights or manage GPU infrastructure.

to install:

  1. Deploy the Fara-7B model on Azure Foundry and get your endpoint URL and API key
  2. Add your endpoint details to existing endpoint_configs/ Directory (example configurations already provided):
# Edit one of the existing config files or create a new one
# endpoint_configs/fara-7b-hosting-ansrz.json (example format):
{
    "model": "Fara-7B",
    "base_url": "https://your-endpoint.inference.ml.azure.com/",
    "api_key": "YOUR_API_KEY_HERE"
}
  1. Run Fara Agent:
fara-cli --task "how many pages does wikipedia have" --start_page "https://www.bing.com"

That’s it! No GPU or model downloads required.

If you have access to GPU resources, you can self-host Fara-7B using VLLM. This requires a GPU machine with sufficient VRAM.

To start the VLLM server it is necessary to simply run the following command:

vllm serve "microsoft/Fara-7B" --port 5000 --dtype auto 

Run the test script to see Fara in action:

fara-cli --task "how many pages does wikipedia have" --start_page "https://www.bing.com" --endpoint_config endpoint_configs/azure_foundry_config.json [--headful] [--downloads_folder "/path/to/downloads"] [--save_screenshots] [--max_rounds 100] [--browserbase]

In a self-hosting scenario endpoint_config Signal of endpoint_configs/vllm_config.json From the above VLLM server.

If you set --browserbaseExport environment variables for API key and project ID.

Initializing Browser...
Browser Running... Starting Fara Agent...
##########################################
Task: how many pages does wikipedia have
##########################################
Running Fara...


Thought #1: To find the current number of Wikipedia pages, I'll search for the latest Wikipedia page count statistics.
Action #1: executing tool 'web_search' with arguments {"action": "web_search", "query": "Wikipedia total number of articles"}
Observation#1: I typed 'Wikipedia total number of articles' into the browser search bar.

Thought #2: Wikipedia currently has 7,095,446 articles.
Action #2: executing tool 'terminate' with arguments {"action": "terminate", "status": "success"}
Observation#2: Wikipedia currently has 7,095,446 articles.

Final Answer: Wikipedia currently has 7,095,446 articles.

Enter another task (or press Enter to exit): 

We provide an outline in webeval/ To reproduce our results on WebVoyager and OnlineMind2Web. Agent evaluations on live websites present unique challenges due to day-to-day changes. We implement a number of measures to ensure reliable and comparable valuations:

Browserbase integration
We use BrowserBase to manage browser session hosting, enabling reliable browser instance management.

time-sensitive task updates
Tasks in benchmarks like WebVoyager may become stale or impossible. We:

  • ~48 impossible tasks removed from original WebVoyager benchmarks
  • Updated ~50 tasks with future dates to keep them attainable
  • Example: “Find a hotel in Bali from January 1st to January 4th, 2024” “Find a hotel in Bali from January 1st to January 4th, 2026”
  • Our updated WebVoyager benchmarks are available here webeval/data/webvoyager/WebVoyager_data_08312025.jsonl

environmental error handling
Browser errors (connection drops, page timeouts) are tightly handled:

  • Trajectories are retried up to 5 times when environmental errors occur
  • Perfect but wrong trajectories are never retried
  • Each retry starts with a new browser session, with no retained state.

stage budget
Each trajectory is limited to a maximum of 100 actions in all online benchmarks. Trajectories that exceed this budget without stopping are considered incorrect.

WebEval Package Installation

conda create --name fara_webeval python=3.12
conda activate fara_webeval

# Install fara package
pip install -e .

# Install autogen submodule
git submodule update --init --recursive
cd autogen/python/packages
pip install -e autogen-core
pip install -e autogen-ext

# Install webeval
cd webeval
pip install -e .

# Install playwright
playwright install

Go to script directory:

Make sure you set a valid OpenAI GPT-4o endpoint endpoint_configs_gpt4o/dev To run WebVoyager LLM as a judge!

Option 1: Self-Hosted VLLM

python webvoyager.py --model_url /path/where/you/want/to/download/model/ --model_port 5000 --eval_oai_config ../endpoint_configs_gpt4o/dev/ --out_url /path/to/save/eval/files --device_id 0,1 --processes 1 --run_id 1 --max_rounds 100

Option 2: Azure Foundry deployment

Deploy Fara-7b to Foundry endpoint, then put endpoint URLs and keys into JSON endpoint_configs/,

python webvoyager.py --model_endpoint ../../endpoint_configs/ --eval_oai_config ../endpoint_configs_gpt4o/dev/ --out_url /path/to/save/eval/files --processes 1 --run_id 1_endpoint --max_rounds 100
  • We use the same LLM-a-Judge prompt and model (GPT-4o) as WebVoyager, so --eval_oai_config logic
  • set --browserbase For browser session management (requires export API key and project ID environment variables)
  • Avoid overloading a single VLLM deployment with more than ~10 concurrent processes due to known issues
  • View debugging output fara/webeval/scripts/stdout.txt

Analysis of evaluation results

evaluation output structure

The evaluation results are stored under --out_url In folders organized by:

  • model name
  • dataset
  • username
  • run id

Example path:

/runs/WebSurfer-fara-100-max_n_images-3/fara-7b//WebVoyager_WebVoyager_data_08312025.jsonl/

Each assessment folder contains:

  • gpt_eval/ – LLM-A-Judge Assessment Result
  • traj/ – Per-task trajectory subdirectories including:
    • final_answer.json (As, Amazon--1_final_answer.json, indicates abort or phase budget exceedance
    • scores/gpt_eval.json – LLM Judge Score
    • web_surfer.log – History of actions and errors
    • screenshot_X.png – Screenshots captured before each action

Use the Analysis notebook to calculate metrics:

cd webeval/scripts/analyze_eval_results/
jupyter notebook analyze.ipynb

Script:

  • Identifies aborted trajectories for mid-performance and clinical reasons
  • Calculates the average score on non-cancelled trajectories
  • Distinguishes between aborted trajectories (errors during sampling) and completed trajectories (with terminate() calls or step budget exceeded)

To re-run failed tasks, execute the evaluation script again with the same run_id And username – This will skip non-cancelled tasks.

Example WebVoyager GPT Eval Results
{
  "score": 1.0,
  "gpt_response_text": "To evaluate the task, we need to verify if the criteria have been met:\n\n1. **Recipe Requirement**: A vegetarian lasagna recipe with zucchini and at least a four-star rating.\n\n2. **Search and Results**:\n   - The screenshots show that the search term used was \"vegetarian lasagna zucchini.\"\n   - Among the search results, \"Debbie's Vegetable Lasagna\" is prominently featured.\n   \n3. **Evaluation of the Recipe**:\n   - Rating: \"Debbie's Vegetable Lasagna\" has a rating of 4.7, which satisfies the requirement of being at least four stars.\n   - The presence of zucchini in the recipe is implied through the search conducted, though the screenshots do not explicitly show the ingredients list. However, the result response confirms the match to the criteria.\n\nGiven the information provided, the task seems to have fulfilled the requirement of finding a vegetarian lasagna recipe with zucchini and a four-star rating or higher. \n\n**Verdict: SUCCESS**"
}

If you use Fara in your research, please cite our work:




<a href

Leave a Comment