GPT-5.4 introduces native computer-use capabilities that allow AI models to interact directly with software interfaces instead of relying on application-specific APIs. By analyzing screenshots and emitting actions such as clicking, typing, and navigating, the model can operate browsers and applications much like a human user. This beginner guide walks you through the complete tutorial of setting up and using GPT-5.4 computer use to build a live news dashboard.

What You'll Learn:

What GPT-5.4 computer use is and how the observe-decide-act loop works
How to clone and set up the OpenAI CUA sample app locally
Explore built-in scenarios: Kanban automation, Paint canvas, and booking workflows
Build a live news dashboard using Codex inside the computer-use environment
Understand real-world applications and limitations of computer-use agents

What Is GPT-5.4 Computer Use?

GPT-5.4 introduces native computer-use capabilities, allowing models to interact with software interfaces much like a human operator. Instead of relying on application-specific APIs, the model works directly from the visual state of the interface, using screenshots and UI feedback to reason about what actions to take next. This enables agents to interact with real environments such as browsers, dashboards, and productivity tools.

Using computer use, the model can perform actions such as:

Navigate Webpages

Browse the web by following links, submitting forms, and moving through multi-page workflows automatically.

Click UI Elements

Identify and interact with buttons, menus, dropdowns, and other interactive elements on the screen.

Type Text into Fields

Enter text into input fields, search boxes, and forms as part of multi-step workflows.

Scroll and Navigate

Scroll through documents, pages, and dashboards to access content beyond the visible viewport.

How the Computer Use Agent Loop Works

Under the hood, the system operates through a simple agent loop that repeatedly observes the interface, decides on an action, and verifies the result. Here is how the workflow runs:

Send a Request

The developer starts by providing a goal prompt, the computer-use tool, and an initial screenshot of the interface to the model.

Model Reasoning and Action Proposal

GPT-5.4 analyzes the screenshot and proposes UI actions such as navigate, click, type, or scroll based on the visual state of the interface.

Execution

The client or runner executes these actions in the environment using browser automation tools like Playwright for pointer events and navigation.

Return Updated State

After the action completes, a new screenshot and the current page state are returned to the model for the next observation cycle.

Repeat the Loop

The model observes the updated interface and decides the next action until the task is completed successfully.

Agent Loop: observe -> decide -> act -> observe

observe -> decide -> act -> observe

Step 1: Clone and Set Up the CUA Sample App

To get started, you will use OpenAI's CUA sample app and set up the repository locally on your device. Follow these steps to clone and configure the environment:

Clone the Repository

Clone the OpenAI CUA sample app repository from GitHub to your local machine using the command below.

Install Dependencies

Navigate into the project directory, enable corepack, and install all required dependencies using pnpm.

Configure Environment Variables

Copy the example environment file and add your OpenAI API key from the OpenAI dashboard.

Install Playwright and Start Dev Servers

Install the Playwright browser runtime and start the development servers to access the CUA operator console.

Setup Commands

git clone https://github.com/openai/openai-cua-sample-app.git
cd openai-cua-sample-app
corepack enable
pnpm install
cp .env.example .env
pnpm playwright:install
pnpm dev

Once the development servers are running, open the CUA operator console at http://127.0.0.1:3000. This console allows you to launch agent runs and inspect logs and screenshots captured during the computer-use loop.

Note on Warnings

If pnpm install prints warnings about optional packages such as sharp or esbuild, these can be ignored for local development. On Linux systems, you may also need OS dependencies: pnpm playwright:install:with-deps.

Step 2: Exploring Built-in Computer Use Scenarios

The sample app includes three sandbox environments designed to demonstrate computer-use behavior. These environments help illustrate how GPT-5.4 interacts with different types of interfaces, from structured layouts to visual drawing applications and multi-step forms.

Kanban Board Automation

The Kanban board scenario demonstrates how GPT-5.4 computer use can reason about and manipulate structured UI layouts through visual interaction. In this example, the agent is given a goal such as reorganizing tasks on a Kanban board.

Instead of calling any application API, the agent interacts with the interface the same way a human would, by observing the board, identifying task cards, and performing drag-and-drop operations. Here is how the computer-use loop executes in this scenario:

Receive Screenshot and URL

The agent receives a screenshot of the Kanban board along with the current URL as the initial observation.

Analyze Visual Layout

GPT-5.4 analyzes the visual layout and determines where task cards and columns are located on the board.

Propose UI Actions

The model proposes actions such as moving the cursor to a card, clicking and holding, and dragging the card to another column.

Execute Actions via Playwright

The runner executes these actions through Playwright pointer events that simulate human mouse interactions.

Capture and Verify Updated State

A new screenshot is captured and sent back to the model so it can verify the updated board state and continue if needed.

Key Advantage

The model does not rely on any internal knowledge of the Kanban application. It reasons entirely from the visual state of the interface, determining where to click, drag, and drop elements based solely on the screenshot. This demonstrates that developers can automate workflows without building custom integrations or APIs for every tool.

Paint Canvas Interaction

The Paint scenario handles tasks that depend on visual layout, spatial reasoning, and precise cursor control rather than simple form-filling. In this setup, the agent is given a drawing instruction and must complete it directly inside the browser-based sketch application.

Unlike the Kanban example, where the core challenge was moving structured cards between columns, this scenario depends much more on interpreting the visual state of the app and making a series of low-level interaction decisions:

Step	Action	Description
1	Cursor Movement	GPT-5.4 interprets the layout of the sketch interface including the color palette and blank canvas
2	Tool Selection	Identifies available palette options and clicks the appropriate color before drawing
3	Canvas Interaction	Interacts entirely through UI actions by moving the pointer to specific cells and filling them
4	State Verification	Fresh screenshot sent back to verify the expected pattern is appearing on the canvas

Booking Workflow

In this environment, the agent interacts with a simulated booking website and is asked to complete a reservation flow. The agent must move through several UI states in sequence rather than solving a single isolated action.

Interface Understanding

GPT-5.4 begins by interpreting the current screen layout, identifying buttons, form fields, calendars, dropdowns, and confirmation controls.

Step-by-Step Navigation

The agent decides which part of the workflow to complete first, such as choosing an option, moving to the next screen, or opening a form element.

Form Filling

It enters the required values into text boxes and interacts with controls like dropdowns or date selectors as needed.

Confirmation and Completion

Once the required inputs are filled, the agent proceeds to the final confirmation step and checks that the reservation was successfully completed.

Step 3: Creating a Live News Dashboard with GPT-5.4

In this step, you will apply the same computer-use capabilities to build a live news dashboard. The goal is to create a small dashboard where a user can select a topic of interest, such as AI, politics, climate, technology, or science, and the system will then:

Gather Recent News

Collect recent news stories from trusted sources based on the user-selected topic in real time.

Extract Key Information

Extract the headline, source, and key information from each article automatically.

Generate Summaries

GPT-5.4 summarizes the findings and produces three concise news summaries per topic.

Render Structured Dashboard

The results are rendered in a dashboard-style layout with cards, intro, and export block.

Instead of writing the application manually, you will use Codex inside the GPT-5.4 computer use environment and pass it a high-level prompt to generate the feature directly inside the existing CUA repository.

Codex Prompt for News Dashboard

Build a live News Dashboard in this repo.

Goal:
Create a dashboard where a user can enter a topic of interest, fetch the latest 
important news in real time from trusted sources, and render exactly 3 
structured results that are meaningful and topic-relevant.

Requirements:
- The dashboard must allow the user to type a topic such as AI, politics, 
  climate, health, science, or tech.
- Fetch live results at request time. Do not hardcode stories.
- Use trusted sources appropriate to the topic.
- Return exactly 3 items with HEADLINE, SOURCE, SUMMARY.
- Summaries must be concise and clearly related to the article.
- Keep the UI minimal and consistent with the repo's existing design.
- Reuse the existing framework/tooling.

Implementation plan:
1. Inspect the repo and place the dashboard in the existing app structure.
2. Add a topic input UI with a search action and loading/error state.
3. Add a server-side news fetch path with trusted source mapping.
4. Render the dashboard with page title, topic, date, intro, and 3 cards.
5. Keep the export block in the specified format.

Deliverables:
- A working live dashboard route in the app
- Real-time topic search
- Exactly 3 relevant results per search
- Structured export block visible in the UI

The prompt instructs Codex to build the dashboard inside the existing repository by acting as a high-level specification rather than detailed implementation code. Codex first inspects the project structure to determine where the dashboard UI and backend logic should be added. It then creates a topic input field, retrieves recent articles from trusted sources in real time, extracts key metadata, and renders exactly three news items in a clean layout.

GPT-5.4 computer use enables this workflow by allowing the model to observe and interact with the development environment while generating the feature. Instead of acting purely as a code generator, Codex analyzes the repository, determines where new components should live, and incrementally implements the dashboard while verifying the results.

Important Note

The final dashboard may not be generated from a single prompt. It may require a few iterations and prompt refinements to get the desired behavior and output format. When running similar experiments, expect some trial-and-error while adjusting the prompt and constraints. Also, ensure that your browser or system does not block automated browser interactions, as such restrictions can interfere with computer-use workflows.

Real-World Applications of Computer Use

From here, you can extend the computer-use concept further by building agents that automate internal dashboards, generate research pipelines, track industry trends in real time, or prototype new product features directly inside existing repositories. As computer-use models continue to improve, they will become more capable of acting as general-purpose development and automation agents.

Automate Internal Dashboards

Build agents that automatically update reporting tools, gather data from multiple sources, and refresh dashboard views without manual intervention.

Generate Research Pipelines

Create agents that browse the web for information, extract relevant data, generate reports, and update dashboards automatically.

Track Industry Trends

Monitor news, publications, and updates in specific industries in real time and summarize findings for stakeholders.

Prototype New Features

Use Codex inside the computer-use environment to rapidly prototype and implement new product features directly inside existing repositories.

Frequently Asked Questions

What is GPT-5.4 Computer Use?

GPT-5.4 Computer Use is a capability that allows AI models to interact with software interfaces through screenshots and actions like clicking, typing, and navigation, instead of relying on traditional APIs.

What powers the CUA sample app?

The CUA sample app uses Playwright for browser automation, the OpenAI Responses API for model interaction, and a Next.js operator console for managing agent runs and viewing logs.

Can GPT-5.4 automate real websites?

Yes, but developers need to respect site policies and avoid bypassing CAPTCHAs or security mechanisms when automating interactions with real websites.

What kinds of applications can be built with computer use?

Examples include research assistants, data dashboards, automation agents, productivity tools, and internal reporting systems that interact with web interfaces directly.

Do I need custom APIs to use GPT-5.4 Computer Use?

No. The key advantage of computer use is that it does not require custom integrations or APIs for every tool. The model reasons from the visual state of the interface using screenshots.

Need Help with AI Implementation?

Our experts can help you implement AI agents, computer-use workflows, and custom automation solutions for your business applications.

What You'll Learn:

What GPT-5.4 computer use is and how the observe-decide-act loop works
How to clone and set up the OpenAI CUA sample app locally
Explore built-in scenarios: Kanban automation, Paint canvas, and booking workflows
Build a live news dashboard using Codex inside the computer-use environment
Understand real-world applications and limitations of computer-use agents

What Is GPT-5.4 Computer Use?

Using computer use, the model can perform actions such as:

Navigate Webpages

Browse the web by following links, submitting forms, and moving through multi-page workflows automatically.

Click UI Elements

Identify and interact with buttons, menus, dropdowns, and other interactive elements on the screen.

Type Text into Fields

Enter text into input fields, search boxes, and forms as part of multi-step workflows.

Scroll and Navigate

Scroll through documents, pages, and dashboards to access content beyond the visible viewport.

How the Computer Use Agent Loop Works

Under the hood, the system operates through a simple agent loop that repeatedly observes the interface, decides on an action, and verifies the result. Here is how the workflow runs:

Send a Request

The developer starts by providing a goal prompt, the computer-use tool, and an initial screenshot of the interface to the model.

Model Reasoning and Action Proposal

GPT-5.4 analyzes the screenshot and proposes UI actions such as navigate, click, type, or scroll based on the visual state of the interface.

Execution

The client or runner executes these actions in the environment using browser automation tools like Playwright for pointer events and navigation.

Return Updated State

After the action completes, a new screenshot and the current page state are returned to the model for the next observation cycle.

Repeat the Loop

The model observes the updated interface and decides the next action until the task is completed successfully.

Agent Loop: observe -> decide -> act -> observe

observe -> decide -> act -> observe

Step 1: Clone and Set Up the CUA Sample App

To get started, you will use OpenAI's CUA sample app and set up the repository locally on your device. Follow these steps to clone and configure the environment:

Clone the Repository

Clone the OpenAI CUA sample app repository from GitHub to your local machine using the command below.

Install Dependencies

Navigate into the project directory, enable corepack, and install all required dependencies using pnpm.

Configure Environment Variables

Copy the example environment file and add your OpenAI API key from the OpenAI dashboard.

Install Playwright and Start Dev Servers

Install the Playwright browser runtime and start the development servers to access the CUA operator console.

Setup Commands

git clone https://github.com/openai/openai-cua-sample-app.git
cd openai-cua-sample-app
corepack enable
pnpm install
cp .env.example .env
pnpm playwright:install
pnpm dev

Note on Warnings

Step 2: Exploring Built-in Computer Use Scenarios

Kanban Board Automation

Receive Screenshot and URL

The agent receives a screenshot of the Kanban board along with the current URL as the initial observation.

Analyze Visual Layout

GPT-5.4 analyzes the visual layout and determines where task cards and columns are located on the board.

Propose UI Actions

The model proposes actions such as moving the cursor to a card, clicking and holding, and dragging the card to another column.

Execute Actions via Playwright

The runner executes these actions through Playwright pointer events that simulate human mouse interactions.

Capture and Verify Updated State

A new screenshot is captured and sent back to the model so it can verify the updated board state and continue if needed.

Key Advantage

Paint Canvas Interaction

Step	Action	Description
1	Cursor Movement	GPT-5.4 interprets the layout of the sketch interface including the color palette and blank canvas
2	Tool Selection	Identifies available palette options and clicks the appropriate color before drawing
3	Canvas Interaction	Interacts entirely through UI actions by moving the pointer to specific cells and filling them
4	State Verification	Fresh screenshot sent back to verify the expected pattern is appearing on the canvas

Booking Workflow

Interface Understanding

GPT-5.4 begins by interpreting the current screen layout, identifying buttons, form fields, calendars, dropdowns, and confirmation controls.

Step-by-Step Navigation

The agent decides which part of the workflow to complete first, such as choosing an option, moving to the next screen, or opening a form element.

Form Filling

It enters the required values into text boxes and interacts with controls like dropdowns or date selectors as needed.

Confirmation and Completion

Once the required inputs are filled, the agent proceeds to the final confirmation step and checks that the reservation was successfully completed.

Step 3: Creating a Live News Dashboard with GPT-5.4

Gather Recent News

Collect recent news stories from trusted sources based on the user-selected topic in real time.

Extract Key Information

Extract the headline, source, and key information from each article automatically.

Generate Summaries

GPT-5.4 summarizes the findings and produces three concise news summaries per topic.

Render Structured Dashboard

The results are rendered in a dashboard-style layout with cards, intro, and export block.

Codex Prompt for News Dashboard

Build a live News Dashboard in this repo.

Goal:
Create a dashboard where a user can enter a topic of interest, fetch the latest 
important news in real time from trusted sources, and render exactly 3 
structured results that are meaningful and topic-relevant.

Requirements:
- The dashboard must allow the user to type a topic such as AI, politics, 
  climate, health, science, or tech.
- Fetch live results at request time. Do not hardcode stories.
- Use trusted sources appropriate to the topic.
- Return exactly 3 items with HEADLINE, SOURCE, SUMMARY.
- Summaries must be concise and clearly related to the article.
- Keep the UI minimal and consistent with the repo's existing design.
- Reuse the existing framework/tooling.

Implementation plan:
1. Inspect the repo and place the dashboard in the existing app structure.
2. Add a topic input UI with a search action and loading/error state.
3. Add a server-side news fetch path with trusted source mapping.
4. Render the dashboard with page title, topic, date, intro, and 3 cards.
5. Keep the export block in the specified format.

Deliverables:
- A working live dashboard route in the app
- Real-time topic search
- Exactly 3 relevant results per search
- Structured export block visible in the UI

Important Note

Real-World Applications of Computer Use

Automate Internal Dashboards

Build agents that automatically update reporting tools, gather data from multiple sources, and refresh dashboard views without manual intervention.

Generate Research Pipelines

Create agents that browse the web for information, extract relevant data, generate reports, and update dashboards automatically.

Track Industry Trends

Monitor news, publications, and updates in specific industries in real time and summarize findings for stakeholders.

Prototype New Features

Use Codex inside the computer-use environment to rapidly prototype and implement new product features directly inside existing repositories.

Frequently Asked Questions

What is GPT-5.4 Computer Use?

What powers the CUA sample app?

The CUA sample app uses Playwright for browser automation, the OpenAI Responses API for model interaction, and a Next.js operator console for managing agent runs and viewing logs.

Can GPT-5.4 automate real websites?

Yes, but developers need to respect site policies and avoid bypassing CAPTCHAs or security mechanisms when automating interactions with real websites.

What kinds of applications can be built with computer use?

Examples include research assistants, data dashboards, automation agents, productivity tools, and internal reporting systems that interact with web interfaces directly.

Do I need custom APIs to use GPT-5.4 Computer Use?

No. The key advantage of computer use is that it does not require custom integrations or APIs for every tool. The model reasons from the visual state of the interface using screenshots.

Need Help with AI Implementation?

Our experts can help you implement AI agents, computer-use workflows, and custom automation solutions for your business applications.

How to Use GPT-5.4 Computer Use: Build a Live News Dashboard Step by Step Guide

What Is GPT-5.4 Computer Use?

Navigate Webpages

Click UI Elements

Type Text into Fields

Scroll and Navigate

How the Computer Use Agent Loop Works

Send a Request

Model Reasoning and Action Proposal

Execution

Return Updated State

Repeat the Loop

Step 1: Clone and Set Up the CUA Sample App

Clone the Repository

Install Dependencies

Configure Environment Variables

Install Playwright and Start Dev Servers

Step 2: Exploring Built-in Computer Use Scenarios

Kanban Board Automation

Receive Screenshot and URL

Analyze Visual Layout

Propose UI Actions

Execute Actions via Playwright

Capture and Verify Updated State

Paint Canvas Interaction

Booking Workflow

Interface Understanding

Step-by-Step Navigation

Form Filling

Confirmation and Completion

Step 3: Creating a Live News Dashboard with GPT-5.4

Gather Recent News

Extract Key Information

Generate Summaries

Render Structured Dashboard

Real-World Applications of Computer Use

Automate Internal Dashboards

Generate Research Pipelines

Track Industry Trends

Prototype New Features

Frequently Asked Questions

What is GPT-5.4 Computer Use?

What powers the CUA sample app?

Can GPT-5.4 automate real websites?

What kinds of applications can be built with computer use?

Do I need custom APIs to use GPT-5.4 Computer Use?

Need Help with AI Implementation?

Need this implemented in your project?

Take the guide with you

Book a 30-min architecture call

Get a free 48-hour written brief

How to Use GPT-5.4 Computer Use: Build a Live News Dashboard Step by Step Guide

What Is GPT-5.4 Computer Use?

Navigate Webpages

Click UI Elements

Type Text into Fields

Scroll and Navigate

How the Computer Use Agent Loop Works

Send a Request

Model Reasoning and Action Proposal

Execution

Return Updated State

Repeat the Loop

Step 1: Clone and Set Up the CUA Sample App

Clone the Repository

Install Dependencies

Configure Environment Variables

Install Playwright and Start Dev Servers

Step 2: Exploring Built-in Computer Use Scenarios

Kanban Board Automation

Receive Screenshot and URL

Analyze Visual Layout

Propose UI Actions

Execute Actions via Playwright

Capture and Verify Updated State

Paint Canvas Interaction

Booking Workflow

Interface Understanding

Step-by-Step Navigation

Form Filling