How to Use GPT-5.4 Computer Use: Build a Live News Dashboard Step by Step Guide
By Braincuber Team
Published on April 23, 2026
GPT-5.4 introduces native computer-use capabilities that allow AI models to interact directly with software interfaces instead of relying on application-specific APIs. By analyzing screenshots and emitting actions such as clicking, typing, and navigating, the model can operate browsers and applications much like a human user. This beginner guide walks you through the complete tutorial of setting up and using GPT-5.4 computer use to build a live news dashboard.
What You'll Learn:
- What GPT-5.4 computer use is and how the observe-decide-act loop works
- How to clone and set up the OpenAI CUA sample app locally
- Explore built-in scenarios: Kanban automation, Paint canvas, and booking workflows
- Build a live news dashboard using Codex inside the computer-use environment
- Understand real-world applications and limitations of computer-use agents
What Is GPT-5.4 Computer Use?
GPT-5.4 introduces native computer-use capabilities, allowing models to interact with software interfaces much like a human operator. Instead of relying on application-specific APIs, the model works directly from the visual state of the interface, using screenshots and UI feedback to reason about what actions to take next. This enables agents to interact with real environments such as browsers, dashboards, and productivity tools.
Using computer use, the model can perform actions such as:
Navigate Webpages
Browse the web by following links, submitting forms, and moving through multi-page workflows automatically.
Click UI Elements
Identify and interact with buttons, menus, dropdowns, and other interactive elements on the screen.
Type Text into Fields
Enter text into input fields, search boxes, and forms as part of multi-step workflows.
Scroll and Navigate
Scroll through documents, pages, and dashboards to access content beyond the visible viewport.
How the Computer Use Agent Loop Works
Under the hood, the system operates through a simple agent loop that repeatedly observes the interface, decides on an action, and verifies the result. Here is how the workflow runs:
Send a Request
The developer starts by providing a goal prompt, the computer-use tool, and an initial screenshot of the interface to the model.
Model Reasoning and Action Proposal
GPT-5.4 analyzes the screenshot and proposes UI actions such as navigate, click, type, or scroll based on the visual state of the interface.
Execution
The client or runner executes these actions in the environment using browser automation tools like Playwright for pointer events and navigation.
Return Updated State
After the action completes, a new screenshot and the current page state are returned to the model for the next observation cycle.
Repeat the Loop
The model observes the updated interface and decides the next action until the task is completed successfully.
observe -> decide -> act -> observe
Step 1: Clone and Set Up the CUA Sample App
To get started, you will use OpenAI's CUA sample app and set up the repository locally on your device. Follow these steps to clone and configure the environment:
Clone the Repository
Clone the OpenAI CUA sample app repository from GitHub to your local machine using the command below.
Install Dependencies
Navigate into the project directory, enable corepack, and install all required dependencies using pnpm.
Configure Environment Variables
Copy the example environment file and add your OpenAI API key from the OpenAI dashboard.
Install Playwright and Start Dev Servers
Install the Playwright browser runtime and start the development servers to access the CUA operator console.
git clone https://github.com/openai/openai-cua-sample-app.git
cd openai-cua-sample-app
corepack enable
pnpm install
cp .env.example .env
pnpm playwright:install
pnpm dev
Once the development servers are running, open the CUA operator console at http://127.0.0.1:3000. This console allows you to launch agent runs and inspect logs and screenshots captured during the computer-use loop.
Note on Warnings
If pnpm install prints warnings about optional packages such as sharp or esbuild, these can be ignored for local development. On Linux systems, you may also need OS dependencies: pnpm playwright:install:with-deps.
Step 2: Exploring Built-in Computer Use Scenarios
The sample app includes three sandbox environments designed to demonstrate computer-use behavior. These environments help illustrate how GPT-5.4 interacts with different types of interfaces, from structured layouts to visual drawing applications and multi-step forms.
Kanban Board Automation
The Kanban board scenario demonstrates how GPT-5.4 computer use can reason about and manipulate structured UI layouts through visual interaction. In this example, the agent is given a goal such as reorganizing tasks on a Kanban board.
Instead of calling any application API, the agent interacts with the interface the same way a human would, by observing the board, identifying task cards, and performing drag-and-drop operations. Here is how the computer-use loop executes in this scenario:
Receive Screenshot and URL
The agent receives a screenshot of the Kanban board along with the current URL as the initial observation.
Analyze Visual Layout
GPT-5.4 analyzes the visual layout and determines where task cards and columns are located on the board.
Propose UI Actions
The model proposes actions such as moving the cursor to a card, clicking and holding, and dragging the card to another column.
Execute Actions via Playwright
The runner executes these actions through Playwright pointer events that simulate human mouse interactions.
Capture and Verify Updated State
A new screenshot is captured and sent back to the model so it can verify the updated board state and continue if needed.
Key Advantage
The model does not rely on any internal knowledge of the Kanban application. It reasons entirely from the visual state of the interface, determining where to click, drag, and drop elements based solely on the screenshot. This demonstrates that developers can automate workflows without building custom integrations or APIs for every tool.
Paint Canvas Interaction
The Paint scenario handles tasks that depend on visual layout, spatial reasoning, and precise cursor control rather than simple form-filling. In this setup, the agent is given a drawing instruction and must complete it directly inside the browser-based sketch application.
Unlike the Kanban example, where the core challenge was moving structured cards between columns, this scenario depends much more on interpreting the visual state of the app and making a series of low-level interaction decisions:
| Step | Action | Description |
|---|---|---|
| 1 | Cursor Movement | GPT-5.4 interprets the layout of the sketch interface including the color palette and blank canvas |
| 2 | Tool Selection | Identifies available palette options and clicks the appropriate color before drawing |
| 3 | Canvas Interaction | Interacts entirely through UI actions by moving the pointer to specific cells and filling them |
| 4 | State Verification | Fresh screenshot sent back to verify the expected pattern is appearing on the canvas |
Booking Workflow
In this environment, the agent interacts with a simulated booking website and is asked to complete a reservation flow. The agent must move through several UI states in sequence rather than solving a single isolated action.
Interface Understanding
GPT-5.4 begins by interpreting the current screen layout, identifying buttons, form fields, calendars, dropdowns, and confirmation controls.
Step-by-Step Navigation
The agent decides which part of the workflow to complete first, such as choosing an option, moving to the next screen, or opening a form element.
Form Filling
It enters the required values into text boxes and interacts with controls like dropdowns or date selectors as needed.
Confirmation and Completion
Once the required inputs are filled, the agent proceeds to the final confirmation step and checks that the reservation was successfully completed.
Step 3: Creating a Live News Dashboard with GPT-5.4
In this step, you will apply the same computer-use capabilities to build a live news dashboard. The goal is to create a small dashboard where a user can select a topic of interest, such as AI, politics, climate, technology, or science, and the system will then:
Gather Recent News
Collect recent news stories from trusted sources based on the user-selected topic in real time.
Extract Key Information
Extract the headline, source, and key information from each article automatically.
Generate Summaries
GPT-5.4 summarizes the findings and produces three concise news summaries per topic.
Render Structured Dashboard
The results are rendered in a dashboard-style layout with cards, intro, and export block.
Instead of writing the application manually, you will use Codex inside the GPT-5.4 computer use environment and pass it a high-level prompt to generate the feature directly inside the existing CUA repository.
Build a live News Dashboard in this repo.
Goal:
Create a dashboard where a user can enter a topic of interest, fetch the latest
important news in real time from trusted sources, and render exactly 3
structured results that are meaningful and topic-relevant.
Requirements:
- The dashboard must allow the user to type a topic such as AI, politics,
climate, health, science, or tech.
- Fetch live results at request time. Do not hardcode stories.
- Use trusted sources appropriate to the topic.
- Return exactly 3 items with HEADLINE, SOURCE, SUMMARY.
- Summaries must be concise and clearly related to the article.
- Keep the UI minimal and consistent with the repo's existing design.
- Reuse the existing framework/tooling.
Implementation plan:
1. Inspect the repo and place the dashboard in the existing app structure.
2. Add a topic input UI with a search action and loading/error state.
3. Add a server-side news fetch path with trusted source mapping.
4. Render the dashboard with page title, topic, date, intro, and 3 cards.
5. Keep the export block in the specified format.
Deliverables:
- A working live dashboard route in the app
- Real-time topic search
- Exactly 3 relevant results per search
- Structured export block visible in the UI
The prompt instructs Codex to build the dashboard inside the existing repository by acting as a high-level specification rather than detailed implementation code. Codex first inspects the project structure to determine where the dashboard UI and backend logic should be added. It then creates a topic input field, retrieves recent articles from trusted sources in real time, extracts key metadata, and renders exactly three news items in a clean layout.
GPT-5.4 computer use enables this workflow by allowing the model to observe and interact with the development environment while generating the feature. Instead of acting purely as a code generator, Codex analyzes the repository, determines where new components should live, and incrementally implements the dashboard while verifying the results.
Important Note
The final dashboard may not be generated from a single prompt. It may require a few iterations and prompt refinements to get the desired behavior and output format. When running similar experiments, expect some trial-and-error while adjusting the prompt and constraints. Also, ensure that your browser or system does not block automated browser interactions, as such restrictions can interfere with computer-use workflows.
Real-World Applications of Computer Use
From here, you can extend the computer-use concept further by building agents that automate internal dashboards, generate research pipelines, track industry trends in real time, or prototype new product features directly inside existing repositories. As computer-use models continue to improve, they will become more capable of acting as general-purpose development and automation agents.
Automate Internal Dashboards
Build agents that automatically update reporting tools, gather data from multiple sources, and refresh dashboard views without manual intervention.
Generate Research Pipelines
Create agents that browse the web for information, extract relevant data, generate reports, and update dashboards automatically.
Track Industry Trends
Monitor news, publications, and updates in specific industries in real time and summarize findings for stakeholders.
Prototype New Features
Use Codex inside the computer-use environment to rapidly prototype and implement new product features directly inside existing repositories.
Frequently Asked Questions
What is GPT-5.4 Computer Use?
GPT-5.4 Computer Use is a capability that allows AI models to interact with software interfaces through screenshots and actions like clicking, typing, and navigation, instead of relying on traditional APIs.
What powers the CUA sample app?
The CUA sample app uses Playwright for browser automation, the OpenAI Responses API for model interaction, and a Next.js operator console for managing agent runs and viewing logs.
Can GPT-5.4 automate real websites?
Yes, but developers need to respect site policies and avoid bypassing CAPTCHAs or security mechanisms when automating interactions with real websites.
What kinds of applications can be built with computer use?
Examples include research assistants, data dashboards, automation agents, productivity tools, and internal reporting systems that interact with web interfaces directly.
Do I need custom APIs to use GPT-5.4 Computer Use?
No. The key advantage of computer use is that it does not require custom integrations or APIs for every tool. The model reasons from the visual state of the interface using screenshots.
Need Help with AI Implementation?
Our experts can help you implement AI agents, computer-use workflows, and custom automation solutions for your business applications.
