Apple Vision Pro 2: visionOS Development
Apple’s Vision Pro has already turned heads, but the upcoming Vision Pro 2 promises even more horsepower, refined optics, and a richer developer ecosystem. If you’ve been curious about building immersive experiences for visionOS, now is the perfect time to dive in. In this guide we’ll walk through the core concepts, set up a development environment, and explore real‑world use cases that showcase the unique capabilities of the headset.
Getting Started with visionOS
visionOS is built on top of Apple’s existing platforms—UIKit, SwiftUI, and Metal—so you’ll feel right at home if you’ve written iOS or macOS apps. The biggest shift is thinking in three dimensions: every view lives in a spatial coordinate system, and interaction happens through eye tracking, hand gestures, and the new 3‑DoF controller.
Before you can write code, you need a few pieces in place:
- Xcode 16 (or later) – includes the visionOS SDK, simulators, and the
visionOStarget template. - Apple Developer account – required for signing, testing on device, and accessing beta documentation.
- Vision Pro 2 (or the Vision Pro simulator) – the simulator can emulate most spatial interactions, but testing on real hardware is essential for performance tuning.
Once Xcode is installed, create a new project and select the “App (visionOS)” template. The wizard sets up a ContentView.swift file that uses SwiftUI’s RealityView to render 3‑D content.
Understanding the Spatial Coordinate System
In visionOS, the origin (0, 0, 0) sits at the user’s eye level, looking straight ahead. Positive x moves right, positive y moves up, and positive z moves forward away from the user. All UI elements are placed relative to this origin, which lets you create floating panels, anchored objects, or full‑room experiences.
Here’s a quick snippet that positions a simple rectangle 0.5 meters in front of the user:
import SwiftUI
struct FloatingPanel: View {
var body: some View {
VStack {
Text("Hello, Vision Pro 2!")
.font(.title2)
.padding()
}
.frame(width: 0.4, height: 0.2) // meters
.background(.ultraThinMaterial)
.cornerRadius(0.02)
.position(x: 0, y: 0, z: -0.5) // 0.5 m in front
}
}
Notice the .position(x:y:z:) modifier—this is the primary way to anchor content in space. For more complex scenes, you’ll likely use RealityKit entities.
Building a Spatial UI with SwiftUI
SwiftUI’s declarative syntax shines on visionOS. You can compose 2‑D views and then wrap them in RealityView to place them in 3‑D. The framework automatically handles eye‑tracking focus, hand‑gesture hit‑testing, and dynamic resolution scaling.
Let’s build a simple “Task Board” that floats like a sticky note. This example demonstrates:
- Using
Buttonwith hand‑gesture activation. - Applying
.focusEffect()for eye‑gaze highlighting. - Persisting state with
@AppStorageso the board remembers its items.
import SwiftUI
struct TaskBoard: View {
@AppStorage("tasks") private var tasks: [String] = []
@State private var newTask = ""
var body: some View {
VStack(alignment: .leading, spacing: 12) {
Text("🗒️ Vision Pro Tasks")
.font(.headline)
.foregroundStyle(.secondary)
ForEach(tasks, id: \.self) { task in
Text("• \(task)")
.font(.subheadline)
}
HStack {
TextField("New task", text: $newTask)
.textFieldStyle(.roundedBorder)
Button("Add") {
guard !newTask.isEmpty else { return }
tasks.append(newTask)
newTask = ""
}
.buttonStyle(.borderedProminent)
}
}
.padding(0.03) // 3 cm padding
.background(.ultraThinMaterial)
.cornerRadius(0.015)
.focusEffect(.automatic) // highlights on gaze
.position(x: 0, y: 0.1, z: -0.6) // slightly below eye level
}
}
Run the app in the simulator and you’ll see the board appear as a floating pane. Look at it to focus, then pinch your thumb and index finger to “press” the Add button. The UI automatically scales based on the user’s distance, keeping text crisp.
Pro tip: Use .ultraThinMaterial for backgrounds that blend with the real world. It respects ambient lighting and reduces eye strain on long sessions.
Integrating Metal for High‑Performance Graphics
While SwiftUI covers most UI needs, you’ll often want custom 3‑D rendering—think games, data visualizations, or scientific simulations. Metal gives you low‑level access to the GPU, and visionOS adds a MTLRenderPassDescriptor that supports per‑eye rendering for stereoscopic output.
The following example renders a rotating cube using Metal’s modern Swift API. It’s a minimal but complete pipeline that works inside a RealityView:
import SwiftUI
import MetalKit
struct RotatingCube: UIViewRepresentable {
class Coordinator: NSObject, MTKViewDelegate {
var device: MTLDevice!
var commandQueue: MTLCommandQueue!
var pipelineState: MTLRenderPipelineState!
var vertexBuffer: MTLBuffer!
var startTime: CFTimeInterval = CACurrentMediaTime()
init(mtkView: MTKView) {
device = MTLCreateSystemDefaultDevice()
mtkView.device = device
commandQueue = device.makeCommandQueue()
mtkView.delegate = self
mtkView.preferredFramesPerSecond = 90 // match Vision Pro refresh
// Simple vertex data for a cube
let vertices: [Float] = [
// positions (x, y, z)
-0.5, -0.5, 0.5, // front face
0.5, -0.5, 0.5,
0.5, 0.5, 0.5,
-0.5, 0.5, 0.5,
// ... other faces omitted for brevity
]
vertexBuffer = device.makeBuffer(bytes: vertices,
length: vertices.count * MemoryLayout.size,
options: [])
// Load default library and create pipeline
let library = device.makeDefaultLibrary()
let vertexFn = library?.makeFunction(name: "vertex_main")
let fragmentFn = library?.makeFunction(name: "fragment_main")
let pipelineDesc = MTLRenderPipelineDescriptor()
pipelineDesc.vertexFunction = vertexFn
pipelineDesc.fragmentFunction = fragmentFn
pipelineDesc.colorAttachments[0].pixelFormat = mtkView.colorPixelFormat
pipelineState = try! device.makeRenderPipelineState(descriptor: pipelineDesc)
}
func draw(in view: MTKView) {
guard let descriptor = view.currentRenderPassDescriptor,
let commandBuffer = commandQueue.makeCommandBuffer(),
let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: descriptor) else { return }
// Compute rotation based on elapsed time
let elapsed = CACurrentMediaTime() - startTime
var angle = Float(elapsed * .pi / 4) // 45° per second
var modelMatrix = matrix4x4_rotation(angle, SIMD3(0, 1, 0))
encoder.setRenderPipelineState(pipelineState)
encoder.setVertexBuffer(vertexBuffer, offset: 0, index: 0)
encoder.setVertexBytes(&modelMatrix, length: MemoryLayout.size, index: 1)
encoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: 8)
encoder.endEncoding()
commandBuffer.present(view.currentDrawable!)
commandBuffer.commit()
}
func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {}
}
func makeCoordinator() -> Coordinator {
Coordinator(mtkView: MTKView())
}
func makeUIView(context: Context) -> MTKView {
let mtkView = MTKView()
mtkView.isPaused = false
mtkView.enableSetNeedsDisplay = false
mtkView.preferredFramesPerSecond = 90
mtkView.clearColor = MTLClearColorMake(0, 0, 0, 0) // transparent background
context.coordinator.mtkView = mtkView
return mtkView
}
func updateUIView(_ uiView: MTKView, context: Context) {}
}
Drop RotatingCube() inside a RealityView and position it wherever you like. The cube will spin smoothly at the headset’s native 90 Hz refresh rate, demonstrating how to blend native UI with high‑performance graphics.
Pro tip: When targeting Vision Pro 2, enable MTLResourceStorageModePrivate for buffers to let the GPU manage memory more efficiently. This reduces CPU‑GPU synchronization overhead, which is critical for maintaining low latency.
Real‑World Use Cases
Now that we’ve covered the basics, let’s explore three practical scenarios where visionOS shines. Each example highlights a different aspect of the platform—spatial UI, sensor integration, and collaborative networking.
1. Remote Collaboration Whiteboard
Imagine a design team spread across continents, each wearing a Vision Pro 2. With a shared RealityKit scene, participants can draw, place 3‑D models, and see each other's hand gestures in real time. The core loop uses MultipeerConnectivity to broadcast Entity transformations.
import MultipeerConnectivity
import RealityKit
class WhiteboardSession: NSObject, MCSessionDelegate, MCNearbyServiceAdvertiserDelegate {
var session: MCSession!
var advertiser: MCNearbyServiceAdvertiser!
var anchorEntity = AnchorEntity(world: .zero)
override init() {
super.init()
let peerID = MCPeerID(displayName: UIDevice.current.name)
session = MCSession(peer: peerID, securityIdentity: nil, encryptionPreference: .required)
session.delegate = self
advertiser = MCNearbyServiceAdvertiser(peer: peerID,
discoveryInfo: nil,
serviceType: "vision-whiteboard")
advertiser.delegate = self
advertiser.startAdvertisingPeer()
}
// Send local entity updates
func broadcast(_ entity: ModelEntity) {
guard let data = try? JSONEncoder().encode(entity.transform) else { return }
try? session.send(data, toPeers: session.connectedPeers, with: .reliable)
}
// Receive remote updates
func session(_ session: MCSession, didReceive data: Data, fromPeer peerID: MCPeerID) {
guard let transform = try? JSONDecoder().decode(Simd4x4.self, from: data) else { return }
DispatchQueue.main.async {
// Assume a shared entity ID known to both peers
if let remoteEntity = self.anchorEntity.findEntity(named: "sharedPen") as? ModelEntity {
remoteEntity.transform.matrix = transform
}
}
}
// Boilerplate delegate methods omitted for brevity
}
Integrate the WhiteboardSession into a RealityView, attach a ModelEntity representing a pen, and call broadcast(_:) on every gesture update. The result is a low‑latency, shared canvas that feels like a physical tabletop.
2. Data‑Driven 3‑D Dashboard
Enterprise users love real‑time metrics. By leveraging Swift Charts inside a floating panel, you can display live KPI graphs that users can walk around and inspect from any angle. Combine this with eye‑tracking to highlight the segment the user is looking at.
import SwiftUI
import Charts
struct KPIChart: View {
@State private var data: [Double] = (0..<30).map { _ in Double.random(in: 0...100) }
var body: some View {
Chart {
ForEach(Array(data.enumerated()), id: \.offset) { index, value in
LineMark(
x: .value("Time", index),
y: .value("Metric", value)
)
.foregroundStyle(.blue)
.interpolationMethod(.catmullRom)
}
}
.chartYScale(domain: 0...120)
.frame(width: 0.6, height: 0.4)
.background(.ultraThinMaterial)
.cornerRadius(0.015)
.position(x: 0, y: -0.2, z: -0.8)
.onAppear {
// Simulate live updates
Timer.scheduledTimer(withTimeInterval: 2.0, repeats: true) { _ in
data.append(Double.random(in: 0...100))
if data.count > 30 { data.removeFirst() }
}
}
}
}
Place KPIChart() in your app’s root view. Users can glance at the chart, and the built‑in focus system will automatically enlarge the portion under their gaze, making data exploration intuitive.
3. Spatial Gaming – “Orbital Defender”
For a more playful example, consider a simple space shooter where enemies spawn around the user. The player defends by looking at an enemy and performing a pinch gesture to fire a laser. This demo showcases:
- Ray‑casting from the eye gaze to detect hits.
- Audio spatialization with
AVAudioEngine. - Dynamic difficulty scaling based on user performance.
import SwiftUI
import RealityKit
import AVFoundation
struct OrbitalDefender: View {
@State private var enemies: [ModelEntity] = []
@State private var score = 0
@State private var audioEngine = AVAudioEngine()
var body: some View {
RealityView { content in
// Spawn enemies every 1.5 seconds
Timer.scheduledTimer(withTimeInterval: 1.5, repeats: true) { _ in
let enemy = ModelEntity(mesh: .generateSphere(radius: 0.05),
materials: [SimpleMaterial(color: .red, isMetallic: false)])
// Random position on a sphere radius 1.2 m around the user
let theta = Float.random(in: 0..<2*Float.pi)
let phi = Float.random(in: 0..(x, y, -z) // -z moves forward
content.add(enemy)
enemies.append(enemy)
}
// Gesture handling – pinch to fire
content.onGesture(.pinch) { _ in
guard let gazeRay = content.camera?.gazeRay else { return }
// Perform ray‑cast against enemy colliders
for enemy in enemies {
if enemy.intersects(ray: gazeRay) {
// Play hit sound
let player = AVAudioPlayerNode()
let buffer = try! AVAudioPCMBuffer(pcmFormat: AVAudioFormat(standardFormatWithSampleRate: 44100, channels: 1)!, frameCapacity: 1024)
// Fill buffer with a short beep (omitted for brevity)
audioEngine.attach(player)
audioEngine.connect(player, to: audioEngine.mainMixerNode, format: buffer.format)
player.scheduleBuffer(buffer!)
player.play()
// Remove enemy
enemy.removeFromParent()
score += 10
break
}
}
}
}
.overlay(alignment: .topTrailing) {
Text("Score: \(score)")
.font(.title2)
.padding(0.02)
.background(.ultraThinMaterial)
.cornerRadius(0.01)
.padding()
}
}
}
The game runs entirely in the user’s room, turning the physical space into a battlefield. Because the engine uses RealityKit’s built‑in physics, enemies can bounce off real‑world surfaces detected by the headset’s depth sensors.
Pro tip: To keep frame times under 11