Foundation Models 프레임워크에 LLM 제공자 적용하기

Foundation Models 프레임워크에 LLM 제공자 적용하기

새로운 모델을 위한 LanguageModelExecutor를 구현하여 Foundation Models 프레임워크를 확장하세요. LanguageModelSession의 스크립트와 연결하고, 세션 상태를 효과적으로 관리하며, KV 캐시 활용률을 최적화하는 방법을 살펴보세요. 맞춤형 세그먼트 유형을 지원하고 생성형 AI 기능의 고급 기능을 활용하는 방법을 알아보세요.

챕터
- 0:00 - Introduction
- 3:37 - Packaging
- 4:48 - Protocol
- 14:50 - Authentication
- 15:51 - Customization
- 19:47 - Next steps
리소스
관련 비디오

WWDC26
안녕하세요! 저는 Christopher Webb이고 Machine Learning Research 팀의 엔지니어입니다 Foundation Models 프레임워크의 새로운 활용 방법을 소개해 드리게 되어 기쁩니다 이전에 Foundation Models 프레임워크를 소개해 드렸는데요 Apple의 온디바이스 언어 모델에 접근할 수 있도록 해주는 것이었습니다 이제 거의 모든 LLM과 함께 동작하도록 프레임워크를 개방합니다 로컬이든 서버 기반이든요 이를 통해 대기업부터 개인 개발자까지 누구나 프레임워크 위에 자신만의 모델 통합을 쉽게 구축할 수 있습니다
온디바이스 System Language Model이 완전히 새롭게 재설계되었습니다 더 스마트해졌고 명령 수행 능력이 향상되었으며 프롬프트에 이미지를 직접 포함할 수 있습니다 시스템 모델 외에도 세 가지 옵션이 추가되었습니다 Private Cloud Compute는 많은 Apple Intelligence 기능의 기반 모델을 제공합니다 추론 기능이 추가되었고 32K 토큰 컨텍스트 윈도우와 기대하시는 수준의 개인정보 보호를 보장합니다 Core AI는 로컬 모델을 효율적으로 실행하고 ANE를 활용할 수 있게 해줍니다 그리고 MLX는 Hugging Face의 MLX-Community를 통해 수천 개의 모델을 활용할 수 있게 합니다
이것들은 모두 새로운 공개 프로토콜을 기반으로 구축되어 있어 개발자들이 최첨단 AI 모델을 동일한 프레임워크를 사용해 앱에 가져올 수 있습니다 Anthropic과 Google이 곧 자체 Swift 패키지로 Foundation Models 프레임워크를 확장할 예정입니다 최신 Claude와 Gemini 모델을 모든 Swift 개발자가 사용할 수 있게 됩니다 어떤 모델을 사용하든 Apple의 것이든, 여러분의 것이든, 커뮤니티의 것이든 호출 방식은 동일합니다 모든 모델이 Language Model 프로토콜을 준수하기 때문입니다 앱 개발자 여러분께는 이 모든 모델을 동일하게 호출하는 방법을 친숙한 API를 통해 보여드리겠습니다 모델 제공자 여러분께는 나만의 Language Model 패키지를 만드는 방법을 안내해 드리겠습니다 먼저 얼마나 쉽게 사용할 수 있는지 미리 보여드리겠습니다 온디바이스 Foundation Model입니다 생성하고 세션에 전달한 뒤 respond 함수를 호출하세요 모델 옵션은 더 있습니다 더 강력한 성능이 필요하다면 Private Cloud Compute를 사용해 보세요 모델만 바꾸면 됩니다 자체 모델을 배포하고 싶다면 CoreAI가 리소스를 가리키도록 하면 됩니다 최신 오픈소스 모델을 사용해 보고 싶다면 모델 ID를 전달하기만 하면 프레임워크가 나머지를 처리합니다
Language Model 프로토콜 기반으로 구축된 모델을 사용하면 Foundation Models의 모든 훌륭한 기능을 활용할 수 있습니다 Dynamic Profiles 같은 기능도요 추가된 모든 기능의 개요는 "What's new in the Foundation Models framework"를 확인해 보세요
이렇게 쉽게 모델을 교체할 수 있는 것은 모든 LanguageModel이 동일한 프로토콜을 따르기 때문입니다 System Language Model PCC Core AI MLX 그리고 커뮤니티가 만든 것들 모두요 모델 제공자라면 함께해 보세요! 방법을 보여드리겠습니다 프레임워크에 모델을 가져오는 네 가지 단계가 있습니다 패키징부터 시작하겠습니다 잘 만들어진 Swift 패키지는 개발자들이 쉽게 시작할 수 있게 합니다 다음으로 프로토콜을 구현합니다 모델을 설명하는 타입을 정의하고 이를 실행하는 Executor를 정의합니다 그다음 서버 기반 모델의 인증 구현 방법과 모범 사례를 알아보겠습니다 마지막으로 커스터마이징입니다 필요에 맞게 프로토콜의 구성 요소를 조정해야 한다면 그렇게 할 수 있습니다 응답 메타데이터 첨부부터 완전히 새로운 모달리티 정의까지 가능합니다 먼저 패키징입니다 Swift Package Manager 사용을 권장합니다 개발자들이 패키지를 간단히 앱 의존성으로 추가할 수 있어서입니다 Package.swift 설정 방법과 릴리스 배포 방법을 다루겠습니다 중요한 고려 사항은 지원할 플랫폼을 결정하는 것입니다 Foundation Models는 iOS를 지원하고 macOS visionOS watchOS도 지원해 개발자들이 다양한 경험을 만들 수 있습니다 여러분도 동일하게 지원하도록 권장합니다 Foundation Models 프레임워크가 오픈소스로 공개되기 때문에 서버에서 Swift를 배포하는 개발자들에게도 패키지가 유용할 수 있습니다 Linux 지원도 고려해 보세요 세 번째는 의존성입니다 모든 의존성은 바이트로 변환되어 개발자가 사용자에게 배포하는 크기가 됩니다 패키지에 연결된 의존성을 신중하게 고려하세요 패키지 배포는 git 태그를 생성하는 것만큼 쉽습니다
Swift Package Manager는 분산형이라 리포지토리 URL이 배포 채널이 됩니다 개발자들은 URL을 복사해 Xcode에 붙여 넣고 모델을 앱에 통합하기 시작할 수 있습니다 자세한 내용은 "Creating Swift Packages"를 참고하세요 패키지가 준비되면 프로토콜로 넘어갑니다 프로토콜은 모델과 Foundation Models 프레임워크 사이의 다리 역할을 합니다 프로토콜에는 두 가지 핵심 요소가 있습니다 첫 번째는 LanguageModel입니다 모델을 프레임워크에 설명합니다 capabilities를 통해 모델이 할 수 있는 것을 선언하고 capabilities를 통해 프레임워크가 모델의 Executor를 설정하는 데 필요한 구성을 제공합니다 Executor 설정에 필요한 구성을 제공합니다
두 번째는 LanguageModelExecutor로 실제 작업이 이루어지는 곳입니다 Configuration을 받는 이니셜라이저와 첫 번째 요청 전에 리소스를 준비하는 prewarm 함수 첫 번째 요청 전에 그리고 respond 함수가 있어 세션으로 생성 결과를 스트리밍합니다
Configuration이 두 타입을 연결합니다 Model이 제공하고 프레임워크가 이를 사용해 Executor를 구성합니다 코드에서 프로토콜을 살펴봤으니 모델의 configuration이 어떻게 Executor와 연결되는지 직관적으로 이해해 보겠습니다 각 세션은 Executor 스토어를 갖습니다 Model1이 도착하면 프레임워크가 모델의 configuration으로 스토어를 확인하지만 일치하는 Executor가 없습니다 그래서 LanguageModelSession이 새 Executor를 생성해 저장합니다 Model2가 동일한 configuration을 생성하면 Configuration이 Hashable이므로 프레임워크가 일치함을 인식하고 동일한 Executor를 사용합니다 조회 키는 모델이 아닌 configuration입니다 Model3가 다른 configuration을 생성하면 자체 Executor를 갖습니다 각 고유한 configuration은 스토어의 정확히 하나의 Executor에 매핑됩니다
코드에서는 어떻게 보일까요? LanguageModel 구현 예시입니다 capabilities를 선언하고 프레임워크가 Executor를 찾는 데 사용할 configuration을 반환합니다
Executor는 실제 작업이 이루어지는 곳으로 가중치 로드 리소스 관리 세션으로 토큰 스트리밍을 담당합니다 프레임워크가 모델이 제공하는 configuration으로 Executor를 구성하고 모든 요청에서 모델을 전달합니다 이 분리 덕분에 Model을 간단하게 구성할 수 있습니다 세션이 해제되면 스토어도 함께 해제됩니다 저장된 모든 Executor가 해제되고 deinit이 실행되며 가중치가 해제되고 연결이 닫힙니다 모두 자동으로요 해제 코드를 직접 작성할 필요가 없습니다 그 라이프사이클 안에서 Executor에는 prewarm이라는 함수가 하나 더 있습니다 요청이 도착하기 전에 개발자가 프레임워크에 prewarm을 요청할 수 있습니다 미리 비용이 큰 설정을 할 수 있는 기회입니다 가중치 로드, 연결 열기 등 첫 번째 응답을 느리게 할 수 있는 작업들을 미리 처리합니다 사용 방법을 살펴보겠습니다 한 가지 방법은 해당 설정을 비공개 헬퍼에 넣어 가중치를 한 번 로드하고 캐시하는 것입니다 prewarm이 헬퍼를 즉시 호출해 첫 번째 요청 전에 가중치가 준비됩니다 하지만 prewarm이 반드시 실행되지는 않습니다
어느 경우든 가중치는 정확히 한 번 로드되며 Executor에 비용이 큰 설정이 없다면 서버 기반 모델처럼 prewarm은 단순히 no-op이 될 수 있습니다 respond 함수가 호출되면 Executor가 작업을 시작합니다 대화의 transcript를 변환해 모델이 기대하는 형식으로 만듭니다 개발자가 설정한 옵션을 적용하고 세션으로 생성 이벤트를 스트리밍합니다
개발자 입장에서는 세션이 전체 상호작용 인터페이스입니다 모델을 초기화하고 세션을 생성하고 respond를 호출하고 기다립니다 Executor와 패키지의 나머지 부분은 모두 세션 뒤에 숨어 보이지 않습니다 개발자는 그 내부 동작을 볼 수 없지만 실제로는 이런 일이 일어납니다 프레임워크가 transcript 항목을 전달하지만 추론 엔진은 자체 네이티브 타입만 처리할 수 있습니다 그래서 Executor가 중간에 위치해 항목을 추론 엔진이 이해하는 메시지로 변환하고 추론을 위해 전달합니다 추론 엔진이 응답하면 동일한 변환이 역방향으로 실행됩니다 메시지가 다시 transcript 항목으로, 세션으로 스트리밍됩니다
이제 Executor로 들어오고 나가는 transcript에 집중해 보겠습니다
transcript는 지금까지의 대화로 항목의 시퀀스로 표현됩니다 각 항목은 역할을 갖습니다 개발자가 설정한 Instructions 사용자의 prompts 모델이 만든 tool call 그것들이 반환한 출력 그리고 모델이 생성한 응답들입니다
다시 넓게 보면 Executor의 역할은 각 transcript 항목을 추론 엔진이 읽을 수 있는 메시지로 변환하는 것입니다 transcript 안에는 무엇이 있을까요? Foundation Models는 여섯 가지 항목 타입을 정의합니다
모델은 자체 역할을 정의합니다 Executor의 역할은 둘 사이를 매핑하는 것입니다 모델의 형태가 어떻든 상관없이요 이 예에서 instructions, prompt, response는 system, user, assistant로 매핑됩니다 여기서 tool call, tool output 그리고 reasoning 모두 assistant로 매핑됩니다 모델이 자신의 턴에 한 작업의 일부이기 때문입니다 이 모델이 이런 것들을 위한 전용 역할이 없으므로 assistant로 매핑합니다 모델이 전용 tool 역할 같은 것을 정의한다면 그쪽으로 라우팅할 수 있습니다 어느 경우든 Executor가 제어권을 갖습니다 Executor가 대화를 읽습니다 하지만 모든 요청에는 기록 이상의 것이 담겨 있습니다 모델이 어떻게 응답해야 하는지에 대한 개발자의 의도가 담겨 있으며 두 가지 추가 속성으로 표현됩니다
모든 요청 객체에는 ContextOptions가 포함될 수 있고 GenerationOptions도 포함될 수 있습니다 ContextOptions는 프롬프트에 들어가는 내용을 제어합니다 모델이 사용할 reasoning 수준이나 응답 스키마 같은 것들을요 GenerationOptions는 디코더 루프를 제어합니다 샘플링 전략 temperature 최대 응답 길이 같은 것들을요
respond 내부에서 어떻게 보이는지 살펴보겠습니다 두 타입의 옵션이 요청으로 들어오고 Executor가 이를 꺼내서 모델을 호출할 때 전달합니다 들어오는 내용은 다 봤습니다 transcript, 옵션, 모두 파싱됩니다 이제 개발자가 보게 될 부분인 응답입니다 응답 측에서는 몇 가지를 보내야 합니다 추론 엔진이 생성하는 텍스트 tool call이나 reasoning 그것들과 함께 전달되는 메타데이터 모두 채널의 이벤트로 전송됩니다 추론 엔진이 내보내는 각 청크 토큰이나 tool call 조각이 이벤트가 됩니다 textDelta, toolCallDelta 등이 있습니다 프레임워크가 이것들을 transcript에 기록합니다 Foundation Models는 단발성 응답과 스트리밍 응답을 모두 제공하지만 구현은 항상 스트리밍이며 단발성 API는 내부적으로 delta를 수집합니다
지금까지 모델 측에서 봤습니다 모델이 생성하는 대로 이벤트가 나가는 것을요 잠시 개발자 입장이 되어 보세요 respond를 호출하고 기다리고 있습니다 무엇이 먼저 필요할까요?
Executor 측에서 개발자와 핸드셰이크하는 방식입니다 의도적인 순서가 있습니다 먼저 메타데이터 업데이트 개발자가 로깅과 디버깅에 사용할 수 있는 모델 및 요청 ID입니다 그다음 사용량 업데이트 정산을 위한 프롬프트 토큰 수입니다 이것들을 먼저 보내면 개발자가 전체 스트림을 기다리지 않고 각 요청 비용을 알 수 있습니다 마지막으로 모델이 생성하는 각 토큰마다 도착하는 즉시 text delta를 보내세요 프레임워크가 도착하는 대로 해당 delta를 세션에 스트리밍하여 사용자가 한꺼번에 보는 대신 단어 단위로 응답을 볼 수 있습니다 앞서 프레임워크가 configuration으로 Executor를 캐싱하는 방법을 봤습니다 통합이 상태 저장 방식이라면 호출 간 KV 캐시나 영속 세션을 유지한다면 그 캐싱 덕분에 네트워크 부하를 최소화하고 작업 반복을 피할 수 있습니다 이제 이를 활용하도록 설계하는 방법과 Executor가 호출 간 작업을 보존하는 방법을 살펴보겠습니다 Executor는 respond 호출마다 전체 transcript를 받습니다 지난번에 처리한 내용이 있습니다 instruction prompt 그리고 생성한 응답입니다 다음 호출이 들어오면 새 transcript를 지난번에 저장한 것과 비교합니다 대부분의 경우 새 항목이 추가되었을 뿐입니다 마지막 응답 이후 새 prompt가 추가된 것입니다 그런 경우에는 기존 상태를 유지하고 새로운 것만 처리할 수 있습니다 하지만 때로는 비교에서 항목이 제거되거나 수정된 것을 발견합니다 예를 들어 개발자가 컨텍스트 절약을 위해 오래된 항목을 제거하는 경우입니다 그런 경우에는 transcript가 분기된 지점까지 무효화해야 합니다 프레임워크는 매 호출마다 전체 transcript를 제공합니다 Executor가 무엇이 일치하는지 변경 사항을 어떻게 처리할지 결정합니다 때로는 모델이 개발자가 요청한 것을 정확히 수행할 수 없습니다 그런 경우 Executor는 두 가지 선택이 있습니다 근사하거나 에러를 던지는 것입니다 가능한 곳에서는 유연하게 대응하고 개발자의 의도를 존중하세요 하지만 때로는 정직한 근사값이 없습니다 개발자가 토큰 제한을 설정하고 필수 필드가 있는 스키마도 지정하면 둘 다 만족시킬 방법이 없을 수 있습니다 그럴 때는 에러를 던지세요 Foundation Models는 바로 이런 경우를 위해 LanguageModelError를 제공합니다 컨텍스트 윈도우 초과, 속도 제한, 거부 등이 포함됩니다 이 중 하나를 던지면 프레임워크를 사용해 본 개발자라면 이미 처리 방법을 알고 있습니다
내장 LanguageModelError 케이스가 상황을 다루지 못할 때는 자체 에러 타입을 정의하세요 일부 실패는 서비스 컨텍스트에서만 의미가 있습니다 구독 계층, 기능, 계정 상태 같은 것들은요 목적에 맞는 케이스 이름이 의도를 전달하여 이를 처리하는 개발자가 정확히 무슨 일이 있었는지 알 수 있습니다 커스텀 에러는 강력하고 때로는 필요합니다 하지만 각각은 개발자가 학습하고 처리해야 하는 새로운 케이스입니다 맞는 경우에는 내장 LanguageModelError를 사용하도록 노력하고 서비스만 생성할 수 있는 실패에만 커스텀 에러를 사용하세요 프로토콜 요구 사항 구현이 완료되었습니다 다음으로 인증 처리 방법을 논의해 보겠습니다 패키지 작성자로서 여러분의 역할은 개발자들이 올바른 방법을 쉽게 따를 수 있도록 하는 것입니다 이니셜라이저가 API 키를 문자열로 받는다면 개발자들은 최소 저항의 경로를 택하고 싶을 것입니다 대신 개발자들이 올바른 방법을 따르도록 토큰 제공자나 로그인 흐름을 제공하세요 패키지가 개발자 대신 액세스 토큰을 가져온다면 Keychain을 사용해 안전하게 저장하세요 자격 증명 처리가 절반이라면 기기 증명이 나머지 절반입니다 클라우드 기반 LanguageModel 패키지를 배포한다면 이 부분을 깊이 살펴볼 가치가 있습니다 관련 세션에서는 기기 검증 변조된 빌드 탐지 페이로드 서명 Apple의 사기 신호를 활용해 서비스의 불량 트래픽을 차단하는 방법을 다룹니다 "Secure your apps with App Attest"를 확인해 보세요 모델을 패키징했고 프로토콜을 구현했으며 인증도 처리했습니다 이제 LanguageModel을 위한 탄탄한 패키지가 완성되었습니다 모든 기본 사항이 갖춰졌습니다 이제 차별화할 시간입니다 프로토콜은 LanguageModelSession을 모델만이 제공하는 능력에 맞게 설계할 여지를 줍니다 응답 메타데이터는 가벼운 옵션으로 응답에 추가 정보를 첨부하고 개발자에게 접근하는 명확한 방법을 제공합니다
응답에 자체 커스텀 메타데이터를 첨부할 수 있습니다 여기서는 스트리밍이 완료된 후 Executor가 tokensPerSecond와 timeToFirstToken을 채널을 통해 보냅니다 메타데이터를 쉽게 활용할 수 있는 유틸리티나 문서를 제공하는 것을 권장합니다 개발자들이 메타데이터를 쉽게 활용할 수 있도록요 명확한 키 타입이 있는 접근자 적합한 것 무엇이든요 내부적으로 메타데이터는 단순히 딕셔너리입니다 문자열, 숫자, 기타 내장 타입을 포함할 수 있습니다 하지만 경우에 따라 더 유연한 것이 필요할 수 있습니다
커스텀 세그먼트가 답입니다 새 세그먼트 타입을 정의하고 Executor에서 받아 동일한 채널을 통해 결과를 스트리밍하면 개발자는 LanguageModelSession을 벗어나지 않고도 사용할 수 있습니다 커스텀 세그먼트 타입으로 프로토콜을 확장할 수 있습니다 새로운 모달리티가 등장할 때 오디오, 비디오, 다음에 올 무엇이든 개발자들은 타입이 있고 구조화된 방식으로 해당 데이터를 모델에 보낼 수 있습니다 작동 방식을 살펴보겠습니다 먼저 custom segment를 준수하는 타입을 정의합니다 커스텀 세그먼트는 PromptRepresentable이어야 하므로 개발자들이 프롬프트에 직접 전달할 수 있습니다 텍스트처럼요 Executor에서 이것을 transcript의 customSegment로 받게 됩니다 이미 처리 중인 텍스트 항목과 함께요 모델이 응답하면 채널을 통해 결과를 다시 내보냅니다 커스텀 세그먼트 업데이트로요 세그먼트 ID가 새 세그먼트를 추가할지 이미 스트리밍을 시작한 것을 업데이트할지 제어합니다 앱으로 결과가 스트리밍되는 방식을 완전히 제어할 수 있습니다 커스텀 세그먼트를 갖추면 한 가지 더 언급할 것이 있습니다 서버 측 도구에 대한 권고 사항입니다 서버 측 도구는 모델이 자체적으로 실행하는 기능입니다 웹 검색 코드 실행 이미지 생성 같은 것들이죠 모델이 실행을 요청하고 서버가 실행하며 Executor가 스트리밍되는 결과를 봅니다 세 가지 수준의 세부 사항을 살펴보겠습니다 각각 도구의 작업을 더 많이 노출하며 웹 검색을 예시로 사용합니다 서버 측 도구는 모델에 있는 이름 있고 타입이 있는 값입니다 개발자가 원하는 도구로 모델을 구성하고 Executor가 모든 요청에서 모델을 통해 이를 받습니다 모델이 선언하는 다른 모든 capability를 받는 것과 같은 방식으로요 첫 번째이자 가장 간단한 패턴입니다 도구를 비공개로 실행하고 답만 스트리밍합니다 도구가 모델의 응답을 보강하지만 작업 내용은 Executor 안에 남습니다
추가하는 각 text delta가 프레임워크에 의해 transcript에 스트리밍됩니다 도구가 생성했다는 흔적 없이요 도구 출력을 기반으로 답변을 보강하는 것 외에 응답에 추가 메타데이터를 첨부할 수도 있습니다 text delta가 메타데이터를 담을 때 인용 같은 것들을 채널에 모두 전달하면 프레임워크가 메타데이터를 transcript의 텍스트 세그먼트에 첨부합니다
마지막으로 도구의 작업 자체를 노출하도록 선택할 수 있습니다 커스텀 세그먼트로 도구의 구조화된 출력을 채널에 전달하면 텍스트와 메타데이터와 함께 앱이 모델이 생성한 모든 것을 볼 수 있습니다 하나의 채널을 통해 전달하는 이벤트 첨부하는 메타데이터 설계하는 커스텀 세그먼트를 통해 서버 측 도구가 패키지를 사용하는 앱이 사용자에게 보여줄 수 있는 것을 결정합니다 한 가지 더 명심할 것이 있습니다 패키지를 선택하든 배포하든 체인의 모든 사람이 이해하도록 하세요 그 뒤의 모델이 가져오는 개인정보 보호 영향을요 온디바이스 모델과 클라우드 기반 모델은 개인정보 보호 특성이 매우 다릅니다 사용자는 어떤 것을 사용하는지 알 권리가 있습니다 모델을 프레임워크로 가져오는 방법을 살펴봤습니다 이 세션들은 개발자들이 이것으로 만들 것을 보여줍니다 "Integrate On-Device AI Models into Your App Using Core AI"를 확인해 로컬 모델을 앱에 직접 번들링하는 방법을 알아보세요 "Build with the new Apple Foundation Model on Private Cloud Compute"는 Apple의 개인정보 보호 보장과 함께 서버 규모 추론을 깊이 다룹니다 그리고 "Build agentic app experiences with the Foundation Models framework"는 개발자들이 dynamic profiles를 사용해 다단계로 도구를 사용하는 워크플로우를 여러분과 같은 모델 위에 구축하는 방법을 보여줍니다 앞으로 펼쳐질 것이 기대됩니다 LanguageModel 패키지의 번창하는 생태계를 보고 싶습니다 Swift 개발자들이 자유롭게 선택하고 앱에 맞는 모델을 선택할 수 있도록요 여러분이 무엇을 만들지 기대됩니다

import FoundationModels
import MLXFoundationModels

// On-device Apple Foundation Model
let model = SystemLanguageModel()

// Private Cloud Compute model
// let model = PrivateCloudComputeLanguageModel()

// Custom Core AI model
// let model = try await CoreAILanguageModel(resourcesAt: modelURL)

// Open-source MLX model from HuggingFace
// let model = MLXLanguageModel(modelID: "mlx-community/my-model")

let session = LanguageModelSession(model: model)
let response = try await session.respond(to: "...")
print(response.content)

3:46 - Configure Package.swift for your model package

// Package.swift

let package = Package(
    name: "MyModel",
    platforms: [
        .macOS(.v27), .iOS(.v27), .visionOS(.v27), .watchOS(.v27)
    ],
    products: [
        .library(name: "MyModel", targets: ["MyModel"])
    ],
    dependencies: [
        .package(url: "...", .upToNextMinor(from: "1.0.0"))
    ],
    targets: [
        .target(name: "MyModelRuntime"),
        // public: LanguageModel conformance
        .target(name: "MyModel", dependencies: ["MyModelRuntime"]),
        .testTarget(name: "MyModelTests", dependencies: ["MyModel"])
    ]
)

4:56 - LanguageModel and LanguageModelExecutor protocols

// LanguageModel protocol

public protocol LanguageModel: Sendable {
    var capabilities: LanguageModelCapabilities { get }
    var executorConfiguration: Executor.Configuration { get }
}

// LanguageModelExecutor protocol

public protocol LanguageModelExecutor: Sendable {
    init(configuration: Configuration) throws
    func prewarm(model: Model, transcript: Transcript)
    func respond(
        to request: LanguageModelExecutorGenerationRequest,
        model: Model,
        streamingInto channel: LanguageModelExecutorGenerationChannel
    ) async throws
}

6:25 - Implement LanguageModel and Executor conformances

// LanguageModel conformance
public struct MyLanguageModel: LanguageModel {
    typealias Executor = MyLanguageModelExecutor

    public var capabilities: LanguageModelCapabilities {
        LanguageModelCapabilities(capabilities: [
            .toolCalling, .guidedGeneration, .reasoning
        ])
    }

    public var executorConfiguration: Executor.Configuration {
        Executor.Configuration(/* ... */)
    }
}

// Executor conformance
public struct MyLanguageModelExecutor: LanguageModelExecutor {
    public typealias Model = MyLanguageModel

    public struct Configuration: Hashable, Sendable { /* ... */ }

    public init(configuration: Configuration) throws { /* ... */ }

    public func respond(
        to request: LanguageModelExecutorGenerationRequest,
        model: MyLanguageModel,
        streamingInto channel: LanguageModelExecutorGenerationChannel
    ) async throws { /* ... */ }
}

7:28 - Manage model resources with prewarm and respond

// One approach to managing resources

struct MyLanguageModelExecutor: LanguageModelExecutor {

    private mutating func loadModelIfNeeded() throws -> LoadedWeights {
        let weights = try loadedModel ?? loadWeights()
        loadedModel = weights
        return weights
    }

    func prewarm(transcript: Transcript) {
        loadedModel = try? loadModelIfNeeded()
    }

    func respond( ... ) async throws {
        let weights = try loadModelIfNeeded()
        // ...generate with 'weights'...
    }
}

9:00 - Map Transcript entries to model messages

// Transcript entries

let transcript = Transcript(entries: [
    .instructions( ... ),  // "You are a helpful assistant"

    .prompt( ... ),        // "What's the weather in Pittsburgh?"
    .toolCalls( ... ),     // getWeather(location: "Pittsburgh")
    .toolOutput( ... ),    // 65°F, sunny
    .response( ... ),      // "It's 65°F and sunny in Pittsburgh"

    .prompt( ... ),        // "What's the address of Apple Park?"
    .response( ... ),      // "One Apple Park Way, Cupertino, CA 95014"
])

10:42 - Read generation and context options from the request

// Parse generation and context options

func respond(
    to request: LanguageModelExecutorGenerationRequest,
    model: MyLanguageModel,
    streamingInto channel: LanguageModelExecutorGenerationChannel
) async throws {
    let reasoningLevel = request.contextOptions.reasoningLevel
    let temperature = request.generationOptions.temperature
    let maxTokens = request.generationOptions.maximumResponseTokens
}

11:47 - Stream tokens and metadata through the channel

// Streaming text tokens

func respond( ... ) async throws {
    // 1. Report metadata
    await channel.send(.response(action: .updateMetadata([
        "modelID": "my-model-2026-06-08",
        "requestID": request.id.uuidString
    ])))
    // 2. Report prompt token usage before generating
    await channel.send(.response(action: .updateUsage(
        input: .init(totalTokenCount: promptTokens, cachedTokenCount: cachedTokens),
        output: .init(totalTokenCount: 0, reasoningTokenCount: 0)
    )))
    // 3. Stream text deltas as the model generates
    for try await token in tokens {
        await channel.send(.response(action: .appendText(token)))
    }
}

13:33 - Honor the developer's intent or throw

// Honor the developer's intention where possible

// The developer set sampling: .greedy, but our service only takes temperature
if request.generationOptions.sampling?.kind == .greedy {
    serviceRequest.temperature = 0
}

// Otherwise, throw an error

// The token budget is too small to satisfy the schema
if let schema = request.schema,
   let budget = request.generationOptions.maximumResponseTokens,
   budget < minimumTokens(for: schema) {
    throw LanguageModelError.unsupportedCapability(
        .init(
            capability: .guidedGeneration,
            debugDescription: "Token budget too small to satisfy this schema."
        )
    )
}

13:57 - Built-in errors that any model can throw

// Built-in errors that any model can throw

public enum LanguageModelError: LocalizedError, CustomDebugStringConvertible {
    // Transcript grew past the model's context window. Trim entries and retry.
    case contextSizeExceeded(     )
    // Too many requests in a short window. Space them out or reduce load.
    case rateLimited(     )
    // Model declined to answer. Fall back to a message of your choosing.
    case refusal(     )
    // Safety guardrails tripped on the prompt or the response.
    case guardrailViolation(     )
    // Model lacks a feature you used, such as guided generation or tools.
    case unsupportedCapability(     )
    // Prompt contains content the model can't process (bad files, unknown formats).
    case unsupportedTranscriptContent(     )
    // A generation guide (e.g., a regex pattern) isn't supported by this model.
    case unsupportedGenerationGuide(     )
    // Prompt asked for output in a language or locale the model doesn't support.
    case unsupportedLanguageOrLocale(     )
    // Request timed out before the model produced a response.
    case timeout(     )
}

14:14 - Handle errors from your model executor

// Custom errors

public enum MyModelError: Error, LocalizedError {
    // User hit monthly token limit. Prompt upgrade or wait for reset.
    case exceededSubscriptionTierLimit
    // Model variant isn't enabled on this account.
    case modelNotProvisioned
    // Billing or policy review locked this account.
    case accountSuspended

    public var errorDescription: String? {
        switch self {
        case .exceededSubscriptionTierLimit:
            String(localized: "Your plan limit has been reached.")
        // ...
        }
    }
}

16:08 - Attach custom metadata to responses

// Attach service-specific performance metadata

let elapsed = Date().timeIntervalSince(startTime)
let tokensPerSecond = Double(tokenCount) / elapsed
let timeToFirstToken = firstTokenTime?.timeIntervalSince(startTime) ?? 0

await channel.send(.metadataUpdate([
    "tokensPerSecond": tokensPerSecond,
    "timeToFirstToken": timeToFirstToken
]))

17:05 - Define and use custom Transcript segments

// Define a custom segment
public struct AudioSegment: Transcript.CustomSegment {
    public var id: String
    public var content: URL
}

// Pass it in a prompt
let recording = AudioSegment(id: UUID().uuidString, content: URL(filePath: "/path/to/recording.m4a"))
let response = try await session.respond {
    "Where was Frank Lloyd Wright's original architecture school located?"
    recording
}

// Emit a custom segment from the executor
for try await event in stream {
    switch event {
    case .audioFileGenerated(let file):
        await channel.send(.response(action: .updateCustomSegment(
            AudioSegment(id: file.id, content: file.url)
        )))
    }
}

18:09 - Implement server-side tools in your model

// Configure server-side tools
public struct MyLanguageModel: LanguageModel {
    public struct ServerTool: Sendable {
        public static let webSearch: ServerTool = ...
    }
    public init(serverTools: [ServerTool] = []) { }
}

// Surface tool results through the channel
let client = MyServerClient(serverTools: model.serverTools)
let response = try await client.send(prompt: .init(request))
for try await chunk in response {
    switch chunk {
    case .webSearch(let webSearch):
        await channel.send(.response(action: .updateCustomSegment(
            WebSearchSegment(url: webSearch.url, content: webSearch.html)
        )))
    case .textDelta(let textDelta):
        await channel.send(.response(action: .appendText(
            textDelta.text, tokenCount: textDelta.tokenCount
        )))
    }
}

- 0:00 - Introduction
- Overview of the Foundation Models framework opening to nearly any LLM. Covers improvements to the on-device System Language Model, three new model options (Private Cloud Compute, Core AI, and MLX), upcoming Anthropic and Google partner integrations, and a code preview showing how any model can be swapped into a LanguageModelSession using the same Swift API.
- 3:37 - Packaging
- How to package your LLM provider as a Swift package — configuring Package.swift with the right platform targets (iOS, macOS, visionOS, watchOS, and Linux), being deliberate about dependencies to minimize shipped bytes, and publishing a release via a git tag that developers can paste directly into Xcode.
- 4:48 - Protocol
- The two core protocol types bridging your model to the framework: LanguageModel (declares capabilities and provides a Configuration) and LanguageModelExecutor (handles prewarm, translates Transcript entries to your inference engine's native format, applies ContextOptions and GenerationOptions, and streams responses with metadata-first ordering). Covers executor caching by configuration and KV cache state reuse across calls, plus how to approximate unsupported options or throw LanguageModelError when needed.
- 14:50 - Authentication
- Best practices for credential handling — designing initializers that guide developers toward secure usage rather than plain API key strings, persisting tokens securely via Keychain, and using App Attest for device attestation to verify devices, catch tampered builds, and protect cloud-based language model services.
- 15:51 - Customization
- How to differentiate your model package beyond the protocol fundamentals — attaching custom response metadata (e.g., tokensPerSecond, timeToFirstToken), defining custom segment types for new input and output modalities (audio, video, and beyond), and implementing server-side tools (web search, code execution, image generation) at three levels of visibility: privately grounded, metadata-enriched, or fully surfaced through custom segments.
- 19:47 - Next steps
- Privacy considerations when choosing or shipping a model package — on-device versus cloud-based models have very different characteristics and users deserve to know which they're getting. Pointers to companion sessions on Core AI model integration, Private Cloud Compute, and building agentic app experiences on top of the new model ecosystem.

시작하기 탐색

알림 받기

플랫폼 탐색

피처링

기술 탐색

피처링

커뮤니티 탐색

피처링

문서 탐색

릴리즈 노트

다운로드 탐색

피처링

지원 탐색

피처링

빠른 링크

챕터

리소스

관련 비디오

WWDC26