通过专用云计算充分利用 Apple Foundation Model

借助专用云计算，你能够访问功能强大的前沿模型，同时保护用户隐私。了解专用云计算的运行机制，以及如何利用 Foundation Models 框架访问这项功能。探索在你的 App 中检查可用性并平稳地处理回退事件的最佳做法。

章节

0:00 - Introduction
1:23 - What is Private Cloud Compute
2:43 - Integrating PCC with Foundation Models
4:00 - Deciding between on-device and PCC
4:32 - Reasoning levels and context size
6:15 - Evaluating and combining models
7:10 - Handling usage limits
10:15 - Next steps

资源

你好我是Louis 在这个视频中我将向你展示如何访问强大的新服务器LLM 在你的App中使用Private Cloud Compute 去年我们提供了强大的设备端LLM访问能力通过全新的Foundation Models框架今年我们让设备端LLM 变得更加出色它现在支持图像输入指令跟随能力更强调用自定义工具也更出色但我们知道有些更复杂的使用场景需要一个更加强大的模型因此今年我们还将提供新服务器模型的访问能力运行在Private Cloud Compute上有了这个模型你可以在App中构建复杂的AI功能例如能够处理大量用户输入的助手

或依赖大量工具调用的功能具有大量输出

你甚至可以从watchOS 调用Private Cloud Compute

在本视频中我们将介绍什么是Private Cloud Compute 以及如何在你的App中访问它使用Foundation Models框架以及如何处理用量限额

Private Cloud Compute 为我们的系统功能提供支持将复杂任务发送到Apple的服务器你的App现在也可以使用这项功能这意味着你可以访问强大的服务器LLM 同时不会损害隐私 Private Cloud Compute在设计之初就以端到端隐私为核心确保用户数据永不被存储数据仅用于处理请求所有这些都已由研究人员独立验证而且还有更好的 Private Cloud Compute已集成在系统中与iCloud协同工作所以你不必担心身份验证或API密钥这是使用服务器模型时通常需要的你的用户只需要一台支持Apple Intelligence的设备无需账户设置无需身份验证也无需API密钥这真的是你用过的最简单的服务器LLM 更棒的是对开发者而言没有Token费用每个用户有每日限额用户可以升级到iCloud+ 以获得更高的限额

此模型适用于下载量不足200万次的App 你今天就可以在开发者网站上申请让我们来看看如何在你的App中集成这个功能使用Foundation Models框架如果你的App已经使用Foundation Models 你知道只需3行代码即可向设备端LLM发送提示你创建一个Session 然后让它响应你的提示现在只需修改1行代码即可切换到 PCC上的新服务器模型仅需那一行你就可以与一个更大的模型通信具有更大的上下文和更复杂的推理能力 Foundation Models框架提供了统一的Swift API 无论你与哪种模型通信使用Generable获取结构化输出或调用Tools 与PCC模型的工作方式完全相同就像使用设备端模型一样

这让你可以轻松地在模型之间切换而无需重写代码

请记住就像设备端模型一样 PCC仅在支持Apple Intelligence 的设备上可用检查可用性API非常重要

并优雅地处理 Apple Intelligence 在用户设备上不可用的情况在使用Foundation Models 编写功能时选择使用哪个模型是一个重要决策让我们来看看两者之间的区别设备端System模型与PCC模型的对比它们都提供隐私保护设备端模型可离线工作而PCC需要网络连接设备端模型没有请求限制而PCC每位用户有每日限额对于某些功能上下文大小是另一个重要因素

设备端模型提供4K PCC则提供32K PCC模型支持推理那么什么是推理？

当LLM响应你的提示时它通常只是读取提示并生成响应有了推理功能模型在生成响应之前会先思考这实际上是通过让模型生成额外的文本来实现的在转录稿的单独片段中 PCC模型提供 3个推理级别 Light让模型收集一些额外上下文 Moderate让模型进行更深入的推理使用Deep时推理片段的文本可能比实际响应还要长

你可以在Session上调用 respond时设置推理级别

你的Session转录稿包含推理片段你可以观察转录稿以显示进度这在使用Deep推理级别时尤为有用这可能需要一些时间但请记住

推理是模型生成的额外文本因此会使用Token 这会计入你的上下文大小限制

说到上下文大小我们还添加了一个便捷API 让你以编程方式获取模型的上下文大小只需访问contextSize属性在SystemLanguageModel上或PrivateCloudComputeLanguageModel上在选择设备端模型与PCC模型时或决定使用哪个推理级别时最好根据数据来做这个决定而不仅仅凭感觉评估可以帮你了解特定功能的质量你可能会惊讶于设备端模型的表现有多好在某些任务上尤其是今年更新后的模型但唯一的判断方式就是通过评估

这就是我们创建全新Evaluations框架的原因这是一个新的Swift框架可以帮你评估你的Foundation Models功能它直接集成在Xcode中且易于上手你可以查看"Meet the Evaluations framework"以了解更多

你甚至可以同时使用设备端和服务器模型查看"Build agentic app experiences with Foundation Models" 以了解更多相关内容在你的App中使用PCC模型时妥善处理用量限额非常重要请求会计入用户的iCloud账户你可以针对用户触达限额的情况来优化你的App 让我们看看如何实现这一点

这里我有一个使用PCC模型对文章进行摘要的App 我可以选择一个Markdown文件我们获取文本和图像将其传入LanguageModelSession 并生成摘要这得益于PCC提供的大上下文大小但当用户触达限额时请求会抛出错误如果这个错误只是显示在界面上这不是一个好的用户体验因为它不够实用为了更好地处理这种情况你可以检查模型quotaUsage 的isLimitReached 并在你的App中用自定义界面处理它这里我在按钮下方使用了一个Label

当用户的限额被超出时你可以显示一个按钮让用户管理其限额例如用户可以升级账户以获得更高的限额这样他们就能发出更多请求

你应该将此功能与现有界面集成避免为用量限额显示弹窗提示因为这个界面应该持续显示而不是被关闭相反你可以更新界面的状态例如禁用发出请求的按钮在该按钮下方我显示了一个低调的Label 以及让用户获得更高限额的按钮如有需要你还可以检测用户即将接近限额的情况这有助于向用户表明他们接近每日限额让他们可以做出明智的决定选择要发出哪些请求在Xcode中我们有一个便捷的调试选项来模拟用量限额状态在你的Scheme中选择Debug 然后选择Options

这里有Simulate Apple Foundation Models Availability选项我们可以选择Quota Usage Limit Reached 来模拟我们刚才在界面中处理的情况我们还可以选择 Nearing Usage Limit 来模拟用户即将达到每日限额的情况

我们之前已经处理了 isLimitReached情况在之前的代码中现在我们也可以测试belowLimit情况就像isLimitReached一样我们可以显示一个简单Label

在App中这现在在请求按钮下方显示Label 同样这包含了可操作的按钮现在用户可以控制其限额即使尚未达到最大值这一切只需几行代码这就是集成的快速概述将Private Cloud Compute 集成到你的App中如果你想在App中使用这个新服务器模型你今天就可以在开发者网站上申请我们还有大量其他内容介绍Foundation Models 及相关框架的新功能你可以先观看"What's new in the Foundation Models framework" 获取精彩概述为了更好地了解模型在运行时的行为你可以查看"Debug and profile agentic app experiences with Instruments" 感谢观看！那本书在哪里？我需要把它拿到图书馆去

不真的那本书在哪里？

2:49 - Prompt the on-device model

import FoundationModels

  let session = LanguageModelSession()
  let response = try await session.respond(to: "Summarize this article: \(article)")

3:02 - Switch to the PCC server model (one-line change)

import FoundationModels
  
  let session = LanguageModelSession(
      model: PrivateCloudComputeLanguageModel()
  )
  let response = try await session.respond(to: "Summarize this article: \(article)")

3:25 - Structured output and tools work the same

import FoundationModels

  @Generable
  struct ArticleSummary {
      let oneLineSummary: String
      let keyPoints: [String]
  }

  struct FindRelatedArticlesTool: Tool {

  }
  
  let session = LanguageModelSession(
      model: PrivateCloudComputeLanguageModel(),
      tools: [FindRelatedArticlesTool.self]
  )

  let response = try await session.respond(
      to: "Summarize this article: \(article)",
      generating: ArticleSummary.self
  )

3:51 - Check availability

import FoundationModels
  
  struct ArticleSummarizationView: View {
      private var model = PrivateCloudComputeLanguageModel()

      var body: some View {
          if model.isAvailable {
              // Show UI for making request
          } else {
              // Fall back
          }
      }
  }

5:26 - Set a reasoning level

let response = try await session.respond(
      to: prompt,
      contextOptions: ContextOptions(reasoningLevel: .light)
  )
  // Reasoning levels: .light, .moderate, .deep

5:58 - Read the context size

SystemLanguageModel().contextSize
  // 4096 on 26.0
  // 8192 on 27.0 (newer devices)

  PrivateCloudComputeLanguageModel().contextSize
  // 32768

9:41 - Handle usage limits

struct ArticleSummarizationView: View {
      private var model = PrivateCloudComputeLanguageModel()

      var body: some View {
          if case .belowLimit(let info) = model.quotaUsage.status {
              if info.isApproachingLimit {
                  Text("Nearing usage limit.")
                      .foregroundStyle(Color.orange)
              }
          }
          if model.quotaUsage.isLimitReached {
              Text("Usage limit exceeded.")
                  .foregroundStyle(Color.red)
          }
          if let suggestion = model.quotaUsage.limitIncreaseSuggestion {
              Button("Show options") {
                  suggestion.show()
              }
          }
      }
  }

0:00 - Introduction
Access to a new server LLM via Private Cloud Compute. The on-device model also improves this year (image input, better instruction following and tool calling), but PCC enables more complex features: reasoning over large input, many tool calls with large outputs, even from watchOS.
1:23 - What is Private Cloud Compute
PCC delivers a powerful server model without compromising privacy: data is never stored, used only for the request, and independently verified. It's integrated with the OS and iCloud, so there's no authentication or API keys, no token cost to developers, a daily per-user limit (higher with iCloud+), and eligibility for apps under 2M downloads.
2:43 - Integrating PCC with Foundation Models
Prompting the on-device model takes three lines; switching to the PCC server model changes just one. The unified Swift API means Generable structured output and tool calling work identically, so you can switch models without rewriting code, and should check the availability API for non-Apple Intelligence devices.
4:00 - Deciding between on-device and PCC
Both offer privacy, but the on-device model works offline with no request limits and a 4K context, while PCC needs a connection, has a daily limit, offers a 32K context, and supports reasoning.
4:32 - Reasoning levels and context size
Reasoning lets the model think before responding by generating extra transcript text, at three levels (light, moderate, deep). Set it on respond, observe the transcript to show progress, and remember reasoning consumes tokens against the context limit, now readable via the contextSize property.
6:15 - Evaluating and combining models
Choose models and reasoning levels based on data, not vibes; the updated on-device model may surprise you. Use the new Evaluations framework (see "Meet the Evaluations framework") and combine on-device and server models together (see "Build agentic app experiences with Foundation Models").
7:10 - Handling usage limits
Handle the per-user iCloud quota gracefully: check isLimitReached on the model's quotaUsage and show persistent, actionable UI (such as a disabled button with an upgrade option) rather than an alert. Detect the approaching-limit case too, and use Xcode's Simulate Apple Foundation Models Availability debug option to test both states.
10:15 - Next steps
Apply for the server model on the developer website, and explore related content: "What's new in the Foundation Models framework" for an overview and "Debug and profile agentic app experiences with Instruments" for runtime behavior.

探索“入门汇总”

及时了解最新动态

探索“平台”

精选

探索“技术”

精选

探索“社区”

精选

探索“文档”

发布说明

探索“下载”

精选

探索“支持”

精选

快速链接

通过专用云计算充分利用 Apple Foundation Model

章节

资源