Post-LLM Systems 后大模型系统 class: yl/pl/000

Post-LLM Systems

后大模型系统

This program investigates verification-first architectures for inference systems that operate beyond token prediction. Work centers on memory architectures with defined retention policies, constrained orchestration layers, and governance toolchains that enforce audit requirements at every inference boundary.

本项目研究以验证为先的架构,用于超越标记预测的推断系统。工作集中在具有定义保留策略的内存架构、约束编排层以及在每个推断边界执行审计要求的治理工具链。

Every system produced under this program ships with its evaluation harness. No inference path is deployed without a corresponding constraint surface definition and verification budget.

本项目下产出的每个系统均附带其评估框架。任何推断路径在没有对应约束面定义和验证预算的情况下均不得部署。

Program Metadata 项目元数据
Domain Post-LLM
Status ACTIVE
Programs Active 5
Division Research Systems
Division ID yl-div-001
Classification yl/pl/000
← back to research index

Scope

研究范围

  • Inference verification: formal and empirical methods for validating inference outputs against defined correctness criteria, including budget-constrained verification schedules. 推断验证:用于根据定义的正确性标准验证推断输出的形式化和经验方法,包括预算约束的验证调度。
  • Memory architecture: structured retention and recall systems with defined decay policies, capacity bounds, and provenance tracking for all stored inference artifacts. 内存架构:具有定义衰减策略、容量边界和所有存储推断产物溯源跟踪的结构化保留与检索系统。
  • Constrained orchestration: multi-agent coordination under explicit resource budgets, latency bounds, and permission surfaces that prevent unconstrained task delegation. 约束编排:在明确的资源预算、延迟边界和权限面下的多智能体协调,防止无约束的任务委派。
  • Governance toolchains: audit trail generation, compliance verification, and policy enforcement tooling that operates as a mandatory component of the inference pipeline. 治理工具链:作为推断管道强制组件运行的审计追踪生成、合规验证和策略执行工具。
  • Post-prediction evaluation: methods for assessing system behavior in domains where token-level prediction accuracy is insufficient as a sole evaluation metric. 后预测评估:在标记级预测精度不足以作为唯一评估指标的领域中评估系统行为的方法。

Evaluation Harness

评估框架

  • Constraint surface compliance: every inference output is checked against the declared constraint surface before it exits the system boundary. Violations are logged and surfaced. 约束面合规:每个推断输出在离开系统边界前均根据声明的约束面进行检查。违规行为被记录并呈现。
  • Verification budget adherence: each program operates within a defined computational budget for verification. Budget overruns trigger automatic scope reduction. 验证预算遵守:每个项目在定义的验证计算预算内运行。预算超支会触发自动范围缩减。
  • Memory integrity audit: periodic validation that stored inference artifacts remain consistent with their provenance records and have not degraded beyond defined thresholds. 内存完整性审计:定期验证存储的推断产物与其溯源记录保持一致,且未超出定义的退化阈值。
  • Governance trace completeness: automated checks confirming that every decision path through the system produces a complete, recoverable audit trail. 治理追踪完整性:自动检查确认系统中的每条决策路径产生完整、可恢复的审计追踪。

Open Questions

开放问题

  • What is the minimum verification budget required to maintain acceptable constraint surface compliance as system complexity increases linearly? 随着系统复杂度线性增长,维持可接受的约束面合规所需的最低验证预算是多少?
  • Can memory retention policies be derived automatically from the declared constraint surface, or do they require independent specification? 内存保留策略能否从声明的约束面自动派生,还是需要独立指定?
  • Under what conditions does constrained orchestration produce measurably different outcomes from unconstrained delegation in bounded task environments? 在有界任务环境中,约束编排在何种条件下会产生与无约束委派可测量的不同结果?

Lab Notes

实验笔记

yl-pl-041

Verification budgets in the current harness implementation are expressed as a fraction of total inference compute. Initial findings suggest that a 12-15% allocation provides sufficient coverage for the constraint surfaces defined in yl-pl-021 through yl-pl-025. Below 8%, violation detection rates degrade in a predictable pattern.

当前框架实现中的验证预算以总推断计算的比例表示。初步发现表明,12-15%的分配为 yl-pl-021 至 yl-pl-025 中定义的约束面提供了充分覆盖。低于8%时,违规检测率以可预测模式退化。

yl-pl-042

The governance trace format has been stabilized as of Q3 evaluation. Each trace entry now carries a cryptographic commitment to the full decision context, allowing post-hoc verification without requiring access to the original inference state. Storage overhead is approximately 3.2% of inference output volume.

治理追踪格式自第三季度评估起已稳定。每条追踪条目现包含对完整决策上下文的加密承诺,允许在不需要访问原始推断状态的情况下进行事后验证。存储开销约为推断输出体积的3.2%。

yl-pl-043

推断即产品:本组正在形式化一个框架,将推断路径本身视为可交付产物,而非推断输出。这要求对路径验证、约束面遵循和资源消耗进行全面审计。初步结论是,这种模式的治理开销约为传统输出审计的1.8倍,但提供了更强的可追溯性保证。路径级审计使得异常检测可以在输出生成前完成。

Citations

参考文献

  • yl-pl-ref-001
    "Verification Budgets for Constrained Inference Systems." Internal Memo, Yueqian Labs Research Systems Division. 2025.
  • yl-pl-ref-002
    "Memory Retention Under Defined Decay Policies: A Formal Treatment." Working Paper, Post-LLM Systems Group. 2024.
  • yl-pl-ref-003
    "Governance Trace Formats for Auditable Inference Pipelines." Technical Report yl-tr-019. Yueqian Labs. 2025.
  • yl-pl-ref-004
    "Constrained Orchestration in Multi-Agent Systems: Bounds and Trade-offs." Proceedings of the Workshop on Bounded Intelligence Systems. 2024.