| New file |
| | |
| | | # WCS Auto Tune Dispatch Design |
| | | |
| | | **Date:** 2026-04-27 |
| | | |
| | | **Status:** Approved for planning |
| | | |
| | | ## 1. Goal |
| | | |
| | | Build a backend automatic tuning capability for WCS dispatch-related parameters. |
| | | |
| | | The system should: |
| | | |
| | | - Automatically analyze the current WCS task state when there are active tasks. |
| | | - Trigger analysis on a configurable interval, defaulting to every 10 minutes. |
| | | - Allow an Agent to directly modify a controlled whitelist of parameters through MCP tools. |
| | | - Immediately apply approved changes to runtime behavior. |
| | | - Keep a full audit trail and support rollback of the latest tuning job. |
| | | |
| | | This design only covers dispatch parameter auto tuning. It does not allow the Agent to change topology, communication settings, row mappings, or other structural device configuration. |
| | | |
| | | ## 2. In Scope |
| | | |
| | | The Agent is allowed to modify only the following parameters: |
| | | |
| | | - `sys_config.crnOutBatchRunningLimit` |
| | | - `sys_config.conveyorStationTaskLimit` |
| | | - `sys_config.aiAutoTuneIntervalMinutes` |
| | | - `asr_bas_station.out_task_limit` |
| | | - `asr_bas_crnp.maxOutTask` |
| | | - `asr_bas_dual_crnp.maxOutTask` |
| | | - `asr_bas_crnp.maxInTask` |
| | | - `asr_bas_dual_crnp.maxInTask` |
| | | |
| | | ## 3. Out of Scope |
| | | |
| | | The Agent must not modify the following categories automatically: |
| | | |
| | | - Station/device topology data |
| | | - Device communication configuration such as IP, port, gateway |
| | | - CRN row mapping and lane mapping such as `rowMap`, `controlRows`, `deepRows` |
| | | - Station binding relationships such as `inStationList`, `outStationList`, `barcodeStationList` |
| | | - PLC task buffer write indices and low-level command write slots |
| | | - Device mode switches unrelated to dispatch capacity |
| | | |
| | | ## 4. Core Approach |
| | | |
| | | Use a controlled tuning engine with Agent decision making. |
| | | |
| | | - The scheduler decides whether analysis should run. |
| | | - The Agent uses MCP read tools to analyze live facts. |
| | | - The Agent uses MCP write tools to submit tuning changes. |
| | | - The backend service enforces whitelist, range, step, cooldown, audit, and rollback. |
| | | |
| | | The Agent does not write the database directly and does not call low-level CRUD tools. |
| | | |
| | | ## 5. Trigger Model |
| | | |
| | | ### 5.1 Scheduler |
| | | |
| | | Create a scheduler that runs every 1 minute. |
| | | |
| | | The scheduler does not always invoke the Agent. It first checks: |
| | | |
| | | - `aiAutoTuneEnabled == Y` |
| | | - The system currently has unfinished tasks |
| | | - The elapsed time since the last completed tuning job is greater than or equal to `aiAutoTuneIntervalMinutes` |
| | | - No other tuning job is currently running |
| | | |
| | | ### 5.2 Active Task Definition |
| | | |
| | | The system is considered active when `asr_wrk_mast` contains non-final tasks. |
| | | |
| | | Final states are excluded from active-task counting: |
| | | |
| | | - `COMPLETE_INBOUND` |
| | | - `SETTLE_INBOUND` |
| | | - `COMPLETE_OUTBOUND` |
| | | - `SETTLE_OUTBOUND` |
| | | - `COMPLETE_LOC_MOVE` |
| | | - `COMPLETE_CRN_MOVE` |
| | | |
| | | Everything else counts as active work, including inbound, outbound, location move, and crane move tasks. |
| | | |
| | | ### 5.3 Interval Control |
| | | |
| | | Add these control parameters in `sys_config`: |
| | | |
| | | - `aiAutoTuneEnabled` |
| | | - `aiAutoTuneIntervalMinutes` |
| | | - `aiAutoTunePromptLogLimit` |
| | | |
| | | Defaults: |
| | | |
| | | - `aiAutoTuneEnabled = N` |
| | | - `aiAutoTuneIntervalMinutes = 10` |
| | | - `aiAutoTunePromptLogLimit = 500` |
| | | |
| | | The Agent may modify `aiAutoTuneIntervalMinutes`, but only within a fixed range. |
| | | |
| | | Recommended range: |
| | | |
| | | - `5` to `60` minutes |
| | | |
| | | ## 6. Analysis Inputs |
| | | |
| | | The Agent must not infer dispatch facts from the frontend canvas. All facts must come from backend-generated MCP snapshots. |
| | | |
| | | ### 6.1 Task Snapshot |
| | | |
| | | Provide: |
| | | |
| | | - Current unfinished task count |
| | | - Aggregation by target station |
| | | - Aggregation by batch |
| | | - Aggregation by single CRN |
| | | - Aggregation by dual CRN |
| | | - Optional aggregation by IO type |
| | | |
| | | ### 6.2 Station Runtime Snapshot |
| | | |
| | | For each station, expose only the fields relevant to runtime dispatch judgment: |
| | | |
| | | - `stationId` |
| | | - `autoing` |
| | | - `loading` |
| | | - `taskNo` |
| | | - Optional `ioMode` |
| | | |
| | | The Agent must not use: |
| | | |
| | | - `taskWriteIdx` |
| | | - `taskBufferItems` |
| | | |
| | | These fields are reserved for backend command execution logic, not dispatch tuning analysis. |
| | | |
| | | ### 6.3 Cycle Load Snapshot |
| | | |
| | | Reuse the existing cycle capacity capability and expose: |
| | | |
| | | - Global `currentLoad` |
| | | - Total loop count |
| | | - For each loop: |
| | | - `loopNo` |
| | | - `stationCount` |
| | | - `taskCount` |
| | | - `occupiedStationCount` |
| | | - `manualStationCount` |
| | | - `currentLoad` |
| | | |
| | | This is part of the decision basis for whether global conveyor capacity can be raised or must be constrained. |
| | | |
| | | ### 6.4 Flow Topology Snapshot |
| | | |
| | | This is the most important new input. |
| | | |
| | | The Agent needs station direction and flow-path facts, because dispatch pressure is directional rather than isolated by station ID. |
| | | |
| | | For each target outbound station, provide: |
| | | |
| | | - `targetStationId` |
| | | - `direction` |
| | | - `upstreamStationIds` |
| | | - `downstreamStationIds` |
| | | - `flowStationIds` |
| | | - `bufferCapacity` |
| | | - `occupiedCount` |
| | | - `freeCount` |
| | | - `nonAutoingCount` |
| | | - `loadingCount` |
| | | - `taskHoldingCount` |
| | | |
| | | Definitions: |
| | | |
| | | - `occupiedCount`: count of stations on the directional flow segment where `loading = true` or `taskNo > 0` |
| | | - `freeCount = bufferCapacity - occupiedCount` |
| | | - `nonAutoingCount`: count of stations on the directional flow segment where `autoing = false` |
| | | - `loadingCount`: count of stations on the directional flow segment where `loading = true` |
| | | - `taskHoldingCount`: count of stations on the directional flow segment where `taskNo > 0` |
| | | |
| | | ## 7. Direction and Buffer Capacity Rules |
| | | |
| | | ### 7.1 Direction Source |
| | | |
| | | Direction is not stored on `StationObjModel`. |
| | | |
| | | Direction must be derived from backend map/path metadata such as bridge/path node direction information. |
| | | |
| | | The Agent must not infer direction from frontend rendering. |
| | | |
| | | ### 7.2 Buffer Capacity Source |
| | | |
| | | `taskWriteIdx` and `taskBufferItems` are not part of the auto tuning input model. |
| | | |
| | | The Agent must not infer path capacity from PLC task buffer slots. |
| | | |
| | | The backend must expose `bufferCapacity` as an explicit fact for each directional flow segment. |
| | | |
| | | Recommended priority: |
| | | |
| | | 1. Compute it from existing directional map topology if the topology can uniquely identify the effective flow buffer segment. |
| | | 2. If the current map model is not sufficient to derive capacity reliably, introduce explicit capacity configuration by station and direction. |
| | | |
| | | Recommended fallback table if explicit configuration is needed: |
| | | |
| | | - `asr_station_flow_capacity` |
| | | - `id` |
| | | - `station_id` |
| | | - `direction_code` |
| | | - `buffer_capacity` |
| | | - `memo` |
| | | |
| | | This design assumes direction can be resolved from map/path data, but buffer capacity derivation is not yet fully verified in the current repository. |
| | | |
| | | ## 8. Agent Decision Logic |
| | | |
| | | The Agent should tune parameters conservatively and incrementally. |
| | | |
| | | Decision principles: |
| | | |
| | | - Prefer local tuning before global tuning. |
| | | - Prefer no-op over unnecessary change. |
| | | - Prefer small-step increase or decrease. |
| | | - Respect loop load and directional occupancy before widening throughput. |
| | | |
| | | Recommended decision order: |
| | | |
| | | 1. Check whether the target directional flow has enough `freeCount`. |
| | | 2. Check whether the directional flow contains blocked non-auto stations. |
| | | 3. Check whether the directional flow has excessive `loading` density. |
| | | 4. Check whether the directional flow has excessive `taskNo > 0` density. |
| | | 5. Check whether the containing loop has high `currentLoad`. |
| | | 6. Check whether CRN or Dual CRN is under-utilized or saturated. |
| | | 7. Check whether batch concurrency is the actual bottleneck. |
| | | |
| | | Expected tuning preferences: |
| | | |
| | | - Raise `asr_bas_station.out_task_limit` only when the station direction path still has buffer capacity. |
| | | - Raise `sys_config.conveyorStationTaskLimit` only when overall loop load is low and directional segments broadly have room. |
| | | - Raise `maxOutTask` or `maxInTask` only when the corresponding device is persistently underutilized and downstream paths are healthy. |
| | | - Raise `crnOutBatchRunningLimit` only when batch sequencing is the dominant bottleneck and downstream flow is safe. |
| | | |
| | | The Agent should lower parameters when: |
| | | |
| | | - Loop load is high |
| | | - Many directional stations are `loading = true` |
| | | - `autoing = false` appears on critical directional segments |
| | | - Downstream directional holding remains high after previous tuning |
| | | |
| | | ## 9. MCP Tool Design |
| | | |
| | | Do not require the Agent to emit final JSON text to the outer system. |
| | | |
| | | The Agent should directly call MCP tools with structured arguments. |
| | | |
| | | Recommended tools: |
| | | |
| | | - `dispatch_get_auto_tune_snapshot` |
| | | - `dispatch_get_recent_auto_tune_jobs` |
| | | - `dispatch_apply_auto_tune_changes` |
| | | - `dispatch_revert_last_auto_tune_job` |
| | | |
| | | ### 9.1 `dispatch_get_auto_tune_snapshot` |
| | | |
| | | Returns: |
| | | |
| | | - `taskSnapshot` |
| | | - `stationRuntimeSnapshot` |
| | | - `cycleLoadSnapshot` |
| | | - `flowTopologySnapshot` |
| | | - `currentParameterSnapshot` |
| | | |
| | | ### 9.2 `dispatch_get_recent_auto_tune_jobs` |
| | | |
| | | Returns the last several tuning jobs, including: |
| | | |
| | | - Summary |
| | | - Parameter changes |
| | | - Before and after values |
| | | - Result state |
| | | - Optional effect summary |
| | | |
| | | ### 9.3 `dispatch_apply_auto_tune_changes` |
| | | |
| | | This is the only write tool the Agent should use for tuning. |
| | | |
| | | Input shape should include: |
| | | |
| | | - `reason` |
| | | - `analysisIntervalMinutes` |
| | | - `triggerType` |
| | | - `dryRun` |
| | | - `changes` |
| | | |
| | | Each change should include: |
| | | |
| | | - `targetType` |
| | | - `targetId` |
| | | - `targetKey` |
| | | - `newValue` |
| | | |
| | | Allowed `targetType`: |
| | | |
| | | - `sys_config` |
| | | - `station` |
| | | - `crn` |
| | | - `dual_crn` |
| | | |
| | | Allowed `targetKey`: |
| | | |
| | | - `crnOutBatchRunningLimit` |
| | | - `conveyorStationTaskLimit` |
| | | - `aiAutoTuneIntervalMinutes` |
| | | - `outTaskLimit` |
| | | - `maxOutTask` |
| | | - `maxInTask` |
| | | |
| | | ### 9.4 `dispatch_revert_last_auto_tune_job` |
| | | |
| | | Rollback the latest successful tuning job using the saved before-value snapshot. |
| | | |
| | | ## 10. Service-Side Safety Controls |
| | | |
| | | The write tool is not trusted by itself. All enforcement must happen in the backend tuning service. |
| | | |
| | | ### 10.1 Whitelist |
| | | |
| | | Only the approved 8 parameters may be changed. |
| | | |
| | | ### 10.2 Range |
| | | |
| | | Recommended initial range policy: |
| | | |
| | | - `aiAutoTuneIntervalMinutes`: `5 ~ 60` |
| | | - `conveyorStationTaskLimit`: configurable hard range, e.g. `5 ~ 200` |
| | | - `crnOutBatchRunningLimit`: configurable hard range, e.g. `1 ~ 20` |
| | | - `outTaskLimit`: `0 ~ bufferCapacity` |
| | | - `maxOutTask`: small bounded integer range by device type |
| | | - `maxInTask`: small bounded integer range by device type |
| | | |
| | | Actual numeric ranges should be centralized in one rule definition class. |
| | | |
| | | ### 10.3 Step Limit |
| | | |
| | | Recommended single-job max change step: |
| | | |
| | | - `conveyorStationTaskLimit`: `±5` |
| | | - `crnOutBatchRunningLimit`: `±2` |
| | | - `outTaskLimit`: `±1` |
| | | - `maxOutTask`: `±1` |
| | | - `maxInTask`: `±1` |
| | | - `aiAutoTuneIntervalMinutes`: `±5` |
| | | |
| | | ### 10.4 Cooldown |
| | | |
| | | Recommended cooldown policy: |
| | | |
| | | - Global parameters: `20` minutes |
| | | - Station parameters: `10` minutes |
| | | - CRN / Dual CRN parameters: `10` minutes |
| | | - Analysis interval parameter: `30` minutes |
| | | |
| | | ### 10.5 Re-read Before Apply |
| | | |
| | | The backend must re-read the current database value before applying a change. |
| | | |
| | | Do not trust the Agent-provided `oldValue`. |
| | | |
| | | ### 10.6 Deferred Safety |
| | | |
| | | If a lower global limit would immediately fall below current in-flight work, the lower value may still be stored, but it only affects future release/dispatch decisions. |
| | | |
| | | This should be recorded as deferred-safe application in audit details. |
| | | |
| | | ## 11. Audit and Rollback |
| | | |
| | | Do not rely only on `sys_operate_log`. |
| | | |
| | | Create dedicated audit tables: |
| | | |
| | | ### 11.1 `sys_ai_auto_tune_job` |
| | | |
| | | Suggested fields: |
| | | |
| | | - `id` |
| | | - `trigger_type` |
| | | - `status` |
| | | - `start_time` |
| | | - `finish_time` |
| | | - `has_active_tasks` |
| | | - `prompt_scene_code` |
| | | - `summary` |
| | | - `reasoning_digest` |
| | | - `snapshot_digest` |
| | | - `interval_before` |
| | | - `interval_after` |
| | | - `success_count` |
| | | - `reject_count` |
| | | - `error_message` |
| | | - `llm_call_count` |
| | | - `prompt_tokens` |
| | | - `completion_tokens` |
| | | - `total_tokens` |
| | | |
| | | ### 11.2 `sys_ai_auto_tune_change` |
| | | |
| | | Suggested fields: |
| | | |
| | | - `id` |
| | | - `job_id` |
| | | - `target_type` |
| | | - `target_id` |
| | | - `target_key` |
| | | - `old_value` |
| | | - `requested_value` |
| | | - `applied_value` |
| | | - `result_status` |
| | | - `reject_reason` |
| | | - `cooldown_expire_time` |
| | | - `created_time` |
| | | |
| | | ### 11.3 Rollback Snapshot |
| | | |
| | | For each successful job, save the before-values required to restore all changed parameters. |
| | | |
| | | At minimum, support rollback of the most recent successful tuning job. |
| | | |
| | | ## 12. Runtime Flow |
| | | |
| | | 1. `AutoTuneScheduler` runs every minute. |
| | | 2. It checks enable flag, active tasks, elapsed interval, and distributed lock. |
| | | 3. If eligible, it creates a tuning job record with status `RUNNING`. |
| | | 4. `AutoTuneAgentService` starts an Agent session using the auto-tune prompt scene. |
| | | 5. The Agent calls `dispatch_get_auto_tune_snapshot`. |
| | | 6. The Agent may call `dispatch_get_recent_auto_tune_jobs`. |
| | | 7. The Agent calls `dispatch_apply_auto_tune_changes(dryRun=true)`. |
| | | 8. If the dry run is valid, the Agent calls `dispatch_apply_auto_tune_changes(dryRun=false)`. |
| | | 9. `AutoTuneApplyService` applies validated changes in a single transaction, refreshes config cache, and records audit rows. |
| | | 10. The job is marked `SUCCESS`, `PARTIAL_SUCCESS`, `NO_CHANGE`, or `FAILED`. |
| | | 11. A summary record is optionally written to `sys_operate_log`. |
| | | 12. The lock is released. |
| | | |
| | | ## 13. Required Code Changes |
| | | |
| | | ### 13.1 AI Layer |
| | | |
| | | - Add new prompt scene `wcs_auto_tune_dispatch` |
| | | - Add scheduler for minute-level trigger checks |
| | | - Add Agent orchestration service for background tuning |
| | | - Add MCP tools for snapshot, recent jobs, apply, and rollback |
| | | |
| | | ### 13.2 Snapshot and Apply Services |
| | | |
| | | - Add snapshot aggregation service |
| | | - Add flow topology snapshot service |
| | | - Add controlled apply service |
| | | - Add tuning rule definition service or config class |
| | | |
| | | ### 13.3 SQL |
| | | |
| | | Add SQL scripts for: |
| | | |
| | | - `sys_ai_auto_tune_job` |
| | | - `sys_ai_auto_tune_change` |
| | | - `aiAutoTuneEnabled` |
| | | - `aiAutoTuneIntervalMinutes` |
| | | - `aiAutoTunePromptLogLimit` |
| | | - Optional `asr_station_flow_capacity` if buffer capacity cannot be derived reliably |
| | | |
| | | ## 14. Verification Strategy |
| | | |
| | | Minimum verification after implementation: |
| | | |
| | | - Compile check |
| | | - MCP read tool returns complete snapshot |
| | | - MCP dry-run rejects out-of-range and over-step changes |
| | | - MCP apply writes approved changes and refreshes runtime cache |
| | | - Scheduler skips when no tasks exist |
| | | - Scheduler triggers when tasks exist and interval is reached |
| | | - Interval update by Agent changes future trigger cadence within allowed range |
| | | - Audit rows are created for success, reject, and failure cases |
| | | - Rollback restores the latest successful tuning job values |
| | | |
| | | ## 15. Assumptions and Open Risk |
| | | |
| | | Confirmed assumptions: |
| | | |
| | | - Direction must come from backend map/path data, not frontend rendering. |
| | | - Runtime station analysis only uses `autoing`, `loading`, `taskNo`, plus derived topology/capacity facts. |
| | | - `taskWriteIdx` and `taskBufferItems` are excluded from Agent analysis. |
| | | |
| | | Open risk not yet fully verified: |
| | | |
| | | - Whether directional flow buffer capacity can be derived reliably from the current map/path model without adding an explicit capacity configuration source. |
| | | |
| | | If this cannot be derived deterministically, add explicit directional capacity configuration before implementing auto tuning decisions based on buffer headroom. |