◈ CSD FC Prediction Accuracy Analysis

ARIGATO COFFEE ROASTERY — aillio-roasting-profiles — 2026-03-20 — Target: 31 profiles

WATCH Trigger Rate

87%
27 / 31 profiles

WARN Trigger Rate

39%
12 / 31 profiles

MAE (Mean Absolute Error)

41.8s
Vs Target ±15s Display

WATCH Lead Time (Avg)

96.6s
Median 105s / Min 5s / Max 123s

Error Distribution (All 320 ticks · pre-FC only)

MAE per Profile

WATCH Lead Time Distribution

Bias Distribution (Over/Under Estimation)

◈ Analysis Summary

✓ WATCH consistently triggers early (~97s prior)

WATCH triggered in 27/31 profiles. The average lead time of 96.6s (median 105s) provides ample room for the roaster to intervene. The 4 non-triggering cases share beans with mild RoR fluctuations (Magarrissa-Tulise, Suica types), likely due to the characteristics of the bean type and roasting profile.

✗ Main Issue 1: Systematic Underestimation (Bias = -25.8s)

Predictions show an average of 25.8s less than the actual remaining time. The cause is the CSD_FC_ESTIMATE_MAX_SEC = 60 setting. While WATCH actually triggers 97s prior on average, assuming a 60s limit translates into displaying an insufficient "60s" right after triggering. Currently, only 16.9% of all ticks fall within "±15s".

✗ Main Issue 2: Low WARN Trigger Rate (39%)

Only 12 out of 31 profiles reached the AR(1) ≥ 0.50 threshold. Despite lowering it to 0.50 based on the actual maximum AR(1) = 0.45-0.48 in Jigesa_no17, many profiles still do not reach it. Even when WARN triggers, the lead time is only 20s shorter on average than WATCH, blurring the distinction between the two.

✗ Main Issue 3: Delayed Trigger Cases (Lead Time ≤ 10s)

In two profiles, no9 (5s) and no23 (10s), WATCH triggered only 5 to 10s before FC. These also have large positive bias values (overestimation +43 to +50s), indicating the worst-case scenario where FC arrives immediately while displaying "60s remaining". AR(1) may have risen late, or it might be influenced by the warmup period.

◈ Calibration Recommendations

Recommendation 1: Increase CSD_FC_ESTIMATE_MAX_SEC to 100s

Since the WATCH average lead time in actual data is 96.6s, setting the upper limit to 100s will make the estimate at AR(1)=0.35 closer to reality. Sample calculation:

AR(1)=0.35 (WATCH):  frac = (0.85-0.35)/(0.85-0.35) = 1.00 → 100s (Current: 60s)
AR(1)=0.50 (WARN):   frac = (0.85-0.50)/(0.85-0.35) = 0.70 → 70s (Current: 42s)
AR(1)=0.70:          frac = (0.85-0.70)/(0.85-0.35) = 0.30 → 30s (Current: 18s)
AR(1)=0.85 (IMMIN):  0s (No change)
  

Change location: CSD_FC_ESTIMATE_MAX_SEC = 60100 in live_monitor.py

Recommendation 2: Lower WARN Threshold to 0.45 (Improve trigger rate)

Checking the maximum AR(1) of all profiles, many plateau around 0.45-0.55. With the current CSD_AR1_WARN = 0.50, more than half do not trigger. Lowering to 0.45 is expected to improve the trigger rate (estimated 60-70%). However, since the differentiation from WATCH will be even thinner, consider reviewing UI representations.

Change location: CSD_AR1_WARN = 0.500.45 in live_monitor.py

Recommendation 3: Reflect "Lead Time Uncertainty" in UI Display

Prediction accuracy is low immediately after WATCH triggers (80-120s before FC, large bias). Errors tend to converge within 30s of FC. Dynamically changing the UI ±15s display to ±40s right after triggering and ±15s close to FC prevents user overconfidence:

AR(1) < 0.50 → "Est. XXs ±40s"
AR(1) 0.50-0.70 → "Est. XXs ±25s"
AR(1) > 0.70 → "Est. XXs ±15s"
  

Recommendation 4: Accumulate Profile Data by Bean Type

The 4 non-triggering profiles are all Magarrissa-Tulise or Suica types. Beans requiring slow temperature increase may be less likely to accumulate Kalman innovation autocorrelation. Once >10 data points are accumulated, consider adjusting CSD_WATCH thresholds by bean type (e.g., slow beans → 0.30).

◈ Details per Profile
File FC Time CSD State WATCH Lead WARN Lead MAE Bias Active ticks