System architecture
This is the navigation map for the project. Read this first if you’re joining the project or returning after a break.
End-to-end pipeline (one diagram)
Section titled “End-to-end pipeline (one diagram)”flowchart TB
classDef src fill:#1e3a5f,stroke:#3b82f6,color:#e0f2fe,stroke-width:1px;
classDef bronze fill:#7c4a02,stroke:#d97706,color:#fef3c7,stroke-width:1px;
classDef silver fill:#475569,stroke:#94a3b8,color:#f1f5f9,stroke-width:1px;
classDef gold fill:#78531f,stroke:#fbbf24,color:#fef3c7,stroke-width:1px;
classDef model fill:#0e8a8a,stroke:#1eb6b6,color:#f0fdfa,stroke-width:1px;
classDef report fill:#5b21b6,stroke:#a78bfa,color:#ede9fe,stroke-width:1px;
subgraph EXT["External sources"]
direction LR
NP["NASA POWER<br/>(9-pt JB grid)"]:::src
IA["Iowa State ASOS<br/>(WMKJ METAR)"]:::src
DOSM["api.data.gov.my<br/>(DOSM)"]:::src
EM["Ember GCS<br/>(monthly mix)"]:::src
ST["ST PDF<br/>(manual)"]:::src
end
ING["src/jb_vpp/ingest/*.py<br/>one module per source<br/>httpx GET → parquet"]:::src
NP --> ING
IA --> ING
DOSM --> ING
EM --> ING
ST --> ING
BRONZE["<b>BRONZE</b> — raw, partitioned by run_id<br/>bronze/weather/{nasa-power,senai-metar}/<br/>bronze/macro/{dosm,ember,st-pdf}/<br/>bronze/manifests/<source>/<dataset>/<run_id>.json"]:::bronze
ING --> BRONZE
TRANS["<b>TRANSFORM</b> — src/jb_vpp/transform/*.py (Polars)<br/>weather_hourly.py: 9-pt grid avg + METAR + MYT cal + ETOU peak flag<br/>load_synthesis.py: weather + DOSM + Johor share → gold load"]:::silver
BRONZE --> TRANS
SILVER["<b>SILVER + GOLD</b><br/>silver/weather_hourly/ (43,848 rows)<br/>gold/load_hourly_state/ (39,424 Johor rows)"]:::gold
TRANS --> SILVER
REF["reference/*.yaml<br/>tariffs/{tnb_etou,tnb_icpt,tnb_e_rates}<br/>electricity/{peninsular_annual,johor_share}"]:::silver
MODELS["<b>MODELS</b> — src/jb_vpp/models/*.py<br/>btm_economics + dispatch_lp (CBC LP)<br/>breakeven · aggregator · portfolio_timeline<br/>ppa_pricing · ppa_termsheet · enegem_export<br/>carbon · cfe_247 · trigger_curve · risk · tornado"]:::model
SILVER --> MODELS
REF --> MODELS
REPORTS["<b>REPORTS</b> — commercial deliverables<br/>btm_economics_dc100 · aggregator_portfolio · portfolio_timeline<br/>vpp_service_revenue_required · ppa_pricing · ppa_termsheet<br/>enegem_export_estimate · carbon_re100 · cfe_247 · bess_trigger_curve<br/>risk_adjusted_npv · sensitivity_tornado · EXECUTIVE_SUMMARY"]:::report
MODELS --> REPORTS
CLI map (1:1 with the modules)
Section titled “CLI map (1:1 with the modules)”flowchart LR
classDef root fill:#0e8a8a,stroke:#1eb6b6,color:#f0fdfa,stroke-width:1px;
classDef cmd fill:#1e293b,stroke:#475569,color:#e2e8f0,stroke-width:1px;
classDef sub fill:#334155,stroke:#94a3b8,color:#f1f5f9,stroke-width:1px;
JB["jb-vpp<br/><i>Typer root</i>"]:::root
JB --> ING["ingest<br/><i>bronze writes</i>"]:::cmd
JB --> TRA["transform<br/><i>silver/gold writes</i>"]:::cmd
JB --> MOD["model<br/><i>economics + reports</i>"]:::cmd
ING --> NP["nasa-power"]:::sub
ING --> SM["senai-metar"]:::sub
ING --> DM["dosm --dataset all"]:::sub
ING --> EM["ember"]:::sub
TRA --> WH["weather-hourly"]:::sub
TRA --> LS["synthesize"]:::sub
MOD --> BTM["btm-economics --dispatch lp"]:::sub
MOD --> BE["breakeven"]:::sub
MOD --> AG["aggregator"]:::sub
MOD --> PT["timeline"]:::sub
MOD --> PP["ppa-pricing"]:::sub
MOD --> PTS["ppa-termsheet"]:::sub
MOD --> EN["enegem"]:::sub
MOD --> CB["carbon"]:::sub
MOD --> CFE["cfe-247"]:::sub
MOD --> TR["trigger"]:::sub
MOD --> RK["risk"]:::sub
MOD --> TD["tornado"]:::sub
MOD --> RA["run-all"]:::sub
Data lineage — what reads what
Section titled “Data lineage — what reads what”| Layer | Reads | Writes |
|---|---|---|
ingest/nasa_power.py | NASA POWER REST API | bronze/weather/nasa-power/ |
ingest/senai_metar.py | Iowa State ASOS REST API | bronze/weather/senai-metar/ |
ingest/dosm.py | api.data.gov.my | bronze/macro/dosm/* |
ingest/ember.py | storage.googleapis.com (Ember public bucket) | bronze/macro/ember/monthly-malaysia/ |
transform/weather_hourly.py | bronze.weather.{nasa-power,senai-metar} | silver/weather_hourly/ |
transform/load_synthesis.py | silver.weather_hourly + bronze.macro.dosm + reference/electricity/johor_share.yaml | gold/load_hourly_state/ |
models/btm_economics.py | silver.weather_hourly + reference/tariffs/{tnb_e_rates,tnb_icpt}.yaml | reports/btm_economics_dc100.{md,json} |
models/dispatch_lp.py | called from btm_economics with hourly arrays | (in-memory only) |
models/breakeven.py | reports/btm_economics_dc100.json | reports/vpp_service_revenue_required.md, breakeven_revenue.json |
models/aggregator.py | (constants from reports — no parquet reads) | reports/aggregator_portfolio.{md,json} |
models/tornado.py | (constants calibrated to LP base) | reports/sensitivity_tornado.{md,json} |
Where assumptions live
Section titled “Where assumptions live”| Assumption | File | When to update |
|---|---|---|
| TNB ETOU peak/off-peak hours | reference/tariffs/tnb_etou.yaml | When TNB revises ETOU windows (rare) |
| Tariff E1/E2/E3 rate values | reference/tariffs/tnb_e_rates.yaml | Each TNB tariff revision (verify against PDF) |
| ICPT historical surcharge | reference/tariffs/tnb_icpt.yaml | Every 6 months (Suruhanjaya Tenaga publishes) |
| Peninsular electricity historical | reference/electricity/peninsular_annual.yaml | Annually with new ST MESH edition |
| Johor share modelling | reference/electricity/johor_share.yaml | When real TNB / ST data obtained, OR annually with GDP refresh |
| BESS / PV capex | hard-coded defaults in models/btm_economics.py (PVConfig, BESSConfig) | When EPC quotes refresh; sweep via tornado |
| Aggregator VPP tiers | models/aggregator.py (DEFAULT_TIERS) | When MY VPP service market structure clarifies |
| Tornado parameter ranges | models/tornado.py (default_variables) | When commercial team challenges a range |
How to run a clean end-to-end pass
Section titled “How to run a clean end-to-end pass”# 1. Ensure envnix developuv sync
# 2. Backfill 5y data (~5-10 minutes wall time)uv run jb-vpp ingest nasa-power --start 2020-01-01 --end 2024-12-31 --sleep 0.5uv run jb-vpp ingest senai-metar --start 2020-01-01 --end 2024-12-31 --sleep 1uv run jb-vpp ingest dosm --dataset alluv run jb-vpp ingest ember
# 3. Build silver + golduv run jb-vpp transform weather-hourlyuv run jb-vpp transform synthesize
# 4. Run all models (writes all reports)uv run jb-vpp model btm-economics --dispatch lpuv run jb-vpp model breakevenuv run jb-vpp model aggregatoruv run jb-vpp model tornado
# 5. Testsuv run pytestAfter this, all 6 reports in reports/ are fresh.
What’s NOT here yet (and why)
Section titled “What’s NOT here yet (and why)”- ERA5 ingester — POWER + METAR cross-validate within 0.10 °C bias over 5y; ERA5 redundant for v0.
- EMC USEP ingester — EMC requires session/cookies; data.gov.sg API needs auth token. Blocked on commercial access.
- County-level load synthesis — needs IRDA industrial park MW breakdowns.
- Real customer profile calibration — needs commercial outreach to a Johor industrial customer.
- Multi-year LP — current monthly LP is sufficient for this analysis; multi-year horizon mostly affects accuracy at year boundaries.
- camelot-py / programmatic ST PDF parser — single annual update; manual transcription cheaper.