Research Strategy – Approach (Phase I Workflow & Validation)
Build a validated prototype that prioritizes testable biological hypotheses using evidence-weighted inference over public data and literature.
Phase I focuses on engineering robustness, reproducibility, and measurable scientific utility for a hypothesis intelligence engine. Validation will rely exclusively on public sources (e.g., GEO/SRA, FlowRepository, and PubMed/PMC), minimizing IP or data-use conflicts while enabling transparent evaluation.
In this project, analysis pipelines are treated as evidence generators—not the end product. The primary Phase I deliverable is a system that produces ranked hypotheses with probabilistic confidence, explicit evidence links, and clear uncertainty labeling.
Workflow overview
- Inputs: Disease-, protein-, pathway-, or cell-type–centric questions, optionally linked to public accession IDs.
- Evidence pool: automated retrieval of relevant public datasets + associated metadata + literature signals.
- Evidence generation: standardized feature extraction (effect sizes, uncertainty, reproducibility, cohort/context descriptors).
- Decision engine: statistical scoring + Bayesian updating to compute confidence distributions for candidate hypotheses.
- Interpretability layer: LLM produces explanations constrained by evidence objects (no evidence → no claim).
- Outputs: ranked hypotheses + confidence bands + “why” rationale + recommended follow-up experiments.
Hypothesis Prioritization Engine (traceable)
PromptGenix prioritizes testable hypotheses by combining quantitative evidence with structured biological priors. Large language models are used only to contextualize and explain rankings—not to determine them.
- Candidate hypothesis set: generated from structured entities (disease/protein/cell-type) and literature-derived relations.
- Evidence strength: effect size + uncertainty + directionality + reproducibility across datasets/contexts.
- Bayesian confidence: priors (known biology, pathways, cell specificity) updated by observed evidence to yield posteriors.
- Ranking outputs: posterior mean + credible interval + evidence coverage (supporting vs. conflicting).
- Actionability: each hypothesis includes suggested next-step experiments/analyses and required data.
Bayesian Evidence-to-Confidence Workflow (Figure X)
Probabilistic confidence is derived from evidence features and Bayesian updating.
Public-data evaluation with measurable KPIs.
| Work package | What we will do | Success metrics (examples) |
|---|---|---|
| WP1 Evidence layer |
Build a robust ingest + evidence feature extraction layer for selected public datasets. Generate standardized evidence objects (effect sizes, uncertainty, reproducibility, and context descriptors) suitable for deterministic re-runs and downstream inference. |
≥90% reproducibility across reruns (same inputs/configs). Evidence objects produced for ≥80% of targeted datasets despite heterogeneity. |
| WP2 Decision engine |
Implement statistical scoring and Bayesian updating to compute posterior confidence distributions for candidate hypotheses. Produce ranked outputs with uncertainty annotations and explicit supporting vs. conflicting evidence summaries. |
Calibrated confidence outputs (posterior intervals behave as expected in held-out tests). Clear traceability: each ranked hypothesis links to evidence objects + citations. |
| WP3 Interpretability & reports |
Generate reviewer-ready HTML/PDF artifacts that include: ranked hypotheses, confidence bands, evidence coverage, citations, and recommended next steps. LLM narratives are constrained to evidence objects and must cite sources. |
End-to-end run completes within <24 hours from accession-driven inputs (dataset-dependent). ≥80% “useful” rating by pilot users (guides at least one concrete experimental/analysis decision). |
Practical risks addressed with explicit fallback strategies.
Technical risks
- Dataset heterogeneity: inconsistent metadata, formats, and cohort contexts across repositories.
- Evidence sparsity: some hypotheses may lack sufficient supporting data or have conflicting signals.
- Compute variability: differences between local and cloud environments affecting deterministic reruns.
- LLM overreach: narrative may overstate conclusions without strict constraints.
Mitigation strategies
- Robust ingest: schema detection + validation + standardized evidence templates and QC gates.
- Evidence coverage labeling: explicit “supporting vs. conflicting vs. missing” flags per hypothesis.
- Deterministic runs: pinned versions, config snapshots, and checksums for reproducibility.
- Traceability guardrails: LLM can only reference approved evidence objects; no evidence → no claim.
- Fallback mode: if evidence is weak, output “candidate hypotheses” with low-confidence tags and data-needed guidance.