Research Strategy – Approach (Phase I Workflow & Validation)

PromptGenix LLC · Contact: dohoon.kim1@icloud.com · promptgenix.org

Phase I goal

Build a validated prototype that prioritizes testable biological hypotheses using evidence-weighted inference over public data and literature.

Phase I focuses on engineering robustness, reproducibility, and measurable scientific utility for a hypothesis intelligence engine. Validation will rely exclusively on public sources (e.g., GEO/SRA, FlowRepository, and PubMed/PMC), minimizing IP or data-use conflicts while enabling transparent evaluation.

In this project, analysis pipelines are treated as evidence generators—not the end product. The primary Phase I deliverable is a system that produces ranked hypotheses with probabilistic confidence, explicit evidence links, and clear uncertainty labeling.

Workflow overview

Inputs: Disease-, protein-, pathway-, or cell-type–centric questions, optionally linked to public accession IDs.
Evidence pool: automated retrieval of relevant public datasets + associated metadata + literature signals.
Evidence generation: standardized feature extraction (effect sizes, uncertainty, reproducibility, cohort/context descriptors).
Decision engine: statistical scoring + Bayesian updating to compute confidence distributions for candidate hypotheses.
Interpretability layer: LLM produces explanations constrained by evidence objects (no evidence → no claim).
Outputs: ranked hypotheses + confidence bands + “why” rationale + recommended follow-up experiments.

Evidence objects Bayesian updating Uncertainty labels Traceable outputs

Hypothesis Prioritization Engine (traceable)

PromptGenix prioritizes testable hypotheses by combining quantitative evidence with structured biological priors. Large language models are used only to contextualize and explain rankings—not to determine them.

Candidate hypothesis set: generated from structured entities (disease/protein/cell-type) and literature-derived relations.
Evidence strength: effect size + uncertainty + directionality + reproducibility across datasets/contexts.
Bayesian confidence: priors (known biology, pathways, cell specificity) updated by observed evidence to yield posteriors.
Ranking outputs: posterior mean + credible interval + evidence coverage (supporting vs. conflicting).
Actionability: each hypothesis includes suggested next-step experiments/analyses and required data.

Phase I emphasis: transparent ranking with uncertainty, not “black-box” generation.

Bayesian Evidence-to-Confidence Workflow (Figure X)

Figure X. Stepwise workflow of the PromptGenix decision engine, from query intake to evidence extraction, Bayesian updating, confidence scoring, and evidence-constrained interpretation.

How confidence is computed

Probabilistic confidence is derived from evidence features and Bayesian updating.

Conceptual model: For each hypothesis H, we define a prior probability P(H) from structured biological knowledge (literature, pathways, cell specificity). We then compute evidence likelihood P(E|H) from standardized evidence features (effect size, uncertainty, reproducibility, context). The system produces a posterior confidence P(H|E), reported with a credible interval and explicit evidence coverage.

Phase I validation plan

Public-data evaluation with measurable KPIs.

Work package	What we will do	Success metrics (examples)
WP1 Evidence layer	Build a robust ingest + evidence feature extraction layer for selected public datasets. Generate standardized evidence objects (effect sizes, uncertainty, reproducibility, and context descriptors) suitable for deterministic re-runs and downstream inference.	≥90% reproducibility across reruns (same inputs/configs). Evidence objects produced for ≥80% of targeted datasets despite heterogeneity.
WP2 Decision engine	Implement statistical scoring and Bayesian updating to compute posterior confidence distributions for candidate hypotheses. Produce ranked outputs with uncertainty annotations and explicit supporting vs. conflicting evidence summaries.	Calibrated confidence outputs (posterior intervals behave as expected in held-out tests). Clear traceability: each ranked hypothesis links to evidence objects + citations.
WP3 Interpretability & reports	Generate reviewer-ready HTML/PDF artifacts that include: ranked hypotheses, confidence bands, evidence coverage, citations, and recommended next steps. LLM narratives are constrained to evidence objects and must cite sources.	End-to-end run completes within <24 hours from accession-driven inputs (dataset-dependent). ≥80% “useful” rating by pilot users (guides at least one concrete experimental/analysis decision).

Data-use boundary: Phase I evaluation uses public datasets only (GEO/SRA/FlowRepository) and public literature sources, enabling clean SBIR review and minimizing IP concerns.

Risks & Mitigation

Practical risks addressed with explicit fallback strategies.

Technical risks

Dataset heterogeneity: inconsistent metadata, formats, and cohort contexts across repositories.
Evidence sparsity: some hypotheses may lack sufficient supporting data or have conflicting signals.
Compute variability: differences between local and cloud environments affecting deterministic reruns.
LLM overreach: narrative may overstate conclusions without strict constraints.

Mitigation strategies

Robust ingest: schema detection + validation + standardized evidence templates and QC gates.
Evidence coverage labeling: explicit “supporting vs. conflicting vs. missing” flags per hypothesis.
Deterministic runs: pinned versions, config snapshots, and checksums for reproducibility.
Traceability guardrails: LLM can only reference approved evidence objects; no evidence → no claim.
Fallback mode: if evidence is weak, output “candidate hypotheses” with low-confidence tags and data-needed guidance.

Reviewer-friendly principle: Every claim must be traceable to (1) evidence objects derived from public data or (2) cited literature, with uncertainty explicitly stated.

Phase I success criterion: Phase I does not require all prioritized hypotheses to be biologically correct, but rather demonstrable traceability, calibrated uncertainty, transparent ranking rationale, and usefulness for guiding study design decisions.