🔬 Evidence Evaluation · Longevix Scientific Standards

How We Evaluate Evidence

Not all research is equal. This page explains exactly how Longevix reads, evaluates and grades scientific evidence — so you understand what "Strong Evidence" vs "Preliminary" means on every page.

The Evidence Hierarchy Pyramid

1 · Systematic Reviews & Meta-Analyses — Highest

2 · Large RCTs (n>1,000) — Strong

3 · Medium RCTs (n=100–1,000) — Moderate

4 · Small RCTs & Prospective Cohorts — Preliminary

5 · Observational / Case-Control — Weak

6 · Animal / Mechanistic — Hypothesis only

Study Types Explained

Tier 1 · Strongest

Meta-Analysis / Systematic Review

Pools data from multiple independent studies

A systematic review identifies all published studies on a topic and evaluates their quality. A meta-analysis goes further — statistically combining results from multiple studies to calculate a pooled effect size. This reduces the impact of any one study's bias or chance findings.

Why Longevix prefers these: One large RCT can be wrong due to chance. Ten RCTs pooled into a meta-analysis are far more reliable.

Limitations: Quality depends on the quality of included studies. "Garbage in, garbage out" applies.

Example: Kodama et al. JAMA 2009 — meta-analysis of 33 studies, n=33,636. VO2max and mortality.

Tier 2 · Strong

Randomized Controlled Trial (RCT)

Gold standard for establishing causation

Participants are randomly assigned to intervention or control group. Randomization eliminates selection bias. Blinding (single or double) prevents placebo effect. RCTs are the closest science gets to proving causation, not just correlation.

What makes an RCT strong: Large n (>1,000), pre-registration, double-blind, intention-to-treat analysis, long follow-up, published in Tier-1 journal.

Limitations: Short follow-up, surrogate endpoints, industry funding bias.

Example: VITAL Trial (Manson et al. NEJM 2019) — n=25,871, 5-year RCT. Vitamin D3 and cancer mortality.

Tier 3–4 · Moderate

Prospective Cohort Study

Follows participants over time — observational

Researchers follow a large group (cohort) forward in time, observing who develops outcomes based on exposures. No randomization — participants choose their own behaviors. Strong for studying long-term effects but cannot prove causation.

Why useful: Can study outcomes that would be unethical to test in RCTs (smoking, extreme diets). Large sample sizes. Long follow-up.

Limitations: Confounding — people who exercise may also eat better, sleep better, etc. Cannot determine causation.

Example: Kuopio studies (Finland) — 2,315 Finnish men, 20-year follow-up. Sauna and cardiovascular mortality.

Tier 5 · Weak

Observational / Cross-Sectional

Snapshot in time — cannot establish causation

Researchers measure exposure and outcome at the same time. Cannot determine which came first. Longevix uses these only to provide context, never as standalone evidence for a health claim.

Common misuse: "Coffee drinkers have lower BMI" — this could mean coffee causes weight loss, OR that leaner people drink more coffee, OR a third factor causes both.

Longevix labels these clearly: "Observational — association only, not causation established."

Tier 6 · Hypothesis

Animal Studies / In Vitro / Mechanistic

Used to explain mechanisms only

Mouse studies and cell culture experiments are essential for understanding biology but translate poorly to humans. Longevix cites these only to explain how something works mechanistically, never as evidence that it does work in humans.

The rule: "NMN extends lifespan in mice" does not mean "NMN extends lifespan in humans." Longevix always makes this distinction explicit.

How Longevix Assigns Evidence Labels

StrongMeta-analysis or large RCT. Replicated. Tier-1 journal.

ModerateMedium RCT or prospective cohort. Single study.

PreliminarySmall RCT, short follow-up, or animal data only.

InsufficientHypothesis only. Not cited as evidence for claims.

Why Statistical Significance Is Not Enough

A study can be "statistically significant" (p<0.05) yet clinically meaningless. Longevix evaluates:

Effect size — How large is the actual difference? HR 0.99 is significant but irrelevant.
Absolute vs Relative Risk — "50% risk reduction" from 2% to 1% is very different from 40% to 20%.
Confidence Intervals — Wide CIs suggest imprecision, even with p<0.05.
Surrogate endpoints — Improving a biomarker ≠ improving outcomes. We prefer hard endpoints (mortality, MI).
Follow-up duration — A 4-week supplement study cannot tell us about 10-year health outcomes.

Evidence Methodology last reviewed: June 2026. Questions: hello@longevix.co.il