Skip to main content

Scoring Methodology

How PkgWatch calculates health scores and predicts abandonment risk

Component Weights

The health score (0-100) is a weighted combination of five components:

User-Centric 30%

Adoption metrics - most predictive of package health

Weekly downloads (50%)Dependent packages (30%)GitHub stars (20%)
Maintainer Health 25%

Active maintenance and sustainability

Commit recency (50%)True bus factor (30%)PR merge velocity (20%)
Evolution 20%

Development momentum with maturity adjustment

Release recency (50%)Commit activity (50%)Maturity factor for stable packages
Security 15%

Security posture and vulnerability status

OpenSSF ScorecardKnown vulnerabilitiesSecurity policy presence
Community 10%

Contributor diversity and engagement

Total contributors (60%)Issue response time (40%)

How We Score

Recency Signals

We use exponential decay functions so scores decrease smoothly over time, not in sudden steps.

  • Commit recency: 90-day half-life (50% score after 90 days inactive)
  • Release recency: 180-day half-life (more lenient for stable releases)

True Bus Factor

We analyze commit distribution to find the minimum contributors needed for 50% of commits. A project with 10 contributors where 1 person does 95% of work has a bus factor of 1, not 10.

Interpretation:
  • Bus factor 1: High risk - single point of failure
  • Bus factor 2-3: Moderate risk - small team
  • Bus factor 4+: Lower risk - distributed contributions

Maturity Factor

High-adoption packages with low activity aren't penalized - they're likely stable, not abandoned. Packages like lodash benefit from this.

Applies: Uses smooth sigmoid transitions centered at ~1M weekly downloads or ~5K dependents, combined with low activity (~10 commits/90d). Benefits scale gradually — no hard cutoffs. Capped at 0.7 (even the most mature package only gets a 70% floor for evolution sub-scores). Single-maintainer risk is surfaced independently through maintainer health and abandonment risk scores.

Security Assessment

We integrate OpenSSF Scorecard data and track known vulnerabilities by severity.

OpenSSF Scorecard (50%): Automated security practices assessment

Vulnerabilities (30%): CRITICAL=3x, HIGH=2x, MEDIUM=1x weight

Security Policy (20%): Has SECURITY.md with disclosure process

PR Merge Velocity

Measures maintainer responsiveness by tracking the ratio of merged to opened pull requests over the last 90 days.

Interpretation:
  • >80% merge rate: High maintainer responsiveness
  • ~50% merge rate: Moderate maintainer engagement
  • <30% merge rate: Potential maintainer overload or abandonment
  • No PRs: Neutral score (could indicate stable package)

Uses a continuous sigmoid function — scores transition smoothly, not in discrete steps.

Issue Response Time

Fast issue response time is a strong indicator of maintainer engagement and project health.

Scoring:
  • <24 hours: Perfect score (1.0)
  • 24-72 hours: High score (0.7-1.0)
  • >72 hours: Exponential decay
  • No data: Neutral score (0.5)
Data Note: Response times are currently estimated using heuristics based on issue comment presence (24h for closed issues with comments, 48h for open issues with comments). Direct timeline data will be added in a future update.

Bot Commit Filtering

We filter out commits from known bot accounts (Dependabot, Renovate, etc.) to get accurate human activity metrics.

Why it matters: Bot commits can artificially inflate activity metrics. We analyze real human contributions for more accurate health assessment.

Deprecated & Archived Overrides

When a package is explicitly deprecated or its repository is archived, normal scoring is overridden with hard caps to ensure these packages are flagged appropriately:

  • Deprecated: Health score capped at 35 (CRITICAL)
  • Archived: Health score capped at 40 (HIGH)
  • Both: Health score capped at 35 (CRITICAL)

Abandonment risk is also forced to 95% for deprecated or archived packages, regardless of other factors.

Abandonment Risk Prediction

Beta

PkgWatch estimates the probability that a package will be abandoned within the next 12 months using a Weibull survival analysis model. The model considers four risk factors:

  • Inactivity (35%): Time since last commit relative to historical patterns
  • Bus Factor (30%): Contribution concentration risk
  • Adoption (20%): Download and dependent count trends
  • Release (15%): Time since last release

High-adoption packages receive up to 30% dampening on inactivity and release risk, reflecting that widely-used packages are less likely to be truly abandoned.

Beta Notice: The Weibull model parameters (shape k=1.5, scale λ=52 months, sensitivity α=2.8) have been tuned against historical validation examples but not fitted to a large dataset. Abandonment probabilities should be interpreted as relative risk indicators, not precise predictions. See Validation Results below for details.

Risk Levels

LOW
80-100
Healthy, well-maintained
MEDIUM
60-79
Monitor for changes
HIGH
40-59
Significant concerns
CRITICAL
0-39
Serious issues

Validation Results

We validated the scoring model against 11 real packages with known outcomes: 5 packages that were abandoned or deprecated, and 6 healthy controls. These are validation examples, not statistical proof — a small case study cannot establish precision/recall, but it can catch obviously wrong behavior.

Health Score Separation

Abandoned packages (avg score: ~57)
left-pad48
event-stream49
colors60
request63
moment67
Healthy controls (avg score: ~86)
lodash67
requests (py)84
flask84
express90
numpy96
react97

29-point average separation between groups. No healthy package scored below the worst abandoned package.

Abandonment Risk (12-month horizon)

Abandoned packages (avg risk: ~49%)
event-stream74%
colors54%
left-pad47%
request42%
moment29%
Healthy controls (avg risk: ~16%)
numpy12%
react12%
express13%
requests (py)15%
flask16%
lodash29%

33-point average separation. Lodash scores higher risk due to single maintainer and 1+ year since last release.

Known Limitations

  • High-adoption masking (partially mitigated): Adoption dampening reduces this effect, but massive download counts still inflate user-centric scores. Abandoned packages like colors (60, MEDIUM) and moment (67, MEDIUM) score higher than ideal due to their large user bases. Single-maintainer risk is surfaced independently through abandonment risk probability and maintainer health scores.
  • Maintenance mode detection: The model cannot distinguish between "stable and done" (lodash) and "about to be declared maintenance-only" (moment) based on metrics alone.
  • Small validation set: 11 examples across 2 ecosystems. More case studies will be added as real-world data accumulates.

Data Sources

deps.dev

Primary source for dependencies, dependents, advisories, OpenSSF scores

npm Registry

Downloads, maintainers, deprecation status, release dates

PyPI Registry

Python package metadata, downloads, maintainers, classifiers

GitHub API

Commits, contributors, stars, issue response times, PR merge velocity

Confidence Levels & Intervals

Every score includes a confidence level and interval indicating data reliability:

  • HIGH: Confidence score ≥80% (±5 point interval)
  • MEDIUM: Confidence score 50–79% (±10 point interval)
  • LOW: Confidence score <50% (±15 point interval)
  • INSUFFICIENT_DATA: Package <90 days old — scores may be unreliable

How it works: Confidence is a blended score of data completeness (50%), package age (30%), and data freshness (20%). The interval margin is computed independently from data completeness and collection errors — not directly from the confidence level. Each collection error reduces the confidence score by 10%. GitHub API failures with a known repository URL widen the margin to ±20 points.

Limitations

PkgWatch does NOT measure:

  • Code quality or test coverage
  • API stability or breaking changes
  • Actual usage patterns in your codebase
  • Zero-day vulnerabilities not in public databases
  • Commercial support availability

Scores are advisory - always review packages manually before making critical decisions.

A Note on Our Approach

Scoring weights are informed by software engineering research and practitioner experience. We continue to refine weights as more real-world data becomes available, and a formal backtesting framework is on our roadmap.

As with any automated assessment, scores should be treated as advisory indicators to complement your own judgement, not replace it.

Changelog

v3.1 - February 2026 Current
  • Removed bus factor gate from maturity factor to fix score skew (61.5% of packages were rated HIGH or CRITICAL)
  • Single-maintainer risk now surfaced solely through maintainer health and abandonment risk (no double penalty)
v3.0 - January 2026
  • Added issue response time signal to Community Health
  • Added PR merge velocity signal to Maintainer Health
  • Improved abandonment risk with Weibull survival analysis
  • Added confidence intervals based on data quality
  • Bot commit filtering for accurate activity metrics
v2.0 - January 2026
  • Added Security as 5th component (15% weight)
  • True bus factor from contribution distribution
  • Maturity factor for stable packages
  • Individual OpenSSF checks exposed in API
  • Rebalanced weights (Maintainer 25%, Evolution 20%, Community 10%)
v1.0 - January 2026

Initial release with 4-component scoring