Scoring Methodology
How PkgWatch calculates health scores and predicts abandonment risk
Component Weights
The health score (0-100) is a weighted combination of five components:
Adoption metrics - most predictive of package health
Active maintenance and sustainability
Development momentum with maturity adjustment
Security posture and vulnerability status
Contributor diversity and engagement
How We Score
Recency Signals
We use exponential decay functions so scores decrease smoothly over time, not in sudden steps.
- Commit recency: 90-day half-life (50% score after 90 days inactive)
- Release recency: 180-day half-life (more lenient for stable releases)
True Bus Factor
We analyze commit distribution to find the minimum contributors needed for 50% of commits. A project with 10 contributors where 1 person does 95% of work has a bus factor of 1, not 10.
- Bus factor 1: High risk - single point of failure
- Bus factor 2-3: Moderate risk - small team
- Bus factor 4+: Lower risk - distributed contributions
Maturity Factor
High-adoption packages with low activity aren't penalized - they're likely stable, not abandoned.
Packages like lodash benefit from this.
Applies: Uses smooth sigmoid transitions centered at ~1M weekly downloads or ~5K dependents, combined with low activity (~10 commits/90d). Benefits scale gradually — no hard cutoffs. Capped at 0.7 (even the most mature package only gets a 70% floor for evolution sub-scores). Single-maintainer risk is surfaced independently through maintainer health and abandonment risk scores.
Security Assessment
We integrate OpenSSF Scorecard data and track known vulnerabilities by severity.
OpenSSF Scorecard (50%): Automated security practices assessment
Vulnerabilities (30%): CRITICAL=3x, HIGH=2x, MEDIUM=1x weight
Security Policy (20%): Has SECURITY.md with disclosure process
PR Merge Velocity
Measures maintainer responsiveness by tracking the ratio of merged to opened pull requests over the last 90 days.
- >80% merge rate: High maintainer responsiveness
- ~50% merge rate: Moderate maintainer engagement
- <30% merge rate: Potential maintainer overload or abandonment
- No PRs: Neutral score (could indicate stable package)
Uses a continuous sigmoid function — scores transition smoothly, not in discrete steps.
Issue Response Time
Fast issue response time is a strong indicator of maintainer engagement and project health.
- <24 hours: Perfect score (1.0)
- 24-72 hours: High score (0.7-1.0)
- >72 hours: Exponential decay
- No data: Neutral score (0.5)
Bot Commit Filtering
We filter out commits from known bot accounts (Dependabot, Renovate, etc.) to get accurate human activity metrics.
Why it matters: Bot commits can artificially inflate activity metrics. We analyze real human contributions for more accurate health assessment.
Deprecated & Archived Overrides
When a package is explicitly deprecated or its repository is archived, normal scoring is overridden with hard caps to ensure these packages are flagged appropriately:
- Deprecated: Health score capped at 35 (CRITICAL)
- Archived: Health score capped at 40 (HIGH)
- Both: Health score capped at 35 (CRITICAL)
Abandonment risk is also forced to 95% for deprecated or archived packages, regardless of other factors.
Abandonment Risk Prediction
BetaPkgWatch estimates the probability that a package will be abandoned within the next 12 months using a Weibull survival analysis model. The model considers four risk factors:
- Inactivity (35%): Time since last commit relative to historical patterns
- Bus Factor (30%): Contribution concentration risk
- Adoption (20%): Download and dependent count trends
- Release (15%): Time since last release
High-adoption packages receive up to 30% dampening on inactivity and release risk, reflecting that widely-used packages are less likely to be truly abandoned.
Risk Levels
Validation Results
We validated the scoring model against 11 real packages with known outcomes: 5 packages that were abandoned or deprecated, and 6 healthy controls. These are validation examples, not statistical proof — a small case study cannot establish precision/recall, but it can catch obviously wrong behavior.
Health Score Separation
29-point average separation between groups. No healthy package scored below the worst abandoned package.
Abandonment Risk (12-month horizon)
33-point average separation. Lodash scores higher risk due to single maintainer and 1+ year since last release.
Known Limitations
- High-adoption masking (partially mitigated): Adoption dampening reduces this effect, but massive download counts still inflate user-centric scores. Abandoned packages like colors (60, MEDIUM) and moment (67, MEDIUM) score higher than ideal due to their large user bases. Single-maintainer risk is surfaced independently through abandonment risk probability and maintainer health scores.
- Maintenance mode detection: The model cannot distinguish between "stable and done" (lodash) and "about to be declared maintenance-only" (moment) based on metrics alone.
- Small validation set: 11 examples across 2 ecosystems. More case studies will be added as real-world data accumulates.
Data Sources
deps.dev
Primary source for dependencies, dependents, advisories, OpenSSF scores
npm Registry
Downloads, maintainers, deprecation status, release dates
PyPI Registry
Python package metadata, downloads, maintainers, classifiers
GitHub API
Commits, contributors, stars, issue response times, PR merge velocity
Confidence Levels & Intervals
Every score includes a confidence level and interval indicating data reliability:
- HIGH: Confidence score ≥80% (±5 point interval)
- MEDIUM: Confidence score 50–79% (±10 point interval)
- LOW: Confidence score <50% (±15 point interval)
- INSUFFICIENT_DATA: Package <90 days old — scores may be unreliable
How it works: Confidence is a blended score of data completeness (50%), package age (30%), and data freshness (20%). The interval margin is computed independently from data completeness and collection errors — not directly from the confidence level. Each collection error reduces the confidence score by 10%. GitHub API failures with a known repository URL widen the margin to ±20 points.
Limitations
PkgWatch does NOT measure:
- Code quality or test coverage
- API stability or breaking changes
- Actual usage patterns in your codebase
- Zero-day vulnerabilities not in public databases
- Commercial support availability
Scores are advisory - always review packages manually before making critical decisions.
A Note on Our Approach
Scoring weights are informed by software engineering research and practitioner experience. We continue to refine weights as more real-world data becomes available, and a formal backtesting framework is on our roadmap.
As with any automated assessment, scores should be treated as advisory indicators to complement your own judgement, not replace it.
Changelog
- Removed bus factor gate from maturity factor to fix score skew (61.5% of packages were rated HIGH or CRITICAL)
- Single-maintainer risk now surfaced solely through maintainer health and abandonment risk (no double penalty)
- Added issue response time signal to Community Health
- Added PR merge velocity signal to Maintainer Health
- Improved abandonment risk with Weibull survival analysis
- Added confidence intervals based on data quality
- Bot commit filtering for accurate activity metrics
- Added Security as 5th component (15% weight)
- True bus factor from contribution distribution
- Maturity factor for stable packages
- Individual OpenSSF checks exposed in API
- Rebalanced weights (Maintainer 25%, Evolution 20%, Community 10%)
Initial release with 4-component scoring