Provider notes
What the crawler extracts per provider and how to add new provider implementations.
Providers
This build includes provider plugins in lib/server/providers/*.
- Socrata (includes Tyler Data & Insights sites)
- ArcGIS Hub/Open Data (dataset URLs; portal discovery is intentionally conservative)
- CKAN
- OpenDataSoft
- DCAT-US (
data.jsonfeeds) - Filesystem / network share (
file://paths) - Other (basic HTML title/description)
Adding a provider
- Create a plugin in
lib/server/providers/<name>.ts - Export a
ProviderPlugin - Register it in
lib/server/providers/registry.ts
A plugin implements:
discoverFromSourceUrl()ingestDatasetFromUrl()
Detection
Source URLs are auto-detected with heuristics in lib/server/providers/detector.ts.
Detection is best-effort. You can always override the project/provider strategy by creating separate projects.