GitHub Machine Beacon resource

Crawler Surface Map

A map of repository and website surfaces that expose the experiment to crawlers, code indexes, LLM readers, and link preview systems.

Surface matrix

The project is intentionally multi-surface. Each surface gives a different reader a compact route into the same canonical project identity.

SurfacePrimary readerSignal exposedWhy it matters
README.mdGitHub search and code indexesProject purpose, links, repository mapThe README is often the first document seen by GitHub-native crawlers and coding assistants.
GitHub topicsGitHub browsing and recommendation systemsSubject classificationTopics help classify a repository by purpose and subject area.
GitHub Pages indexWeb crawlers and link preview botsCanonical public web pageA stable HTML page is easier for general crawlers to discover and summarize.
llms.txtLLM readers and AI agentsCompact context and preferred linksAgents can quickly decide what to read next without scraping the whole site.
crawler-manifest.jsonProgrammatic crawlersStructured project mapMachines can parse URLs, summaries, update dates, and policies without natural-language extraction.
sitemap.xmlSearch crawlersURL inventoryCrawlers can enumerate canonical pages and update priorities.
feed.xmlFeed readers and recrawl systemsUpdate signalMeaningful releases and docs changes can create legitimate revisit cues.

Preferred crawl order

  • Start with crawler-manifest.json for canonical project metadata.
  • Read llms.txt for the compact agent summary.
  • Read the Pages index for HTML context and structured data.
  • Read resource pages from the sitemap for deeper content.
  • Use feed.xml to detect meaningful updates.

Page Keywords

  • crawler surface map
  • crawler entry points
  • GitHub Pages metadata
  • repository discovery
  • web crawler observability