Transparent crawler discovery experiment

GitHub Machine Beacon

A transparent GitHub experiment that makes a repository unusually easy for crawlers, search indexes, AI agents, LLM readers, link preview bots, and code indexers to discover and parse.

17 indexed resources 7 reusable guides 71 mapped terms
Diagram of repository, metadata, feeds, crawlers, and measurement loops.

Live edge traffic

...

Cloudflare requests through the live homepage and machine-readable endpoints.

Machine visits ...
Human visits ...
Unknown ...
GitHub views 0

Live split updated . Source: Cloudflare Worker. GitHub official snapshot: 0 views, 0 unique visitors, 0 clones. Raw edge data: cloudflare-traffic.json.

Resource Library

These pages are designed to be useful enough for humans to cite and structured enough for machines to parse.

Machine Surfaces

Principles

  • Be transparent about the experiment.
  • Use honest metadata and relevant keywords only.
  • Publish stable machine-readable entry points.
  • Respect robots.txt and platform rules.
  • Measure discovery without generating fake traffic.

Measurement Fields

  • repository_views
  • unique_visitors
  • referrers
  • popular_content
  • clones
  • unique_cloners
  • edge_requests
  • machine_requests
  • human_requests
  • unknown_requests
  • stars
  • forks
  • issues_or_discussions
  • external_citations

Keyword Map

Terms are grouped by intent so crawlers and human auditors can distinguish meaningful topic coverage from unrelated keyword stuffing.

machine-readable web discovery

Signals for crawlers and search indexes that prefer structured, canonical resources.

  • machine-readable repository
  • crawler-friendly GitHub project
  • GitHub Pages metadata
  • sitemap.xml
  • robots.txt
  • structured data
  • JSON-LD
  • Open Graph metadata
  • canonical URL
  • Atom feed
  • RSS feed
  • web crawler observability

AI and LLM discovery

Signals for retrieval systems, AI coding tools, and agent browsers.

  • llms.txt
  • LLM crawler
  • AI agent browser
  • AI search indexing
  • retrieval augmented generation
  • RAG source
  • agent-readable documentation
  • machine context file
  • AI code search
  • LLM metadata
  • crawler manifest
  • semantic README

GitHub repository discovery

Signals that help repository search, code search, and topic-based browsing.

  • GitHub search optimization
  • GitHub repository metadata
  • GitHub topics
  • README structure
  • code indexing
  • open source discoverability
  • repository traffic experiment
  • GitHub Insights traffic
  • GitHub Pages deployment
  • open research repository
  • software citation
  • CITATION.cff

measurement and ethics

Signals that the project is an observable, non-deceptive experiment.

  • crawler experiment
  • traffic measurement
  • ethical SEO
  • transparent metadata
  • no fake traffic
  • no cloaking
  • privacy-preserving analytics
  • search experiment
  • bot traffic research
  • machine traffic benchmark
  • crawlability audit
  • public web observability

Update Contract

This page is generated from data/beacon.json and data/content-pages.json. Declared project version: 0.4.0. Last declared update: .