TypeFetched/CONTENT_NEGOTIATION_REPORT.md
Casey Collier d64e135d25 Fix social media sharing and SEO issues
- Add dynamic sitemap.xml generation for search engines
- Implement server-side rendering for social media crawlers
- Add Vary headers to prevent CDN cache pollution
- Provide page-specific meta tags for better social previews
- Fix Facebook/Twitter/LinkedIn sharing with proper HTML responses

Crawlers now get proper HTML with OpenGraph tags while humans get React app
2025-07-20 17:29:21 -04:00

17 KiB

Content Negotiation Edge Cases Report for TypedFetch Website

Executive Summary

The TypedFetch website (typedfetch.dev) currently uses a dual-route approach:

  • /docs - React SPA for human browsers
  • /docs.json - JSON API endpoint for programmatic access

While this separation is clean, it creates several edge cases with social media crawlers, search engines, and CDN caching that need to be addressed for optimal visibility and performance.

Current Architecture Analysis

Strengths

  1. Clear separation of concerns: Human-readable HTML vs machine-readable JSON
  2. Good meta tags: Comprehensive OpenGraph, Twitter Cards, and structured data
  3. AI-friendly setup: llms.txt and dedicated JSON endpoint
  4. SEO basics covered: robots.txt, canonical URLs, meta descriptions

Weaknesses

  1. No sitemap.xml: Critical for search engine discovery
  2. Client-side routing: May cause issues with social media crawlers
  3. Missing server-side rendering: Crawlers may not execute JavaScript
  4. No cache variation strategy: CDNs may serve wrong content type
  5. Limited content negotiation: Only JSON alternative, no markdown support

1. OpenGraph Meta Tags

Current State

  • Meta tags are properly set in index.html
  • OpenGraph image at /og-image.png
  • All required properties present

Technical Requirements

  1. Facebook Crawler (facebookexternalhit)

    • User-Agent: facebookexternalhit/1.1
    • Requires server-rendered HTML
    • Does NOT execute JavaScript
    • Caches aggressively (use Sharing Debugger to refresh)
  2. Required Meta Tags

    <meta property="og:url" content="https://typedfetch.dev/[FULL_URL]" />
    <meta property="og:type" content="article" /> <!-- for docs pages -->
    <meta property="og:title" content="[PAGE_TITLE]" />
    <meta property="og:description" content="[PAGE_DESCRIPTION]" />
    <meta property="og:image" content="https://typedfetch.dev/og-image.png" />
    

Issues with Current Setup

  1. Single Page Application Problem: Facebook crawler won't see content from React routes
  2. Generic meta tags: Same tags for all pages, reducing shareability
  3. No page-specific images: Could have better visual distinction

Proposed Solutions

// vercel.json modification
{
  "functions": {
    "api/ssr-docs/[...path].js": {
      "maxDuration": 10
    }
  },
  "rewrites": [
    {
      "source": "/docs/:path*",
      "destination": "/api/ssr-docs/:path*",
      "has": [
        {
          "type": "header",
          "key": "user-agent",
          "value": ".*(facebookexternalhit|LinkedInBot|Twitterbot|Slackbot|WhatsApp|Discordbot).*"
        }
      ]
    }
  ]
}

Solution B: Pre-rendering Static Pages

// Generate static HTML for each docs page during build
// vite.config.ts addition
export default {
  plugins: [
    {
      name: 'generate-social-pages',
      writeBundle() {
        // Generate minimal HTML pages for crawlers
        generateSocialPages();
      }
    }
  ]
}

Testing Strategy

# Test with Facebook Sharing Debugger
curl -A "facebookexternalhit/1.1" https://typedfetch.dev/docs/getting-started

# Validate with official tool
# https://developers.facebook.com/tools/debug/

2. Search Engine Indexing

Technical Requirements

  1. Googlebot Behavior

    • Modern Googlebot executes JavaScript (Chrome 90+)
    • Prefers server-rendered content for faster indexing
    • Respects Vary: Accept header for content negotiation
  2. Bing/Microsoft Edge

    • Limited JavaScript execution
    • Requires proper HTML structure
    • Values sitemap.xml highly

Current Issues

  1. Missing sitemap.xml: Essential for discovery
  2. No structured data for docs: Missing breadcrumbs, article schema
  3. Client-side content: Delays indexing, may miss content

Proposed Solutions

1. Dynamic Sitemap Generation

// api/sitemap.xml.js
export default function handler(req, res) {
  const baseUrl = 'https://typedfetch.dev';
  const pages = [
    { url: '/', priority: 1.0, changefreq: 'weekly' },
    { url: '/docs', priority: 0.9, changefreq: 'weekly' },
    { url: '/docs/getting-started', priority: 0.8, changefreq: 'monthly' },
    { url: '/docs/installation', priority: 0.8, changefreq: 'monthly' },
    // ... other pages
  ];

  const sitemap = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
${pages.map(page => `  <url>
    <loc>${baseUrl}${page.url}</loc>
    <changefreq>${page.changefreq}</changefreq>
    <priority>${page.priority}</priority>
    <lastmod>${new Date().toISOString()}</lastmod>
  </url>`).join('\n')}
</urlset>`;

  res.setHeader('Content-Type', 'application/xml');
  res.setHeader('Cache-Control', 'public, max-age=3600');
  res.status(200).send(sitemap);
}

2. Enhanced Structured Data

// For each documentation page
const structuredData = {
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "headline": pageTitle,
  "description": pageDescription,
  "author": {
    "@type": "Organization",
    "name": "Catalyst Labs"
  },
  "datePublished": "2024-01-01",
  "dateModified": new Date().toISOString(),
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": `https://typedfetch.dev${path}`
  },
  "breadcrumb": {
    "@type": "BreadcrumbList",
    "itemListElement": [
      {
        "@type": "ListItem",
        "position": 1,
        "name": "Docs",
        "item": "https://typedfetch.dev/docs"
      },
      {
        "@type": "ListItem",
        "position": 2,
        "name": pageTitle,
        "item": `https://typedfetch.dev${path}`
      }
    ]
  }
};

Testing Strategy

# Test rendering
curl -A "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" \
  https://typedfetch.dev/docs/getting-started

# Validate structured data
# https://search.google.com/test/rich-results

# Check indexing status
# https://search.google.com/search-console

3. Social Media Preview Issues

Platform-Specific Requirements

Twitter/X

  • User-Agent: Twitterbot
  • Requires: twitter:card, twitter:site, twitter:creator
  • Supports JavaScript execution (limited)
  • Image requirements: 2:1 aspect ratio, min 300x157px

LinkedIn

  • User-Agent: LinkedInBot
  • NO JavaScript execution
  • Caches aggressively
  • Prefers og:image with 1200x627px

Discord

  • User-Agent: Discordbot
  • NO JavaScript execution
  • Embeds based on OpenGraph tags
  • Supports multiple images

WhatsApp

  • User-Agent: WhatsApp
  • NO JavaScript execution
  • Basic OpenGraph support
  • Thumbnail generation from og:image

Current Issues

  1. SPA content not visible: Crawlers can't see React-rendered content
  2. Generic previews: All pages show same preview
  3. No URL unfurling data: Missing rich previews for specific pages

Proposed Solutions

1. Crawler-Specific Responses

// api/social-preview/[...path].js
export default function handler(req, res) {
  const userAgent = req.headers['user-agent'] || '';
  const crawlers = ['facebookexternalhit', 'LinkedInBot', 'Twitterbot', 'Discordbot', 'WhatsApp'];
  
  const isCrawler = crawlers.some(bot => userAgent.includes(bot));
  
  if (isCrawler) {
    const path = req.query.path?.join('/') || '';
    const pageData = getPageData(path);
    
    const html = `<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <title>${pageData.title} - TypedFetch</title>
  <meta property="og:title" content="${pageData.title}" />
  <meta property="og:description" content="${pageData.description}" />
  <meta property="og:url" content="https://typedfetch.dev/docs/${path}" />
  <meta property="og:image" content="https://typedfetch.dev/og-image.png" />
  <meta property="og:type" content="article" />
  
  <meta name="twitter:card" content="summary_large_image" />
  <meta name="twitter:title" content="${pageData.title}" />
  <meta name="twitter:description" content="${pageData.description}" />
  <meta name="twitter:image" content="https://typedfetch.dev/og-image.png" />
</head>
<body>
  <h1>${pageData.title}</h1>
  <p>${pageData.description}</p>
</body>
</html>`;
    
    res.setHeader('Content-Type', 'text/html');
    res.status(200).send(html);
  } else {
    // Regular users get the React app
    res.status(200).sendFile(path.join(__dirname, '../index.html'));
  }
}

2. Dynamic OpenGraph Images

// api/og-image/[...path].js
import { ImageResponse } from '@vercel/og';

export default function handler(req) {
  const { path } = req.query;
  const pageTitle = getPageTitle(path);
  
  return new ImageResponse(
    (
      <div style={{
        background: 'linear-gradient(to right, #8b5cf6, #3b82f6)',
        width: '100%',
        height: '100%',
        display: 'flex',
        alignItems: 'center',
        justifyContent: 'center',
      }}>
        <h1 style={{ color: 'white', fontSize: 60 }}>{pageTitle}</h1>
      </div>
    ),
    {
      width: 1200,
      height: 630,
    }
  );
}

Testing Tools

# Twitter Card Validator
# https://cards-dev.twitter.com/validator

# LinkedIn Post Inspector
# https://www.linkedin.com/post-inspector/

# Facebook Sharing Debugger
# https://developers.facebook.com/tools/debug/

# Discord Embed Visualizer
# https://discohook.org/

4. CDN/Proxy Cache Pollution

Current Issues

  1. No Vary header: CDNs can't distinguish content types
  2. Same URL pattern: /docs serves different content based on client
  3. Cache key collision: JSON and HTML responses cached together

Technical Requirements

  1. Cloudflare: Respects Vary header, needs proper cache keys
  2. Vercel Edge: Built-in caching, needs configuration
  3. Browser caching: Must handle different content types

Proposed Solutions

1. Proper Cache Headers

// Set appropriate Vary headers
export default function handler(req, res) {
  const acceptHeader = req.headers.accept || '';
  
  // Indicate that response varies by Accept header
  res.setHeader('Vary', 'Accept, User-Agent');
  
  if (acceptHeader.includes('application/json')) {
    res.setHeader('Content-Type', 'application/json');
    res.setHeader('Cache-Control', 'public, max-age=3600, stale-while-revalidate=86400');
    return res.json(data);
  } else {
    res.setHeader('Content-Type', 'text/html');
    res.setHeader('Cache-Control', 'public, max-age=300, stale-while-revalidate=3600');
    return res.send(html);
  }
}

2. CDN Configuration

// vercel.json
{
  "headers": [
    {
      "source": "/docs/(.*)",
      "headers": [
        {
          "key": "Vary",
          "value": "Accept, User-Agent"
        },
        {
          "key": "Cache-Control",
          "value": "public, max-age=0, must-revalidate"
        }
      ]
    }
  ]
}

3. Separate Cache Keys

// Use different URLs for different content types
// This avoids cache pollution entirely
{
  "rewrites": [
    {
      "source": "/docs.json",
      "destination": "/api/docs?format=json"
    },
    {
      "source": "/docs.md", 
      "destination": "/api/docs?format=markdown"
    },
    {
      "source": "/docs.xml",
      "destination": "/api/docs?format=xml"
    }
  ]
}

Testing Strategy

# Test cache behavior
curl -H "Accept: application/json" https://typedfetch.dev/docs
curl -H "Accept: text/html" https://typedfetch.dev/docs

# Check cache headers
curl -I https://typedfetch.dev/docs

# Test CDN caching
# Use different locations/proxies to verify cache separation

5. API Documentation Discovery

Current State

  • JSON endpoint at /docs.json
  • No OpenAPI/Swagger spec
  • No machine-readable API description

Technical Requirements

  1. OpenAPI Discovery: .well-known/openapi.json
  2. Postman Collection: Exportable collection format
  3. Developer Portal: Interactive API documentation

Proposed Solutions

1. OpenAPI Specification

// api/openapi.json
export default function handler(req, res) {
  const spec = {
    "openapi": "3.0.0",
    "info": {
      "title": "TypedFetch Documentation API",
      "version": "1.0.0",
      "description": "API for accessing TypedFetch documentation"
    },
    "servers": [
      {
        "url": "https://typedfetch.dev"
      }
    ],
    "paths": {
      "/docs.json": {
        "get": {
          "summary": "Get documentation index",
          "responses": {
            "200": {
              "description": "Documentation sections",
              "content": {
                "application/json": {
                  "schema": {
                    "$ref": "#/components/schemas/Documentation"
                  }
                }
              }
            }
          }
        }
      }
    }
  };
  
  res.setHeader('Content-Type', 'application/json');
  res.json(spec);
}

2. Well-Known Discovery

// public/.well-known/apis.json
{
  "name": "TypedFetch",
  "description": "Zero-dependency type-safe HTTP client",
  "url": "https://typedfetch.dev",
  "apis": [
    {
      "name": "Documentation API",
      "description": "Access TypedFetch documentation",
      "baseURL": "https://typedfetch.dev",
      "properties": [
        {
          "type": "OpenAPI",
          "url": "https://typedfetch.dev/openapi.json"
        },
        {
          "type": "Postman",
          "url": "https://typedfetch.dev/postman-collection.json"
        }
      ]
    }
  ]
}

3. Content Type Negotiation

// Enhanced API endpoint with multiple formats
export default function handler(req, res) {
  const accept = req.headers.accept || '';
  const format = req.query.format;
  
  // Format priority: query param > accept header > default
  if (format === 'openapi' || accept.includes('application/vnd.oai.openapi')) {
    return res.json(generateOpenAPISpec());
  } else if (format === 'postman' || accept.includes('application/vnd.postman')) {
    return res.json(generatePostmanCollection());
  } else if (format === 'markdown' || accept.includes('text/markdown')) {
    res.setHeader('Content-Type', 'text/markdown');
    return res.send(generateMarkdownDocs());
  } else {
    return res.json(docsData);
  }
}

Testing Strategy

# Test OpenAPI discovery
curl https://typedfetch.dev/.well-known/openapi.json

# Test content negotiation
curl -H "Accept: application/vnd.oai.openapi" https://typedfetch.dev/docs

# Import into tools
# - Postman: Import > Link > https://typedfetch.dev/postman-collection.json
# - Swagger UI: https://petstore.swagger.io/?url=https://typedfetch.dev/openapi.json

Implementation Priority

Phase 1: Critical Fixes (Week 1)

  1. Add sitemap.xml - Essential for SEO
  2. Implement crawler detection - Fix social sharing
  3. Add Vary headers - Prevent cache pollution
  4. Create static fallbacks - Ensure content visibility

Phase 2: Enhancements (Week 2)

  1. Dynamic OG images - Better social previews
  2. Enhanced structured data - Rich search results
  3. Multiple content formats - Markdown, XML support
  4. API discovery endpoints - Developer tools

Phase 3: Optimization (Week 3)

  1. Edge-side rendering - Optimal performance
  2. Smart caching strategies - Reduce server load
  3. Monitoring and analytics - Track improvements
  4. A/B testing - Optimize conversions

Monitoring and Validation

Key Metrics to Track

  1. Search Console: Indexing status, crawl errors
  2. Social shares: Engagement rates, preview quality
  3. Cache hit rates: CDN performance
  4. API usage: Developer adoption

Automated Testing Suite

// tests/seo-validation.test.js
describe('SEO and Social Media', () => {
  test('Crawler receives HTML content', async () => {
    const response = await fetch('/docs/getting-started', {
      headers: { 'User-Agent': 'facebookexternalhit/1.1' }
    });
    const html = await response.text();
    expect(html).toContain('<meta property="og:title"');
  });
  
  test('API returns JSON with correct headers', async () => {
    const response = await fetch('/docs.json');
    expect(response.headers.get('content-type')).toBe('application/json');
    expect(response.headers.get('vary')).toContain('Accept');
  });
});

Conclusion

The current TypedFetch website has a solid foundation but needs enhancements to handle edge cases with social media crawlers, search engines, and content delivery networks. The proposed solutions maintain the clean separation between human and machine interfaces while ensuring all crawlers and tools can access the content they need.

Key improvements focus on:

  1. Server-side rendering for crawlers
  2. Proper content negotiation with caching
  3. Enhanced metadata and structured data
  4. Multiple content format support
  5. API discovery mechanisms

These changes will significantly improve the website's visibility, shareability, and developer experience while maintaining the current architecture's strengths.