Casey Collier d64e135d25 Fix social media sharing and SEO issues

- Add dynamic sitemap.xml generation for search engines
- Implement server-side rendering for social media crawlers
- Add Vary headers to prevent CDN cache pollution
- Provide page-specific meta tags for better social previews
- Fix Facebook/Twitter/LinkedIn sharing with proper HTML responses

Crawlers now get proper HTML with OpenGraph tags while humans get React app

2025-07-20 17:29:21 -04:00

17 KiB

Raw Blame History

Content Negotiation Edge Cases Report for TypedFetch Website

Executive Summary

The TypedFetch website (typedfetch.dev) currently uses a dual-route approach:

/docs - React SPA for human browsers
/docs.json - JSON API endpoint for programmatic access

While this separation is clean, it creates several edge cases with social media crawlers, search engines, and CDN caching that need to be addressed for optimal visibility and performance.

Current Architecture Analysis

Strengths

Clear separation of concerns: Human-readable HTML vs machine-readable JSON
Good meta tags: Comprehensive OpenGraph, Twitter Cards, and structured data
AI-friendly setup: llms.txt and dedicated JSON endpoint
SEO basics covered: robots.txt, canonical URLs, meta descriptions

Weaknesses

No sitemap.xml: Critical for search engine discovery
Client-side routing: May cause issues with social media crawlers
Missing server-side rendering: Crawlers may not execute JavaScript
No cache variation strategy: CDNs may serve wrong content type
Limited content negotiation: Only JSON alternative, no markdown support

1. OpenGraph Meta Tags

Current State

Meta tags are properly set in index.html
OpenGraph image at /og-image.png
All required properties present

Technical Requirements

Facebook Crawler (facebookexternalhit)
- User-Agent: facebookexternalhit/1.1
- Requires server-rendered HTML
- Does NOT execute JavaScript
- Caches aggressively (use Sharing Debugger to refresh)

Required Meta Tags

<meta property="og:url" content="https://typedfetch.dev/[FULL_URL]" />
<meta property="og:type" content="article" /> <!-- for docs pages -->
<meta property="og:title" content="[PAGE_TITLE]" />
<meta property="og:description" content="[PAGE_DESCRIPTION]" />
<meta property="og:image" content="https://typedfetch.dev/og-image.png" />

Issues with Current Setup

Single Page Application Problem: Facebook crawler won't see content from React routes
Generic meta tags: Same tags for all pages, reducing shareability
No page-specific images: Could have better visual distinction

Proposed Solutions

Solution A: Server-Side Rendering (Recommended)

// vercel.json modification
{
  "functions": {
    "api/ssr-docs/[...path].js": {
      "maxDuration": 10
    }
  },
  "rewrites": [
    {
      "source": "/docs/:path*",
      "destination": "/api/ssr-docs/:path*",
      "has": [
        {
          "type": "header",
          "key": "user-agent",
          "value": ".*(facebookexternalhit|LinkedInBot|Twitterbot|Slackbot|WhatsApp|Discordbot).*"
        }
      ]
    }
  ]
}

Solution B: Pre-rendering Static Pages

// Generate static HTML for each docs page during build
// vite.config.ts addition
export default {
  plugins: [
    {
      name: 'generate-social-pages',
      writeBundle() {
        // Generate minimal HTML pages for crawlers
        generateSocialPages();
      }
    }
  ]
}

Testing Strategy

# Test with Facebook Sharing Debugger
curl -A "facebookexternalhit/1.1" https://typedfetch.dev/docs/getting-started

# Validate with official tool
# https://developers.facebook.com/tools/debug/

2. Search Engine Indexing

Technical Requirements

Googlebot Behavior
- Modern Googlebot executes JavaScript (Chrome 90+)
- Prefers server-rendered content for faster indexing
- Respects Vary: Accept header for content negotiation
Bing/Microsoft Edge
- Limited JavaScript execution
- Requires proper HTML structure
- Values sitemap.xml highly

Current Issues

Missing sitemap.xml: Essential for discovery
No structured data for docs: Missing breadcrumbs, article schema
Client-side content: Delays indexing, may miss content

Proposed Solutions

1. Dynamic Sitemap Generation

// api/sitemap.xml.js
export default function handler(req, res) {
  const baseUrl = 'https://typedfetch.dev';
  const pages = [
    { url: '/', priority: 1.0, changefreq: 'weekly' },
    { url: '/docs', priority: 0.9, changefreq: 'weekly' },
    { url: '/docs/getting-started', priority: 0.8, changefreq: 'monthly' },
    { url: '/docs/installation', priority: 0.8, changefreq: 'monthly' },
    // ... other pages
  ];

  const sitemap = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
${pages.map(page => `  <url>
    <loc>${baseUrl}${page.url}</loc>
    <changefreq>${page.changefreq}</changefreq>
    <priority>${page.priority}</priority>
    <lastmod>${new Date().toISOString()}</lastmod>
  </url>`).join('\n')}
</urlset>`;

  res.setHeader('Content-Type', 'application/xml');
  res.setHeader('Cache-Control', 'public, max-age=3600');
  res.status(200).send(sitemap);
}

2. Enhanced Structured Data

// For each documentation page
const structuredData = {
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "headline": pageTitle,
  "description": pageDescription,
  "author": {
    "@type": "Organization",
    "name": "Catalyst Labs"
  },
  "datePublished": "2024-01-01",
  "dateModified": new Date().toISOString(),
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": `https://typedfetch.dev${path}`
  },
  "breadcrumb": {
    "@type": "BreadcrumbList",
    "itemListElement": [
      {
        "@type": "ListItem",
        "position": 1,
        "name": "Docs",
        "item": "https://typedfetch.dev/docs"
      },
      {
        "@type": "ListItem",
        "position": 2,
        "name": pageTitle,
        "item": `https://typedfetch.dev${path}`
      }
    ]
  }
};

Testing Strategy

# Test rendering
curl -A "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" \
  https://typedfetch.dev/docs/getting-started

# Validate structured data
# https://search.google.com/test/rich-results

# Check indexing status
# https://search.google.com/search-console

Platform-Specific Requirements

Twitter/X

User-Agent: Twitterbot
Requires: twitter:card, twitter:site, twitter:creator
Supports JavaScript execution (limited)
Image requirements: 2:1 aspect ratio, min 300x157px

User-Agent: LinkedInBot
NO JavaScript execution
Caches aggressively
Prefers og:image with 1200x627px

Discord

User-Agent: Discordbot
NO JavaScript execution
Embeds based on OpenGraph tags
Supports multiple images

User-Agent: WhatsApp
NO JavaScript execution
Basic OpenGraph support
Thumbnail generation from og:image

Current Issues

SPA content not visible: Crawlers can't see React-rendered content
Generic previews: All pages show same preview
No URL unfurling data: Missing rich previews for specific pages

Proposed Solutions

1. Crawler-Specific Responses

// api/social-preview/[...path].js
export default function handler(req, res) {
  const userAgent = req.headers['user-agent'] || '';
  const crawlers = ['facebookexternalhit', 'LinkedInBot', 'Twitterbot', 'Discordbot', 'WhatsApp'];
  
  const isCrawler = crawlers.some(bot => userAgent.includes(bot));
  
  if (isCrawler) {
    const path = req.query.path?.join('/') || '';
    const pageData = getPageData(path);
    
    const html = `<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <title>${pageData.title} - TypedFetch</title>
  <meta property="og:title" content="${pageData.title}" />
  <meta property="og:description" content="${pageData.description}" />
  <meta property="og:url" content="https://typedfetch.dev/docs/${path}" />
  <meta property="og:image" content="https://typedfetch.dev/og-image.png" />
  <meta property="og:type" content="article" />
  
  <meta name="twitter:card" content="summary_large_image" />
  <meta name="twitter:title" content="${pageData.title}" />
  <meta name="twitter:description" content="${pageData.description}" />
  <meta name="twitter:image" content="https://typedfetch.dev/og-image.png" />
</head>
<body>
  <h1>${pageData.title}</h1>
  <p>${pageData.description}</p>
</body>
</html>`;
    
    res.setHeader('Content-Type', 'text/html');
    res.status(200).send(html);
  } else {
    // Regular users get the React app
    res.status(200).sendFile(path.join(__dirname, '../index.html'));
  }
}

2. Dynamic OpenGraph Images

// api/og-image/[...path].js
import { ImageResponse } from '@vercel/og';

export default function handler(req) {
  const { path } = req.query;
  const pageTitle = getPageTitle(path);
  
  return new ImageResponse(
    (
      <div style={{
        background: 'linear-gradient(to right, #8b5cf6, #3b82f6)',
        width: '100%',
        height: '100%',
        display: 'flex',
        alignItems: 'center',
        justifyContent: 'center',
      }}>
        <h1 style={{ color: 'white', fontSize: 60 }}>{pageTitle}</h1>
      </div>
    ),
    {
      width: 1200,
      height: 630,
    }
  );
}

Testing Tools

# Twitter Card Validator
# https://cards-dev.twitter.com/validator

# LinkedIn Post Inspector
# https://www.linkedin.com/post-inspector/

# Facebook Sharing Debugger
# https://developers.facebook.com/tools/debug/

# Discord Embed Visualizer
# https://discohook.org/

4. CDN/Proxy Cache Pollution

Current Issues

No Vary header: CDNs can't distinguish content types
Same URL pattern: /docs serves different content based on client
Cache key collision: JSON and HTML responses cached together

Technical Requirements

Cloudflare: Respects Vary header, needs proper cache keys
Vercel Edge: Built-in caching, needs configuration
Browser caching: Must handle different content types

Proposed Solutions

1. Proper Cache Headers

// Set appropriate Vary headers
export default function handler(req, res) {
  const acceptHeader = req.headers.accept || '';
  
  // Indicate that response varies by Accept header
  res.setHeader('Vary', 'Accept, User-Agent');
  
  if (acceptHeader.includes('application/json')) {
    res.setHeader('Content-Type', 'application/json');
    res.setHeader('Cache-Control', 'public, max-age=3600, stale-while-revalidate=86400');
    return res.json(data);
  } else {
    res.setHeader('Content-Type', 'text/html');
    res.setHeader('Cache-Control', 'public, max-age=300, stale-while-revalidate=3600');
    return res.send(html);
  }
}

2. CDN Configuration

// vercel.json
{
  "headers": [
    {
      "source": "/docs/(.*)",
      "headers": [
        {
          "key": "Vary",
          "value": "Accept, User-Agent"
        },
        {
          "key": "Cache-Control",
          "value": "public, max-age=0, must-revalidate"
        }
      ]
    }
  ]
}

3. Separate Cache Keys

// Use different URLs for different content types
// This avoids cache pollution entirely
{
  "rewrites": [
    {
      "source": "/docs.json",
      "destination": "/api/docs?format=json"
    },
    {
      "source": "/docs.md", 
      "destination": "/api/docs?format=markdown"
    },
    {
      "source": "/docs.xml",
      "destination": "/api/docs?format=xml"
    }
  ]
}

Testing Strategy

# Test cache behavior
curl -H "Accept: application/json" https://typedfetch.dev/docs
curl -H "Accept: text/html" https://typedfetch.dev/docs

# Check cache headers
curl -I https://typedfetch.dev/docs

# Test CDN caching
# Use different locations/proxies to verify cache separation

5. API Documentation Discovery

Current State

JSON endpoint at /docs.json
No OpenAPI/Swagger spec
No machine-readable API description

Technical Requirements

OpenAPI Discovery: .well-known/openapi.json
Postman Collection: Exportable collection format
Developer Portal: Interactive API documentation

Proposed Solutions

1. OpenAPI Specification

// api/openapi.json
export default function handler(req, res) {
  const spec = {
    "openapi": "3.0.0",
    "info": {
      "title": "TypedFetch Documentation API",
      "version": "1.0.0",
      "description": "API for accessing TypedFetch documentation"
    },
    "servers": [
      {
        "url": "https://typedfetch.dev"
      }
    ],
    "paths": {
      "/docs.json": {
        "get": {
          "summary": "Get documentation index",
          "responses": {
            "200": {
              "description": "Documentation sections",
              "content": {
                "application/json": {
                  "schema": {
                    "$ref": "#/components/schemas/Documentation"
                  }
                }
              }
            }
          }
        }
      }
    }
  };
  
  res.setHeader('Content-Type', 'application/json');
  res.json(spec);
}

2. Well-Known Discovery

// public/.well-known/apis.json
{
  "name": "TypedFetch",
  "description": "Zero-dependency type-safe HTTP client",
  "url": "https://typedfetch.dev",
  "apis": [
    {
      "name": "Documentation API",
      "description": "Access TypedFetch documentation",
      "baseURL": "https://typedfetch.dev",
      "properties": [
        {
          "type": "OpenAPI",
          "url": "https://typedfetch.dev/openapi.json"
        },
        {
          "type": "Postman",
          "url": "https://typedfetch.dev/postman-collection.json"
        }
      ]
    }
  ]
}

3. Content Type Negotiation

// Enhanced API endpoint with multiple formats
export default function handler(req, res) {
  const accept = req.headers.accept || '';
  const format = req.query.format;
  
  // Format priority: query param > accept header > default
  if (format === 'openapi' || accept.includes('application/vnd.oai.openapi')) {
    return res.json(generateOpenAPISpec());
  } else if (format === 'postman' || accept.includes('application/vnd.postman')) {
    return res.json(generatePostmanCollection());
  } else if (format === 'markdown' || accept.includes('text/markdown')) {
    res.setHeader('Content-Type', 'text/markdown');
    return res.send(generateMarkdownDocs());
  } else {
    return res.json(docsData);
  }
}

Testing Strategy

# Test OpenAPI discovery
curl https://typedfetch.dev/.well-known/openapi.json

# Test content negotiation
curl -H "Accept: application/vnd.oai.openapi" https://typedfetch.dev/docs

# Import into tools
# - Postman: Import > Link > https://typedfetch.dev/postman-collection.json
# - Swagger UI: https://petstore.swagger.io/?url=https://typedfetch.dev/openapi.json

Implementation Priority

Phase 1: Critical Fixes (Week 1)

Add sitemap.xml - Essential for SEO
Implement crawler detection - Fix social sharing
Add Vary headers - Prevent cache pollution
Create static fallbacks - Ensure content visibility

Phase 2: Enhancements (Week 2)

Dynamic OG images - Better social previews
Enhanced structured data - Rich search results
Multiple content formats - Markdown, XML support
API discovery endpoints - Developer tools

Phase 3: Optimization (Week 3)

Edge-side rendering - Optimal performance
Smart caching strategies - Reduce server load
Monitoring and analytics - Track improvements
A/B testing - Optimize conversions

Monitoring and Validation

Key Metrics to Track

Search Console: Indexing status, crawl errors
Social shares: Engagement rates, preview quality
Cache hit rates: CDN performance
API usage: Developer adoption

Automated Testing Suite

// tests/seo-validation.test.js
describe('SEO and Social Media', () => {
  test('Crawler receives HTML content', async () => {
    const response = await fetch('/docs/getting-started', {
      headers: { 'User-Agent': 'facebookexternalhit/1.1' }
    });
    const html = await response.text();
    expect(html).toContain('<meta property="og:title"');
  });
  
  test('API returns JSON with correct headers', async () => {
    const response = await fetch('/docs.json');
    expect(response.headers.get('content-type')).toBe('application/json');
    expect(response.headers.get('vary')).toContain('Accept');
  });
});

Conclusion

The current TypedFetch website has a solid foundation but needs enhancements to handle edge cases with social media crawlers, search engines, and content delivery networks. The proposed solutions maintain the clean separation between human and machine interfaces while ensuring all crawlers and tools can access the content they need.

Key improvements focus on:

Server-side rendering for crawlers
Proper content negotiation with caching
Enhanced metadata and structured data
Multiple content format support
API discovery mechanisms

These changes will significantly improve the website's visibility, shareability, and developer experience while maintaining the current architecture's strengths.

17 KiB Raw Blame History

Content Negotiation Edge Cases Report for TypedFetch Website

Executive Summary

Current Architecture Analysis

Strengths

Weaknesses

1. OpenGraph Meta Tags

Current State

Technical Requirements

Issues with Current Setup

Proposed Solutions

Solution A: Server-Side Rendering (Recommended)

Solution B: Pre-rendering Static Pages

Testing Strategy

2. Search Engine Indexing

Technical Requirements

Current Issues

Proposed Solutions

1. Dynamic Sitemap Generation

2. Enhanced Structured Data

Testing Strategy

3. Social Media Preview Issues

Platform-Specific Requirements

Twitter/X

LinkedIn

Discord

WhatsApp

Current Issues

Proposed Solutions

1. Crawler-Specific Responses

2. Dynamic OpenGraph Images

Testing Tools

4. CDN/Proxy Cache Pollution

Current Issues

Technical Requirements

Proposed Solutions

1. Proper Cache Headers

2. CDN Configuration

3. Separate Cache Keys

Testing Strategy

5. API Documentation Discovery

Current State

Technical Requirements

Proposed Solutions

1. OpenAPI Specification

2. Well-Known Discovery

3. Content Type Negotiation

Testing Strategy

Implementation Priority

Phase 1: Critical Fixes (Week 1)

Phase 2: Enhancements (Week 2)

Phase 3: Optimization (Week 3)

Monitoring and Validation

Key Metrics to Track

Automated Testing Suite

Conclusion

17 KiB

Raw Blame History