- Add dynamic sitemap.xml generation for search engines - Implement server-side rendering for social media crawlers - Add Vary headers to prevent CDN cache pollution - Provide page-specific meta tags for better social previews - Fix Facebook/Twitter/LinkedIn sharing with proper HTML responses Crawlers now get proper HTML with OpenGraph tags while humans get React app
17 KiB
Content Negotiation Edge Cases Report for TypedFetch Website
Executive Summary
The TypedFetch website (typedfetch.dev) currently uses a dual-route approach:
/docs
- React SPA for human browsers/docs.json
- JSON API endpoint for programmatic access
While this separation is clean, it creates several edge cases with social media crawlers, search engines, and CDN caching that need to be addressed for optimal visibility and performance.
Current Architecture Analysis
Strengths
- Clear separation of concerns: Human-readable HTML vs machine-readable JSON
- Good meta tags: Comprehensive OpenGraph, Twitter Cards, and structured data
- AI-friendly setup: llms.txt and dedicated JSON endpoint
- SEO basics covered: robots.txt, canonical URLs, meta descriptions
Weaknesses
- No sitemap.xml: Critical for search engine discovery
- Client-side routing: May cause issues with social media crawlers
- Missing server-side rendering: Crawlers may not execute JavaScript
- No cache variation strategy: CDNs may serve wrong content type
- Limited content negotiation: Only JSON alternative, no markdown support
1. OpenGraph Meta Tags
Current State
- Meta tags are properly set in index.html
- OpenGraph image at
/og-image.png
- All required properties present
Technical Requirements
-
Facebook Crawler (facebookexternalhit)
- User-Agent:
facebookexternalhit/1.1
- Requires server-rendered HTML
- Does NOT execute JavaScript
- Caches aggressively (use Sharing Debugger to refresh)
- User-Agent:
-
Required Meta Tags
<meta property="og:url" content="https://typedfetch.dev/[FULL_URL]" /> <meta property="og:type" content="article" /> <!-- for docs pages --> <meta property="og:title" content="[PAGE_TITLE]" /> <meta property="og:description" content="[PAGE_DESCRIPTION]" /> <meta property="og:image" content="https://typedfetch.dev/og-image.png" />
Issues with Current Setup
- Single Page Application Problem: Facebook crawler won't see content from React routes
- Generic meta tags: Same tags for all pages, reducing shareability
- No page-specific images: Could have better visual distinction
Proposed Solutions
Solution A: Server-Side Rendering (Recommended)
// vercel.json modification
{
"functions": {
"api/ssr-docs/[...path].js": {
"maxDuration": 10
}
},
"rewrites": [
{
"source": "/docs/:path*",
"destination": "/api/ssr-docs/:path*",
"has": [
{
"type": "header",
"key": "user-agent",
"value": ".*(facebookexternalhit|LinkedInBot|Twitterbot|Slackbot|WhatsApp|Discordbot).*"
}
]
}
]
}
Solution B: Pre-rendering Static Pages
// Generate static HTML for each docs page during build
// vite.config.ts addition
export default {
plugins: [
{
name: 'generate-social-pages',
writeBundle() {
// Generate minimal HTML pages for crawlers
generateSocialPages();
}
}
]
}
Testing Strategy
# Test with Facebook Sharing Debugger
curl -A "facebookexternalhit/1.1" https://typedfetch.dev/docs/getting-started
# Validate with official tool
# https://developers.facebook.com/tools/debug/
2. Search Engine Indexing
Technical Requirements
-
Googlebot Behavior
- Modern Googlebot executes JavaScript (Chrome 90+)
- Prefers server-rendered content for faster indexing
- Respects
Vary: Accept
header for content negotiation
-
Bing/Microsoft Edge
- Limited JavaScript execution
- Requires proper HTML structure
- Values sitemap.xml highly
Current Issues
- Missing sitemap.xml: Essential for discovery
- No structured data for docs: Missing breadcrumbs, article schema
- Client-side content: Delays indexing, may miss content
Proposed Solutions
1. Dynamic Sitemap Generation
// api/sitemap.xml.js
export default function handler(req, res) {
const baseUrl = 'https://typedfetch.dev';
const pages = [
{ url: '/', priority: 1.0, changefreq: 'weekly' },
{ url: '/docs', priority: 0.9, changefreq: 'weekly' },
{ url: '/docs/getting-started', priority: 0.8, changefreq: 'monthly' },
{ url: '/docs/installation', priority: 0.8, changefreq: 'monthly' },
// ... other pages
];
const sitemap = `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
${pages.map(page => ` <url>
<loc>${baseUrl}${page.url}</loc>
<changefreq>${page.changefreq}</changefreq>
<priority>${page.priority}</priority>
<lastmod>${new Date().toISOString()}</lastmod>
</url>`).join('\n')}
</urlset>`;
res.setHeader('Content-Type', 'application/xml');
res.setHeader('Cache-Control', 'public, max-age=3600');
res.status(200).send(sitemap);
}
2. Enhanced Structured Data
// For each documentation page
const structuredData = {
"@context": "https://schema.org",
"@type": "TechArticle",
"headline": pageTitle,
"description": pageDescription,
"author": {
"@type": "Organization",
"name": "Catalyst Labs"
},
"datePublished": "2024-01-01",
"dateModified": new Date().toISOString(),
"mainEntityOfPage": {
"@type": "WebPage",
"@id": `https://typedfetch.dev${path}`
},
"breadcrumb": {
"@type": "BreadcrumbList",
"itemListElement": [
{
"@type": "ListItem",
"position": 1,
"name": "Docs",
"item": "https://typedfetch.dev/docs"
},
{
"@type": "ListItem",
"position": 2,
"name": pageTitle,
"item": `https://typedfetch.dev${path}`
}
]
}
};
Testing Strategy
# Test rendering
curl -A "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" \
https://typedfetch.dev/docs/getting-started
# Validate structured data
# https://search.google.com/test/rich-results
# Check indexing status
# https://search.google.com/search-console
3. Social Media Preview Issues
Platform-Specific Requirements
Twitter/X
- User-Agent:
Twitterbot
- Requires:
twitter:card
,twitter:site
,twitter:creator
- Supports JavaScript execution (limited)
- Image requirements: 2:1 aspect ratio, min 300x157px
- User-Agent:
LinkedInBot
- NO JavaScript execution
- Caches aggressively
- Prefers og:image with 1200x627px
Discord
- User-Agent:
Discordbot
- NO JavaScript execution
- Embeds based on OpenGraph tags
- Supports multiple images
- User-Agent:
WhatsApp
- NO JavaScript execution
- Basic OpenGraph support
- Thumbnail generation from og:image
Current Issues
- SPA content not visible: Crawlers can't see React-rendered content
- Generic previews: All pages show same preview
- No URL unfurling data: Missing rich previews for specific pages
Proposed Solutions
1. Crawler-Specific Responses
// api/social-preview/[...path].js
export default function handler(req, res) {
const userAgent = req.headers['user-agent'] || '';
const crawlers = ['facebookexternalhit', 'LinkedInBot', 'Twitterbot', 'Discordbot', 'WhatsApp'];
const isCrawler = crawlers.some(bot => userAgent.includes(bot));
if (isCrawler) {
const path = req.query.path?.join('/') || '';
const pageData = getPageData(path);
const html = `<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>${pageData.title} - TypedFetch</title>
<meta property="og:title" content="${pageData.title}" />
<meta property="og:description" content="${pageData.description}" />
<meta property="og:url" content="https://typedfetch.dev/docs/${path}" />
<meta property="og:image" content="https://typedfetch.dev/og-image.png" />
<meta property="og:type" content="article" />
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:title" content="${pageData.title}" />
<meta name="twitter:description" content="${pageData.description}" />
<meta name="twitter:image" content="https://typedfetch.dev/og-image.png" />
</head>
<body>
<h1>${pageData.title}</h1>
<p>${pageData.description}</p>
</body>
</html>`;
res.setHeader('Content-Type', 'text/html');
res.status(200).send(html);
} else {
// Regular users get the React app
res.status(200).sendFile(path.join(__dirname, '../index.html'));
}
}
2. Dynamic OpenGraph Images
// api/og-image/[...path].js
import { ImageResponse } from '@vercel/og';
export default function handler(req) {
const { path } = req.query;
const pageTitle = getPageTitle(path);
return new ImageResponse(
(
<div style={{
background: 'linear-gradient(to right, #8b5cf6, #3b82f6)',
width: '100%',
height: '100%',
display: 'flex',
alignItems: 'center',
justifyContent: 'center',
}}>
<h1 style={{ color: 'white', fontSize: 60 }}>{pageTitle}</h1>
</div>
),
{
width: 1200,
height: 630,
}
);
}
Testing Tools
# Twitter Card Validator
# https://cards-dev.twitter.com/validator
# LinkedIn Post Inspector
# https://www.linkedin.com/post-inspector/
# Facebook Sharing Debugger
# https://developers.facebook.com/tools/debug/
# Discord Embed Visualizer
# https://discohook.org/
4. CDN/Proxy Cache Pollution
Current Issues
- No Vary header: CDNs can't distinguish content types
- Same URL pattern:
/docs
serves different content based on client - Cache key collision: JSON and HTML responses cached together
Technical Requirements
- Cloudflare: Respects
Vary
header, needs proper cache keys - Vercel Edge: Built-in caching, needs configuration
- Browser caching: Must handle different content types
Proposed Solutions
1. Proper Cache Headers
// Set appropriate Vary headers
export default function handler(req, res) {
const acceptHeader = req.headers.accept || '';
// Indicate that response varies by Accept header
res.setHeader('Vary', 'Accept, User-Agent');
if (acceptHeader.includes('application/json')) {
res.setHeader('Content-Type', 'application/json');
res.setHeader('Cache-Control', 'public, max-age=3600, stale-while-revalidate=86400');
return res.json(data);
} else {
res.setHeader('Content-Type', 'text/html');
res.setHeader('Cache-Control', 'public, max-age=300, stale-while-revalidate=3600');
return res.send(html);
}
}
2. CDN Configuration
// vercel.json
{
"headers": [
{
"source": "/docs/(.*)",
"headers": [
{
"key": "Vary",
"value": "Accept, User-Agent"
},
{
"key": "Cache-Control",
"value": "public, max-age=0, must-revalidate"
}
]
}
]
}
3. Separate Cache Keys
// Use different URLs for different content types
// This avoids cache pollution entirely
{
"rewrites": [
{
"source": "/docs.json",
"destination": "/api/docs?format=json"
},
{
"source": "/docs.md",
"destination": "/api/docs?format=markdown"
},
{
"source": "/docs.xml",
"destination": "/api/docs?format=xml"
}
]
}
Testing Strategy
# Test cache behavior
curl -H "Accept: application/json" https://typedfetch.dev/docs
curl -H "Accept: text/html" https://typedfetch.dev/docs
# Check cache headers
curl -I https://typedfetch.dev/docs
# Test CDN caching
# Use different locations/proxies to verify cache separation
5. API Documentation Discovery
Current State
- JSON endpoint at
/docs.json
- No OpenAPI/Swagger spec
- No machine-readable API description
Technical Requirements
- OpenAPI Discovery:
.well-known/openapi.json
- Postman Collection: Exportable collection format
- Developer Portal: Interactive API documentation
Proposed Solutions
1. OpenAPI Specification
// api/openapi.json
export default function handler(req, res) {
const spec = {
"openapi": "3.0.0",
"info": {
"title": "TypedFetch Documentation API",
"version": "1.0.0",
"description": "API for accessing TypedFetch documentation"
},
"servers": [
{
"url": "https://typedfetch.dev"
}
],
"paths": {
"/docs.json": {
"get": {
"summary": "Get documentation index",
"responses": {
"200": {
"description": "Documentation sections",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/Documentation"
}
}
}
}
}
}
}
}
};
res.setHeader('Content-Type', 'application/json');
res.json(spec);
}
2. Well-Known Discovery
// public/.well-known/apis.json
{
"name": "TypedFetch",
"description": "Zero-dependency type-safe HTTP client",
"url": "https://typedfetch.dev",
"apis": [
{
"name": "Documentation API",
"description": "Access TypedFetch documentation",
"baseURL": "https://typedfetch.dev",
"properties": [
{
"type": "OpenAPI",
"url": "https://typedfetch.dev/openapi.json"
},
{
"type": "Postman",
"url": "https://typedfetch.dev/postman-collection.json"
}
]
}
]
}
3. Content Type Negotiation
// Enhanced API endpoint with multiple formats
export default function handler(req, res) {
const accept = req.headers.accept || '';
const format = req.query.format;
// Format priority: query param > accept header > default
if (format === 'openapi' || accept.includes('application/vnd.oai.openapi')) {
return res.json(generateOpenAPISpec());
} else if (format === 'postman' || accept.includes('application/vnd.postman')) {
return res.json(generatePostmanCollection());
} else if (format === 'markdown' || accept.includes('text/markdown')) {
res.setHeader('Content-Type', 'text/markdown');
return res.send(generateMarkdownDocs());
} else {
return res.json(docsData);
}
}
Testing Strategy
# Test OpenAPI discovery
curl https://typedfetch.dev/.well-known/openapi.json
# Test content negotiation
curl -H "Accept: application/vnd.oai.openapi" https://typedfetch.dev/docs
# Import into tools
# - Postman: Import > Link > https://typedfetch.dev/postman-collection.json
# - Swagger UI: https://petstore.swagger.io/?url=https://typedfetch.dev/openapi.json
Implementation Priority
Phase 1: Critical Fixes (Week 1)
- Add sitemap.xml - Essential for SEO
- Implement crawler detection - Fix social sharing
- Add Vary headers - Prevent cache pollution
- Create static fallbacks - Ensure content visibility
Phase 2: Enhancements (Week 2)
- Dynamic OG images - Better social previews
- Enhanced structured data - Rich search results
- Multiple content formats - Markdown, XML support
- API discovery endpoints - Developer tools
Phase 3: Optimization (Week 3)
- Edge-side rendering - Optimal performance
- Smart caching strategies - Reduce server load
- Monitoring and analytics - Track improvements
- A/B testing - Optimize conversions
Monitoring and Validation
Key Metrics to Track
- Search Console: Indexing status, crawl errors
- Social shares: Engagement rates, preview quality
- Cache hit rates: CDN performance
- API usage: Developer adoption
Automated Testing Suite
// tests/seo-validation.test.js
describe('SEO and Social Media', () => {
test('Crawler receives HTML content', async () => {
const response = await fetch('/docs/getting-started', {
headers: { 'User-Agent': 'facebookexternalhit/1.1' }
});
const html = await response.text();
expect(html).toContain('<meta property="og:title"');
});
test('API returns JSON with correct headers', async () => {
const response = await fetch('/docs.json');
expect(response.headers.get('content-type')).toBe('application/json');
expect(response.headers.get('vary')).toContain('Accept');
});
});
Conclusion
The current TypedFetch website has a solid foundation but needs enhancements to handle edge cases with social media crawlers, search engines, and content delivery networks. The proposed solutions maintain the clean separation between human and machine interfaces while ensuring all crawlers and tools can access the content they need.
Key improvements focus on:
- Server-side rendering for crawlers
- Proper content negotiation with caching
- Enhanced metadata and structured data
- Multiple content format support
- API discovery mechanisms
These changes will significantly improve the website's visibility, shareability, and developer experience while maintaining the current architecture's strengths.