diff --git a/CONTENT_NEGOTIATION_REPORT.md b/CONTENT_NEGOTIATION_REPORT.md new file mode 100644 index 0000000..9fc262c --- /dev/null +++ b/CONTENT_NEGOTIATION_REPORT.md @@ -0,0 +1,616 @@ +# Content Negotiation Edge Cases Report for TypedFetch Website + +## Executive Summary + +The TypedFetch website (typedfetch.dev) currently uses a dual-route approach: +- `/docs` - React SPA for human browsers +- `/docs.json` - JSON API endpoint for programmatic access + +While this separation is clean, it creates several edge cases with social media crawlers, search engines, and CDN caching that need to be addressed for optimal visibility and performance. + +## Current Architecture Analysis + +### Strengths +1. **Clear separation of concerns**: Human-readable HTML vs machine-readable JSON +2. **Good meta tags**: Comprehensive OpenGraph, Twitter Cards, and structured data +3. **AI-friendly setup**: llms.txt and dedicated JSON endpoint +4. **SEO basics covered**: robots.txt, canonical URLs, meta descriptions + +### Weaknesses +1. **No sitemap.xml**: Critical for search engine discovery +2. **Client-side routing**: May cause issues with social media crawlers +3. **Missing server-side rendering**: Crawlers may not execute JavaScript +4. **No cache variation strategy**: CDNs may serve wrong content type +5. **Limited content negotiation**: Only JSON alternative, no markdown support + +--- + +## 1. OpenGraph Meta Tags + +### Current State +- Meta tags are properly set in index.html +- OpenGraph image at `/og-image.png` +- All required properties present + +### Technical Requirements +1. **Facebook Crawler (facebookexternalhit)** + - User-Agent: `facebookexternalhit/1.1` + - Requires server-rendered HTML + - Does NOT execute JavaScript + - Caches aggressively (use Sharing Debugger to refresh) + +2. **Required Meta Tags** + ```html + + + + + + ``` + +### Issues with Current Setup +1. **Single Page Application Problem**: Facebook crawler won't see content from React routes +2. **Generic meta tags**: Same tags for all pages, reducing shareability +3. **No page-specific images**: Could have better visual distinction + +### Proposed Solutions + +#### Solution A: Server-Side Rendering (Recommended) +```javascript +// vercel.json modification +{ + "functions": { + "api/ssr-docs/[...path].js": { + "maxDuration": 10 + } + }, + "rewrites": [ + { + "source": "/docs/:path*", + "destination": "/api/ssr-docs/:path*", + "has": [ + { + "type": "header", + "key": "user-agent", + "value": ".*(facebookexternalhit|LinkedInBot|Twitterbot|Slackbot|WhatsApp|Discordbot).*" + } + ] + } + ] +} +``` + +#### Solution B: Pre-rendering Static Pages +```javascript +// Generate static HTML for each docs page during build +// vite.config.ts addition +export default { + plugins: [ + { + name: 'generate-social-pages', + writeBundle() { + // Generate minimal HTML pages for crawlers + generateSocialPages(); + } + } + ] +} +``` + +### Testing Strategy +```bash +# Test with Facebook Sharing Debugger +curl -A "facebookexternalhit/1.1" https://typedfetch.dev/docs/getting-started + +# Validate with official tool +# https://developers.facebook.com/tools/debug/ +``` + +--- + +## 2. Search Engine Indexing + +### Technical Requirements + +1. **Googlebot Behavior** + - Modern Googlebot executes JavaScript (Chrome 90+) + - Prefers server-rendered content for faster indexing + - Respects `Vary: Accept` header for content negotiation + +2. **Bing/Microsoft Edge** + - Limited JavaScript execution + - Requires proper HTML structure + - Values sitemap.xml highly + +### Current Issues +1. **Missing sitemap.xml**: Essential for discovery +2. **No structured data for docs**: Missing breadcrumbs, article schema +3. **Client-side content**: Delays indexing, may miss content + +### Proposed Solutions + +#### 1. Dynamic Sitemap Generation +```javascript +// api/sitemap.xml.js +export default function handler(req, res) { + const baseUrl = 'https://typedfetch.dev'; + const pages = [ + { url: '/', priority: 1.0, changefreq: 'weekly' }, + { url: '/docs', priority: 0.9, changefreq: 'weekly' }, + { url: '/docs/getting-started', priority: 0.8, changefreq: 'monthly' }, + { url: '/docs/installation', priority: 0.8, changefreq: 'monthly' }, + // ... other pages + ]; + + const sitemap = ` + +${pages.map(page => ` + ${baseUrl}${page.url} + ${page.changefreq} + ${page.priority} + ${new Date().toISOString()} + `).join('\n')} +`; + + res.setHeader('Content-Type', 'application/xml'); + res.setHeader('Cache-Control', 'public, max-age=3600'); + res.status(200).send(sitemap); +} +``` + +#### 2. Enhanced Structured Data +```javascript +// For each documentation page +const structuredData = { + "@context": "https://schema.org", + "@type": "TechArticle", + "headline": pageTitle, + "description": pageDescription, + "author": { + "@type": "Organization", + "name": "Catalyst Labs" + }, + "datePublished": "2024-01-01", + "dateModified": new Date().toISOString(), + "mainEntityOfPage": { + "@type": "WebPage", + "@id": `https://typedfetch.dev${path}` + }, + "breadcrumb": { + "@type": "BreadcrumbList", + "itemListElement": [ + { + "@type": "ListItem", + "position": 1, + "name": "Docs", + "item": "https://typedfetch.dev/docs" + }, + { + "@type": "ListItem", + "position": 2, + "name": pageTitle, + "item": `https://typedfetch.dev${path}` + } + ] + } +}; +``` + +### Testing Strategy +```bash +# Test rendering +curl -A "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" \ + https://typedfetch.dev/docs/getting-started + +# Validate structured data +# https://search.google.com/test/rich-results + +# Check indexing status +# https://search.google.com/search-console +``` + +--- + +## 3. Social Media Preview Issues + +### Platform-Specific Requirements + +#### Twitter/X +- User-Agent: `Twitterbot` +- Requires: `twitter:card`, `twitter:site`, `twitter:creator` +- Supports JavaScript execution (limited) +- Image requirements: 2:1 aspect ratio, min 300x157px + +#### LinkedIn +- User-Agent: `LinkedInBot` +- NO JavaScript execution +- Caches aggressively +- Prefers og:image with 1200x627px + +#### Discord +- User-Agent: `Discordbot` +- NO JavaScript execution +- Embeds based on OpenGraph tags +- Supports multiple images + +#### WhatsApp +- User-Agent: `WhatsApp` +- NO JavaScript execution +- Basic OpenGraph support +- Thumbnail generation from og:image + +### Current Issues +1. **SPA content not visible**: Crawlers can't see React-rendered content +2. **Generic previews**: All pages show same preview +3. **No URL unfurling data**: Missing rich previews for specific pages + +### Proposed Solutions + +#### 1. Crawler-Specific Responses +```javascript +// api/social-preview/[...path].js +export default function handler(req, res) { + const userAgent = req.headers['user-agent'] || ''; + const crawlers = ['facebookexternalhit', 'LinkedInBot', 'Twitterbot', 'Discordbot', 'WhatsApp']; + + const isCrawler = crawlers.some(bot => userAgent.includes(bot)); + + if (isCrawler) { + const path = req.query.path?.join('/') || ''; + const pageData = getPageData(path); + + const html = ` + + + + ${pageData.title} - TypedFetch + + + + + + + + + + + + +

${pageData.title}

+

${pageData.description}

+ +`; + + res.setHeader('Content-Type', 'text/html'); + res.status(200).send(html); + } else { + // Regular users get the React app + res.status(200).sendFile(path.join(__dirname, '../index.html')); + } +} +``` + +#### 2. Dynamic OpenGraph Images +```javascript +// api/og-image/[...path].js +import { ImageResponse } from '@vercel/og'; + +export default function handler(req) { + const { path } = req.query; + const pageTitle = getPageTitle(path); + + return new ImageResponse( + ( +
+

{pageTitle}

+
+ ), + { + width: 1200, + height: 630, + } + ); +} +``` + +### Testing Tools +```bash +# Twitter Card Validator +# https://cards-dev.twitter.com/validator + +# LinkedIn Post Inspector +# https://www.linkedin.com/post-inspector/ + +# Facebook Sharing Debugger +# https://developers.facebook.com/tools/debug/ + +# Discord Embed Visualizer +# https://discohook.org/ +``` + +--- + +## 4. CDN/Proxy Cache Pollution + +### Current Issues +1. **No Vary header**: CDNs can't distinguish content types +2. **Same URL pattern**: `/docs` serves different content based on client +3. **Cache key collision**: JSON and HTML responses cached together + +### Technical Requirements +1. **Cloudflare**: Respects `Vary` header, needs proper cache keys +2. **Vercel Edge**: Built-in caching, needs configuration +3. **Browser caching**: Must handle different content types + +### Proposed Solutions + +#### 1. Proper Cache Headers +```javascript +// Set appropriate Vary headers +export default function handler(req, res) { + const acceptHeader = req.headers.accept || ''; + + // Indicate that response varies by Accept header + res.setHeader('Vary', 'Accept, User-Agent'); + + if (acceptHeader.includes('application/json')) { + res.setHeader('Content-Type', 'application/json'); + res.setHeader('Cache-Control', 'public, max-age=3600, stale-while-revalidate=86400'); + return res.json(data); + } else { + res.setHeader('Content-Type', 'text/html'); + res.setHeader('Cache-Control', 'public, max-age=300, stale-while-revalidate=3600'); + return res.send(html); + } +} +``` + +#### 2. CDN Configuration +```javascript +// vercel.json +{ + "headers": [ + { + "source": "/docs/(.*)", + "headers": [ + { + "key": "Vary", + "value": "Accept, User-Agent" + }, + { + "key": "Cache-Control", + "value": "public, max-age=0, must-revalidate" + } + ] + } + ] +} +``` + +#### 3. Separate Cache Keys +```javascript +// Use different URLs for different content types +// This avoids cache pollution entirely +{ + "rewrites": [ + { + "source": "/docs.json", + "destination": "/api/docs?format=json" + }, + { + "source": "/docs.md", + "destination": "/api/docs?format=markdown" + }, + { + "source": "/docs.xml", + "destination": "/api/docs?format=xml" + } + ] +} +``` + +### Testing Strategy +```bash +# Test cache behavior +curl -H "Accept: application/json" https://typedfetch.dev/docs +curl -H "Accept: text/html" https://typedfetch.dev/docs + +# Check cache headers +curl -I https://typedfetch.dev/docs + +# Test CDN caching +# Use different locations/proxies to verify cache separation +``` + +--- + +## 5. API Documentation Discovery + +### Current State +- JSON endpoint at `/docs.json` +- No OpenAPI/Swagger spec +- No machine-readable API description + +### Technical Requirements +1. **OpenAPI Discovery**: `.well-known/openapi.json` +2. **Postman Collection**: Exportable collection format +3. **Developer Portal**: Interactive API documentation + +### Proposed Solutions + +#### 1. OpenAPI Specification +```javascript +// api/openapi.json +export default function handler(req, res) { + const spec = { + "openapi": "3.0.0", + "info": { + "title": "TypedFetch Documentation API", + "version": "1.0.0", + "description": "API for accessing TypedFetch documentation" + }, + "servers": [ + { + "url": "https://typedfetch.dev" + } + ], + "paths": { + "/docs.json": { + "get": { + "summary": "Get documentation index", + "responses": { + "200": { + "description": "Documentation sections", + "content": { + "application/json": { + "schema": { + "$ref": "#/components/schemas/Documentation" + } + } + } + } + } + } + } + } + }; + + res.setHeader('Content-Type', 'application/json'); + res.json(spec); +} +``` + +#### 2. Well-Known Discovery +```javascript +// public/.well-known/apis.json +{ + "name": "TypedFetch", + "description": "Zero-dependency type-safe HTTP client", + "url": "https://typedfetch.dev", + "apis": [ + { + "name": "Documentation API", + "description": "Access TypedFetch documentation", + "baseURL": "https://typedfetch.dev", + "properties": [ + { + "type": "OpenAPI", + "url": "https://typedfetch.dev/openapi.json" + }, + { + "type": "Postman", + "url": "https://typedfetch.dev/postman-collection.json" + } + ] + } + ] +} +``` + +#### 3. Content Type Negotiation +```javascript +// Enhanced API endpoint with multiple formats +export default function handler(req, res) { + const accept = req.headers.accept || ''; + const format = req.query.format; + + // Format priority: query param > accept header > default + if (format === 'openapi' || accept.includes('application/vnd.oai.openapi')) { + return res.json(generateOpenAPISpec()); + } else if (format === 'postman' || accept.includes('application/vnd.postman')) { + return res.json(generatePostmanCollection()); + } else if (format === 'markdown' || accept.includes('text/markdown')) { + res.setHeader('Content-Type', 'text/markdown'); + return res.send(generateMarkdownDocs()); + } else { + return res.json(docsData); + } +} +``` + +### Testing Strategy +```bash +# Test OpenAPI discovery +curl https://typedfetch.dev/.well-known/openapi.json + +# Test content negotiation +curl -H "Accept: application/vnd.oai.openapi" https://typedfetch.dev/docs + +# Import into tools +# - Postman: Import > Link > https://typedfetch.dev/postman-collection.json +# - Swagger UI: https://petstore.swagger.io/?url=https://typedfetch.dev/openapi.json +``` + +--- + +## Implementation Priority + +### Phase 1: Critical Fixes (Week 1) +1. **Add sitemap.xml** - Essential for SEO +2. **Implement crawler detection** - Fix social sharing +3. **Add Vary headers** - Prevent cache pollution +4. **Create static fallbacks** - Ensure content visibility + +### Phase 2: Enhancements (Week 2) +1. **Dynamic OG images** - Better social previews +2. **Enhanced structured data** - Rich search results +3. **Multiple content formats** - Markdown, XML support +4. **API discovery endpoints** - Developer tools + +### Phase 3: Optimization (Week 3) +1. **Edge-side rendering** - Optimal performance +2. **Smart caching strategies** - Reduce server load +3. **Monitoring and analytics** - Track improvements +4. **A/B testing** - Optimize conversions + +--- + +## Monitoring and Validation + +### Key Metrics to Track +1. **Search Console**: Indexing status, crawl errors +2. **Social shares**: Engagement rates, preview quality +3. **Cache hit rates**: CDN performance +4. **API usage**: Developer adoption + +### Automated Testing Suite +```javascript +// tests/seo-validation.test.js +describe('SEO and Social Media', () => { + test('Crawler receives HTML content', async () => { + const response = await fetch('/docs/getting-started', { + headers: { 'User-Agent': 'facebookexternalhit/1.1' } + }); + const html = await response.text(); + expect(html).toContain(' { + const response = await fetch('/docs.json'); + expect(response.headers.get('content-type')).toBe('application/json'); + expect(response.headers.get('vary')).toContain('Accept'); + }); +}); +``` + +--- + +## Conclusion + +The current TypedFetch website has a solid foundation but needs enhancements to handle edge cases with social media crawlers, search engines, and content delivery networks. The proposed solutions maintain the clean separation between human and machine interfaces while ensuring all crawlers and tools can access the content they need. + +Key improvements focus on: +1. Server-side rendering for crawlers +2. Proper content negotiation with caching +3. Enhanced metadata and structured data +4. Multiple content format support +5. API discovery mechanisms + +These changes will significantly improve the website's visibility, shareability, and developer experience while maintaining the current architecture's strengths. \ No newline at end of file