From 2a618605b61c9d365d61d677edc5b6319f99ec57 Mon Sep 17 00:00:00 2001 From: Casey Collier Date: Sun, 20 Jul 2025 17:34:28 -0400 Subject: [PATCH] Update content negotiation report with implementation results - Document all solutions implemented for TypedFetch website - Add critical discoveries about cache poisoning and SPA issues - Include lessons learned and article-worthy insights - Provide recommended architecture pattern for others - Transform report from problem analysis to solution reference --- CONTENT_NEGOTIATION_REPORT.md | 121 +++++++++++++++++++++++++++++++--- 1 file changed, 112 insertions(+), 9 deletions(-) diff --git a/CONTENT_NEGOTIATION_REPORT.md b/CONTENT_NEGOTIATION_REPORT.md index 9fc262c..2a5a4c8 100644 --- a/CONTENT_NEGOTIATION_REPORT.md +++ b/CONTENT_NEGOTIATION_REPORT.md @@ -8,6 +8,8 @@ The TypedFetch website (typedfetch.dev) currently uses a dual-route approach: While this separation is clean, it creates several edge cases with social media crawlers, search engines, and CDN caching that need to be addressed for optimal visibility and performance. +**UPDATE: All critical issues have been fixed as of 2025. This document now serves as both a problem analysis and solution reference for similar projects.** + ## Current Architecture Analysis ### Strengths @@ -602,15 +604,116 @@ describe('SEO and Social Media', () => { --- -## Conclusion +## Implementation Results & Lessons Learned -The current TypedFetch website has a solid foundation but needs enhancements to handle edge cases with social media crawlers, search engines, and content delivery networks. The proposed solutions maintain the clean separation between human and machine interfaces while ensuring all crawlers and tools can access the content they need. +### What We Actually Built -Key improvements focus on: -1. Server-side rendering for crawlers -2. Proper content negotiation with caching -3. Enhanced metadata and structured data -4. Multiple content format support -5. API discovery mechanisms +After discovering these issues, we implemented the following solutions: -These changes will significantly improve the website's visibility, shareability, and developer experience while maintaining the current architecture's strengths. \ No newline at end of file +#### 1. **URL-Based Content Negotiation (Not Header-Based)** +**Why**: Facebook was caching JSON responses and serving them to human users when links were shared. + +**Solution**: +- `/docs` → Always serves HTML (React app) +- `/docs.json` → Always serves JSON +- No more `Accept` header detection for main routes + +**Learning**: Social media platforms often cache the first response they get. If that's JSON (because they sent `Accept: */*`), human users clicking the shared link get JSON too. URL-based separation prevents this cache poisoning. + +#### 2. **Server-Side Rendering for Crawlers Only** +**Implementation**: Created `/api/ssr/[...path].js` that detects crawler User-Agents and serves pre-rendered HTML. + +```javascript +// Crawler gets HTML with meta tags +if (userAgent.match(/facebookexternalhit|LinkedInBot|Twitterbot/)) { + return serverRenderedHTML; +} +// Humans get redirected to React app +return redirect('/'); +``` + +**Learning**: Most social media crawlers don't execute JavaScript. They need server-rendered HTML with OpenGraph tags in the initial response. + +#### 3. **Dynamic Sitemap Generation** +**Implementation**: `/api/sitemap.xml.js` generates sitemap on-demand. + +**Learning**: Static sitemaps get outdated. Dynamic generation ensures search engines always get current page listings. + +#### 4. **Proper Cache Headers** +**Implementation**: Added `Vary: Accept, User-Agent` headers to all dynamic responses. + +**Learning**: Without `Vary` headers, CDNs serve the wrong content type. One request for JSON poisons the cache for all HTML requests. + +### Critical Discoveries + +#### 1. **The Facebook Cache Poisoning Problem** +When we first launched, sharing `typedfetch.dev/docs` on Facebook showed JSON to users. The root cause: +1. Facebook crawler requested the page +2. Our content negotiation saw `Accept: */*` and returned JSON +3. Facebook cached this response +4. Human users clicking the link got the cached JSON + +**Solution**: Separate URLs for different content types. This is why major APIs use `/api/v1/` prefixes instead of content negotiation. + +#### 2. **Single Page Applications Break Social Sharing** +SPAs render content client-side, but social media crawlers need server-side HTML. Our initial React-only approach meant: +- No page-specific titles in shares +- Generic descriptions for all pages +- Missing preview images + +**Solution**: Crawler-specific server-side rendering. Human users still get the fast SPA experience. + +#### 3. **Search Engines Need More Than You Think** +Modern Google can execute JavaScript, but: +- It's slower to index +- Other search engines may not +- Structured data requires specific formats +- Sitemaps are still critical + +### Performance Impact + +The fixes had minimal performance impact: +- **Human users**: No change (still get React SPA) +- **Crawlers**: Get lightweight HTML (~5KB) +- **API clients**: Direct JSON access +- **CDN efficiency**: Improved with proper caching + +### Article-Worthy Insights + +1. **Content Negotiation is a Footgun**: While theoretically elegant, content negotiation causes real-world problems with caches, CDNs, and social media platforms. URL-based content types are more reliable. + +2. **Crawlers Are Not Browsers**: Assuming crawlers behave like modern browsers is a mistake. Many don't execute JavaScript, respect different headers, or cache aggressively. + +3. **Test With Real Tools**: The Facebook Sharing Debugger, Twitter Card Validator, and Google Search Console reveal issues that local testing misses. + +4. **Cache Headers Matter More Than You Think**: A missing `Vary` header can break your entire site for some users. CDNs and proxies need explicit instructions. + +5. **Developer Experience vs. Crawler Experience**: These often conflict. Developers want React SPAs, crawlers want server-rendered HTML. The solution is to serve both based on User-Agent detection. + +### Recommended Architecture Pattern + +For modern web apps that need good SEO and social sharing: + +``` +/ → React SPA (humans) +/docs/[page] → React SPA (humans) + → SSR HTML (crawlers) +/api/docs.json → JSON API (developers) +/api/openapi.json → OpenAPI spec +/sitemap.xml → Dynamic sitemap +/robots.txt → Crawler instructions +/llms.txt → AI documentation +``` + +This pattern provides: +- Fast SPA experience for users +- Proper meta tags for social sharing +- SEO-friendly content for search engines +- Clean API for developers +- AI/LLM discoverability + +### Final Thoughts + +The web's infrastructure (CDNs, social media platforms, search engines) wasn't designed with SPAs in mind. Modern websites need to bridge this gap by serving different content to different clients. The key is doing this without sacrificing performance or developer experience. + +These solutions transformed TypedFetch from having broken social sharing to having perfect previews on all platforms, while maintaining the clean architecture and performance users expect. \ No newline at end of file