Sean Goedecke

Sean Goedecke's blog. Sean mostly writes about AI and large-company dynamics.

Try:

Sean Goedecke4 days ago

Only three kinds of AI products actually work

The author argues that only three LLM product patterns reliably work today: chatbots (ubiquitous but commoditized and often outcompeted by major labs), completion/autocomplete products like GitHub Copilot (which integrate into workflows without changing user interfaces), and agentic systems (currently thriving in coding because actions can be verified via tests). The piece explains why generic chatbots struggle, why completions and coding agents succeed, and surveys emerging areas—research agents, AI-generated feeds, and AI-assisted games—while noting image generation currently feels more toy-like than a standalone product.

Sean Goedecke6 days ago

Writing for AIs is a good way to reach more humans

The author argues that writing with the goal of being included in AI training data is a reasonable way to amplify ideas to more humans: write more, publish where content can be scraped (avoid paywalls and JS-only rendering), but don't change your voice just to please AIs. He frames posts as “ritual objects” that seed conversations and says representation in LLM training sets can help spread your ideas, while cautioning that writing for money or purely for art are different motives.

Sean Goedecke1 week ago

To get better at technical writing, lower your expectations

Advice for software engineers on technical writing: keep prose as short as possible, frontload key points, and lower expectations about how much understanding writing can convey. Use one-sentence summaries for broad audiences, and reserve detailed ADRs for small, technical audiences. Good writing provides limited but high-leverage clarity in large organizations.

Sean Goedecke2 weeks ago

Is it worrying that 95% of AI enterprise projects fail?

The author critiques the headline "95% of enterprise AI projects fail" from MIT NANDA, arguing the rate is comparable to historical enterprise IT failure rates, is amplified by strict success definitions and immature tooling, and may be overstated by limited methodology. AI projects are young and complex, so failures are expected; long-term value will likely emerge via shadow IT and pre-built enterprise tools even after an AI bubble burst.

Sean Goedecke3 weeks ago

Why do AI models use so many em-dashes?

The post investigates why LLMs overuse em-dashes by evaluating several hypotheses—model tokenization/brevity biases, RLHF annotator dialect or preference, and training-data composition. After measuring punctuation frequencies and reviewing model/version changes, the author argues the most plausible cause is increased reliance on digitized late‑19th/early‑20th‑century print books (which use more em‑dashes) in high-quality training corpora, while noting RLHF and other factors may also contribute.

Sean Goedecke3 weeks ago

Mistakes I see engineers making in their code reviews

An opinionated guide to better code reviews: focus beyond the diff by considering the broader codebase and consistency, keep reviews concise (avoid dozens of trivial comments), review with a "will this work" filter instead of imposing your personal style, use blocking reviews when you genuinely want to prevent a merge, and generally bias toward approving changes to avoid excessive gatekeeping. The post also notes special considerations for AI-generated PRs and team incentive mismatches that cause blocking behavior.

Sean Goedecke1 month ago

Should LLMs just treat text content as an image?

The author discusses DeepSeek's OCR finding and the concept of 'optical compression'—using images of text as model input because image tokens can be more information-dense than text tokens. The post explains why image embeddings might encode multiple text tokens, explores potential efficiency and long-context benefits, and highlights training, evaluation, and practical caveats.

Sean Goedecke1 month ago

What have we learned about building agentic AI tools?

A practical, experience-based set of lessons for building agentic LLM-powered coding tools: prefer a plan-then-act flow, use a small but composable toolset (user plugins), support nested per-chat rule files, allow mid-run steering and queued commands, expose slash commands, and favor string search over RAG for code navigation. The post emphasizes tuning tools to specific models and treating agentic tooling like standard software engineering.

Sean Goedecke1 month ago

We are in the "gentleman scientist" era of AI research

Opinion piece arguing that modern AI research resembles a "gentleman scientist" era: many influential advances are simple or older ideas re-applied to LLMs, and a lot of capability discovery comes from informal experimentation. The author uses examples (GRPO in RL, Anthropic's "skills", Recursive Language Models) to show how accessible experiments and modest code-level ideas can produce meaningful research results, and encourages more of this exploratory work.

Sean Goedecke1 month ago

Providing technical clarity to non-technical leaders

A staff engineer explains that their most valuable role is providing "technical clarity" to non-technical leaders by translating complex system trade-offs into clear, actionable recommendations. The post covers why clarity is rare (large codebase complexity and hidden interactions), the staff-engineer-as-technical-advisor role, the tension between communicating uncertainty and giving decisive guidance, and the skills required: deep system knowledge, good judgment about which risks to surface, and the confidence to commit to recommendations.

Sean Goedecke1 month ago

GPT-5-Codex is a better AI researcher than me

An account of using GPT-5-Codex to automate quick, laptop-scale language-model research on the TinyStories dataset: the author ran many n-gram and tiny-transformer experiments, found that shallow-fusion optimizations reduced perplexity but harmed generation quality, and discovered that briefly distilling a transformer from an n-gram teacher produced the most coherent five-minute-trained stories. The post also describes the agentic Codex-driven workflow, practical compute/sandbox constraints, and trade-offs between metrics and perceived quality.

Sean Goedecke1 month ago

How I influence tech company politics as a staff software engineer

Advice for staff engineers on influencing company politics by aligning technical work with executive priorities: make high-profile projects successful, prepare multiple technical programs to match organizational "waves," time proposals to current priorities, and maintain a backlog of well-formed ideas so you can steer funded work toward better technical outcomes.

Sean Goedecke1 month ago

What is "good taste" in software engineering?

An essay distinguishing technical taste from technical skill: taste is the ability to prioritize the engineering values (readability, correctness, speed, resiliency, scalability, etc.) that best fit a specific project. The author explains how disagreements often reflect different value rankings, how bad taste is inflexibility or misfit with project needs, and recommends gaining varied experience and flexibility to develop better taste.

Sean Goedecke1 month ago

AI coding agents rely too much on fallbacks

The author argues that AI coding agents frequently insert silent fallback code paths (simpler algorithms or placeholders) when asked to implement specific approaches, which can hide whether the intended algorithm is actually being used and undermines prototyping and testing; they suspect an RL training artifact and recommend agents avoid silent fallbacks.

Sean Goedecke1 month ago

Endless AI-generated Wikipedia

A technical write-up of EndlessWiki: an infinite, AI-generated encyclopedia that generates pages on first request using a large LLM (Kimi K2 via Groq). The system is a Go server with a MySQL pages table; if a requested page doesn't exist it's generated by the model and stored. The post discusses model choice, inference latency and cost, server-side link validation to prevent 'cheating', and simple crawler mitigation.

Sean Goedecke1 month ago

What I learned building an AI-driven spaced repetition app

A developer describes building AutoDeck, an LLM-driven spaced-repetition flashcard app. The post covers the idea of AI-generated adaptive flashcards, engineering trade-offs for generating an infinite feed (batching, parallelism, streaming XML chunks, background generation and client polling), experiences using coding agents (OpenAI Codex vs Claude Code), model selection and latency issues, and the economics of paying for inference.

Sean Goedecke2 months ago

If you are good at code review, you will be good at using AI agents

The author argues that effective use of agentic AI coding tools is largely a matter of good code review: humans must provide structural, architecture-focused guidance to prevent LLMs from over-engineering solutions. Through examples (a PWA/data-access case, unnecessary background job design) the post advises privileging simple, existing systems and criticizes both nitpicky and rubber-stamp review styles. The result: being skilled at structural code review makes you better at using AI agents.

Sean Goedecke2 months ago

AI is good news for Australian and European software engineers

Argues that because LLMs are hosted in a few datacenters and suffer capacity/quality issues during US peak hours (including possible quantization), American tech firms have an incentive to hire engineers in Australia and Europe so AI-assisted work can continue off-peak, improving throughput, reliability, and overnight responsiveness.

Sean Goedecke2 months ago

The whole point of OpenAI's Responses API is to help them hide reasoning traces

The post argues OpenAI's stateful Responses API exists primarily so OpenAI can retain private chain-of-thought reasoning traces on their backend, enabling full model capability only when you use their stateful API; the author criticizes the marketing framing and calls for more transparency.

Sean Goedecke2 months ago

'Make invalid states unrepresentable' considered harmful

The author argues that enforcing every domain-model invariant as an absolute constraint ("make invalid states unrepresentable") often causes operational and evolutionary pain. Using examples — state machines, foreign key constraints, and Protocol Buffers required fields — they show that hard constraints can make schema changes, upgrades, and exceptional one-off fixes difficult. The recommendation is to prefer softer, removable constraints where appropriate and let application logic tolerate some invalid or incomplete states to keep systems flexible and evolvable.