Markdown is a good starting point because it is portable, readable, and easy to review in Git. It stops being enough when the publishing workflow needs rules the file does not enforce by itself.
The first failures are usually small: two spellings of the same tag, missing descriptions, future dates published early, screenshots with no alt text, and editors asking whether a draft will appear in RSS.
Metadata breaks before prose
The article body can still be fine while the content system around it drifts. Public pages usually need title, description, date, author, draft state, tags, canonical URL, and sometimes image data.
Loose frontmatter becomes visible in many places:
Homepage card: missing or repeated description
RSS: wrong date or draft included
Sitemap: future post included too early
Search: identical summaries make results hard to scan
Social preview: missing image or title truncation
That is a content model problem, not a writing problem.
Add a schema before adding a database
The middle ground is structured frontmatter plus validation. Keep the content in Git, but make the public fields explicit.
const postSchema = z.object({
title: z.string().min(8),
description: z.string().min(50).max(180),
pubDatetime: z.date(),
modDatetime: z.date().optional().nullable(),
author: z.string(),
draft: z.boolean().default(false),
tags: z.array(z.string()).min(1),
cover: z
.object({
src: z.string(),
alt: z.string().min(12),
caption: z.string().optional(),
})
.optional(),
});
This preserves the portability of Markdown while removing the need for editors to remember every YAML convention.
Draft and publish modes should differ
A draft should be easy to save while incomplete. A published article should not leak incomplete metadata. Treat those as different validation modes:
| Field | Draft | Published |
|---|---|---|
| Title | Can be temporary | Must be final and unique enough |
| Description | Can be missing | Required for cards and metadata |
| Tags | Optional | Required for navigation |
| Cover image alt | Optional if no image | Required when image exists |
| Future date | Allowed | Excluded until publish window |
Loose validation everywhere feels friendly until a half-finished article appears in production.
Search reveals weak content modeling
Search is where bad metadata becomes painfully visible. If every post has a similar description, the search results cannot help readers choose. If tags are inconsistent, topic pages become random. If internal working titles reach the public archive, the site becomes hard to scan.
Before adding a more advanced search engine, fix the fields search already depends on: unique titles, specific descriptions, consistent tags, and predictable dates.
Media needs rules too
Images are usually the first asset type to become messy. Someone uploads screenshot-final-new.png, another editor replaces it, and months later nobody knows which file is still referenced.
A small media convention helps:
Uploads live under public/uploads/YYYY/MM/
Article cover images require alt text
Images above 1.5 MB fail validation
Orphaned uploads are reported before publishing
Screenshots use descriptive filenames, not "final2"
Markdown can remain the authoring format. It just needs a content model around it once more than one person depends on the site.