The Build Ledger Search articles
Back to articles

When Markdown Files Stop Being Enough for a Content Site

Markdown is a good starting point, but workflow, preview, search, and governance eventually ask for a stronger content model.

Markdown is a good starting point because it is portable, readable, and easy to review in Git. It stops being enough when the publishing workflow needs rules the file does not enforce by itself.

The first failures are usually small: two spellings of the same tag, missing descriptions, future dates published early, screenshots with no alt text, and editors asking whether a draft will appear in RSS.

Metadata breaks before prose

The article body can still be fine while the content system around it drifts. Public pages usually need title, description, date, author, draft state, tags, canonical URL, and sometimes image data.

Loose frontmatter becomes visible in many places:

Homepage card: missing or repeated description
RSS: wrong date or draft included
Sitemap: future post included too early
Search: identical summaries make results hard to scan
Social preview: missing image or title truncation

That is a content model problem, not a writing problem.

Add a schema before adding a database

The middle ground is structured frontmatter plus validation. Keep the content in Git, but make the public fields explicit.

const postSchema = z.object({
  title: z.string().min(8),
  description: z.string().min(50).max(180),
  pubDatetime: z.date(),
  modDatetime: z.date().optional().nullable(),
  author: z.string(),
  draft: z.boolean().default(false),
  tags: z.array(z.string()).min(1),
  cover: z
    .object({
      src: z.string(),
      alt: z.string().min(12),
      caption: z.string().optional(),
    })
    .optional(),
});

This preserves the portability of Markdown while removing the need for editors to remember every YAML convention.

Draft and publish modes should differ

A draft should be easy to save while incomplete. A published article should not leak incomplete metadata. Treat those as different validation modes:

FieldDraftPublished
TitleCan be temporaryMust be final and unique enough
DescriptionCan be missingRequired for cards and metadata
TagsOptionalRequired for navigation
Cover image altOptional if no imageRequired when image exists
Future dateAllowedExcluded until publish window

Loose validation everywhere feels friendly until a half-finished article appears in production.

Search reveals weak content modeling

Search is where bad metadata becomes painfully visible. If every post has a similar description, the search results cannot help readers choose. If tags are inconsistent, topic pages become random. If internal working titles reach the public archive, the site becomes hard to scan.

Before adding a more advanced search engine, fix the fields search already depends on: unique titles, specific descriptions, consistent tags, and predictable dates.

Media needs rules too

Images are usually the first asset type to become messy. Someone uploads screenshot-final-new.png, another editor replaces it, and months later nobody knows which file is still referenced.

A small media convention helps:

Uploads live under public/uploads/YYYY/MM/
Article cover images require alt text
Images above 1.5 MB fail validation
Orphaned uploads are reported before publishing
Screenshots use descriptive filenames, not "final2"

Markdown can remain the authoring format. It just needs a content model around it once more than one person depends on the site.