Content Inventory

Also known as: Content Catalog / Content Census

discoveryexplorationmediumIntermediate

TL;DR

Master document that lists and classifies all information assets of a digital product.

Strategic value

Wakes stakeholders up to the magnitude of existing content and the budget needed to manage it. There is no other way to fully understand the content problems a website faces.

Category: information-architectureEstimated time: 8-16 hours small site (50-100 pages); 16-40 hours medium site (100-500 pages)

What is it

The Content Inventory is a quantitative exercise and master document that consolidates, lists, and classifies all existing information assets in a digital product. It answers the question 'What content do we currently have?'. Not to be confused with a Content Audit: while the inventory is a factual count (what it is and where it lives), the audit is the subsequent step that evaluates content quality.

What it is for

✓Know the real magnitude of existing content in a digital product
✓Prepare CMS migrations preventing page loss
✓Identify obsolete, redundant, or orphaned content
✓Wake up stakeholders to the magnitude of content and the budget needed to manage it

Research methods that feed it

Spidering tools (Screaming Frog, Sitebulb)CMS exportManual navigation reviewGoogle Analytics (traffic data per page)

When to use it

✓At the start of planning a website or application redesign
✓To prepare migration to a new CMS, preventing page loss
✓When merging multiple corporate sites or splitting a very large one
✓For cleaning up junk files, orphaned documents, and obsolete material

When NOT to use it

✗If the site is massive (hundreds of thousands of pages) and budget is tight — do a partial inventory or representative sampling
✗If the digital product is very small (fewer than 20 pages) and its structure is obvious

Required components

✓Identifier (ID) or numbering system: unique code for each content piece reflecting its hierarchy (e.g., 1.0, 1.1, 1.1.2)
✓Page title / Content name: the main title of the document or page
✓URL or Location: the technical address of the content or its path in the hierarchy
✓Format / Content type: specify if it's an HTML page, PDF, video, article, product card, etc.

Optional components

○Owner/Author: person or department responsible for maintenance
○Dates: creation date, last update, or expiration date
○Status: whether it's current, obsolete, or awaiting review
○Metrics (Analytics): page views, ranking, last visit date
○Metadata and Tags: keywords, tags, or associated topics
○Template or CMS: the underlying system or design
○Introductory tab: start sheet with scope, column legend, version, and author

How to create it step by step

1Define scope and purpose: Determine if it will be a total or partial/representative inventory (for very large sites).
2Establish classification strategy: Define spreadsheet columns before starting (ID, title, URL, format, etc.).
3Automated collection (first sweep): Extract base list from CMS or using spidering tools.
4Manual collection (deep dive): Go page by page, starting from home, listing main navigation.
5Build in layers: Don't fill all data at once — one member does URLs, another extracts analytics, another adds metadata.
6Review findings: Short 15-minute meetings to clarify doubts about redundant or outdated content.

Tips for small teams

Start with a partial inventory: only the first 3-4 navigation levels
Use free tools like Screaming Frog (free up to 500 URLs)
Divide the work: one person does URLs, another analytics, another metadata
Use colors in the spreadsheet to code status (current, obsolete, pending)

Common mistakes

✗Delegating everything to automated tools: prevents human understanding of content fragments, context, and redundancy
✗Confusing inventory with strategy: debating 'what should be' during the inventory — the inventory only describes what exists
✗Hierarchical disorder: omitting numbering system and indentation — a flat list makes subsequent use difficult
✗The 'descent into madness' syndrome: one person collecting absolutely everything in one sitting — this work needs to be divided
✗Duplication from cross-links: listing the same page multiple times — each piece should be listed only once at its primary location

Quality criteria

✓Traceability: allows easily tracking any content
✓Clear relationships: reflects parent-child relationships through correct numbering and indentation
✓Visual information design: uses colors to code rows (obsolete vs popular, internal vs external)
✓Summary tab: includes purpose, scope, legend, and document author

Authority quotes

“A content inventory tells you what your content is. A content audit makes recommendations about what your content should be.”
— Universal Methods of Design

“The goal of a quantitative inventory is to learn what you have, where it lives, and a few basic statistics. No frills. Just objective facts.”
— Content Strategy for the Web

“Content inventories are hard and time-consuming, but usually worth the investment. There is no other way to fully understand the content problems a website faces.”
— Communicating Design

Contextualized example

Context: Corporate website migration for an insurance company with 350 pages.

Inventory finding: Of 350 pages, 87 (25%) hadn't been updated in over 2 years. 23 policy PDFs were outdated. 15 discontinued product pages were still indexed on Google.

Action: 45 obsolete pages were removed, 42 were updated with current information, and 301 redirects were created for deleted ones. The inventory saved months of work during the new CMS migration.

Template available

Format: Google Sheets$12 USD

Phase

discoveryexploration

Audience

product-teamdevelopment

Complexity

medium

Experience level

Intermediate

Related deliverables

Free tool by UXR — UX Research Consulting in Chile