koira
Sign inGet access →
voice searchspeakable schemalocal seo

Voice Search Optimization: Beyond Speakable Schema

KOIRA Team8 min read1,504 words
Diagram showing the voice search pipeline from spoken query through featured snippet to speakable schema markup on a small business website
Intro
Breakdown
Solution
FAQ
◆ Key takeaways
  • Speakable schema (SpeakableSpecification) flags specific content blocks for text-to-speech; without it, voice assistants guess — and usually guess wrong.
  • Featured snippets feed most smart-speaker answers, so snippet optimisation and speakable schema are the same pipeline, not two separate tasks.
  • Conversational, question-first headers (Who, What, Where, How, Why) dramatically increase the odds your content matches natural-language voice queries.
  • Local voice searches ('open now near me', 'best dentist in [city]') pull from your Google Business Profile first — structured data on your site is the backup signal.
  • Page speed under 2 seconds is a hard prerequisite; voice assistants skip slow pages because they can't pause for a loading spinner.
  • Optimising for voice is cheaper than paid search and compounds over time — one well-structured FAQ page can earn dozens of spoken citations.

The Problem With Treating Voice Search as a Checkbox

Most guides on voice search end at "add speakable schema and you're done." That's like buying running shoes and calling yourself a marathoner. The schema markup matters — we'll explain exactly how to implement it — but the bigger opportunity is in how you write, structure, and publish content that voice assistants want to read.

Voice queries are growing faster than text queries in local and informational categories. When someone asks their Google Nest, "What time does [business] close?" or "How do I remove rust from cast iron?" — they are not typing a keyword. They are talking to something they expect to understand them. If your site is built only for scannable text and keyword density, you are invisible to that interaction.

This guide covers the full stack: what speakable schema actually does (not just what it is), why featured snippets are the real pipeline for voice answers, how to write content that earns those snippets, and what local businesses specifically need in place before any of the above matters.


What Speakable Schema Actually Does

Speakable schema is a type of structured data (using the Schema.org SpeakableSpecification) that explicitly tells voice assistants: these specific sections of this page are suitable for text-to-speech playback. You add it to your page's JSON-LD block, referencing either CSS selectors or XPath expressions that point to the text you want spoken.

Without it, a voice assistant has to guess which paragraph is the right answer. It might grab a navigation label, a copyright notice, or a sentence fragment. With speakable markup, you're handing the assistant a highlighter and saying, "Read this part."

Google's documentation targets speakable schema primarily at news articles and broadcast content, but the underlying mechanism works on any page type. The practical effect for a small business is that your FAQ answer, your service description, or your "about" paragraph becomes a clean, speakable candidate in the assistant's index.

Here's a minimal JSON-LD implementation:

{
  "@context": "https://schema.org",
  "@type": "WebPage",
  "name": "How to Remove Rust From Cast Iron",
  "speakable": {
    "@type": "SpeakableSpecification",
    "cssSelector": [".answer-block", "h2.answer-heading"]
  }
}

That's it. Point cssSelector at the elements that contain your clean, spoken answer. Keep those elements to 2–3 sentences. Longer passages get truncated or skipped entirely.


Featured Snippets Are the Voice Answer Pipeline

This is the part most guides miss. The majority of smart-speaker answers are pulled directly from featured snippets — the boxed answer at position zero in Google Search. If you want voice traffic, you need to understand that these are not two separate optimisation tasks. They are one task.

The path looks like this:

  1. A user asks a natural-language question
  2. Google (or the underlying search engine powering the assistant) identifies the best answer
  3. If a featured snippet exists for that query, it is the answer — either displayed on screen or read aloud
  4. Speakable schema helps Google identify which part of your page to use when it's deciding between multiple snippet candidates

So your workflow is: earn the snippet, then mark it as speakable. Not the other way around.

To earn featured snippets, your content needs to:

  • Directly answer a specific question in the first sentence of the relevant paragraph — don't build to the answer, start with it
  • Keep the answer block concise: 40–60 words for paragraph snippets, 4–8 items for list snippets
  • Use the question as a header (exact or close variant), then answer immediately below it
  • Provide supporting depth in the rest of the section — this signals that your page is authoritative, not just terse

How to Write for Ears, Not Eyes

Written content optimised for scanning looks like this: short bullets, bold keywords, minimal full sentences. That is the opposite of what voice assistants need. Voice content needs to sound natural when read aloud by a robotic voice at moderate speed.

Practical rules for voice-friendly writing:

Use full sentences. Bullets don't speak well. "• Available Monday–Saturday" becomes "bullet available Monday dash Saturday" in some assistants. Write "We're available Monday through Saturday, 9 a.m. to 6 p.m."

Avoid jargon and acronyms on first use. An assistant won't pause to explain what "GBP" means. If you write "your Google Business Profile (GBP)", the assistant reads the parenthetical too. Spell it out or drop the acronym.

Front-load your answers. "Yes, we offer same-day delivery in [city]" beats "For customers in [city], we are pleased to offer same-day delivery." The first version works as a voice answer. The second does not.

Use conversational question formats as subheadings. "How long does installation take?" instead of "Installation Timeline." The question matches what a user actually says. The label does not.


The Local Voice Search Layer

Voice assistants handle local queries differently from informational ones. When someone says "best plumber near me" or "is [business name] open right now," the assistant is not primarily querying your website — it's querying your Google Business Profile (GBP) and structured local data.

This means your voice search strategy has two tracks:

Track 1: Your GBP (for local queries)

  • Hours must be current — including holiday hours and special hours
  • Categories must be specific (not just "Restaurant" but "Italian Restaurant" and "Pizza Delivery")
  • Q&A section should have pre-populated answers to common questions ("Do you take walk-ins?" "Is there parking?")
  • Your NAP (name, address, phone) must be identical across every directory — inconsistent NAP data directly suppresses local pack rankings

Track 2: Your website (for informational and comparison queries)

  • LocalBusiness schema with openingHoursSpecification, geo, areaServed
  • FAQ schema on your services and location pages
  • Speakable markup on your most concise, accurate answer blocks

The two tracks feed each other. A strong GBP with lots of reviews and complete data signals authority, which helps your site rank higher in organic results, which increases the chance of earning featured snippets.


Page Speed: The Prerequisite Nobody Mentions

Voice assistants operate on a tighter latency budget than human users. A person will wait 4 seconds for a page to load. A voice assistant won't — if your page is slow, it moves to the next candidate.

Target under 2 seconds TTFB (Time to First Byte) for pages you want in voice results. Run your core content pages through Google PageSpeed Insights and focus on:

  • Eliminating render-blocking JavaScript from above-the-fold content
  • Serving images in WebP with explicit width and height attributes
  • Using a CDN if your audience is geographically distributed
  • Caching HTML responses for static or semi-static pages

Page speed is also a direct ranking signal for mobile, and most voice queries originate on mobile devices. Fixing speed helps your overall SEO, not just voice.


The Content Gap Most Small Businesses Have

Run this test right now: type your most commonly asked customer question into Google as a full sentence. ("How much does a [your service] cost in [your city]?" or "Does [your business name] do [specific service]?")

Does your website appear? Does it appear with a snippet? If not, you have a content gap that no amount of schema markup can fix.

The solution is a dedicated FAQ or question-and-answer page built specifically around the questions your customers actually ask. Not the questions you wish they asked. The ones that come in via phone, email, and chat.

Structure each answer as:

  1. Question as H2 or H3 heading
  2. Direct 1–2 sentence answer immediately below
  3. 100–200 words of supporting context
  4. Relevant internal link to the full service or product page

Apply FAQ schema markup to the entire page. Apply speakable schema to the direct answer sentences. This single page, done properly, can capture voice traffic for dozens of queries simultaneously.


Putting It Together: What to Prioritise

If you're starting from zero, here's the order that gets results fastest:

  1. Fix your GBP first. It's free, it's fast, and it directly answers most local voice queries without your website being involved at all.
  2. Run a page speed audit. Remove the technical blockers before adding more content.
  3. Build or improve your FAQ page. Write in full conversational sentences. Use question headers. Keep answers under 60 words.
  4. Add FAQ schema to that page using Google's Structured Data Markup Helper.
  5. Add speakable schema pointing to your direct answer paragraphs.
  6. Check for featured snippet opportunities in Google Search Console — look for queries where you rank in positions 2–10; those are your highest-probability snippet targets.
  7. Monitor voice-friendly queries by filtering GSC for question-format queries (who, what, where, when, how, why) and tracking their click-through rates over 60–90 days.

Voice search optimisation isn't a one-day project, but it's also not a complicated one. The businesses that win voice traffic are the ones who write clear answers to real questions, mark those answers up properly, and maintain the local data signals that assistants rely on. That's a discipline, not a trick.

The businesses that win voice traffic are the ones who write clear answers to real questions — speakable schema just makes sure the assistant finds them first.

Save this for later
Get a PDF copy of this post →
Drop your email, we’ll send you the full piece as a clean PDF. Plus the weekly KOIRA roundup.
Title: Voice Search Optimization: Beyond Speakable Schema
Speakable Schema
A Schema.org structured data type (SpeakableSpecification) added to a webpage's JSON-LD that explicitly identifies which content sections are appropriate for text-to-speech playback by voice assistants.
Featured Snippet
A highlighted answer box displayed at position zero in Google Search results that voice assistants use as the primary source for spoken responses to question-format queries.
SpeakableSpecification
The Schema.org class used within speakable markup that uses CSS selectors or XPath expressions to point a voice assistant to specific, speakable content blocks on a page.
Time to First Byte (TTFB)
The time it takes a server to send the first byte of a response to a browser or crawler, used as a key latency benchmark that affects whether voice assistants will index a given page.
LocalBusiness Schema
A Schema.org structured data type added to a business website that provides voice assistants and search engines with machine-readable details including hours, location, and service area.
Voice Search Optimisation: Unoptimised Site vs. Fully Optimised Site
AreaUnoptimised siteVoice-optimised site
Content structureKeyword-dense paragraphs written for scanning, bullet-heavy formattingFull-sentence, question-and-answer prose with conversational headers and direct opening answers
Schema markupNo structured data, or only basic title/description meta tagsFAQ schema, LocalBusiness schema, and SpeakableSpecification pointing to clean answer blocks
Local data signalsInconsistent NAP across directories, incomplete or outdated GBP listingFully completed GBP with current hours, categories, Q&A populated, and NAP identical site-wide
Page speedTTFB over 3 seconds, render-blocking JS, uncompressed imagesTTFB under 2 seconds, WebP images with explicit dimensions, CDN-served static assets
Featured snippet ownershipRanks on page one but holds no featured snippets; answers buried mid-paragraphHolds featured snippets for top question-format queries; answers front-loaded in 40–60 word blocks
Visibility in voice resultsRarely or never read aloud by Google Assistant, Alexa, or SiriConsistently surfaced for local and informational voice queries relevant to the business

How to Optimise Your Site for Voice Search

  1. 01
    Audit and complete your Google Business Profile. Open your GBP dashboard and verify that business hours (including holidays), primary and secondary categories, service areas, and the Q&A section are fully populated. Most local voice queries hit GBP directly before ever reaching your website.
  2. 02
    Run a page speed test on your key pages. Use Google PageSpeed Insights or WebPageTest to check TTFB and Largest Contentful Paint on your homepage, FAQ page, and top service pages. Address render-blocking scripts and uncompressed images before adding any schema markup.
  3. 03
    Build or rewrite your FAQ page with conversational answers. List the 10–20 questions customers most commonly ask, write each as an H2 or H3 heading, then answer each one in 1–2 full sentences immediately below the heading — no preamble. Aim for 40–60 words per answer block.
  4. 04
    Add FAQ and LocalBusiness schema markup. Use Google's Structured Data Markup Helper or a plugin like Rank Math to add FAQ schema to your question page and LocalBusiness schema (with openingHoursSpecification and geo) to your homepage and contact page. Validate both with Google's Rich Results Test before publishing.
  5. 05
    Implement speakable schema on your cleanest answer blocks. Add a SpeakableSpecification JSON-LD block to your FAQ page, pointing cssSelector at the elements that contain your direct answer sentences. Keep each referenced element to 2–4 sentences maximum.
  6. 06
    Identify featured snippet opportunities in Search Console. Filter Google Search Console queries for question-format keywords (how, what, where, why, when, who) and sort by impressions. Pages ranking in positions 2–10 for high-impression question queries are your best candidates for snippet optimisation — rewrite those answer sections to be more direct and concise.
  7. 07
    Monitor and iterate every 60 days. Re-check your featured snippet holdings and track question-format click-through rates in GSC on a 60-day cadence. Voice search improvements compound slowly — schema changes can take 4–6 weeks to be recrawled — so give changes time before evaluating them.
FAQ
Does speakable schema work for small business websites, or is it only for news publishers?
Google's official documentation focuses on news and broadcast content, but the SpeakableSpecification markup functions on any page type. Small business FAQ pages, service descriptions, and location pages can all use speakable schema. The key is that the marked content must be concise, factually accurate, and suitable to be read aloud without additional context — that's the bar Google applies regardless of site type.
How do I know if my content is already being used in voice search answers?
There's no direct voice-search report in Google Search Console, but you can approximate it by filtering for question-format queries (what, how, where, when, who, why) and monitoring your featured snippet position. If you hold a featured snippet for a conversational query, there's a strong probability that query is also being answered via voice. Tools like SEMrush and Ahrefs can show you which of your pages hold featured snippets.
What's the ideal length for a speakable content block?
Keep speakable sections to 2–4 sentences or roughly 40–65 words. Voice assistants need an answer that sounds natural when read aloud in 10–15 seconds. Anything longer gets truncated mid-sentence, which sounds broken to the user and reflects poorly on your brand. If the full answer needs more detail, provide it in a follow-up paragraph that is not marked as speakable.
Can voice search bring meaningful traffic to a local service business?
Yes, particularly for 'near me' queries, business hours checks, and 'how to find' style questions. These high-intent queries — often from users already in the decision stage — disproportionately happen on mobile and smart speakers. A local plumber who owns the featured snippet for 'emergency plumber in [city] open now' is capturing customers at the exact moment they're ready to call. That's high-value traffic with minimal ongoing cost.
Do I need a developer to implement speakable schema?
Not necessarily. If your site runs on WordPress, plugins like Yoast SEO Premium or Rank Math allow you to add schema markup through a visual interface without touching code. For custom or static sites, you can manually add a JSON-LD block in the page's head tag using Google's Structured Data Markup Helper. The markup itself is a short JSON object — most non-technical owners can copy, modify, and paste it in under 30 minutes with a basic walkthrough.
Does page speed really affect voice search, or is that a myth?
It's real. Voice assistants resolve queries in real time and discard slow-responding pages in favour of faster alternatives that return the same answer. Google has documented that voice results are significantly faster to load on average than standard web results. While there's no official published threshold, keeping your TTFB under 2 seconds and your Largest Contentful Paint under 2.5 seconds puts you in the safe zone for most assistant crawlers.
Written with AI assistance and reviewed by the KOIRA team before publishing.
Find KOIRA on
LinkedInCrunchbaseWellfoundF6S
Keep reading
Guides
NAP Consistency: What It Is and Why It Destroys Local Rankings When Wrong
9 min read
Data
The ROI of Structured Content: Citation Rates With and Without Schema
9 min read
Guides
Speakable Schema: What It Is and Why Voice Search Needs It
7 min read
Guides
Speakable Schema in Practice: Getting Voice Assistants to Read Your Content
9 min read
Stay in the loop
New posts, straight to your inbox.
Marketing and sales insights from the KOIRA team. No filler.
Voice Search Optimization: Beyond Speakable Schema
Get KOIRA