Create voice Agents for your website using ElevenLabs and Airtable

Below is a detailed article on how to integrate Airtable with ElevenLabs to create voice agents for a website. This setup allows you to store conversation prompts, user data, or agent configuration in Airtable, while leveraging ElevenLabs for text-to-speech (TTS) and, optionally, advanced speech-to-text capabilities. We will walk through an example workflow, data modeling in Airtable, integrations with JavaScript, and how to include a code snippet (shown below) that handles redirection after a voice interaction.

1. Introduction

Voice-based interactions have gained traction as users often find them more intuitive and hands-free. ElevenLabs provides high-quality text-to-speech (TTS) technology, which can be embedded into a website or application to enable a conversational user interface.

However, creating a scalable or easily maintainable voice agent often requires a centralized repository for prompts, responses, or user data. That’s where Airtable comes in. Airtable combines the familiarity of a spreadsheet with the power of a database, allowing both technical and non-technical teams to manage data, logic triggers, and more.

By coupling Airtable’s flexible database with ElevenLabs’ powerful voice technology, you can build user-friendly voice agents that fetch data, respond in natural-sounding voices, and guide users through various tasks on your website.

2. Setting Up Your Airtable Base

Create a new Base: Go to Airtable, sign in, and create a new base. You might name it “Voice Agent” or “Website Voice Assistant.”
Design Tables: Determine what information you want your voice agent to use or provide. Common tables may include:
- Prompts: Store conversation prompts or instructions for your voice agent. For instance, you can store user queries and recommended responses or next steps.
- User Profiles: If your agent personalizes interactions, store user IDs, session data, or preferences here.
- Redirect URLs: If you want your agent to direct users to specific URLs (e.g., product pages, articles, or support guides), create a table of mappings from a user request to a target URL.
Generate API Credentials: Go to “Account” in Airtable to generate an API key. You will need this to query or update your Airtable data from your website’s JavaScript code.

3. Connecting Airtable to Your Website

3.1 Using Airtable’s REST API

Airtable Base Endpoint: After creating your base, open Airtable’s API documentation page for that base (usually https://airtable.com/appXXXX/api/docs). It shows you the endpoint for each table and how to query them.

Fetching Data: In your JavaScript code, you can fetch data from a table using the endpoint. For example:

// Example: fetch prompts from "Prompts" table
const baseId = "appXXXX";         // Your Airtable base ID
const tableName = "Prompts";      // The table where you store prompts
const apiKey = "keyXXXX";         // Your Airtable API key

async function getPrompts() {
  const response = await fetch(`https://api.airtable.com/v0/${baseId}/${tableName}`, {
    headers: {
      Authorization: `Bearer ${apiKey}`
    }
  });
  const data = await response.json();
  return data.records;
}

Filtering: You can use Airtable’s filtering capabilities (e.g., filterByFormula) to retrieve only the relevant records for a specific scenario or user query.

3.2 Handling the Data in Your Voice Agent

Processing: Once data is fetched from Airtable, your voice agent or TTS code can format it and pass it along to ElevenLabs for speech output.
Updating: If your agent needs to capture user input (e.g., user preferences or answered questions), you can push that back to Airtable via a POST or PATCH request to the same API endpoint.

4. Integrating with ElevenLabs

4.1 ElevenLabs Text-to-Speech

ElevenLabs provides an API and a custom element (like <elevenlabs-convai>) to handle text-to-speech interactions. Depending on your setup, you might:

Use the ElevenLabs REST API: Convert your text data from Airtable into an audio stream.
Use a third-party Library: If you are using a web component like <elevenlabs-convai>, simply pass it text content to speak aloud.

4.2 Setting Up the Voice Agent

A typical usage might look like this:

Load Data from Airtable: Retrieve a greeting or relevant conversation prompt for the user.
**Send Prompt to ElevenLabs **: Pass the text to ElevenLabs TTS to play it back on the user’s device.
Listen for user response: If you have speech-to-text, interpret user’s speech and decide how to respond (e.g., fetch more data from Airtable or direct them to a page).
Redirect or Provide Additional Info: If needed, you can use JavaScript to direct the user to another page or highlight relevant content on the same page.

5. Handling Page Redirection After a Voice Interaction

Sometimes your voice agent needs to guide users to a specific article, product page, or other resource. Below is a sample code snippet that:

Listens for an elevenlabs-convai:call event (a hypothetical custom event emitted by the ElevenLabs web component).
Defines a custom function redirect_to_a_page within clientTools.
Waits 2 seconds before redirecting.
Replaces the hostname of the incoming url parameter with the current website’s hostname (and port, if present).

<elevenlabs-convai agent-id="YourAgentCode"></elevenlabs-convai>
<script>
document.querySelector('elevenlabs-convai').addEventListener('elevenlabs-convai:call', (e) => {
  e.detail.config.clientTools = {
    redirect_to_a_page: async ({ url }) => {
      console.log("Original URL:", url);
      
      // Parse the incoming URL
      const parsedUrl = new URL(url);

      // Replace the hostname with the current browser tab's hostname
      parsedUrl.hostname = window.location.hostname;

      // If the current page is served on a specific port, use that in the new URL
      if (window.location.port) {
        parsedUrl.port = window.location.port;
      } else {
        // If no port is set on the current page, remove any port from the parsed URL
        parsedUrl.port = '';
      }

      // Keep the same protocol as the current page (http / https)
      parsedUrl.protocol = window.location.protocol;

      // Wait 2 seconds before redirecting
      setTimeout(() => {
        window.location.href = parsedUrl.href;
      }, 2000);

      return "Redirect scheduled after 2 seconds.";
    }
  };
});
</script>
<script src="https://elevenlabs.io/convai-widget/index.js" 
async type="text/javascript"></script>

5.1 How It Works

Registering the Listener: The code waits for an elevenlabs-convai:call event from the <elevenlabs-convai> element.
Defining clientTools: The clientTools object is extended with custom behaviors. Here, we define redirect_to_a_page.
Hostname & Port Handling: We parse the incoming URL with the URL constructor, replace the hostname and port with the current site’s details. This ensures the user remains on your domain even if the original URL is from a different domain.
Protocol Consistency: By setting parsedUrl.protocol = window.location.protocol, we ensure that the protocol remains http or https (as appropriate for your site).
Delay Before Redirect: setTimeout is used to introduce a 2-second delay, which can improve user experience (for example, letting the voice agent finish speaking a confirmation message).

6. Putting It All Together

Configure Your HTML: On your website, include both the Airtable fetch script and the ElevenLabs <elevenlabs-convai> web component (or any other ElevenLabs integration you are using).
Fetch Data from Airtable: Before or after the user initiates a conversation, fetch relevant data (e.g., greetings, FAQ answers) from Airtable.
Trigger Voice Output: Use the data from Airtable to generate text or cues that are spoken by the ElevenLabs component.
User Interaction: The user might ask for more information, which triggers a new fetch from Airtable for a more specific piece of content.
Redirection: If the user requests to view the page, your custom redirect function (such as redirect_to_a_page) can be invoked by the voice agent, guiding them to the correct article.

7. Best Practices & Considerations

Authentication: Never expose your full Airtable API key in public-facing code. Use environment variables or an intermediary server for security.
Caching: If your site has heavy traffic, consider caching responses from Airtable to reduce the number of requests and speed up performance.
Error Handling: Always handle errors gracefully—if Airtable or ElevenLabs is unavailable or returns an error, inform the user via a fallback message or text.
Accessibility: Voice interactions should complement, not replace, standard web navigation for users who may need screen readers or other accessibility tools.
Performance: TTS can be resource-intensive on slower devices. Keep an eye on performance, especially for mobile users.
Data Structure: Keep your Airtable schema as intuitive and minimal as possible. Overly complex structures can complicate your code and hamper debugging.

8. Conclusion

Combining Airtable’s flexible database capabilities with ElevenLabs’ advanced TTS engine opens up exciting possibilities for engaging voice-based interactions on your website. Whether you are guiding customers through a product catalog, helping users navigate an online course, or providing audio assistance for an FAQ, the workflow remains the same:

Store structured content or prompts in Airtable.
Fetch from Airtable and pass the content to ElevenLabs to generate speech.
Interact with your users via voice.
Redirect or update the user interface using JavaScript (like the sample code above).

This modular approach ensures that non-technical team members can update the voice agent’s content through Airtable, while developers fine-tune the website’s logic and user experience. Over time, you can scale and refine your voice agent by adding more data tables, customizing voice outputs, or integrating natural language understanding (NLU) for more sophisticated interactions.

By following these steps, you can create a seamless voice agent that delivers real-time, on-demand, and context-aware guidance to users—all powered by Airtable’s structured data and ElevenLabs’ high-quality voice technology.