Browser
Node
React
Algorithm
-> API Reference

API Reference

MicVAD

The MicVAD API is for recording user audio in the browser and running callbacks on speech segments and related events.

Support

Package Supported
@ricky0123/vad-web Yes
@ricky0123/vad-node No
@ricky0123/vad-react No, use the useMicVAD hook

Example

import { MicVAD } from "@ricky0123/vad-web"
const myvad = await MicVAD.new({
onSpeechEnd: (audio) => {
// do something with `audio` (Float32Array of audio samples at sample rate 16000)...
},
})
myvad.start()

Options

New instances of MicVAD are created by calling the async static method MicVAD.new(options). The options object can contain the following fields (all are optional).

Option Type Description
additionalAudioConstraints constraints to pass to getUserMedia via the audio field
onFrameProcessed (probabilities: {isSpeech: float; notSpeech: float}) => any Callback to run after each frame.
onVADMisfire () => any Callback to run if speech start was detected but onSpeechEnd will not be run because the audio segment is smaller than minSpeechFrames
onSpeechStart () => any Callback to run when speech start is detected
onSpeechEnd (audio: Float32Array) => any Callback to run when speech end is detected. Takes as arg a Float32Array of audio samples between -1 and 1, sample rate 16000. This will not run if the audio segment is smaller than minSpeechFrames
positiveSpeechThreshold number see algorithm configuration
negativeSpeechThreshold number see algorithm configuration
redemptionFrames number see algorithm configuration
frameSamples number see algorithm configuration
preSpeechPadFrames number see algorithm configuration
minSpeechFrames number see algorithm configuration

Attributes

Attributes Type Description
listening boolean Is the VAD listening to mic input or is it paused?
pause () => void Stop listening to mic input
start () => void Start listening to mic input

NonRealTimeVAD

The NonRealTimeVAD API is for identifying segments of user speech if you already have a Float32Array of audio samples.

Support

Package Supported
@ricky0123/vad-web Yes
@ricky0123/vad-node Yes
@ricky0123/vad-react No

Example

const vad = require("@ricky0123/vad-node") // or @ricky0123/vad-web

const options: Partial<vad.NonRealTimeVADOptions> = { /* ... */ }
const myvad = await vad.NonRealTimeVAD.new(options)
const audioFileData, nativeSampleRate = ... // get audio and sample rate from file or something
for await (const {audio, start, end} of myvad.run(audioFileData, nativeSampleRate)) {
// do stuff with
// audio (float32array of audio)
// start (milliseconds into audio where speech starts)
// end (milliseconds into audio where speech ends)
}

Options

New instances of MicVAD are created by calling the async static method MicVAD.new(options). The options object can contain the following fields (all are optional).

Option Type Description
positiveSpeechThreshold number see algorithm configuration
negativeSpeechThreshold number see algorithm configuration
redemptionFrames number see algorithm configuration
frameSamples number see algorithm configuration
preSpeechPadFrames number see algorithm configuration
minSpeechFrames number see algorithm configuration

Attributes

Attributes Type Description
run async function* (inputAudio: Float32Array, sampleRate: number): AsyncGenerator Run the VAD model on your audio

useMicVAD

A React hook wrapper for MicVAD. Use this if you want to run the VAD model on mic input in a React application.

Support

Package Supported
@ricky0123/vad-web No, use MicVAD
@ricky0123/vad-node No
@ricky0123/vad-react Yes

Example

import { useMicVAD } from "@ricky0123/vad-react"

const MyComponent = () => {
const vad = useMicVAD({
startOnLoad: true,
onSpeechEnd: (audio) => {
console.log("User stopped talking")
},
})
return <div>{vad.userSpeaking && "User is speaking"}</div>
}

Options

The useMicVAD hook takes an options object with the following fields (all optional).

Option Type Description
startOnLoad boolean Should the VAD start listening to mic input when it finishes loading?
additionalAudioConstraints constraints to pass to getUserMedia via the audio field
onFrameProcessed (probabilities: {isSpeech: float; notSpeech: float}) => any Callback to run after each frame.
onVADMisfire () => any Callback to run if speech start was detected but onSpeechEnd will not be run because the audio segment is smaller than minSpeechFrames
onSpeechStart () => any Callback to run when speech start is detected
onSpeechEnd (audio: Float32Array) => any Callback to run when speech end is detected. Takes as arg a Float32Array of audio samples between -1 and 1, sample rate 16000. This will not run if the audio segment is smaller than minSpeechFrames
positiveSpeechThreshold number see algorithm configuration
negativeSpeechThreshold number see algorithm configuration
redemptionFrames number see algorithm configuration
frameSamples number see algorithm configuration
preSpeechPadFrames number see algorithm configuration
minSpeechFrames number see algorithm configuration

Returns

Attributes Type Description
listening boolean Is the VAD currently listening to mic input?
errored false | { message: string; } Did the VAD fail to load?
loading boolean Did the VAD finish loading?
userSpeaking boolean Is the user speaking?
pause () => void Stop the VAD from running on mic input
start () => void Start running the VAD on mic input