Quickstart: Speech to text with the Azure OpenAI Whisper model
In this quickstart, you use the Azure OpenAI Whisper model for speech to text.
The file size limit for the Azure OpenAI Whisper model is 25 MB. If you need to transcribe a file larger than 25 MB, you can use the Azure AI Speech batch transcription API.
Prerequisites
- An Azure subscription - Create one for free.
- Access granted to Azure OpenAI Service in the desired Azure subscription.
- An Azure OpenAI resource with a
whisper
model deployed in a supported region. Whisper model regional availability. For more information, see Create a resource and deploy a model with Azure OpenAI.
Note
Currently, you must submit an application to access Azure OpenAI Service. To apply for access, complete this form.
Set up
Retrieve key and endpoint
To successfully make a call against Azure OpenAI, you'll need an endpoint and a key.
Variable name | Value |
---|---|
AZURE_OPENAI_ENDPOINT |
This value can be found in the Keys & Endpoint section when examining your resource from the Azure portal. Alternatively, you can find the value in the Azure OpenAI Studio > Playground > Code View. An example endpoint is: https://aoai-docs.openai.azure.com/ . |
AZURE_OPENAI_API_KEY |
This value can be found in the Keys & Endpoint section when examining your resource from the Azure portal. You can use either KEY1 or KEY2 . |
Go to your resource in the Azure portal. The Endpoint and Keys can be found in the Resource Management section. Copy your endpoint and access key as you'll need both for authenticating your API calls. You can use either KEY1
or KEY2
. Always having two keys allows you to securely rotate and regenerate keys without causing a service disruption.
Create and assign persistent environment variables for your key and endpoint.
Environment variables
setx AZURE_OPENAI_API_KEY "REPLACE_WITH_YOUR_KEY_VALUE_HERE"
setx AZURE_OPENAI_ENDPOINT "REPLACE_WITH_YOUR_ENDPOINT_HERE"
REST API
In a bash shell, run the following command. You need to replace YourDeploymentName
with the deployment name you chose when you deployed the Whisper model. The deployment name isn't necessarily the same as the model name. Entering the model name results in an error unless you chose a deployment name that is identical to the underlying model name.
curl $AZURE_OPENAI_ENDPOINT/openai/deployments/YourDeploymentName/audio/transcriptions?api-version=2024-02-01 \
-H "api-key: $AZURE_OPENAI_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@./wikipediaOcelot.wav"
The format of your first line of the command with an example endpoint would appear as follows curl https://aoai-docs.openai.azure.com/openai/deployments/{YourDeploymentName}/audio/transcriptions?api-version=2024-02-01 \
.
You can get sample audio files from the Azure AI Speech SDK repository at GitHub.
Important
For production, use a secure way of storing and accessing your credentials like Azure Key Vault. For more information about credential security, see the Azure AI services security article.
Output
{"text":"The ocelot, Lepardus paradalis, is a small wild cat native to the southwestern United States, Mexico, and Central and South America. This medium-sized cat is characterized by solid black spots and streaks on its coat, round ears, and white neck and undersides. It weighs between 8 and 15.5 kilograms, 18 and 34 pounds, and reaches 40 to 50 centimeters 16 to 20 inches at the shoulders. It was first described by Carl Linnaeus in 1758. Two subspecies are recognized, L. p. paradalis and L. p. mitis. Typically active during twilight and at night, the ocelot tends to be solitary and territorial. It is efficient at climbing, leaping, and swimming. It preys on small terrestrial mammals such as armadillo, opossum, and lagomorphs."}
PowerShell
Run the following command. You need to replace YourDeploymentName
with the deployment name you chose when you deployed the Whisper model. The deployment name isn't necessarily the same as the model name. Entering the model name results in an error unless you chose a deployment name that is identical to the underlying model name.
# Azure OpenAI metadata variables
$openai = @{
api_key = $Env:AZURE_OPENAI_API_KEY
api_base = $Env:AZURE_OPENAI_ENDPOINT # your endpoint should look like the following https://YOUR_RESOURCE_NAME.openai.azure.com/
api_version = '2024-02-01' # this may change in the future
name = 'YourDeploymentName' #This will correspond to the custom name you chose for your deployment when you deployed a model.
}
# Header for authentication
$headers = [ordered]@{
'api-key' = $openai.api_key
}
$form = @{ file = get-item -path './wikipediaOcelot.wav' }
# Send a completion call to generate an answer
$url = "$($openai.api_base)/openai/deployments/$($openai.name)/audio/transcriptions?api-version=$($openai.api_version)"
$response = Invoke-RestMethod -Uri $url -Headers $headers -Form $form -Method Post -ContentType 'multipart/form-data'
return $response.text
You can get sample audio files from the Azure AI Speech SDK repository at GitHub.
Important
For production, use a secure way of storing and accessing your credentials like The PowerShell Secret Management with Azure Key Vault. For more information about credential security, see the Azure AI services security article.
Output
The ocelot, Lepardus paradalis, is a small wild cat native to the southwestern United States, Mexico, and Central and South America. This medium-sized cat is characterized by solid black spots and streaks on its coat, round ears, and white neck and undersides. It weighs between 8 and 15.5 kilograms, 18 and 34 pounds, and reaches 40 to 50 centimeters 16 to 20 inches at the shoulders. It was first described by Carl Linnaeus in 1758. Two subspecies are recognized, L. p. paradalis and L. p. mitis. Typically active during twilight and at night, the ocelot tends to be solitary and territorial. It is efficient at climbing, leaping, and swimming. It preys on small terrestrial mammals such as armadillo, opossum, and lagomorphs.
Python
Prerequisites
- Python 3.8 or later version
- The following Python libraries: os
Set up
Install the OpenAI Python client library with:
pip install openai
Create a new Python file called quickstart.py. Then open it up in your preferred editor or IDE.
Replace the contents of quickstart.py with the following code. Modify the code to add your deployment name:
import os
from openai import AzureOpenAI
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-02-01",
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
)
deployment_id = "YOUR-DEPLOYMENT-NAME-HERE" #This will correspond to the custom name you chose for your deployment when you deployed a model."
audio_test_file = "./wikipediaOcelot.wav"
result = client.audio.transcriptions.create(
file=open(audio_test_file, "rb"),
model=deployment_id
)
print(result)
Run the application with the python command on your quickstart file:
You can get sample audio files from the Azure AI Speech SDK repository at GitHub.
Important
For production, use a secure way of storing and accessing your credentials like Azure Key Vault. For more information about credential security, see the Azure AI services security article.
Output
{"text":"The ocelot, Lepardus paradalis, is a small wild cat native to the southwestern United States, Mexico, and Central and South America. This medium-sized cat is characterized by solid black spots and streaks on its coat, round ears, and white neck and undersides. It weighs between 8 and 15.5 kilograms, 18 and 34 pounds, and reaches 40 to 50 centimeters 16 to 20 inches at the shoulders. It was first described by Carl Linnaeus in 1758. Two subspecies are recognized, L. p. paradalis and L. p. mitis. Typically active during twilight and at night, the ocelot tends to be solitary and territorial. It is efficient at climbing, leaping, and swimming. It preys on small terrestrial mammals such as armadillo, opossum, and lagomorphs."}
Clean up resources
If you want to clean up and remove an Azure OpenAI resource, you can delete the resource. Before deleting the resource, you must first delete any deployed models.
Next steps
- Learn more about how to work with Whisper models with the Azure AI Speech batch transcription API.
- For more examples, check out the Azure OpenAI Samples GitHub repository
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for