Language support for Azure AI Vision
Some capabilities of Azure AI Vision support multiple languages; any capabilities not mentioned here only support English.
Optical Character Recognition (OCR)
The Azure AI Vision Read API supports many languages. The Read
API can extract text from images and documents with mixed languages, including from the same text line, without requiring a language parameter. See How to specify the Read
model to use the new languages.
Note
Language code optional
Read
OCR's deep-learning-based universal models extract all multi-lingual text in your documents, including text lines with mixed languages, and do not require specifying a language code. Do not provide the language code as the parameter unless you are sure about the language and want to force the service to apply only the relevant model. Otherwise, the service may return incomplete and incorrect text.
Handwritten text
The following table lists the OCR supported languages for handwritten text by the most recent Read
GA model.
Language | Language code (optional) | Language | Language code (optional) |
---|---|---|---|
English | en |
Japanese | ja |
Chinese Simplified | zh-Hans |
Korean | ko |
French | fr |
Portuguese | pt |
German | de |
Spanish | es |
Italian | it |
Print text
The following table lists the OCR supported languages for print text by the most recent Read
GA model.
Language | Code (optional) | Language | Code (optional) |
---|---|---|---|
Afrikaans | af |
Khasi | kha |
Albanian | sq |
K'iche' | quc |
Angika (Devanagiri) | anp |
Korean | ko |
Arabic | ar |
Korku | kfq |
Asturian | ast |
Koryak | kpy |
Awadhi-Hindi (Devanagiri) | awa |
Kosraean | kos |
Azerbaijani (Latin) | az |
Kumyk (Cyrillic) | kum |
Bagheli | bfy |
Kurdish (Arabic) | ku-arab |
Basque | eu |
Kurdish (Latin) | ku-latn |
Belarusian (Cyrillic) | be , be-cyrl |
Kurukh (Devanagiri) | kru |
Belarusian (Latin) | be , be-latn |
Kyrgyz (Cyrillic) | ky |
Bhojpuri-Hindi (Devanagiri) | bho |
Lakota | lkt |
Bislama | bi |
Latin | la |
Bodo (Devanagiri) | brx |
Lithuanian | lt |
Bosnian (Latin) | bs |
Lower Sorbian | dsb |
Brajbha | bra |
Lule Sami | smj |
Breton | br |
Luxembourgish | lb |
Bulgarian | bg |
Mahasu Pahari (Devanagiri) | bfz |
Bundeli | bns |
Malay (Latin) | ms |
Buryat (Cyrillic) | bua |
Maltese | mt |
Catalan | ca |
Malto (Devanagiri) | kmj |
Cebuano | ceb |
Manx | gv |
Chamling | rab |
Maori | mi |
Chamorro | ch |
Marathi | mr |
Chhattisgarhi (Devanagiri) | hne |
Mongolian (Cyrillic) | mn |
Chinese Simplified | zh-Hans |
Montenegrin (Cyrillic) | cnr-cyrl |
Chinese Traditional | zh-Hant |
Montenegrin (Latin) | cnr-latn |
Cornish | kw |
Neapolitan | nap |
Corsican | co |
Nepali | ne |
Crimean Tatar (Latin) | crh |
Niuean | niu |
Croatian | hr |
Nogay | nog |
Czech | cs |
Northern Sami (Latin) | sme |
Danish | da |
Norwegian | no |
Dari | prs |
Occitan | oc |
Dhimal (Devanagiri) | dhi |
Ossetic | os |
Dogri (Devanagiri) | doi |
Pashto | ps |
Dutch | nl |
Persian | fa |
English | en |
Polish | pl |
Erzya (Cyrillic) | myv |
Portuguese | pt |
Estonian | et |
Punjabi (Arabic) | pa |
Faroese | fo |
Ripuarian | ksh |
Fijian | fj |
Romanian | ro |
Filipino | fil |
Romansh | rm |
Finnish | fi |
Russian | ru |
French | fr |
Sadri (Devanagiri) | sck |
Friulian | fur |
Samoan (Latin) | sm |
Gagauz (Latin) | gag |
Sanskrit (Devanagari) | sa |
Galician | gl |
Santali(Devanagiri) | sat |
German | de |
Scots | sco |
Gilbertese | gil |
Scottish Gaelic | gd |
Gondi (Devanagiri) | gon |
Serbian (Latin) | sr , sr-latn |
Greenlandic | kl |
Sherpa (Devanagiri) | xsr |
Gurung (Devanagiri) | gvr |
Sirmauri (Devanagiri) | srx |
Haitian Creole | ht |
Skolt Sami | sms |
Halbi (Devanagiri) | hlb |
Slovak | sk |
Hani | hni |
Slovenian | sl |
Haryanvi | bgc |
Somali (Arabic) | so |
Hawaiian | haw |
Southern Sami | sma |
Hindi | hi |
Spanish | es |
Hmong Daw (Latin) | mww |
Swahili (Latin) | sw |
Ho(Devanagiri) | hoc |
Swedish | sv |
Hungarian | hu |
Tajik (Cyrillic) | tg |
Icelandic | is |
Tatar (Latin) | tt |
Inari Sami | smn |
Tetum | tet |
Indonesian | id |
Thangmi | thf |
Interlingua | ia |
Tongan | to |
Inuktitut (Latin) | iu |
Turkish | tr |
Irish | ga |
Turkmen (Latin) | tk |
Italian | it |
Tuvan | tyv |
Japanese | ja |
Upper Sorbian | hsb |
Jaunsari (Devanagiri) | Jns |
Urdu | ur |
Javanese | jv |
Uyghur (Arabic) | ug |
Kabuverdianu | kea |
Uzbek (Arabic) | uz-arab |
Kachin (Latin) | kac |
Uzbek (Cyrillic) | uz-cyrl |
Kangri (Devanagiri) | xnr |
Uzbek (Latin) | uz |
Karachay-Balkar | krc |
Volapük | vo |
Kara-Kalpak (Cyrillic) | kaa-cyrl |
Walser | wae |
Kara-Kalpak (Latin) | kaa |
Welsh | cy |
Kashubian | csb |
Western Frisian | fy |
Kazakh (Cyrillic) | kk-cyrl |
Yucatec Maya | yua |
Kazakh (Latin) | kk-latn |
Zhuang | za |
Khaling | klr |
Zulu | zu |
Analyze image
Some features of the Analyze - Image API can return results in other languages, specified with the language
query parameter. Other actions return results in English regardless of what language is specified, and others throw an exception for unsupported languages. Actions are specified with the visualFeatures
and details
query parameters; see the Overview for a list of all the actions you can do with the Analyze API, or follow the How-to guide to try them out.
Language | Language code | Categories | Tags | Description | Adult, Brands, Color, Faces, ImageType, Objects | Celebrities, Landmarks | Captions, Dense captions |
---|---|---|---|---|---|---|---|
Arabic | ar |
✅ | |||||
Azerbaijani | az |
✅ | |||||
Bulgarian | bg |
✅ | |||||
Bosnian Latin | bs |
✅ | |||||
Catalan | ca |
✅ | |||||
Czech | cs |
✅ | |||||
Welsh | cy |
✅ | |||||
Danish | da |
✅ | |||||
German | de |
✅ | |||||
Greek | el |
✅ | |||||
English | en |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Spanish | es |
✅ | ✅ | ✅ | ✅ | ||
Estonian | et |
✅ | |||||
Basque | eu |
✅ | |||||
Finnish | fi |
✅ | |||||
French | fr |
✅ | |||||
Irish | ga |
✅ | |||||
Galician | gl |
✅ | |||||
Hebrew | he |
✅ | |||||
Hindi | hi |
✅ | |||||
Croatian | hr |
✅ | |||||
Hungarian | hu |
✅ | |||||
Indonesian | id |
✅ | |||||
Italian | it |
✅ | |||||
Japanese | ja |
✅ | ✅ | ✅ | ✅ | ||
Kazakh | kk |
✅ | |||||
Korean | ko |
✅ | |||||
Lithuanian | lt |
✅ | |||||
Latvian | lv |
✅ | |||||
Macedonian | mk |
✅ | |||||
Malay Malaysia | ms |
✅ | |||||
Norwegian (Bokmal) | nb |
✅ | |||||
Dutch | nl |
✅ | |||||
Polish | pl |
✅ | |||||
Dari | prs |
✅ | |||||
Portuguese-Brazil | pt-BR |
✅ | |||||
Portuguese-Portugal | pt |
✅ | ✅ | ✅ | ✅ | ||
Portuguese-Portugal | pt-PT |
✅ | |||||
Romanian | ro |
✅ | |||||
Russian | ru |
✅ | |||||
Slovak | sk |
✅ | |||||
Slovenian | sl |
✅ | |||||
Serbian - Cyrillic RS | sr-Cryl |
✅ | |||||
Serbian - Latin RS | sr-Latn |
✅ | |||||
Swedish | sv |
✅ | |||||
Thai | th |
✅ | |||||
Turkish | tr |
✅ | |||||
Ukrainian | uk |
✅ | |||||
Vietnamese | vi |
✅ | |||||
Chinese Simplified | zh |
✅ | ✅ | ✅ | ✅ | ||
Chinese Simplified | zh-Hans |
✅ | |||||
Chinese Traditional | zh-Hant |
✅ |
Multimodal embeddings
The latest Multimodal embeddings model supports vector search in many languages. The original model supports English only. Images that are vectorized in the English-only model are not compatible with text searches in the multi-lingual model.
Language | Language code | 2023-04-15 model |
2022-04-11 model |
---|---|---|---|
Akrikaans | af |
✅ | |
Amharic | am |
✅ | |
Arabic | ar |
✅ | |
Armenian | hy |
✅ | |
Assamese | as |
✅ | |
Asturian | ast |
✅ | |
Azerbaijani | az |
✅ | |
Belarusian | be |
✅ | |
Bengali | bn |
✅ | |
Bosnian | bs |
✅ | |
Bulgarian | bg |
✅ | |
Burmese | my |
✅ | |
Catalan | ca |
✅ | |
Cebuano | ceb |
✅ | |
Chinese Simpl | zho |
✅ | |
Chinese Trad | zho |
✅ | |
Croatian | hr |
✅ | |
Czech | cs |
✅ | |
Danish | da |
✅ | |
Dutch | nl |
✅ | |
English | en |
✅ | ✅ |
Estonian | et |
✅ | |
Filipino (Tagalog) | tl |
✅ | |
Finnish | fi |
✅ | |
French | fr |
✅ | |
Fulah | ff |
✅ | |
Galician | gl |
✅ | |
Ganda | lg |
✅ | |
Georgian | ka |
✅ | |
German | de |
✅ | |
Greek | el |
✅ | |
Gujarati | gu |
✅ | |
Hausa | ha |
✅ | |
Hebrew | he |
✅ | |
Hindi | hi |
✅ | |
Hungarian | hu |
✅ | |
Icelandic | is |
✅ | |
Igbo | ig |
✅ | |
Indonesian | id |
✅ | |
Irish | ga |
✅ | |
Italian | it |
✅ | |
Japanese | ja |
✅ | |
Javanese | jv |
✅ | |
Kabuverdianu | kea |
✅ | |
Kamba | kam |
✅ | |
Kannada | kn |
✅ | |
Kazakh | kk |
✅ | |
Khmer | km |
✅ | |
Korean | ko |
✅ | |
Kyrgyz | ky |
✅ | |
Lao | lo |
✅ | |
Latvian | lv |
✅ | |
Lingala | ln |
✅ | |
Lithuanian | lt |
✅ | |
Luo | luo |
✅ | |
Luxembourgish | lb |
✅ | |
Macedonian | mk |
✅ | |
Malay | ms |
✅ | |
Malayalam | ml |
✅ | |
Maltese | mt |
✅ | |
Maori | mi |
✅ | |
Marathi | mr |
✅ | |
Mongolian | mn |
✅ | |
Nepali | ne |
✅ | |
Northern Sotho | ns |
✅ | |
Norwegian | no |
✅ | |
Nyanja | ny |
✅ | |
Occitan | oc |
✅ | |
Oriya | or |
✅ | |
Oromo | om |
✅ | |
Pashto | ps |
✅ | |
Persian | fa |
✅ | |
Polish | pl |
✅ | |
Portuguese (Brazil) | pt |
✅ | |
Punjabi | pa |
✅ | |
Romanian | ro |
✅ | |
Russian | ru |
✅ | |
Serbian | sr |
✅ | |
Shona | sn |
✅ | |
Sindhi | sd |
✅ | |
Slovak | sk |
✅ | |
Slovenian | sl |
✅ | |
Somali | so |
✅ | |
Sorani Kurdish | ku |
✅ | |
Spanish (Latin American) | es |
✅ | |
Swahili | sw |
✅ | |
Swedish | sv |
✅ | |
Tajik | tg |
✅ | |
Tamil | ta |
✅ | |
Telugu | te |
✅ | |
Thai | th |
✅ | |
Turkish | tr |
✅ | |
Ukrainian | uk |
✅ | |
Umbundu | umb |
✅ | |
Urdu | ur |
✅ | |
Uzbek | uz |
✅ | |
Vietnamese | vi |
✅ | |
Welsh | cy |
✅ | |
Wolof | wo |
✅ | |
Xhosa | xh |
✅ | |
Yoruba | yo |
✅ | |
Zulu | zu |
✅ |
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for