Zaɓi Harshe

SM2: Tsarin Magana Mai Gudana Da Yare Da Yawa Mai Ƙaramin Kulawa Tare Da Haƙiƙanin Ikon Sifili-Sifili

Bincike akan SM2, tsarin Transformer Transducer mai gudana don ASR da fassarar magana na yare da yawa, mai siffar haƙiƙanin ikon sifili-sifili da kulawa mara ƙarfi.
translation-service.org | PDF Size: 0.7 MB
Kima: 4.5/5
Kimarku
Kun riga kun ƙididdige wannan takarda
Murfin Takardar PDF - SM2: Tsarin Magana Mai Gudana Da Yare Da Yawa Mai Ƙaramin Kulawa Tare Da Haƙiƙanin Ikon Sifili-Sifili

1. Gabatarwa & Bayyani

Wannan takarda tana binciken takardar bincike "Tsarin Magana Mai Gudana Da Yare Da Yawa Mai Ƙaramin Kulawa Tare Da Haƙiƙanin Ikon Sifili-Sifili," wanda ya gabatar da SM2 (Tsarin Magana Mai Gudana Da Yare Da Yawa). SM2 tsarin transducer na jijiya guda ɗaya ne wanda aka tsara don gudanar da Gane Magana ta Atomatik (ASR) da Fassarar Magana (ST) a cikin yaruka 25, yana nufin fitar da yare guda ɗaya ba tare da buƙatar Gano Harshe (LID) ba.

Sabbin abubuwan tsarin sune ikin gudana ta amfani da kashin Transformer Transducer, kulawa mara ƙarfi (horar da ayyukan ST ta amfani da rubutun ASR da aka fassara ta hanyar injin fassara, tare da guje wa tsadar bayanan da mutum ya yi lakabi da su), da kuma nuna aikin haƙiƙanin sifili-sifili akan nau'ikan yarukan da ba a gani ba.

Girman Bayanan Horarwa

Sau 351K

Magana marar suna a cikin yaruka 25

Nau'in Tsarin

Transformer Transducer

Mai gudana, tsarin guda ɗaya don ASR & ST

Da'awar Mabuɗi

Haƙiƙanin Sifili-Sifili

ST don nau'ikan {magana, rubutu} da ba a gani ba

2. Tsarin Magana Mai Gudana Da Yare Da Yawa (SM2)

An sanya SM2 a matsayin tsarin mai amfani, mai karkata zuwa masana'antu wanda ya bambanta da manyan tsare-tsare marasa gudana kamar Whisper na OpenAI.

2.1 Tsarin Tsarin: Transformer Transducer

Kashin tsarin shine Transformer Transducer (T-T). Ba kamar tsare-tsaren Maida Hankali-Mai Rarraba (AED) da aka saba da su a cikin ST na kashe wuta (misali, Whisper) ba, tsarin transducer ya fi dacewa da gudanar da sauri. Yana haɗa mai rarraba Transformer mai gudana tare da cibiyar hasashe da cibiyar haɗin gwiwa.

Wannan zaɓin yana magance kai tsaye musanya tsakanin gudana da inganci, yana zaɓar T-T akan nau'ikan AED masu gudana kamar Hankali Mai Tsayi, yana ba da fifiko ga ƙayyadaddun jinkiri da yuwuwar turawa masana'antu.

2.2 Tsarin Horarwa Mai Ƙaramin Kulawa

Gudunmawar asali ita ce hanyar horarwa. Maimakon bayanan {magana-tushe, rubutu-manufa} masu layi daya, SM2 yana amfani da bayanan ASR na yare da yawa masu yawa. Ana fassara rubutun zuwa yaren da ake nufi ta amfani da sabis na Fassarar Injin (MT) don ƙirƙirar nau'ikan horarwa na ST na karya.

Tsari: {Magana Tushe, Rubutun Tushe (tarin ASR)} → Sabis na MT → {Magana Tushe, Rubutun Manufa (Alamar Karya)}. Wannan yana ƙetare ƙarancin bayanai don ST kuma ya yi daidai da yanayin amfani da alamun ƙazanta ko na roba don ma'auni, mai kama da dabarun a cikin hangen nesa na kwamfuta mai rabin kulawa kamar CycleGAN don daidaita yanki ba tare da bayanan da aka haɗa ba.

2.3 Haƙiƙanin Ikon Sifili-Sifili

Takardar ta bambanta a cikin kalmomin. Tana jayayya cewa "sifili-sifili" a cikin tsare-tsare kamar Whisper yana nuna ƙarfi ga lafuzzan da ba a gani ba/ yaruka amma ba ayyukan taswirar yare da ba a gani ba. SM2 yana da'awar "haƙiƙanin sifili-sifili"—ikin yin ST don nau'in yare wanda kai tsayin taswirar {magana, rubutu-manufa} ba a taɓa gabatar da shi ba yayin horarwa.

Wannan ikon yana yiwuwa bisa ka'idar ta hanyar tsarin koyon wakilcin abun cikin magana da harshe da aka raba ko haɗa su, yana ba shi damar sake haɗa siffofin magana na tushe da aka koya tare da sabon haɗakar yaren da ake nufi.

3. Cikakkun Bayanai Na Fasaha & Tsarin Lissafi

Transformer Transducer yana ayyana yuwuwar jerin fitarwa $Y=(y_1,...,y_U)$ idan aka ba da siffofin sauti $X=(x_1,...,x_T)$:

\[P(Y|X) = \prod_{u=1}^{U} P(y_u | \mathcal{E}(X), y_{

Inda $\mathcal{E}(X)$ shine fitarwa mai rarraba Transformer mai gudana. Tsarin yana rarraba kamar haka:

\[P(y_u | \cdot) = \text{softmax}(\mathbf{W} \cdot (\text{Enc}(X_t) + \text{PredNet}(y_{

Manufar kulawa mara ƙarfi tana rage ƙarancin yuwuwar log ta amfani da rubutun manufa da MT ya samar $\hat{Y}_{\text{MT}}$ a matsayin lakabi:

\[\mathcal{L}_{\text{WS}} = -\sum_{(X, \hat{Y}_{\text{MT}}) \in \mathcal{D}} \log P(\hat{Y}_{\text{MT}} | X; \theta)\]

Wani muhimmin bayani na fasaha shine sarrafa alamar yaren da ake nufi. Ana sanya alamar musamman ta yare a gaban jerin manufa, yana umurci tsarin wane yare zai samar. Wannan yayi kama da tsarin ƙarfafawa a cikin tsare-tsaren rubutu na yare da yawa.

4. Sakamakon Gwaji & Aiki

Takardar ta ba da rahoton sakamako akan yaruka 25 tare da bayanan horarwa na sa'o'i 351K.

  • Aikin ASR: SM2 ya cimma ƙimar Kuskuren Kalma (WER) mai gasa idan aka kwatanta da tsare-tsaren ASR na yare guda ɗaya, yana nuna ingancinsa a matsayin mai gane haɗin kai.
  • Aikin ST: A kan tarin bayanai masu ma'auni kamar CoVoST-2, makin BLEU na SM2 suna daidai ko mafi girma fiye da manyan tsare-tsare marasa gudana na baya-bayan nan (ciki har da Whisper a wasu kwatance), wannan abin mamaki ne idan aka yi la'akari da ƙuntatawar gudana da kulawa mara ƙarfi.
  • ST Sifili-Sifili: Don nau'ikan yarukan da ba a cikin horarwa ba (misali, Tamil→Turanci), SM2 yana samar da fassarori masu ma'ana tare da makin BLEU sama da ma'auni sosai, yana tabbatar da da'awar "haƙiƙanin sifili-sifili". An samo ci gaban aikin ne saboda ikon tsarin yin amfani da koyon haɗawa daga yarukan da aka gani.
  • Jinkirin Gudana: Duk da yake ba a yi cikakken bayani game da lambobi ba, amfani da Transformer Transducer yana nuna ƙarancin jinkiri da hasashe, wanda ya dace da yin taken kai tsaye ko aikace-aikacen fassara na ainihi.

Ma'anar Ginshiƙi: Zanen ginshiƙi na hasashe zai nuna makin BLEU na SM2 don ST yana bin gaba ko yin daidai da na Whisper a cikin yaruka da yawa, yayin da wani jadawali na layi daban zai nuna jinkirinsa (ms) yana kasancewa a kwance kuma ƙasa idan aka kwatanta da na Whisper "kashe wuta" (jinkiri mara iyaka).

5. Tsarin Bincike: Fahimta Ta Asali & Gudanarwar Ma'ana

Fahimta Ta Asali: Haƙiƙanin nasara a nan ba wai kawai wani tsarin yare da yawa ba ne; shi ne zane na aikin injiniya mai amfani don gina AI na magana da za a iya turawa, mai ma'auni. SM2 yana musanya neman mafi girman daidaito (ta hanyar manyan tsare-tsare da tsaftataccen bayanai) don mafi kyawun daidaito na daidaito, jinkiri, farashi, da ingantaccen bayanai. Da'awar "haƙiƙanin sifili-sifili" ba ta da alaƙa da haɓakawa na sihiri kuma ta fi game da tsarin horarwa mai wayo wanda ke tilasta tsarin koyon wakilcin magana da harshe mai sauƙi, mai sake amfani da su.

Gudanarwar Ma'ana: Ma'anar binciken ta masana'antu ce sosai: 1) Gano ƙuntatawa (gudana ba za a iya sasantawa ba ga samfuran). 2) Zaɓi kayan aiki daidai (Transformer Transducer akan AED don ƙayyadaddun jinkiri). 3) Warware matsalar bayanai (kulawa mara ƙarfi ta hanyar MT yana haɗa gibin bayanan ST). 4) Ƙira don faɗaɗawa (ƙarfafa alamun yare yana ba da damar ƙara sabbin yarukan manufa cikin arha). 5) Tabbatar da siyarwa na musamman (nuna sifili-sifili a matsayin sakamakon tsarin/horarwa). Wannan babban darasi ne a cikin binciken da aka yi amfani da shi, wanda buƙatun samfurin suka sanar da shi kai tsaye, ba kamar yawancin binciken AI na bincike na yau ba.

6. Ƙarfafawa, Kurakurai & Hanyoyin Aiki

Ƙarfafawa:

  • Tsarin Da Ya Dace Da Samfura: Ikon gudana da ƙaramin girma ("Green AI") sun sa ya dace kai tsaye don fassara kai tsaye, mataimaka, da wayar tarho.
  • Dabarar Bayanai Mai Kyau: Kulawa mara ƙarfi canza wasa ne ga yaruka masu ƙarancin albarkatu, yana amfani da yawan bayanan ASR da MT mai girma.
  • Fa'idodin Tattalin Arziki A Bayyane: Yana rage dogaro ga tsadar bayanan magana masu layi daya da mutum ya yi lakabi da su.
  • Ƙira Mai Ma'auni: Tsarin ƙarfafawa yana ba da damar ƙara sabbin yarukan manufa tare da ƙaramin sake horarwa, siffa mai mahimmanci ga dandamali na duniya.

Kurakurai & Tambayoyi Masu Muhimmanci:

  • "Sifili-Sifili" ko "Ƙananan-Sifili"? An horar da tsarin akan yaruka 25. Shin aikin sifili-sifili na yare na 26 saboda haɓakawa na gaske ne ko kuma kamanceceniya da saitin horarwa? Takardar ba ta da binciken cirewa akan yarukan da ba a gani ba na gaske, masu nisa.
  • Matsalar MT: Ingancin ST a zahiri yana iyakance ta ingancin sabis na MT na kashe wuta da aka yi amfani da shi don samar da lakabi. Kurakurai a cikin MT suna yaduwa kuma SM2 yana koyon su.
  • Zurfin Kimantawa: Kwatance da Whisper yana buƙatar ƙarin mahallin. Whisper tsarin guda ɗaya ne don ayyuka da yawa (ASR, ST, LID). Kwatancen da ya dace zai buƙaci kimanta ikon SM2 na ayyuka da yawa ko kwatanta tsarin T-T mai girman Whisper.
  • Sarrafa Canjin Lamba: Duk da yake yana da'awar ba buƙatar LID ba, aikin akan canjin lamba mai yawa, cikin jumla (misali, Hindi-Turanci) ba a ƙididdige shi sosai ba.

Hanyoyin Aiki:

  • Ga Ƙungiyoyin Samfura: Wannan tsarin tunani ne don kowane aikace-aikacen magana na ainihi, na yare da yawa. Ka ba da fifiko ga kashin T-T da tsarin kulawa mara ƙarfi.
  • Ga Masu Bincike: Bincika iyakokin kulawa mara ƙarfi. Za a iya ƙirƙirar zagaye na "inganta kai" inda fitarwa na SM2 ya inganta tsarin MT? Bincika tushen ka'idar ikon sifili-sifili—menene ake raba?
  • Ga Masu Zuba Jari: Goyon bayan kamfanoni da ke amfani da wannan hanya mai amfani akan waɗanda ke bin girman girman tsafta. Ribobin inganci a nan suna fassara kai tsaye zuwa ƙananan farashin lissafi da saurin maimaitawa.

7. Ayyukan Gaba & Hanyoyin Bincike

Aikace-aikace:

  • Sadarwar Tsakanin Harsuna Na Ainihi: Haɗin kai cikin sauƙi cikin taron bidiyo (misali, Teams, Zoom), yin taken kai tsaye na abubuwan da suka faru, da dandamalin kafofin watsa labarun don samar da taken kai tsaye.
  • Hankali Na'urar Geffen: Ƙaramin sawun tsarin ya sa ya dace don fassara akan na'ura a cikin wayoyin hannu, na'urorin IoT, da tsarin motoci, yana tabbatar da sirri da aikin kashe wuta.
  • Daidaituwar Abun Ciki A Ma'auni: Sarrafa dubbing da subtitling na abun cikin bidiyo (YouTube, Netflix) ga masu sauraron duniya, yana rage farashi da lokaci sosai.
  • Fasahar Taimako: Ingantattun kayan taimakon ji ko aikace-aikace waɗanda ke ba da rubutun kai tsaye da fassara ga kurame da masu wahalar ji a cikin yanayi na yare da yawa.

Hanyoyin Bincike:

  • Ƙarfi Ga Alamun Ƙazanta: Haɗa dabarun daga koyon lakabi mai ƙazanta (misali, koyarwa tare, koyo na meta) don rage kurakurai daga tsarin MT na sama.
  • Tsarin Tushen Magana Na Haɗin Kai: Tsawaita tsarin SM2 zuwa tsarin aiki na gaskiya mai haɗa kai wanda ya ƙunshi haɗakar magana (TTS), canjin murya, da rarraba mai magana, duka a cikin yanayin gudana.
  • Bayyanawa Na Sifili-Sifili: Yin amfani da dabarun gani (kamar taswirar hankali ko tarin siffofi) don fahimtar yadda tsarin ke haɗa nau'ikan yarukan da ba a gani ba, yana ba da gudummawa ga fagen haɓakawa na haɗawa a cikin AI.
  • Tsakanin Hanyoyi Sifili-Sifili: Shin za a iya faɗaɗa wannan tsari zuwa ayyukan sifili-sifili na tsakanin hanyoyi na gaskiya, kamar samar da taken hoto a cikin sabon yare daga magana, wanda aka yi wahayi daga daidaitawar tsakanin hanyoyi da aka gani a cikin tsare-tsare daga CLIP na OpenAI?

8. Nassoshi

  1. Graves, A. (2012). Sequence Transduction with Recurrent Neural Networks. arXiv preprint arXiv:1211.3711.
  2. Vaswani, A., et al. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, 30.
  3. Radford, A., et al. (2022). Robust Speech Recognition via Large-Scale Weak Supervision. arXiv preprint arXiv:2212.04356. (Whisper)
  4. Zhang, Y., et al. (2020). Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss. ICASSP 2020.
  5. Zhu, J.-Y., et al. (2017). Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. ICCV 2017. (CycleGAN)
  6. Wang, C., et al. (2020). Monotonic Multihead Attention. ICLR 2020.
  7. Microsoft Research. (n.d.). Neural Speech Recognition. An samo daga gidan yanar gizon Microsoft Research.
  8. Schwartz, R., et al. (2019). Green AI. arXiv preprint arXiv:1907.10597.
  9. CoVoST 2: A Large-Scale Multilingual Speech Translation Corpus. (2021). Proceedings of Interspeech 2021.