Zaɓi Harshe

Kima ta Neuronal da Gyara ta Atomatik don Taimakon Kwamfuta na Fassara

Tsarin zurfin koyo mai haɗa kai da kai wanda ya haɗa kima da inganci da gyara ta atomatik don inganta sakamakon fassarar inji da rage nauyin aikin mai fassara na ɗan adam.
translation-service.org | PDF Size: 0.4 MB
Kima: 4.5/5
Kimarku
Kun riga kun ƙididdige wannan takarda
Murfin Takardar PDF - Kima ta Neuronal da Gyara ta Atomatik don Taimakon Kwamfuta na Fassara

Table of Contents

1. Gabatarwa

Zuwan Fassarar Injin Neuronal (NMT) ya canza tsarin zuwa amfani da fassarorin da inji ya samar. Duk da haka, tazarar inganci tsakanin sakamakon NMT da ma'auni na ɗan adam na buƙatar gyara da hannu bayan fassara, wani tsari mai ɗaukar lokaci. Wannan takarda tana ba da shawarar tsarin zurfin koyo mai haɗa kai da kai wanda ya haɗa Kima da Inganci (QE) da Gyara ta Atomatik bayan Fassara (APE). Manufar ita ce samar da shawarwari na gyara kurakurai da rage nauyi akan masu fassara na ɗan adam ta hanyar samfurin matakai mai bayyana ma'ana wanda ke kwaikwayon halayen gyara bayan fassara na ɗan adam.

2. Ayyukan da suka shafi

Wannan aikin ya ginu akan wasu zaren bincike masu haɗaka: Fassarar Injin Neuronal (NMT), Kima da Inganci (hasashen ingancin fassara ba tare da nassoshi ba), da Gyara ta Atomatik bayan Fassara (gyara sakamakon MT ta atomatik). Ya tsaya kansa a cikin tsarin Taimakon Kwamfuta na Fassara (CAT), da nufin ƙetare tsarin MT ko QE masu zaman kansu zuwa ga bututun haɗakar da aka ƙaddara.

3. Hanyoyin Bincike

Babban ƙirƙira shi ne samfurin matakai tare da sassa uku na wakilci, waɗanda aka haɗa su sosai cikin hanyoyin sadarwa na Transformer neuronal.

3.1 Tsarin Samfura na Matakai

Samfurin da farko yana tantance 'yan takarar MT ta hanyar sashen QE mai laushi. Dangane da maki na gaba ɗaya na ingancin da aka hasashe, yana tura jimla zuwa ɗaya daga cikin hanyoyin gyara guda biyu bisa sharadi.

3.2 Sashen Kima da Inganci

Wannan sashe yana hasashen kurakurai cikakkun bayanai a matakin alama (misali, kuskuren fassara, tsallakewa) waɗanda aka tattara su zuwa maki na gaba ɗaya a matakin jimla. Yana amfani da mai ɓoyayyen Transformer don nazarin jimlar tushe da sakamakon MT.

3.3 Gyara ta hanyar Ƙirƙira

Don jimlolin da sashen QE ya ɗauka a matsayin ƙananan inganci, ana amfani da samfurin ƙirƙira mai jeri-zuwa-jeri (wanda ya dogara akan Transformer) don sake fasalin da sake rubuta fassarar gaba ɗaya. Wannan yana kama da cikakken sake fassara da aka mai da hankali kan sashin da ke da matsala.

3.4 Gyara ta hanyar Ayyuka na Atomic

Don jimlolin mafi inganci tare da ƙananan kurakurai, ana amfani da sashe mai inganci. Yana hasashen jerin ayyukan gyara na atomic (misali, KIYAYE, SHARE, MAYAFA DA_X) a matakin alama, yana rage canje-canje ga ainihin sakamakon MT. Yuwuwar aikin $o_t$ a matsayi $t$ ana iya ƙirƙira shi kamar haka: $P(o_t | \mathbf{s}, \mathbf{mt}_{1:t}) = \text{Softmax}(\mathbf{W} \cdot \mathbf{h}_t + \mathbf{b})$ inda $\mathbf{h}_t$ shine yanayin ɓoyewa daga samfurin, $\mathbf{s}$ shine tushe, kuma $\mathbf{mt}$ shine fassarar inji.

4. Gwaje-gwaje & Sakamako

4.1 Bayanan Gwaji & Saiti

An gudanar da kimantawa akan bayanan Ingilishi–Jamus daga WMT 2017 APE aikin raba. An yi amfani da ma'auni na yau da kullun BLEU (mafi girma yana da kyau) da TER (Ƙimar Gyaran Fassara, ƙasa yana da kyau).

4.2 Sakamako na Ƙididdiga (BLEU/TER)

Samfurin matakai da aka gabatar ya cimma aikin da ya fi duka a kan aikin WMT 2017 APE, ya zarce manyan hanyoyin a cikin maki BLEU da TER. Wannan yana nuna ingancin dabarar tura bisa sharadi da kuma hanyar gyara biyu.

Ma'auni Mafi Muhimmanci na Aiki

Makin BLEU: An cimma sakamako mafi girma idan aka kwatanta da SOTA na baya.

Makin TER: An rage nisan gyara sosai, yana nuna gyare-gyare bayan fassara mafi aminci.

4.3 Kima ta Dan Adam

A cikin kimantawar ɗan adam da aka sarrafa, an nemi masu fassara da aka ba da izini su gyara sakamakon MT tare da kuma ba tare da taimakon tsarin APE da aka gabatar ba. Sakamakon ya nuna rage lokacin gyara bayan fassara sosai lokacin amfani da shawarwarin APE, yana tabbatar da amfanin aikin a cikin aikin CAT na zahiri.

5. Binciken Fasaha & Tsarin Aiki

5.1 Fahimta ta Asali & Kwararar Hankali

Fahimta ta Asali: Babban nasarar takardar ba wai kawai wani samfurin APE ba ne; shine dabarun rarrabuwar tsarin fahimtar mai gyara bayan fassara na ɗan adam zuwa cikin bishiyar yanke shawara da hanyoyin sadarwa na neuronal za su iya aiwatarwa. Maimakon samfurin "gyara shi" guda ɗaya, suna kwaikwayon matakin farko na ƙwararren mai fassara: tantance, sannan yi aiki daidai. Wannan yana kama da bututun "tantance sannan yi aiki" da ake gani a cikin ƙwararrun injinan mutum-mutumi da ƙarfafawa koyo, yana amfani da shi don gyaran harshe. Zaɓin tsakanin ƙirƙira da gyaran atomic kwatankwacin ne kai tsaye ga ɗan adam yana yanke shawara tsakanin sake rubuta sakin layi mara kyau ko kuma kawai gyara kuskuren rubutu.

Kwararar Hankali: Bututun yana da tsari mai kyau amma bisa sharadi. 1) Bincike (QE): Tsarin gano kurakurai mai laushi, a matakin alama yana aiki azaman kayan aikin bincike. Wannan ya fi ci gaba fiye da maki a matakin jimla, yana ba da "taswirar zafi" na matsaloli. 2) Rarraba: Binciken ya tattara zuwa yanke shawara na binary: shin wannan jimla ce "mara lafiya" (ƙananan inganci) ko "lafiya" tare da ƙananan cututtuka (mafi inganci)? 3) Jiyya: Lamuran da suka fi muhimmanci (ƙananan inganci) suna samun kulawa mai zurfi na cikakken samfurin ƙirƙira—cikakken sake fassarar tazarar da ke da matsala. Lamuran masu kwanciyar hankali (mafi inganci) suna samun tiyata mara cutarwa ta hanyar ayyuka na atomic. Wannan kwararar yana tabbatar da cewa an ware albarkatun lissafi yadda ya kamata, ƙa'idar da aka aro daga ka'idar inganta tsarin.

5.2 Ƙarfafawa & Kurakurai

Ƙarfafawa:

  1. Zane mai Maida Hankali ga Dan Adam: Tsarin sassa uku shine ƙarfinsa mafi girma. Ba ya ɗaukar APE a matsayin matsalar rubutu-zuwa-rubutu ba'amam ba amma ya raba shi zuwa ƙananan ayyuka masu bayyana ma'ana (QE, sake rubuta babba, gyara ƙanana), yana sa sakamakon tsarin ya fi aminci da bincike ga ƙwararrun masu fassara. Wannan ya yi daidai da turawar AI mai bayyana ma'ana a cikin aikace-aikace masu mahimmanci.
  2. Ingancin Albarkatu: Aiwatarwa bisa sharadi yana da wayo. Me ya sa a gudanar da samfurin ƙirƙira mai nauyin lissafi akan jimla da kawai ke buƙatar musanya kalma? Wannan tura mai motsi, mai tunawa da samfuran ƙwararru ko Transformer na Google Switch, yana ba da hanyar da za a iya aunawa don turawa.
  3. Tabatarwa ta Gwaji: Sakamako mai ƙarfi akan ma'auni na WMT tare da kimantawar ɗan adam na zahiri da ke nuna ceton lokaci shine ma'auni na zinariya. Yawancin takardu suna tsayawa a makin BLEU; tabbatar da inganci a cikin binciken mai amfani shine shaida mai gamsarwa na ƙimar aiki.

Kurakurai & Iyakoki:

  1. Rarraba Binary Mai Sauƙaƙe: Rarraba inganci mai girma/ƙasa shine maƙalar kulli mai mahimmanci. Gyara bayan fassara na ɗan adam yana kan wani sikelin. Jimla na iya zama daidai kashi 80% amma tana da kuskure mai mahimmanci guda ɗaya, wanda ke karya mahallin (maki "mafi girma" tare da aibi mai mutuwa). Ƙofar binary na iya karkatar da ita zuwa gyare-gyare na atomic, ta rasa buƙatar sake haifuwa na cikin gida amma mai zurfi. Sashen QE yana buƙatar makin amincewa ko alamun tsananin kuskure masu yawa.
  2. Hadaddiyar Horarwa & Raunin Bututu: Wannan bututu ne mai matakai da yawa (samfurin QE -> mai tura -> ɗaya daga cikin samfuran PE biyu). Kurakurai suna haɗuwa. Idan samfurin QE bai daidaita ba, aikin dukan tsarin yana raguwa. Horar da irin wannan tsarin har zuwa ƙarshe yana da wahala sosai, sau da yawa yana buƙatar fasahohi masu sarƙaƙƙiya kamar Gumbel-Softmax don bambance tura ko ƙarfafawa koyo, wanda takarda bazai magance cikakke ba.
  3. Yanki & Kullewar Nau'in Harshe: Kamar yawancin tsarin MT/APE na zurfin koyo, aikin sa ya dogara sosai akan inganci da yawan bayanan layi daya don takamaiman nau'in harshe da yanki (misali, WMT En-De). Takardar ba ta binciko nau'ikan harsuna masu ƙarancin albarkatu ko saurin daidaitawa ga sabbin yankuna ba (misali, doka zuwa likita), wanda babban cikas ne ga kayan aikin CAT na kamfani. Fasahohi kamar koyo-meta ko sassa na adafta, kamar yadda aka bincika a cikin binciken NLP na baya-bayan nan, na iya zama matakai masu mahimmanci na gaba.

5.3 Fahimta mai Aiki

Ga Masu Bincike:

  1. Bincika Tura mai Laushi: Bar yanke shawara mai wuya na binary. Bincika haɗin kai mai laushi, mai nauyi na masu gyara na ƙirƙira da atomic, inda sakamakon sashen QE ya auna gudunmawar kowane ɗayan. Wannan na iya zama mafi ƙarfi ga kurakuran QE.
  2. Haɗa Ilimin Waje: Samfurin na yanzu ya dogara kawai akan tushe da jimlar MT. Haɗa siffofi daga ma'ajiyar ƙwaƙwalwar ajiya (TM) ko tushen kalmomi—kayan aiki na yau da kullun a cikin kayan aikin CAT na ƙwararru—a matsayin ƙarin mahallin. Wannan yana haɗa tazarar tsakanin hanyoyin neuronal masu tsafta da injiniyan ƙaddamarwa na gargajiya.
  3. Ma'auni akan Rajistan CAT na Zahiri: Ƙetare ayyukan raba na WMT. Yi haɗin gwiwa tare da hukumar fassara don gwadawa akan ainihin ayyukan fassara masu rikitarwa, masu yankuna da yawa tare da rajistan hulɗar mai fassara. Wannan zai bayyana ainihin yanayin gazawa.

Ga Masu Haɓaka Samfura (Masu Sayar da Kayan Aikin CAT):

  1. Aiwatar azaman Ƙofar Inganci: Yi amfani da sashen QE a matsayin mai tacewa kafin a cikin tsarin sarrafa fassara. Yi alama ta atomatik don sassan da ba su da kwarin gwiwa don kulawar babban mai bita ko kuma cika su da shawarwarin ƙirƙira na APE, yana daidaita aikin bita.
  2. Mai da Hankali kan Editan Atomic don Haɗin Kai na UI: Sakamakon aikin atomic (KIYAYE/SHARE/MAYAFA) ya dace da musanya hulɗa. Zai iya ƙarfafa gyaran rubutu mai wayo, mai hasashe inda mai fassara yake amfani da gajerun hanyoyin madannai don karɓa/ƙi/gyara shawarwarin atomic, yana rage dannawa sosai.
  3. Ba da fifiko ga Daidaitawar Samfura: Saka hannun jari a cikin haɗa hanyoyin daidaitawa mai inganci ko daidaitawar yanki don tsarin APE. Abokan cinikin kamfani suna buƙatar samfuran da aka keɓance ga takamaiman ƙa'idodin su da jagororin salo a cikin kwanaki, ba watanni ba.

Misalin Tsarin Bincike

Yanayi: Fassarar takardar doka daga Ingilishi zuwa Jamusanci.
Tushe: "The party shall indemnify the other party for all losses."
Sakamakon MT na Asali: "Die Partei wird die andere Partei für alle Verluste entschädigen." (Daidai, amma yana amfani da "Partei" wanda zai iya zama mara tsari/ma'ana a cikin mahallin kwangila mai tsauri. Kalma mafi kyau na iya zama "Vertragspartei").
Aikin Tsarin Samfurin da aka Gabatar:

  1. Sashen QE: Yana nazarin sashin. Yawancin alamomi daidai ne, amma yana alamar "Partei" a matsayin rashin daidaituwar kalma mai yuwuwa (ba lallai ba ne kuskure, amma zaɓin kalma mara kyau). Jimla ta sami maki "mafi inganci".
  2. Tura: An aika zuwa sashen Gyara ta hanyar Ayyuka na Atomic.
  3. Editan Atomic: Idan aka ba da tushe da mahallin, zai iya ba da shawarar jerin aikin: [KIYAYE, KIYAYE, MAYAF DA_'Vertragspartei', KIYAYE, KIYAYE, KIYAYE, KIYAYE].
  4. Sakamako: "Die Vertragspartei wird die andere Vertragspartei für alle Verluste entschädigen." Wannan ƙayyadaddun gyara ne, mafi ƙanƙanta wanda ya yi daidai da ma'auni na kalmomin doka.
Wannan misalin yana nuna yadda samfurin ya wuce gyaran kuskure mai sauƙi zuwa ga inganta salo da kalmomi, wata buƙata mai mahimmanci a cikin fassarar ƙwararru.

6. Aikace-aikace na Gaba & Jagorori

Tasirin wannan tsarin haɗakar QE-APE ya wuce fassarar gargajiya:

  1. Tsarin MT masu Daidaitawa: Alamar QE za a iya mayar da ita cikin ainihin lokaci zuwa tsarin NMT don daidaitawa kan layi ko ƙarfafawa koyo, ƙirƙirar madauki na fassara mai inganta kansa.
  2. Daidaituwar Abun ciki & Ƙaddamarwa: Za a iya daidaita sashen aikin atomic don ƙaddamarwa ta atomatik ko daidaita abun ciki da mai amfani ya samar ta hanyar amfani da maye gurbin da suka dace da al'adu ko gyare-gyare dangane da dokokin siyasa.
  3. Ilimi da Horarwa: Tsarin zai iya zama malami mai hankali ga ɗaliban fassara, yana ba da cikakken binciken kuskure (daga sashen QE) da shawarwarin gyara.
  4. Fassara mai Nau'i-nau'i: Haɗa irin wannan ƙa'idodin kima da inganci da gyara bayan fassara don fassarar tushen hoto (fassarar OCR) ko tsarin fassarar magana-zuwa-magana, inda kurakurai ke da nau'ikan nau'ikan daban-daban.
  5. Ƙarancin Albarkatu & Saitunan da ba a kula da su ba: Aikin nan gaba dole ne ya magance amfani da waɗannan ƙa'idodin inda manyan tarin layi daya ba su samuwa, mai yuwuwa ta amfani da dabarun da ba a kula da su ba ko rabin kulawa waɗanda aka yi wahayi daga ayyuka kamar CycleGAN don fassarar hoto mara biyu, amma an yi amfani da su ga rubutu.

7. Nassoshi

  1. Wang, J., Wang, K., Ge, N., Shi, Y., Zhao, Y., & Fan, K. (2020). Computer Assisted Translation with Neural Quality Estimation and Automatic Post-Editing. arXiv preprint arXiv:2009.09126.
  2. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
  3. Specia, L., Shah, K., de Souza, J. G., & Cohn, T. (2013). QuEst - A translation quality estimation framework. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations.
  4. Junczys-Dowmunt, M., & Grundkiewicz, R. (2016). Log-linear combinations of monolingual and bilingual neural machine translation models for automatic post-editing. In Proceedings of the First Conference on Machine Translation.
  5. Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1125-1134). (An ambata don kwatankwacin ra'ayi zuwa canji na musamman, na takamaiman aiki).
  6. Läubli, S., Fishel, M., Massey, G., Ehrensberger-Dow, M., & Volk, M. (2013). Assessing post-editing efficiency in a realistic translation environment. Proceedings of MT Summit XIV.