Zaɓi Harshe

Ƙwarewar Yanki: Hanyar Daidaitawa Bayan Horarwa don Fassarar Injin Jijiya

Bincike kan sabuwar hanyar daidaita yanki bayan horarwa don Fassarar Injin Jijiya (NMT), tare da bincika ƙwarewa ta hankali, sakamakon gwaji, da aikace-aikace na gaba.
translation-service.org | PDF Size: 0.1 MB
Kima: 4.5/5
Kimarku
Kun riga kun ƙididdige wannan takarda
Murfin Takardar PDF - Ƙwarewar Yanki: Hanyar Daidaitawa Bayan Horarwa don Fassarar Injin Jijiya

1. Gabatarwa

Daidaitawar yanki wani muhimmin sashi ne a cikin Fassarar Injin (MT), wanda ya ƙunshi daidaita kalmomi, yanki, da salon salo, musamman a cikin ayyukan Fassarar Taimakon Kwamfuta (CAT) waɗanda suka haɗa da gyaran bayan mutum. Wannan takarda ta gabatar da wata sabuwar ra'ayi da ake kira "ƙwarewar yanki" don Fassarar Injin Jijiya (NMT). Wannan hanyar tana wakiltar wani nau'i na daidaitawa bayan horarwa, inda ake tace samfurin NMT na gaba ɗaya, wanda aka riga aka horar, ta amfani da sabbin bayanan da ke cikin yanki. Hanyar tana yi wa'adin fa'idodi a cikin saurin koyo da daidaiton daidaitawa idan aka kwatanta da horarwa cikakke na gargajiya daga farko.

Babbar gudummawar ita ce nazarin wannan hanyar ƙwarewa, wacce ke daidaita samfurin NMT na gaba ɗaya ba tare da buƙatar cikakken tsarin sake horarwa ba. A maimakon haka, ta ƙunshi wani lokaci na sake horarwa wanda ya mayar da hankali kawai akan sabbin bayanan da ke cikin yanki, ta amfani da sigogin da aka riga aka koya na samfurin.

2. Hanya

Hanyar da aka gabatar tana bin tsarin daidaitawa ta hankali. Samfurin NMT na gaba ɗaya, wanda aka fara horar da shi akan babban tarin bayanai na yanki na gaba ɗaya, daga baya ana "ƙware shi" ta ci gaba da horar da shi (gudanar da ƙarin lokutan) akan ƙaramin, bayanan da aka yi niyya a cikin yanki. Ana nuna wannan tsari a cikin Hoto na 1 (wanda aka bayyana daga baya).

Babban manufar lissafi a wannan lokacin sake horarwa shine sake kimanta yuwuwar sharadi $p(y_1,...,y_m | x_1,...,x_n)$, inda $(x_1,...,x_n)$ shine jerin harshen tushe kuma $(y_1,...,y_m)$ shine jerin harshen da aka yi niyya. Muhimmanci, ana yin hakan ba tare da sake saita ko watsar da yanayin da aka riga aka koya na Cibiyar Jijiya ta Maimaitawa (RNN) ba, yana ba da damar samfurin ya gina akan iliminsa na yanzu.

3. Tsarin Gwaji

Binciken yana kimanta hanyar ƙwarewa ta amfani da ma'aunin kimanta MT na yau da kullun: BLEU (Papineni et al., 2002) da TER (Snover et al., 2006). Tsarin gine-ginen NMT ya haɗa tsarin jerin-zuwa-jeri (Sutskever et al., 2014) tare da tsarin kulawa (Luong et al., 2015).

Gwaje-gwaje suna kwatanta saituttuka daban-daban, galibi suna bambanta abun da ke cikin tarin bayanan horo. Muhimman kwatancen sun haɗa da horarwa daga farko akan bayanan gauraye na gaba ɗaya/na cikin yanki da tsarin mataki biyu da aka gabatar: fara horar da samfurin gaba ɗaya, sannan a ƙware shi da bayanan cikin yanki. Wannan saitin yana nufin yin kwaikwayon ainihin yanayin CAT inda fassarorin da aka gyara bayan mutum suka samu a hankali.

3.1 Bayanan Horo

Takardar ta ambaci ƙirƙirar tsarin bayanai na al'ada don gwaje-gwaje. An gina samfurin gaba ɗaya ta amfani da cakuda daidaitaccen tarin bayanai da yawa daga yankuna daban-daban. Daga baya, ana amfani da takamaiman bayanan cikin yanki don lokacin ƙwarewa. Cikakkun abubuwan da ke ciki da girman waɗannan tarin bayanan an yi cikakken bayani a cikin tebur da aka ambata (Tebu na 1 a cikin PDF).

4. Fahimtar Asali & Ra'ayin Mai Bincike

Fahimtar Asali

Wannan takarda ba game da daidaitawa kawai ba ce; hack ne mai amfani don NMT mai matakin samarwa. Marubutan sun gano daidai cewa tsarin "samfurin-guda-daya-yana-dacewa-da-kowa" ba shi da amfani a kasuwanci. Hanyarsu ta "ƙwarewa" a zahiri ci gaba ne na koyo don NMT, suna ɗaukar samfurin gaba ɗaya a matsayin tushe mai rai wanda ke haɓaka tare da sabbin bayanai, kamar yadda mai fassara ke tara ƙwarewa. Wannan yana ƙalubalantar tunanin sake horarwa da ke yaɗuwa kai tsaye, yana ba da hanyar zuwa tsarin MT masu ƙarfi, masu amsawa.

Kwararar Ma'ana

Ma'anar tana da sauƙi mai ban sha'awa: 1) Amincewa da tsadar cikakken sake horar NMT. 2) Lura cewa bayanan cikin yanki (misali, gyare-gyaren bayan mutum) suna zuwa a hankali a cikin kayan aikin CAT na ainihi. 3) Ba da shawarar sake amfani da sigogin samfurin da ke akwai a matsayin farkon farawa don ƙarin horo akan sabbin bayanai. 4) Tabbatar da cewa wannan yana haifar da riba mai kwatankwacin horon bayanan gauraye amma da sauri. Kwararar tana kama da mafi kyawun ayyuka a cikin koyon canja wuri da ake gani a hangen nesa na kwamfuta (misali, fara samfuran ImageNet don takamaiman ayyuka) amma yana amfani da shi ga yanayin jerin, sharadi na fassarar.

Ƙarfi & Kurakurai

Ƙarfi: Fa'idar sauri ita ce siffarta mai kisa don turawa. Yana ba da damar sabunta samfurin kusan ainihin lokaci, mahimmanci ga yankuna masu ƙarfi kamar labarai ko tallafin abokin ciniki kai tsaye. Hanyar tana da sauƙi mai kyau, ba ta buƙatar canje-canjen gine-gine. Ya dace daidai da aikin CAT na mutum-a-cikin-madauki, yana haifar da zagayowar haɗin kai tsakanin mai fassara da na'ura.

Kurakurai: Giwa a cikin ɗaki shine mantuwa mai ban tsoro. Takardar ta nuna alamar kada a watsar da yanayin da suka gabata, amma haɗarin samfurin "rashin koyo" da iyawarsa na gaba ɗaya yayin ƙwarewa yana da yawa, batun da aka rubuta da kyau a cikin binciken ci gaba na koyo. Da alama kimantawa ta iyakance ga BLEU/TER akan yankin da aka yi niyya; ina gwajin akan ainihin yankin gaba ɗaya don duba lalacewar aiki? Bugu da ƙari, hanyar tana ɗauka samun ingantaccen bayanan cikin yanki, wanda zai iya zama toshewa.

Fahimta Mai Aiki

Ga manajojin samfurin MT: Wannan tsari ne don gina injunan MT masu daidaitawa. Ba da fifikon aiwatar da wannan bututun a cikin kayan aikin CAT ɗinku. Ga masu bincike: Mataki na gaba shine haɗa dabarun daidaitawa daga ci gaba na koyo (misali, Ƙarfafawar Ma'auni na Elastic) don rage mantuwa. Bincika wannan don samfuran harsuna da yawa—za mu iya ƙware samfurin Turanci-Sinanci don yankin likita ba tare da cutar da iyawarsa na Faransanci-Jamus ba? Nan gaba yana cikin samfuran NMT masu sassa, masu haɗawa, kuma wannan aikin mataki ne na tushe.

5. Cikakkun Bayanai na Fasaha

Tsarin ƙwarewa ya dogara ne akan manufar NMT ta yau da kullun na haɓaka yuwuwar log-likelihood na sharadi na jerin da aka yi niyya idan aka ba da jerin tushe. Don tarin bayanai $D$, aikin asara $L(\theta)$ don sigogin samfurin $\theta$ yawanci:

$L(\theta) = -\sum_{(x,y) \in D} \log p(y | x; \theta)$

A cikin horon mataki biyu da aka gabatar:

  1. Horo na Gabaɗaya: Rage $L_{generic}(\theta)$ akan babban tarin bayanai iri-iri $D_G$ don samun sigogi na farko $\theta_G$.
  2. Ƙwarewa: Fara da $\theta_G$ kuma a rage $L_{specialize}(\theta)$ akan ƙaramin tarin bayanai na cikin yanki $D_S$, yana haifar da sigogi na ƙarshe $\theta_S$. Mahimmanci shine ingantawa a mataki na 2 ya fara daga $\theta_G$, ba daga farawa bazuwar ba.

Samfurin da ke ƙasa yana amfani da mai ɓoyewa-mai ɓoyewa na tushen RNN tare da kulawa. Tsarin kulawa yana lissafta vector mahallin $c_i$ don kowace kalmar da aka yi niyya $y_i$ a matsayin jimlar nauyi na ɓoyayyun jihohin mai ɓoyewa $h_j$: $c_i = \sum_{j=1}^{n} \alpha_{ij} h_j$, inda ma'auni $\alpha_{ij}$ aka lissafa ta samfurin daidaitawa.

6. Sakamakon Gwaji & Bayanin Ginshiƙi

Takardar ta gabatar da sakamako daga manyan gwaje-gwaje guda biyu da ke kimanta hanyar ƙwarewa.

Gwaji na 1: Tasirin Lokutan Ƙwarewa. Wannan gwajin yana nazarin yadda ingancin fassarar (wanda aka auna ta BLEU) akan gwajin gwajin cikin yanki ke inganta yayin da adadin ƙarin lokutan horo akan bayanan cikin yanki ya ƙaru. Sakamakon da ake tsammani shine saurin riba na farko a cikin makin BLEU wanda a ƙarshe ya tsaya, yana nuna cewa ana iya samun babban daidaitawa tare da ƙarin lokuta kaɗan, yana nuna ingancin hanyar.

Gwaji na 2: Tasirin Girman Bayanan Cikin Yanki. Wannan gwajin yana bincika nawa ake buƙatar bayanan cikin yanki don ingantaccen ƙwarewa. An zana makin BLEU da girman tarin bayanan cikin yanki da aka yi amfani da su don sake horarwa. Da alama lanƙwasa yana nuna raguwar dawowa, yana nuna cewa ko da ƙaramin adadin ingantaccen bayanan cikin yanki na iya haifar da ingantacciyar ci gaba, yana sa hanyar ta zama mai yuwuwa ga yankuna masu iyakancewar bayanai masu kama da juna.

Bayanin Ginshiƙi (Hoto na 1 a cikin PDF): Zanen ra'ayi yana kwatanta bututun horo na mataki biyu. Ya ƙunshi manyan akwatuna guda biyu: 1. Tsarin Horo: Shigarwa shine "Bayanan Gabaɗaya," fitarwa shine "Samfurin Gabaɗaya." 2. Tsarin Sake Horo: Shigarwa shine "Samfurin Gabaɗaya" da "Bayanan Cikin Yanki," fitarwa shine "Samfurin Cikin Yanki" (Samfurin Ƙware). Kibiyoyi suna nuna kwarara daga bayanan gabaɗaya zuwa samfurin gabaɗaya, sannan daga duka samfurin gabaɗaya da bayanan cikin yanki zuwa samfurin ƙware na ƙarshe.

7. Misalin Tsarin Bincike

Yanayi: Kamfani yana amfani da samfurin NMT na Turanci-zuwa-Faransanci na gabaɗaya don fassara sadarwar cikin gida iri-iri. Sun sami sabon abokin ciniki a fannin shari'a kuma suna buƙatar daidaita fitarwar MT ɗin su don takaddun shari'a (kwangiloli, taƙaitaccen bayani).

Aiwatar da Tsarin Ƙwarewa:

  1. Tushe: Samfurin gabaɗaya yana fassara jumlar shari'a. Fitarwa na iya rasa takamaiman kalmomin shari'a da salon hukuma.
  2. Tarin Bayanai: Kamfanin ya tattara ƙaramin tarin bayanai (misali, nau'i-nau'i na jumla 10,000) na ingantattun takaddun shari'a, waɗanda aka fassara ta ƙwararru.
  3. Lokacin Ƙwarewa: An loda samfurin gabaɗaya da ke akwai. An ci gaba da horo ta amfani da sabon tarin shari'a kawai. Horo yana gudana na iyakancewar adadin lokuta (misali, 5-10) tare da ƙaramin ƙimar koyo don guje wa sake rubuta ilimin gabaɗaya.
  4. Kimantawa: An gwada samfurin ƙware akan wani saiti na rubutun shari'a da aka ajiye. Makin BLEU/TER ya kamata su nuna ci gaba akan samfurin gabaɗaya. Muhimmanci, an sami samfurin aikinsa akan sadarwa gabaɗaya kuma don tabbatar da cewa babu wani mummunan lalacewa.
  5. Turawa: An tura samfurin ƙware a matsayin wani ƙarshen wutsiya daban don buƙatun fassarar abokin ciniki na shari'a a cikin kayan aikin CAT.

Wannan misalin yana nuna hanya mai amfani, mai amfani da albarkatu zuwa MT na takamaiman yanki ba tare da kiyaye samfura masu zaman kansu cikakke da yawa ba.

8. Hangen Nesa na Aikace-aikace & Hanyoyin Gaba

Aikace-aikace Nan da Nan:

Hanyoyin Bincike na Gaba:

9. Nassoshi