Zaɓi Harshe

Inganta Zaɓin Misalai don Fassarar Injin da aka Ƙarfafa da Ma'ajiyar Fassarori tare da Tunatarwar Fassara

Bincika hanyoyin lissafi na ayyuka masu ƙarancin ƙarfi don zaɓin mafi kyawun misali a cikin fassarar injin jijiyoyi da aka ƙarfafa da ma'ajiya, tare da mai da hankali kan inganta ɗaukar hoto.
translation-service.org | PDF Size: 0.4 MB
Kima: 4.5/5
Kimarku
Kun riga kun ƙididdige wannan takarda
Murfin Takardar PDF - Inganta Zaɓin Misalai don Fassarar Injin da aka Ƙarfafa da Ma'ajiyar Fassarori tare da Tunatarwar Fassara

1. Gabatarwa

Fassarar Injin da aka Ƙarfafa da Ma'ajiya (MT) tana haɓaka samfuran jijiyoyi ta hanyar daidaita hasashe akan misalai masu kama da aka samo daga ma'ajiyar fassara (TM). Wannan aikin yana mai da hankali kan inganta matakin samo bayanai na sama don ƙayyadadden samfurin gyarawa na ƙasa, na'urar canza siffa ta multi-Levenshtein. Babban ƙalubalen shine zaɓar mafi kyawun saƙon misalai k wanda ke haɓaka ɗaukar hoto na jumlar tushe, matsala da aka kusanci ta hanyar ingantaccen aikin ƙarancin ƙarfi.

2. Ayyukan da suka danganci

Haɗa misalai a cikin MT ya samo asali daga kayan aikin fassarar kwamfuta don ƙwararru zuwa hanyoyin jijiyoyi na zamani. Manyan hanyoyin sun haɗa da: fassara mai sharadi tare da hankalin misali (Gu et al., 2018), gyara mai sauƙi don daidaita yanki (Farajian et al., 2017), haɗa misalai cikin mahallin Babban Harshe Mai Girma (LLM) mai yawan harsuna (Moslem et al., 2023), da kuma gyara kai tsaye na mafi kyawun misalin da ya dace (Gu et al., 2019). Wannan takarda ta sanya kanta a cikin tsarin samfuran gyarawa waɗanda suka haɗa misalai da yawa.

3. Hanyoyi & Tsarin Fasaha

3.1 Na'urar Canza Siffa ta Multi-Levenshtein

Samfurin ƙasa shine na'urar canza siffa ta multi-Levenshtein (Bouthors et al., 2023), samfurin gyarawa wanda ke ƙididdige fassara ta hanyar haɗa misalai k (≥1) da aka samo. Ayyukanta yana da matukar hankali ga inganci da tsarin saƙon misalai da aka samo.

3.2 Tsara Matsala: Zaɓin Mafi Kyawun Saƙon Misalai

Idan aka ba da jumlar tushe S da ƙayyadadden lamba k, manufar ita ce a nemo saƙon R na misalai k daga TM wanda ke haɓaka aikin amfani F(R) mai alaƙa da ɗaukar hoto na S. Bincike mai cikakken ƙarfi ba zai yiwu ba, yana buƙatar ingantattun dabaru.

3.3 Ayyuka masu ƙarancin ƙarfi don Inganta ɗaukar hoto

Takardar tana amfani da ka'idar ƙarancin ƙarfi. Aikin saƙo F: 2^V → ℝ yana da ƙarancin ƙarfi idan ya nuna siffar raguwar dawowa:

$F(A \cup \{e\}) - F(A) \geq F(B \cup \{e\}) - F(B)$ ga duk A ⊆ B ⊆ V da e ∈ V \ B.

Ayyukan ɗaukar hoto wani ɓangare ne na halitta na ayyukan ƙarancin ƙarfi. Marubutan sun bincika nau'ikan F(R) daban-daban don ƙirar ɗaukar hoto, kamar haɗuwa bisa alama ko bisa n-gram tsakanin jumlar tushe da misalan da aka samo.

4. Sakamakon Gwaji & Bincike

4.1 Tsarin Gwaji & Bayanan Gwaji

An gudanar da gwaje-gwaje akan aikin fassarar inji mai yawan yankuna. Ma'ajiyar fassara ta ƙunshi jimloli masu kama daga yankuna masu alaƙa. Ma'auni na asali sun haɗa da bincike mai sauƙi na kamanni (misali, bisa BM25 ko abubuwan haɗin jumla).

4.2 Ma'aunin Aiki & Sakamako

Bincike na farko yana amfani da ma'auni na yau da kullun na MT kamar BLEU da TER. Hanyoyin samo bayanai na ingantaccen ƙarancin ƙarfi da aka gabatar sun fi dacewa da dabarun samo bayanai na asali. Misali, wani bambance-bambance ya sami ribar maki +1.5 BLEU akan ma'aunin samo bayanai na BM25 a wani yanki na fasaha.

4.3 Binciken ɗaukar hoto da Ingancin Fassara

An lura da alaƙa mai ƙarfi tsakanin ingantaccen maki na ɗaukar hoto F(R) da ingancin fassara na ƙarshe. Wannan ya tabbatar da ainihin hasashe cewa mafi kyawun ɗaukar hoto na tushe yana haifar da mafi kyawun ɗaukar hoto na fassara, duk da sanannun ƙalubalen harshe kamar bambancin ƙamus da rarrabuwar tsarin jumla.

Hoton Hoton Aiki Mai Muhimmanci

Ma'auni (BM25): Maki BLEU = 42.1

Hanyar da aka Gabatar (Ingantaccen Ƙarancin Ƙarfi): Maki BLEU = 43.6

Ci gaba: +1.5 maki BLEU

5. Muhimman Bayanai

6. Bincike na Asali: Bayani na Cibiyar, Tsarin Ma'ana, Ƙarfafawa & Kurakurai, Bayanai masu Amfani

Bayani na Cibiyar: Mafi ƙarfin hujjar takardar ita ce fassarar injin da aka ƙarfafa da ma'ajiya ta kasance mai mai da hankali sosai akan tsarin jijiyoyi na mai haɗawa (mai ɓoyewa), yayin da aka yi watsi da mai zaɓa (mai samo bayanai). Bouthors et al. sun gano wannan ɓangaren na sama a matsayin wurin tuƙi mai yanke hukunci. Bayaninsu na tsara zaɓin misali a matsayin matsala ta rufe saƙo mai ƙarancin ƙarfi yana da kyau, suna ɗaukar tsarin da aka fahimta sosai daga binciken aiki da maido da bayanai (yana kwaikwayon ci gaban taƙaitaccen takarda kamar a Lin & Bilmes, 2011) kuma suna amfani da shi daidai daidai ga mahallin MT. Wannan ba ƙara gyara ba ne; yana da cikakken sake tunani game da mafi raunin hanyar haɗin gwiwar da aka ƙarfafa da ma'ajiya.

Tsarin Ma'ana: Ma'ana tana da ƙarfi kuma tana gamsarwa. Ya fara ne daga ganin hankalin na'urar canza siffa ta multi-Levenshtein ga abubuwan da aka shigar, ya sanya ɗaukar hoto a matsayin abin da ake buƙata mai mahimmanci, ya gane fashewar haɗuwa a cikin zaɓar mafi kyawun saƙo, sannan ya ba da ƙarancin ƙarfi a matsayin kayan aikin lissafi wanda ke sa matsalar ta zama mai sauƙi. Haɗin tsakanin ingantattun maki na ɗaukar hoto da ingantattun maki na BLEU ya samar da tsari mai tsabta, na dalili na shaida. Yana nuna cewa ingantaccen injiniyanci na matakin samo bayanai, wanda ka'idar ke jagoranta, kai tsaye yana fassara zuwa mafi kyawun aiki na ƙasa.

Ƙarfafawa & Kurakurai: Babban ƙarfi shine nasarar amfani da ingantaccen tsarin ka'idar, wanda ba na jijiyoyi ba, ga matsala ta asali a cikin NLP na zamani, yana haifar da riba bayyananna. Hanyar tana da inganci kuma ana iya maimaitawa. Duk da haka, kurakurai—kuma babba ne da suka yarda da shi a fili—shine hasashe na tushe cewa ɗaukar hoto na tushe yana nufin ɗaukar hoto na manufa. Wannan ya rufe batun wuya na rarrabuwar fassara, ƙalubalen da aka rubuta sosai inda tsarin harshen tushe da manufa ba su daidaita ba (Dorr, 1994). A cikin harsuna masu babban rarrabuwar tsarin jumla ko siffofi, haɓaka ɗaukar hoto na n-gram na tushe zai iya samo misalan da suka haɗa da ɓata gaba ɗaya. Binciken, duk da yana nuna riba, bai cika ba a cikin nau'ikan nau'ikan harsuna da yawa waɗanda za su gwada wannan hasashe.

Bayanai masu Amfani: Ga masu aiki, abin da za a ɗauka nan take shine daina ɗaukar ma'ajiyar bayanai a matsayin bincike mai sauƙi na kamanni. Aiwatar da ingantaccen mai haɓaka ɗaukar hoto mai ƙarancin ƙarfi mai son zuciya don binciken TM ɗinku—yana da sauƙi kuma yana ba da tabbacin kusanci. Ga masu bincike, wannan aikin ya buɗe hanyoyi da yawa: 1) Haɗa tare da Ma'ajiyar Bayanai mai Yawa: Haɗa manufofin ƙarancin ƙarfi tare da horar da mai ma'ajiyar bayanai mai yawa na zamani (misali, DPR, Karpukhin et al., 2020) don koyan wakilcin da aka inganta don ɗaukar hoto na gama gari, ba kawai kamanni biyu ba. 2) ɗaukar hoto mai Sanin Manufa: Haɓaka samfuran haɗin gwiwa ko na hasashe na ɗaukar hoto na tushe-manufa don rage matsalar rarrabuwa. 3) k mai Sauƙi: Bincika hanyoyin da za a ƙayyade mafi kyawun adadin misalai k kowace jumla, maimakon amfani da ƙayyadadden ƙima. Wannan takarda tana ba da kayan aikin tushe; mataki na gaba shine gina ƙarin tsarin masu hankali na harshe a samansa.

7. Cikakkun Bayanai na Fasaha & Tsarin Lissafi

An ayyana ainihin matsalar ingantawa kamar haka:

$\text{argmax}_{R \subseteq V, |R| \leq k} \, F(R)$

inda V shine saƙon duk misalan a cikin TM, kuma F aiki ne na ɗaukar hoto mai ƙarancin ƙarfi. Wani abu na gama gari shine:

$F(R) = \sum_{g \in G(S)} w_g \, \min\{1, \sum_{e \in R} \mathbb{I}(g \in e)\}$

Anan, G(S) shine saƙon siffofi (misali, alamomi, n-grams) na jumlar tushe S, w_g nauyi ne don siffa g, kuma $\mathbb{I}$ shine aikin nuna alama. Wannan aikin yana ƙidaya adadin siffofin tushe da aƙalla misali ɗaya a cikin R ya rufe. Algorithm ɗin son zuciya, wanda ke ƙara misalin da ke ba da mafi girman riba na gefe $F(R \cup \{e\}) - F(R)$ a jere, yana cimma tabbacin kusanci na $(1 - 1/e)$ ga wannan matsala mai wuya ta NP.

8. Tsarin Bincike: Nazarin Misali

Yanayi: Fassara jumlar tushe ta fasaha: "Dole ne a kammala tsarin farawa na actuator na asali kafin a yi ƙoƙarin daidaitawa." Ma'ajiyar Bayanai na Asali (Top-3 ta Kamanni Cosine): 1. "Kammala tsarin farawa kafin fara aikin." 2. "Daidaitawar actuator tana da hankali." 3. "Saitunan asali sau da yawa sun isa." Bincike: Waɗannan suna da kamanni ɗaya amma suna maimaitawa gaba ɗaya akan "farawa" kuma sun rasa mahimman kalmomi kamar "dole ne a kammala" da "ƙoƙarin". Ma'ajiyar Bayanai na ɗaukar hoto mai ƙarancin ƙarfi da aka Gabatar (k=3): 1. "Dole ne a gudanar da tsarin farawa gaba ɗaya." 2. "Kada ku yi ƙoƙarin daidaitawa kafin shirye-shiryen tsarin." 3. "An saita abubuwan da suka dace na actuator a cikin tsarin." Bincike: Wannan saƙon yana ba da ɗaukar hoto mai faɗi: Jumla ta 1 ta rufe "tsarin farawa dole ne", Jumla ta 2 ta rufe "ƙoƙarin daidaitawa" da "kafin", kuma Jumla ta 3 ta rufe "abubuwan da suka dace na actuator". ɗaukar hoto na gama gari na ra'ayoyin tushe ya fi girma, yana ba da mahallin mai wadata da bambancin ga mai fassara na gyarawa.

9. Aikace-aikacen Gaba & Hanyoyin Bincike

10. Nassoshi

  1. Bouthors, M., Crego, J., & Yvon, F. (2023). Na'urar Canza Siffa ta Multi-Levenshtein. Proceedings of ACL.
  2. Dorr, B. J. (1994). Rarrabuwar fassarar inji: Bayani na yau da kullun da kuma bayar da mafita. Ilimin Lissafi, 20(4), 597-633.
  3. Farajian, M. A., et al. (2017). Fassarar injin jijiyoyi mai yawan yankuna ta hanyar daidaitawa mara kulawa. Proceedings of WMT.
  4. Gu, J., et al. (2018). Binciken injin da aka jagorantar fassarar injin jijiyoyi. Proceedings of AAAI.
  5. Gu, J., et al. (2019). Ingantaccen ɓoyewa mai ƙuntatawa na ƙamus don fassara tare da ƙayyadaddun albarkatu. Proceedings of NAACL.
  6. Karpukhin, V., et al. (2020). Ma'ajiyar bayanai mai yawa don amsa tambayoyi na buɗe yanki. Proceedings of EMNLP.
  7. Koehn, P., & Senellart, J. (2010). Haɗuwa da ma'ajiyar fassara da fassarar injin ƙididdiga. Proceedings of AMTA.
  8. Lin, H., & Bilmes, J. (2011). Rukunin ayyukan ƙarancin ƙarfi don taƙaitaccen takarda. Proceedings of ACL.
  9. Moslem, Y., et al. (2023). Fassarar inji mai daidaitawa tare da manyan samfuran harshe. Proceedings of EACL.
  10. Nagao, M. (1984). Tsarin fassarar inji tsakanin Jafananci da Ingilishi ta hanyar ƙa'idar kwatankwacin. Hankali na Wucin Gadi da na ɗan Adam.