Projects
Results
Research results
Mining Folios for Parallel Sentences
Mining folios for parallel sentences Two available datasets As of now, 84,000 publishes two datasets: Parallel folios. This takes the form of their translations in XML...
Read more...
Segmenting Long Documents
Segmenting long documents The need for segmentation Modern generative language models are trained on short sentences. While there are summarization models, they only accept...
Read more...
No Language Left Behind
No Language Left Behind What is NLLB? In July 2022, FAIR (Facebook AI Research) released a large multilingual transformer model that they call No Language Left Behind, or NLLB...
Read more...
Injecting Context During Decoding
Injecting context during decoding Ambiguity and context in classical Tibetan As a written language, Tibetan is a simple language in every way - syntactically, morphologically,...
Read more...
Sequencing Long Text From Sentence Fragments
Sequencing long text from sentence fragments What is sequencing? There are two datasets available from 84,000: The raw English translations. This is English text only, it is...
Read more...
««
«
1
2
»
»»