e5-large-v2 requirements for training in non english?

by wilfoderek - opened May 29, 2023

May 29, 2023

Friends congrulations for the amazing work! My name is Wilfredo and i would like to training this model for non english so what are the further modification that must be done to get that goal?
And could you please describe the hardware need to get this model done?

intfloat

Owner May 30, 2023

Hi @wilfoderek , thanks for your interest.

The vocabulary of this model is mostly English, so you need to change it to a multilingual model (e.g., multilingual-bert / xlm-roberta). Also, you need to curate a collection of multilingual datasets for training.

We have released a multilingual model at https://huggingface.co/intfloat/multilingual-e5-base , which you may want to check out.

For hardware requirements, as described in our paper, the large-size model requires 64 V100 GPUs for roughly 4 days.

wilfoderek

Jun 20, 2023

Thank you for your soon answer

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment