{"id":12,"title":"IndicLID","area":"XLIT","published_on":"2023-05-04","conference":"ACL","description":"IndicLID is a publicly available language identification datasets for all 22 Indian languages in both native-script and romanized text. It is the first LID for romanized text in Indian languages and can predict 47 classes (24 native-script classes and 21 roman-script classes plus English and Others).","paper_link":"https://aclanthology.org/2023.acl-short.71","colab_link":"https://colab.research.google.com/drive/1pLMeaGhYgfNRmYHPHkvAmcR-8xMMZOme?usp=sharing","website_link":"https://github.com/AI4Bharat/IndicLID","github_link":"https://github.com/AI4Bharat/IndicLID","service_id":null,"hf_link":null,"installation_steps_json":[{"instruction":"Setup and Installation","codeString":null,"type":"heading"},{"instruction":"1. Install Required Python Packages","codeString":"pip3 install fasttext\npip3 install transformers","type":"instruction"},{"instruction":"2. Clone the GitHub Repository and Navigate to the Inference Directory","codeString":"git clone https://github.com/AI4Bharat/IndicLID.git\ncd IndicLID/Inference","type":"instruction"},{"instruction":"3. Create a Directory for Models and Navigate to It","codeString":"mkdir models\ncd models","type":"instruction"},{"instruction":"4. Download the IndicLID Model Files","codeString":"wget https://github.com/AI4Bharat/IndicLID/releases/download/v1.0/indiclid-bert.zip\nwget https://github.com/AI4Bharat/IndicLID/releases/download/v1.0/indiclid-ftn.zip\nwget https://github.com/AI4Bharat/IndicLID/releases/download/v1.0/indiclid-ftr.zip","type":"instruction"},{"instruction":"5. Unzip the Downloaded Model Files","codeString":"unzip indiclid-bert.zip\nunzip indiclid-ftn.zip\nunzip indiclid-ftr.zip","type":"instruction"},{"instruction":"6. Navigate Back to the Inference Directory","codeString":"cd ..","type":"instruction"}],"usage_steps_json":[{"instruction":"Running the IndicLID Model","codeString":null,"type":"heading"},{"instruction":"1. Run the Following Code to Perform Language Identification Using the IndicLID Model","codeString":"# Import the IndicLID class\nfrom ai4bharat.IndicLID import IndicLID\n\n# Initialize the IndicLID model with specified thresholds\nIndicLID_model = IndicLID(input_threshold=0.5, roman_lid_threshold=0.6)\n\n# Define the test samples\n# These samples include both native script and Romanized script text\n# Modify the samples as needed to test different inputs\n\n# Test samples for prediction\ntest_samples = [\n   'आज के दिन का मौसम अत्यंत सुंदर है, जहां सदैव छाए हुए बादल, गुलाबी रंगीन शाम, और हल्की हवा के साथ प्राकृतिक सौंदर्य का आनंद लेने का एक सुनहरा अवसर है',\n   'aaj key din ka mausam atyant sundar hai, jahan sadaiv chae hue baadal, gulabi rangeen shaam, aur halki havaa key saath praakritik saundarya kaa anand lene kaa aeka sunhara avsar haye',\n]\n\n# Set the batch size for predictions\nbatch_size = 1\n\n# Run the batch prediction\noutputs = IndicLID_model.batch_predict(test_samples, batch_size)\n\n# Print the outputs\n# This will display the language identification results for each sample\nprint(outputs)","type":"instruction"}],"testimonials_json":null,"latest":false,"paper_award":null,"license":[],"type":"Model","hfData":null,"services":{}}