www.universalcurrentaffairs.com

Meta Releases LLaMA Foundation Language Models to Researchers.

Meta has released a collection of foundation language models dubbed LLaMA, which is short for “Large Language Model Meta AI.”

LLaMA is an auto-regressive language model based on the transformer architecture and was developed by Meta’s Fundamental AI Research (FAIR) team. It is 10x smaller than ChatGPT and comes in four different sizes: 7B, 13B, 33B, and 65B parameters. For comparison, GPT-3.5, the model ChatGPT is based on, was trained with 175B parameters.

Meta trained LLaMA on tokens, which are pieces of words instead of full words, saying this makes the models easier to retrain and fine-tune for specific potential use cases: “We trained LLaMA 65B and LLaMA 33B on 1.4 trillion tokens. Our smallest model, LLaMA 7B, is trained on one trillion tokens.” The company chose text from the 20 most spoken languages and focused on those with Latin and Cyrillic alphabets.

In a company blog post, Meta says that smaller models like LLaMA can enable those in the research community who lack access to large amounts of infrastructure to study these models: “Training smaller foundation models like LLaMA is desirable in the large language model space because it requires far less computing power and resources to test new approaches, validate others’ work, and explore new use cases.”

Like ChatGPT and Bard, LLaMA is not free from the problems plaguing LLMs, including hallucinations, bias, and generating harmful content. Meta asserts that full research access to these models remains limited due to resource constraints, hindering progress in understanding them and mitigating these known issues.

LLaMA is being released under a noncommercial license focused on research use cases, and access will be granted on a case-by-case basis to academic researchers, civil and governmental organizations, and industry research labs, according to Meta.

Meta hopes that by sharing the code for LLaMA, researchers can test new approaches to limiting these problems in LLMs. In its research paper, the company has provided a set of evaluations on benchmarks evaluating model biases and toxicity to show LLaMA’s limitations and support further research in this area.

The company noted that these foundation models were trained on a large set of unlabeled data, making them ideal for fine-tuning for different tasks. The FAIR team trained the model with publicly available data from CCNet, C4, GitHub, Wikipedia, books, ArXiv, and Stack Exchange, with 67% of the total data coming from CCNet.

Meta claims its LLaMA 13B model can outperform GPT-3 while running on a single GPU when measured on benchmarks such as BoolQ, PIQA, SIQA, HellaSwag, WinoGrande, ARC, and OpenBookQA, which could set the stage for developing applications based on this model using consumer-level hardware in the future.

Source: www.enterpriseai.news

Share:

No comments:

Post a Comment

Followers

Android App "CA DAILY UPDATES"

Translate

Popular Posts

Blog Archive

Recent Posts