Llama2 Models
About Llama2 Models
Llama2 Models
Llama2 Models is a family of pretrained and fine-tuned large language models (LLMs) developed by Meta AI. The models are trained on a massive dataset of text and code, and can be used for a variety of tasks, including
- Natural language understanding (NLU)
- Natural language generation (NLG)
- Machine translation
- Text summarization
- Question answering
- Code generation
The Llama2 models are available under the Apache 2.0 license, which means that they can be freely used, modified, and redistributed.
Model Architecture
The Llama2 models are based on the Transformer architecture, which is a neural network architecture that has been shown to be very effective for NLP tasks. The models are trained using a technique called masked language modeling, which involves predicting the missing words in a sequence of text.
Model Sizes
The Llama2 models come in a variety of sizes, ranging from 7 billion to 70 billion parameters. The larger models have more capacity to learn complex patterns in language, but they are also more computationally expensive to train and deploy.
Fine-tuning
The Llama2 models are pretrained on a massive dataset of text and code, but they can be further fine-tuned on a specific task to improve their performance. Fine-tuning involves training the model on a dataset of labeled data for the specific task.
Use Cases
The Llama2 models can be used for a variety of tasks, including:
- Natural language understanding (NLU): The Llama2 models can be used to understand the meaning of text, such as identifying the entities and relationships in a sentence.
- Natural language generation (NLG): The Llama2 models can be used to generate text, such as writing different kinds of creative content, like poems, code, scripts, musical pieces, email, letters, etc.
- Machine translation: The Llama2 models can be used to translate text from one language to another.
- Text summarization: The Llama2 models can be used to summarize a text document into a shorter, more concise version.
- Question answering: The Llama2 models can be used to answer questions about a text document.
- Code generation: The Llama2 models can be used to generate code, such as Python scripts or Java classes.
Availability
The Llama2 models are available through the Hugging Face Hub. The models are also available in the TensorFlow Hub , the PyTorch Hub and EasyDeL.
Conclusion
The Llama2 models are a powerful family of LLMs that can be used for a variety of tasks. The models are open source and available for free, making them a valuable resource for researchers and developers.
How to Use/Load Them in EasyDeL
from easydel import AutoEasyDeLModelForCausalLM
model, params = AutoEasyDeLModelForCausalLM.from_pretrained(
'meta-llama/Llama-2-7b',
# other kwargs
)
also keep that in mind that returned config
includes .get_partition_rules(fsdp=True)
Use With JaxServer
from easydel.serve import JAXServer, JAXServerConfig
import jax
from transformers import AutoTokenizer
from easydel import AutoEasyDeLModelForCausalLM
model, params = AutoEasyDeLModelForCausalLM.from_pretrained(
'meta-llama/Llama-2-7b',
# other kwargs
)
DEFAULT_SYSTEM_PROMPT = "You are a helpful, respectful and honest assistant and act as wanted"
class Llama2JaxServer(JAXServer):
def sample_gradio_chat(self, prompt, history, max_new_tokens, system, greedy):
system = None if system == "" else system
string = self.prompt_llama2_model(
message=prompt,
chat_history=history or [],
system_prompt=system or DEFAULT_SYSTEM_PROMPT
)
if not self.server_config.stream_tokens_for_gradio:
response = ""
for response, _ in self.sample(
string=string,
greedy=greedy,
max_new_tokens=max_new_tokens,
):
...
history.append([prompt, response])
else:
history.append([prompt, ""])
for response, _ in self.sample(
string=string,
greedy=greedy,
max_new_tokens=max_new_tokens
):
history[-1][-1] = response
yield "", history
return "", history
def sample_gradio_instruct(self, prompt, system, max_new_tokens, greedy):
string = self.prompt_llama2_model(system_prompt=DEFAULT_SYSTEM_PROMPT, message=prompt, chat_history=[])
if not self.server_config.stream_tokens_for_gradio:
response = ""
for response, _ in self.sample(
string=string,
greedy=greedy,
max_new_tokens=max_new_tokens,
):
pass
else:
response = ""
for response, _ in self.sample(
string=string,
greedy=greedy,
max_new_tokens=max_new_tokens,
stream=True
):
yield "", response
return "", response
@staticmethod
def prompt_llama2_model(message: str, chat_history,
system_prompt: str) -> str:
do_strip = False
texts = [f'<s>[INST] <<SYS>>\n{system_prompt}\n<</SYS>>\n\n']
for user_input, response in chat_history:
user_input = user_input.strip() if do_strip else user_input
do_strip = True
texts.append(f'{user_input} [/INST] {response.strip()} </s><s>[INST] ')
message = message.strip() if do_strip else message
texts.append(f'{message} [/INST]')
return "".join(texts)
server = Llama2JaxServer.from_parameters(
params=params,
model=model,
config_model=model.config,
add_params_field=True,
tokenizer=AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b'),
verbose=False,
do_memory_log=True,
server_config=JAXServerConfig()
)
server.fire() # Launch FastAPI functions
shared_urls = server.launch(
share_chat=True,
share_inst=True
)
Done 😇 this method can be used for all the llama2 models