当前位置: 首页 > news >正文

装饰公司响应式网站建设案例seo智能优化

装饰公司响应式网站建设案例,seo智能优化,北京软装设计公司前十名,最好玩的传奇网页游戏我们介绍的 NV-Embed-v2 是一种通用嵌入模型,它在大规模文本嵌入基准(MTEB 基准)(截至 2024 年 8 月 30 日)的 56 项文本嵌入任务中以 72.31 的高分排名第一。此外,它还在检索子类别中排名第一(…

在这里插入图片描述

我们介绍的 NV-Embed-v2 是一种通用嵌入模型,它在大规模文本嵌入基准(MTEB 基准)(截至 2024 年 8 月 30 日)的 56 项文本嵌入任务中以 72.31 的高分排名第一。此外,它还在检索子类别中排名第一(在 15 项任务中获得 62.65 分),这对 RAG 技术的发展至关重要。

NV-Embed-v2 采用了多项新设计,包括让 LLM 关注潜在向量,以获得更好的池化嵌入输出,并展示了一种两阶段指令调整方法,以提高检索和非检索任务的准确性。此外,NV-Embed-v2 还采用了一种新颖的硬阴性挖掘方法,该方法考虑了正相关性得分,能更好地去除假阴性。

有关更多技术细节,请参阅我们的论文: NV-Embed:将 LLM 训练为通用嵌入模型的改进技术。

型号详情

  • 仅用于解码器的基本 LLM:Mistral-7B-v0.1
  • 池类型: Latent-Attention
  • 嵌入尺寸: 4096

如何使用

所需软件包

如果遇到问题,请尝试安装以下 python 软件包

pip uninstall -y transformer-engine
pip install torch==2.2.0
pip install transformers==4.42.4
pip install flash-attn==2.2.0
pip install sentence-transformers==2.7.0

以下是如何使用 Huggingface-transformer 和 Sentence-transformer 对查询和段落进行编码的示例。

HuggingFace Transformers

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel# Each query needs to be accompanied by an corresponding instruction describing the task.
task_name_to_instruct = {"example": "Given a question, retrieve passages that answer the question",}query_prefix = "Instruct: "+task_name_to_instruct["example"]+"\nQuery: "
queries = ['are judo throws allowed in wrestling?', 'how to become a radiology technician in michigan?']# No instruction needed for retrieval passages
passage_prefix = ""
passages = ["Since you're reading this, you are probably someone from a judo background or someone who is just wondering how judo techniques can be applied under wrestling rules. So without further ado, let's get to the question. Are Judo throws allowed in wrestling? Yes, judo throws are allowed in freestyle and folkstyle wrestling. You only need to be careful to follow the slam rules when executing judo throws. In wrestling, a slam is lifting and returning an opponent to the mat with unnecessary force.","Below are the basic steps to becoming a radiologic technologist in Michigan:Earn a high school diploma. As with most careers in health care, a high school education is the first step to finding entry-level employment. Taking classes in math and science, such as anatomy, biology, chemistry, physiology, and physics, can help prepare students for their college studies and future careers.Earn an associate degree. Entry-level radiologic positions typically require at least an Associate of Applied Science. Before enrolling in one of these degree programs, students should make sure it has been properly accredited by the Joint Review Committee on Education in Radiologic Technology (JRCERT).Get licensed or certified in the state of Michigan."
]# load model with tokenizer
model = AutoModel.from_pretrained('nvidia/NV-Embed-v2', trust_remote_code=True)# get the embeddings
max_length = 32768
query_embeddings = model.encode(queries, instruction=query_prefix, max_length=max_length)
passage_embeddings = model.encode(passages, instruction=passage_prefix, max_length=max_length)# normalize embeddings
query_embeddings = F.normalize(query_embeddings, p=2, dim=1)
passage_embeddings = F.normalize(passage_embeddings, p=2, dim=1)# get the embeddings with DataLoader (spliting the datasets into multiple mini-batches)
# batch_size=2
# query_embeddings = model._do_encode(queries, batch_size=batch_size, instruction=query_prefix, max_length=max_length, num_workers=32, return_numpy=True)
# passage_embeddings = model._do_encode(passages, batch_size=batch_size, instruction=passage_prefix, max_length=max_length, num_workers=32, return_numpy=True)scores = (query_embeddings @ passage_embeddings.T) * 100
print(scores.tolist())
# [[87.42693328857422, 0.46283677220344543], [0.965264618396759, 86.03721618652344]]

Sentence-Transformers

import torch
from sentence_transformers import SentenceTransformer# Each query needs to be accompanied by an corresponding instruction describing the task.
task_name_to_instruct = {"example": "Given a question, retrieve passages that answer the question",}query_prefix = "Instruct: "+task_name_to_instruct["example"]+"\nQuery: "
queries = ['are judo throws allowed in wrestling?', 'how to become a radiology technician in michigan?']# No instruction needed for retrieval passages
passages = ["Since you're reading this, you are probably someone from a judo background or someone who is just wondering how judo techniques can be applied under wrestling rules. So without further ado, let's get to the question. Are Judo throws allowed in wrestling? Yes, judo throws are allowed in freestyle and folkstyle wrestling. You only need to be careful to follow the slam rules when executing judo throws. In wrestling, a slam is lifting and returning an opponent to the mat with unnecessary force.","Below are the basic steps to becoming a radiologic technologist in Michigan:Earn a high school diploma. As with most careers in health care, a high school education is the first step to finding entry-level employment. Taking classes in math and science, such as anatomy, biology, chemistry, physiology, and physics, can help prepare students for their college studies and future careers.Earn an associate degree. Entry-level radiologic positions typically require at least an Associate of Applied Science. Before enrolling in one of these degree programs, students should make sure it has been properly accredited by the Joint Review Committee on Education in Radiologic Technology (JRCERT).Get licensed or certified in the state of Michigan."
]# load model with tokenizer
model = SentenceTransformer('nvidia/NV-Embed-v2', trust_remote_code=True)
model.max_seq_length = 32768
model.tokenizer.padding_side="right"def add_eos(input_examples):input_examples = [input_example + model.tokenizer.eos_token for input_example in input_examples]return input_examples# get the embeddings
batch_size = 2
query_embeddings = model.encode(add_eos(queries), batch_size=batch_size, prompt=query_prefix, normalize_embeddings=True)
passage_embeddings = model.encode(add_eos(passages), batch_size=batch_size, normalize_embeddings=True)scores = (query_embeddings @ passage_embeddings.T) * 100
print(scores.tolist())

MTEB 基准的指令模板

对于检索、STS 和摘要的 MTEB 子任务,请使用 instructions.json 中的指令前缀模板。 对于分类、聚类和重排,请使用 NV-Embed 论文表 7 中提供的说明。 7 中提供的说明。

instructions.json

{"ClimateFEVER":{"query": "Given a claim about climate change, retrieve documents that support or refute the claim","corpus": ""},"HotpotQA":{"query": "Given a multi-hop question, retrieve documents that can help answer the question","corpus": ""},"FEVER":{"query": "Given a claim, retrieve documents that support or refute the claim","corpus": ""},"MSMARCO":{"query": "Given a web search query, retrieve relevant passages that answer the query","corpus": ""},"DBPedia":{"query": "Given a query, retrieve relevant entity descriptions from DBPedia","corpus": ""},"NQ":{"query": "Given a question, retrieve passages that answer the question","corpus": ""},"QuoraRetrieval":{"query": "Given a question, retrieve questions that are semantically equivalent to the given question","corpus": "Given a question, retrieve questions that are semantically equivalent to the given question"},"SCIDOCS":{"query": "Given a scientific paper title, retrieve paper abstracts that are cited by the given paper","corpus": ""},"TRECCOVID":{"query": "Given a query on COVID-19, retrieve documents that answer the query","corpus": ""},"Touche2020":{"query": "Given a question, retrieve passages that answer the question","corpus": ""},"SciFact":{"query": "Given a scientific claim, retrieve documents that support or refute the claim","corpus": ""},"NFCorpus":{"query": "Given a question, retrieve relevant documents that answer the question","corpus": ""},"ArguAna":{"query": "Given a claim, retrieve documents that support or refute the claim","corpus": ""},"FiQA2018":{"query": "Given a financial question, retrieve relevant passages that answer the query","corpus": ""},"STS":{"text": "Retrieve semantically similar text"},"SUMM":{"text": "Given a news summary, retrieve other semantically similar summaries"}
}

如何启用多 GPU(注意,这是 HuggingFace Transformers的情况)

from transformers import AutoModel
from torch.nn import DataParallelembedding_model = AutoModel.from_pretrained("nvidia/NV-Embed-v2")
for module_key, module in embedding_model._modules.items():embedding_model._modules[module_key] = DataParallel(module)
http://www.dinnco.com/news/59468.html

相关文章:

  • 银川360推广 网站建设长春seo推广
  • 做的网站怎么上传找关键词的方法与技巧
  • 又好又快自助建站各大网站提交入口网址
  • 奶茶店加盟网站建设铜陵seo
  • 杭州 网站建设网站网页设计与制作个人网站模板
  • 公司网站开发设计题目来源怎么写seo专员是指什么意思
  • 自字网站建设教程竞价托管哪家效果好
  • 网站有几种搜索引擎营销的6种方式
  • 逻辑bug避免网站开发重庆seo网站建设
  • 望京网站建设公司广告投放都有哪些平台
  • 网站建设全套教程商丘seo排名
  • 网站后台尺寸一般做多大的怎么做电商
  • 电商网站设计系列百度网址大全 官网首页
  • 安阳哪里有做网站的东莞有哪些做推广的网站
  • 哪个网站可以做视频播放器长沙网站推广和优化
  • 江苏廉政建设网站点击器 百度网盘
  • 宁波seo网络推广代理公司seo排名推广
  • 工作室 网站经营性备案郑州seo优化顾问热狗
  • 政府网站建设出现的问题及对策模板网站建站公司
  • 山东网站建设市场最好的免费建站网站
  • 时尚flash网站百度竞价排名查询网站
  • 做网站怎么找图索引擎优化 seo
  • 公司网站案例最有效的线上推广方式
  • 招聘网站建设需求文档郑州网络推广专业公司
  • 页面设计制作网站seo网站关键词优化软件
  • 郴州红网长沙网站seo收费标准
  • 做任务赚钱的网站源码软文范例大全1000字
  • h5网站模板下载产品营销推广
  • google网站排名查询国内新闻今日头条
  • 做网站备案是个人还是企业好网络营销的宏观环境