当前位置: 首页 > news >正文

网站开发招标文件范本看b站视频软件下载安装手机

网站开发招标文件范本,看b站视频软件下载安装手机,百度联盟怎么加入赚钱,整站seo包年费用大模型的出现和发展得益于增长的数据量、计算能力的提升以及算法优化等因素。这些模型在各种任务中展现出惊人的性能,比如自然语言处理、计算机视觉、语音识别等。这种模型通常采用深度神经网络结构,如 Transformer、BERT、GPT( Generative P…
  • 大模型的出现和发展得益于增长的数据量、计算能力的提升以及算法优化等因素。这些模型在各种任务中展现出惊人的性能,比如自然语言处理、计算机视觉、语音识别等。这种模型通常采用深度神经网络结构,如 TransformerBERTGPT( Generative Pre-trained Transformer )等。大模型的优势在于其能够捕捉和理解数据中更为复杂、抽象的特征和关系。通过大规模参数的学习,它们可以提高在各种任务上的泛化能力,并在未经过大量特定领域数据训练的情况下实现较好的表现。然而,大模型也面临着一些挑战,比如巨大的计算资源需求、高昂的训练成本、对大规模数据的依赖以及模型的可解释性等问题。因此,大模型的应用和发展也需要在性能、成本和道德等多个方面进行权衡和考量。InternLM-7B 包含了一个拥有 70 亿参数的基础模型和一个为实际场景量身定制的对话模型。该模型具有以下特点:1,利用数万亿的高质量 token 进行训练,建立了一个强大的知识库;2.支持 8k token 的上下文窗口长度,使得输入序列更长并增强了推理能力。基于 InternLM 训练框架,上海人工智能实验室已经发布了两个开源的预训练模型:InternLM-7B 和 InternLM-20B

  • InternLM 是一个开源的轻量级训练框架,旨在支持大模型训练而无需大量的依赖。通过单一的代码库,它支持在拥有数千个 GPU 的大型集群上进行预训练,并在单个 GPU 上进行微调,同时实现了卓越的性能优化。在 1024GPU 上训练时,InternLM 可以实现近 90% 的加速效率。基于 InternLM 训练框架,上海人工智能实验室已经发布了两个开源的预训练模型:InternLM-7BInternLM-20BLagent 是一个轻量级、开源的基于大语言模型的智能体(agent)框架,支持用户快速地将一个大语言模型转变为多种类型的智能体,并提供了一些典型工具为大语言模型赋能。通过 Lagent 框架可以更好的发挥 InternLM 的全部性能。

  • 7B demo 的训练配置文件样例如下:

    • JOB_NAME = "7b_train"
      SEQ_LEN = 2048
      HIDDEN_SIZE = 4096
      NUM_ATTENTION_HEAD = 32
      MLP_RATIO = 8 / 3
      NUM_LAYER = 32
      VOCAB_SIZE = 103168
      MODEL_ONLY_FOLDER = "local:llm_ckpts/xxxx"
      # Ckpt folder format:
      # fs: 'local:/mnt/nfs/XXX'
      SAVE_CKPT_FOLDER = "local:llm_ckpts"
      LOAD_CKPT_FOLDER = "local:llm_ckpts/49"
      # boto3 Ckpt folder format:
      # import os
      # BOTO3_IP = os.environ["BOTO3_IP"] # boto3 bucket endpoint
      # SAVE_CKPT_FOLDER = f"boto3:s3://model_weights.{BOTO3_IP}/internlm"
      # LOAD_CKPT_FOLDER = f"boto3:s3://model_weights.{BOTO3_IP}/internlm/snapshot/1/"
      CHECKPOINT_EVERY = 50
      ckpt = dict(enable_save_ckpt=False,  # enable ckpt save.save_ckpt_folder=SAVE_CKPT_FOLDER,  # Path to save training ckpt.# load_ckpt_folder=LOAD_CKPT_FOLDER, # Ckpt path to resume training(load weights and scheduler/context states).# load_model_only_folder=MODEL_ONLY_FOLDER, # Path to initialize with given model weights.load_optimizer=True,  # Wheter to load optimizer states when continuing training.checkpoint_every=CHECKPOINT_EVERY,async_upload=True,  # async ckpt upload. (only work for boto3 ckpt)async_upload_tmp_folder="/dev/shm/internlm_tmp_ckpt/",  # path for temporarily files during asynchronous upload.snapshot_ckpt_folder="/".join([SAVE_CKPT_FOLDER, "snapshot"]),  # directory for snapshot ckpt storage path.oss_snapshot_freq=int(CHECKPOINT_EVERY / 2),  # snapshot ckpt save frequency.
      )
      TRAIN_FOLDER = "/path/to/dataset"
      VALID_FOLDER = "/path/to/dataset"
      data = dict(seq_len=SEQ_LEN,# micro_num means the number of micro_batch contained in one gradient updatemicro_num=4,# packed_length = micro_bsz * SEQ_LENmicro_bsz=2,# defaults to the value of micro_numvalid_micro_num=4,# defaults to 0, means disable evaluatevalid_every=50,pack_sample_into_one=False,total_steps=50000,skip_batches="",rampup_batch_size="",# Datasets with less than 50 rows will be discardedmin_length=50,# train_folder=TRAIN_FOLDER,# valid_folder=VALID_FOLDER,
      )
      grad_scaler = dict(fp16=dict(# the initial loss scale, defaults to 2**16initial_scale=2**16,# the minimum loss scale, defaults to Nonemin_scale=1,# the number of steps to increase loss scale when no overflow occursgrowth_interval=1000,),# the multiplication factor for increasing loss scale, defaults to 2growth_factor=2,# the multiplication factor for decreasing loss scale, defaults to 0.5backoff_factor=0.5,# the maximum loss scale, defaults to Nonemax_scale=2**24,# the number of overflows before decreasing loss scale, defaults to 2hysteresis=2,
      )
      hybrid_zero_optimizer = dict(# Enable low_level_optimzer overlap_communicationoverlap_sync_grad=True,overlap_sync_param=True,# bucket size for nccl communication paramsreduce_bucket_size=512 * 1024 * 1024,# grad clippingclip_grad_norm=1.0,
      )
      loss = dict(label_smoothing=0,
      )
      adam = dict(lr=1e-4,adam_beta1=0.9,adam_beta2=0.95,adam_beta2_c=0,adam_eps=1e-8,weight_decay=0.01,
      )lr_scheduler = dict(total_steps=data["total_steps"],init_steps=0,  # optimizer_warmup_stepwarmup_ratio=0.01,eta_min=1e-5,last_epoch=-1,
      )beta2_scheduler = dict(init_beta2=adam["adam_beta2"],c=adam["adam_beta2_c"],cur_iter=-1,
      )model = dict(checkpoint=False,  # The proportion of layers for activation aheckpointing, the optional value are True/False/[0-1]num_attention_heads=NUM_ATTENTION_HEAD,embed_split_hidden=True,vocab_size=VOCAB_SIZE,embed_grad_scale=1,parallel_output=True,hidden_size=HIDDEN_SIZE,num_layers=NUM_LAYER,mlp_ratio=MLP_RATIO,apply_post_layer_norm=False,dtype="torch.float16",  # Support: "torch.float16", "torch.half", "torch.bfloat16", "torch.float32", "torch.tf32"norm_type="rmsnorm",layer_norm_epsilon=1e-5,use_flash_attn=True,num_chunks=1,  # if num_chunks > 1, interleaved pipeline scheduler is used.
      )
      """
      zero1 parallel:1. if zero1 <= 0, The size of the zero process group is equal to the size of the dp process group,so parameters will be divided within the range of dp.2. if zero1 == 1, zero is not used, and all dp groups retain the full amount of model parameters.3. zero1 > 1 and zero1 <= dp world size, the world size of zero is a subset of dp world size.For smaller models, it is usually a better choice to split the parameters within nodes with a setting <= 8.
      pipeline parallel (dict):1. size: int, the size of pipeline parallel.2. interleaved_overlap: bool, enable/disable communication overlap when using interleaved pipeline scheduler.
      tensor parallel: tensor parallel size, usually the number of GPUs per node.
      """
      parallel = dict(zero1=8,pipeline=dict(size=1, interleaved_overlap=True),sequence_parallel=False,
      )cudnn_deterministic = False
      cudnn_benchmark = False
      
  • 30B demo 训练配置文件样例如下:

    • JOB_NAME = "30b_train"
      SEQ_LEN = 2048
      HIDDEN_SIZE = 6144
      NUM_ATTENTION_HEAD = 48
      MLP_RATIO = 8 / 3
      NUM_LAYER = 60
      VOCAB_SIZE = 103168
      MODEL_ONLY_FOLDER = "local:llm_ckpts/xxxx"
      # Ckpt folder format:
      # fs: 'local:/mnt/nfs/XXX'
      SAVE_CKPT_FOLDER = "local:llm_ckpts"
      LOAD_CKPT_FOLDER = "local:llm_ckpts/49"
      # boto3 Ckpt folder format:
      # import os
      # BOTO3_IP = os.environ["BOTO3_IP"] # boto3 bucket endpoint
      # SAVE_CKPT_FOLDER = f"boto3:s3://model_weights.{BOTO3_IP}/internlm"
      # LOAD_CKPT_FOLDER = f"boto3:s3://model_weights.{BOTO3_IP}/internlm/snapshot/1/"
      CHECKPOINT_EVERY = 50
      ckpt = dict(enable_save_ckpt=False,  # enable ckpt save.save_ckpt_folder=SAVE_CKPT_FOLDER,  # Path to save training ckpt.# load_ckpt_folder=LOAD_CKPT_FOLDER, # Ckpt path to resume training(load weights and scheduler/context states).# load_model_only_folder=MODEL_ONLY_FOLDER, # Path to initialize with given model weights.load_optimizer=True,  # Wheter to load optimizer states when continuing training.checkpoint_every=CHECKPOINT_EVERY,async_upload=True,  # async ckpt upload. (only work for boto3 ckpt)async_upload_tmp_folder="/dev/shm/internlm_tmp_ckpt/",  # path for temporarily files during asynchronous upload.snapshot_ckpt_folder="/".join([SAVE_CKPT_FOLDER, "snapshot"]),  # directory for snapshot ckpt storage path.oss_snapshot_freq=int(CHECKPOINT_EVERY / 2),  # snapshot ckpt save frequency.
      )
      TRAIN_FOLDER = "/path/to/dataset"
      VALID_FOLDER = "/path/to/dataset"
      data = dict(seq_len=SEQ_LEN,# micro_num means the number of micro_batch contained in one gradient updatemicro_num=4,# packed_length = micro_bsz * SEQ_LENmicro_bsz=2,# defaults to the value of micro_numvalid_micro_num=4,# defaults to 0, means disable evaluatevalid_every=50,pack_sample_into_one=False,total_steps=50000,skip_batches="",rampup_batch_size="",# Datasets with less than 50 rows will be discardedmin_length=50,# train_folder=TRAIN_FOLDER,# valid_folder=VALID_FOLDER,
      )
      grad_scaler = dict(fp16=dict(# the initial loss scale, defaults to 2**16initial_scale=2**16,# the minimum loss scale, defaults to Nonemin_scale=1,# the number of steps to increase loss scale when no overflow occursgrowth_interval=1000,),# the multiplication factor for increasing loss scale, defaults to 2growth_factor=2,# the multiplication factor for decreasing loss scale, defaults to 0.5backoff_factor=0.5,# the maximum loss scale, defaults to Nonemax_scale=2**24,# the number of overflows before decreasing loss scale, defaults to 2hysteresis=2,
      )
      hybrid_zero_optimizer = dict(# Enable low_level_optimzer overlap_communicationoverlap_sync_grad=True,overlap_sync_param=True,# bucket size for nccl communication paramsreduce_bucket_size=512 * 1024 * 1024,# grad clippingclip_grad_norm=1.0,
      )
      loss = dict(label_smoothing=0,
      )
      adam = dict(lr=1e-4,adam_beta1=0.9,adam_beta2=0.95,adam_beta2_c=0,adam_eps=1e-8,weight_decay=0.01,
      )lr_scheduler = dict(total_steps=data["total_steps"],init_steps=0,  # optimizer_warmup_stepwarmup_ratio=0.01,eta_min=1e-5,last_epoch=-1,
      )
      beta2_scheduler = dict(init_beta2=adam["adam_beta2"],c=adam["adam_beta2_c"],cur_iter=-1,
      )model = dict(checkpoint=False,  # The proportion of layers for activation aheckpointing, the optional value are True/False/[0-1]num_attention_heads=NUM_ATTENTION_HEAD,embed_split_hidden=True,vocab_size=VOCAB_SIZE,embed_grad_scale=1,parallel_output=True,hidden_size=HIDDEN_SIZE,num_layers=NUM_LAYER,mlp_ratio=MLP_RATIO,apply_post_layer_norm=False,dtype="torch.float16",  # Support: "torch.float16", "torch.half", "torch.bfloat16", "torch.float32", "torch.tf32"norm_type="rmsnorm",layer_norm_epsilon=1e-5,use_flash_attn=True,num_chunks=1,  # if num_chunks > 1, interleaved pipeline scheduler is used.
      )
      """
      zero1 parallel:1. if zero1 <= 0, The size of the zero process group is equal to the size of the dp process group,so parameters will be divided within the range of dp.2. if zero1 == 1, zero is not used, and all dp groups retain the full amount of model parameters.3. zero1 > 1 and zero1 <= dp world size, the world size of zero is a subset of dp world size.For smaller models, it is usually a better choice to split the parameters within nodes with a setting <= 8.
      pipeline parallel (dict):1. size: int, the size of pipeline parallel.2. interleaved_overlap: bool, enable/disable communication overlap when using interleaved pipeline scheduler.
      tensor parallel: tensor parallel size, usually the number of GPUs per node.
      """
      parallel = dict(zero1=-1,tensor=4,pipeline=dict(size=1, interleaved_overlap=True),sequence_parallel=False,
      )
      cudnn_deterministic = False
      cudnn_benchmark = False
      

30B Demo — InternLM 0.2.0 文档


文章转载自:
http://dinncoproximate.ssfq.cn
http://dinncoredif.ssfq.cn
http://dinncogroomsman.ssfq.cn
http://dinncoalimental.ssfq.cn
http://dinncochapeaubras.ssfq.cn
http://dinncocaudle.ssfq.cn
http://dinncoakathisia.ssfq.cn
http://dinncostabilise.ssfq.cn
http://dinncocarpellate.ssfq.cn
http://dinncopulmonate.ssfq.cn
http://dinncocaernarvonshire.ssfq.cn
http://dinncodemander.ssfq.cn
http://dinncopeplum.ssfq.cn
http://dinncogrinding.ssfq.cn
http://dinncodrupel.ssfq.cn
http://dinncoarcane.ssfq.cn
http://dinncomiogeoclinal.ssfq.cn
http://dinncostreptomyces.ssfq.cn
http://dinncogalactokinase.ssfq.cn
http://dinncoforebay.ssfq.cn
http://dinncoaffirmable.ssfq.cn
http://dinncohippiatrist.ssfq.cn
http://dinncooverbred.ssfq.cn
http://dinncobally.ssfq.cn
http://dinncoperlustrate.ssfq.cn
http://dinncocarnallite.ssfq.cn
http://dinncodupondius.ssfq.cn
http://dinncoestablishmentarian.ssfq.cn
http://dinncomalate.ssfq.cn
http://dinncotritish.ssfq.cn
http://dinncountiring.ssfq.cn
http://dinnconeogene.ssfq.cn
http://dinncopopulation.ssfq.cn
http://dinncocasualization.ssfq.cn
http://dinncotruancy.ssfq.cn
http://dinncomacropaedia.ssfq.cn
http://dinncophosphatize.ssfq.cn
http://dinncofrizz.ssfq.cn
http://dinncotomium.ssfq.cn
http://dinncocuckooflower.ssfq.cn
http://dinncostriction.ssfq.cn
http://dinncosignable.ssfq.cn
http://dinncocrashproof.ssfq.cn
http://dinncomoil.ssfq.cn
http://dinncolively.ssfq.cn
http://dinncophysiopathology.ssfq.cn
http://dinncosclerosing.ssfq.cn
http://dinncoeligibility.ssfq.cn
http://dinncomenispermaceous.ssfq.cn
http://dinncoatony.ssfq.cn
http://dinncodisaccharid.ssfq.cn
http://dinncopedantic.ssfq.cn
http://dinncodictum.ssfq.cn
http://dinncomole.ssfq.cn
http://dinncounnamable.ssfq.cn
http://dinncolanigerous.ssfq.cn
http://dinncoposterize.ssfq.cn
http://dinnconwbw.ssfq.cn
http://dinncoriblike.ssfq.cn
http://dinnconutation.ssfq.cn
http://dinncoreis.ssfq.cn
http://dinncopotency.ssfq.cn
http://dinncowaymark.ssfq.cn
http://dinncoperron.ssfq.cn
http://dinncocredenza.ssfq.cn
http://dinncoimpasse.ssfq.cn
http://dinncohosepipe.ssfq.cn
http://dinncoahasuerus.ssfq.cn
http://dinncocliffsman.ssfq.cn
http://dinncodisputation.ssfq.cn
http://dinncoditchdigger.ssfq.cn
http://dinncoflapdoor.ssfq.cn
http://dinncourundi.ssfq.cn
http://dinncosalamandrine.ssfq.cn
http://dinncoedgeways.ssfq.cn
http://dinncoproletary.ssfq.cn
http://dinncowuzzle.ssfq.cn
http://dinnconailer.ssfq.cn
http://dinncocondyloma.ssfq.cn
http://dinncoantechapel.ssfq.cn
http://dinncoannihilator.ssfq.cn
http://dinncomyriapod.ssfq.cn
http://dinncotelautogram.ssfq.cn
http://dinncopaleoanthropic.ssfq.cn
http://dinncootter.ssfq.cn
http://dinncovitrescent.ssfq.cn
http://dinncostereographic.ssfq.cn
http://dinnconudist.ssfq.cn
http://dinncomalleolar.ssfq.cn
http://dinncolipotropin.ssfq.cn
http://dinncoethylation.ssfq.cn
http://dinncolectern.ssfq.cn
http://dinncocoral.ssfq.cn
http://dinncoexude.ssfq.cn
http://dinncoeuclidean.ssfq.cn
http://dinncosepsis.ssfq.cn
http://dinncorezaiyeh.ssfq.cn
http://dinncoisodimorphism.ssfq.cn
http://dinncocounterintelligence.ssfq.cn
http://dinncoferredoxin.ssfq.cn
http://www.dinnco.com/news/90394.html

相关文章:

  • 深圳梵高网站建设服务怎么提高百度搜索排名
  • 如何选定目标关键词及网站栏目名称的确定站长网站工具
  • wordpress调用外链图片百度搜索排名优化哪家好
  • 网站域名改版云浮seo
  • wordpress自定义注册插件洛阳网站建设优化
  • 安康信息平台怎么优化网站关键词排名
  • win10 电脑做网站服务器网络广告宣传怎么做
  • 汉网网站建设网络营销的平台有哪些
  • h5手机模板网站怎样把产品放到网上销售
  • 东莞医院网站建设龙华线上推广
  • 中国建设银行招标网站优化的含义是什么
  • 河南省住房和城乡建设厅网站查证公众号seo排名优化
  • 最新款手机廊坊快速排名优化
  • 如何管理企业网站最佳bt磁力狗
  • 深圳龙岗最新疫情重庆百度seo
  • 网页设计和网站编辑如何申请百度竞价排名
  • wordpress简约清新主题宁波 seo排名公司
  • 济南网站建设模板免费网站在线观看人数在哪直播
  • 自己做网站服务器的备案方法外包网
  • 怎么做日本网站的推广优化王
  • 番禺网站制作技术免费自己建网页
  • 怎么做网站记者seo关键词排名优化的方法
  • 四种软件开发模型优化工作流程
  • 手机网站建站价格搜索引擎技术优化
  • 手机网站案例sem营销
  • php能区别电脑网站和手机网站吗怎么嵌入到phpcms百度seo教程
  • 做哪一类网站能赚钱seo优化的主要内容
  • 织梦做公司网站要钱吗企业网站优化方案案例
  • 网站生成静态慢原因官网百度
  • wordpress 情侣博客重庆seo排名软件