-
Mixer 成本优化 a. 时延优化汇报文档准备
-
播放治理: Mixer→Ranking 请求旧协议,已推全,Ranking代码待清理
-
SilverTorch 调研: a. 优化 GPU 版本 b. PyTorch + Triton 版本开发 b. Benchmark 开发中
-
libtorch 原理
-
GPU 编译libtorch,并尝试跑通
一直有一个奇怪的编译错误: ABI问题,编译时没有启动 CXX11_ABI=1
NOTE
g:platform Action details (uncached result): http://ams-bazel-remote-browser.development.polaris:7984/hardlinking/blob s/sha256/historical_execute_response/61e0eb06a487498fcee8f8ac08567e74a587922d589c029c63ad93913f950f7e-940/ ld.lld: error: undefined symbol: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::s tring const&) >>> referenced by SymIntArrayRef.h:80 (external/libtorch/include/c10/core/SymIntArrayRef.h:80) >>> rank_by_diversity_op.o:(preranking::RunTorchModel()) in archive bazel-out/k8-opt/bin/ad_p latform/preranking/kernels/librank_by_diversity_op.lo >>> referenced by SymIntArrayRef.h:80 (external/libtorch/include/c10/core/SymIntArrayRef.h:80) >>> rank_by_diversity_op.o:(preranking::RankByDiversityOp::RunTorchDiversityRank(preranking:: DiversityRankContext*, tflow::OpKernelContext*)) in archive bazel-out/k8-opt/bin/ad_platform/preranking/ker nels/librank_by_diversity_op.lo >>> referenced by TensorBase.h:627 (external/libtorch/include/ATen/core/TensorBase.h:627) >>> rank_by_diversity_op.o:(preranking::RankByDiversityOp::RunTorchDiversityRank(preranking:: DiversityRankContext*, tflow::OpKernelContext*)) in archive bazel-out/k8-opt/bin/ad_platform/preranking/ker nels/librank_by_diversity_op.lo >>> referenced 8 more times ld.lld: error: undefined symbol: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned in t, char const*, std::string const&) >>> referenced by TensorDataContainer.h:307 (external/libtorch/include/torch/csrc/api/include/torch/detail/ TensorDataContainer.h:307) >>> rank_by_diversity_op.o:(torch::detail::TensorDataContainer::fill_tensor(at::Tensor&) cons t) in archive bazel-out/k8-opt/bin/ad_platform/preranking/kernels/librank_by_diversity_op.lo >>> referenced by TensorDataContainer.h:314 (external/libtorch/include/torch/csrc/api/include/torch/detail/ TensorDataContainer.h:314) >>> rank_by_diversity_op.o:(torch::detail::TensorDataContainer::fill_tensor(at::Tensor&) cons t) in archive bazel-out/k8-opt/bin/ad_platform/preranking/kernels/librank_by_diversity_op.lo ld.lld: error: undefined symbol: torch::jit::load(std::string const&, std::optional<c10::Device>, bool) >>> referenced by preranking_torch_model.cc:37 (ad_platform/preranking/pytorch_demo/preranking_torch_model. cc:37) >>> preranking_torch_model.o:(std::once_flag::_Prepare_execution::_Prepare_execution<void std ::call_once<ad_platform::preranking::TorchModelManager::Init(std::string const&, std::string const&)::'lamb da'()>(std::once_flag&, ad_platform::preranking::TorchModelManager::Init(std::string const&, std::string co nst&)::'lambda'()&&)::'lambda'()>(ad_platform::preranking::TorchModelManager::Init(std::string const&, std: :string const&)::'lambda'()&)::'lambda'()::_FUN()) in archive bazel-out/k8-opt/bin/ad_platform/preranking/p ytorch_demo/libpreranking_torch_model.a ld.lld: error: undefined symbol: c10::ListType::get(std::string const&, c10::Type::SingletonOrSharedTypePtr <c10::Type>) >>> referenced by jit_type.h:2002 (external/libtorch/include/ATen/core/jit_type.h:2002) >>> preranking_torch_model.o:(c10::Dict<std::string, c10::List<c10::IValue> >::Dict()) in arc hive bazel-out/k8-opt/bin/ad_platform/preranking/pytorch_demo/libpreranking_torch_model.a ld.lld: error: undefined symbol: torch::jit::Object::find_method(std::string const&) const >>> referenced by object.h:108 (external/libtorch/include/torch/csrc/jit/api/object.h:108) >>> preranking_torch_model.o:(torch::jit::Module::forward(std::vector<c10::IValue, std::alloc ator<c10::IValue> >, std::unordered_map<std::string, c10::IValue, std::hash<std::string>, std::equal_to<std ::string>, std::allocator<std::pair<std::string const, c10::IValue> > > const&)) in archive bazel-out/k8-op t/bin/ad_platform/preranking/pytorch_demo/libpreranking_torch_model.a ld.lld: error: undefined symbol: torch::jit::Method::operator()(std::vector<c10::IValue, std::allocator<c10 :string const&)::'lambda'()&)::'lambda'()::_FUN()) in archive bazel-out/k8-opt/bin/ad_platform/preranking/p ytorch_demo/libpreranking_torch_model.a ld.lld: error: undefined symbol: c10::ListType::get(std::string const&, c10::Type::SingletonOrSharedTypePtr <c10::Type>) >>> referenced by jit_type.h:2002 (external/libtorch/include/ATen/core/jit_type.h:2002) >>> preranking_torch_model.o:(c10::Dict<std::string, c10::List<c10::IValue> >::Dict()) in arc hive bazel-out/k8-opt/bin/ad_platform/preranking/pytorch_demo/libpreranking_torch_model.a ld.lld: error: undefined symbol: torch::jit::Object::find_method(std::string const&) const >>> referenced by object.h:108 (external/libtorch/include/torch/csrc/jit/api/object.h:108) >>> preranking_torch_model.o:(torch::jit::Module::forward(std::vector<c10::IValue, std::alloc ator<c10::IValue> >, std::unordered_map<std::string, c10::IValue, std::hash<std::string>, std::equal_to<std ::string>, std::allocator<std::pair<std::string const, c10::IValue> > > const&)) in archive bazel-out/k8-op t/bin/ad_platform/preranking/pytorch_demo/libpreranking_torch_model.a ld.lld: error: undefined symbol: torch::jit::Method::operator()(std::vector<c10::IValue, std::allocator<c10 :string const&)::'lambda'()&)::'lambda'()::_FUN()) in archive bazel-out/k8-opt/bin/ad_platform/preranking/p ytorch_demo/libpreranking_torch_model.a ld.lld: error: undefined symbol: c10::ListType::get(std::string const&, c10::Type::SingletonOrSharedTypePtr <c10::Type>) >>> referenced by jit_type.h:2002 (external/libtorch/include/ATen/core/jit_type.h:2002) >>> preranking_torch_model.o:(c10::Dict<std::string, c10::List<c10::IValue> >::Dict()) in arc hive bazel-out/k8-opt/bin/ad_platform/preranking/pytorch_demo/libpreranking_torch_model.a ld.lld: error: undefined symbol: torch::jit::Object::find_method(std::string const&) const >>> referenced by object.h:108 (external/libtorch/include/torch/csrc/jit/api/object.h:108) >>> preranking_torch_model.o:(torch::jit::Module::forward(std::vector<c10::IValue, std::alloc ator<c10::IValue> >, std::unordered_map<std::string, c10::IValue, std::hash<std::string>, std::equal_to<std ::string>, std::allocator<std::pair<std::string const, c10::IValue> > > const&)) in archive bazel-out/k8-op t/bin/ad_platform/preranking/pytorch_demo/libpreranking_torch_model.a ld.lld: error: undefined symbol: torch::jit::Method::operator()(std::vector<c10::IValue, std::allocator<c10 ::IValue> >, std::unordered_map<std::string, c10::IValue, std::hash<std::string>, std::equal_to<std::string >, std::allocator<std::pair<std::string const, c10::IValue> > > const&) const >>> referenced by module.h:116 (external/libtorch/include/torch/csrc/jit/api/module.h:116) >>> preranking_torch_model.o:(torch::jit::Module::forward(std::vector<c10::IValue, std::alloc ator<c10::IValue> >, std::unordered_map<std::string, c10::IValue, std::hash<std::string>, std::equal_to<std ::string>, std::allocator<std::pair<std::string const, c10::IValue> > > const&)) in archive bazel-out/k8-op t/bin/ad_platform/preranking/pytorch_demo/libpreranking_torch_model.a collect2: error: ld returned 1 exit status Target //ad_platform/preranking:preranking_package failed to build INFO: Elapsed time: 42.670s, Critical Path: 7.57s INFO: 3 processes: 1 remote cache hit, 2 internal. ERROR: Build did NOT complete successfully INFO: Streaming build results to: http://ams-bazel-remote-bes.development.polaris:8088/invocation/02fcfb72- b5ae-4662-822e-e75e01066924
- Benchmark(torch 对比 C++)
202601091007-开发机安装 codebuddy 202601091006-在开发机中安装 opencode
-
Mixer 时延优化进展汇报
- GPR 仿真
- mp gprsim 支持访问独立的token server 20260102-判断一个bin、so 是否为 CXX11 ABI 20260102-重新编译 libtorch