imtoken2.0官网下载地址|lico
力扣 (LeetCode) 全球极客挚爱的技术成长平台
LeetCode) 全球极客挚爱的技术成长平台学习题库竞赛讨论求职商店 推荐算法数据结构题目交流职业发展竞赛前端后端THU今FRI08SAT09SUN10MON11TUE12WED13每日 1 题零点揭晓关于我们·企业服务· 商务咨询·使用条款·隐私政策问题反馈·侵权投诉·证照中心沪ICP备18019787号-20·沪ICP证B2-20180578沪公网安备31010702007420号下载 App©2023 领扣网络(上海)有产品简介 — LiCO Document 5.1.0 documentation
产品简介 — LiCO Document 5.1.0 documentation
5.1.0
前言
产品简介
运行环境
安装手册
前提和假设
特别说明
部署集群环境
安装操作系统
为管理节点安装OS
部署集群其它节点OS
配置环境变量
获取本地源
安装Lenovo xcat
为其他节点准备OS系统
设置xcat节点信息
添加hosts解析
配置DHCP及DNS服务
通过网络为节点安装操作系统
检查点A
安装集群基础软件
基础软件列表
为管理节点设置本地源
为计算及登录节点配置本地源
配置LiCO依赖源
安装slurm
配置nfs
配置ntp
安装cuda和cudnn
配置slurm
安装ganglia
安装mpi
安装singularity
检查点B
安装其它组件
组件列表
安裝rabbitmq
安裝postgresql
安装influxdb
安装confluent
配置用户认证
安装openldap-server
安装libuser
安装openldap-client
安装nss-pam-ldapd
安装Gmond GPU插件
安装LiCO
组件列表
获取安装包
配置本地源
安装节点
安装管理节点
安装登陆节点
安装计算节点
配置LiCO
初始化系统
初始化用户
导入系统镜像
启动
配置手册
配置服务账户
配置集群节点
机房信息
逻辑组信息
机房行信息
机架信息
刀箱信息
节点信息
配置LiCO服务
基础配置
数据库配置
登陆配置
存储配置
调度器配置
告警配置
集群配置
功能配置
配置LiCO组件
lico-vnc-mond
lico-env
lico-portal
lico-ganglia-mond
lico-confluent-proxy
lico-confluent-mond
lico-wechat-agent
HOWTOs
如何快速安装LiCO
如何安装Infiniband驱动
安装IB网卡
安装OPA网卡
如何配置vnc
如何配置confluent
如何手动创建Influxdb数据库
如何解决slurm常见问题
如何升级操作系统
附录
推荐配置
ganglia
管理节点
其他节点
slurm
slurm.conf
gres.conf
常用命令
设置ldap管理员密码
修改用户角色
恢复用户
导入用户
安全改进
绑定设置
防火墙设置
创建和导入系统镜像
创建镜像
导入系统镜像
集群服务汇总
物理视图列表
刀箱型号列表
系统产品列表
升级LiCO
升级步骤
卸载lico-core
删除配置文件
升级集群基础软件
升级LiCO组件
配置LiCO服务
初始化LiCO和用户
启动LiCO
LiCO Document
»
产品简介
Previous
Next
产品简介
联想智能超算平台( Lenovo Intelligent Compute Orchestration, 下称 LiCO )
是 HPC/AI 基础管理软件,其功能包括计算机集群管理,集群监控,作业调度管理,集群用户管理,
账户管理,文件系统管理等。通过联想智能超算平台可以实现在超算集群中,统一资源调度,
同时支持 HPC 作业和 AI 作业运行。
本管理软件支持通过浏览器轻松登陆到管理界面进行操作,同时也可以通过其他终端工具登陆到集群登录节点进行命令行操作。
运行环境
服务器
操作系统
RedHat 7.4 / CentOS 7.4
SLES 12 SP3
客户端
浏览器:推荐 Chrome >= 62.0 或 Firefox >= 56.0
显示分辨率:推荐 1280*800
Previous
Next
© Copyright 2018, Lenovo.
在LiCO中实现模型训练_lico专家模式-CSDN博客
>在LiCO中实现模型训练_lico专家模式-CSDN博客
在LiCO中实现模型训练
最新推荐文章于 2023-08-24 22:09:18 发布
DTTRA
最新推荐文章于 2023-08-24 22:09:18 发布
阅读量1.1k
收藏
14
点赞数
3
文章标签:
docker
深度学习
tensorflow
singularity
云服务器
版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/DTTRA/article/details/118761610
版权
为了获得更加稳健的深度学习训练模型,一些庞大的训练在个人电脑上无法实现,本文主要以在LiCO中实现tensorflow训练模型为例,为读者提供参考,若文档中存在错误,请读者提出宝贵的意见。
文章目录
一、在ubuntu中下载docker 二、利用docker下载镜像 三、构造自己的容器 四、修改容器名,生成镜像 五、发布自定义镜像 六、在LiCO中下载镜像 七、实现深度学习模型训练
提示:以下是本篇文章正文内容,下面案例可供参考
一、在ubuntu中下载docker :https://blog.csdn.net/qq_40663357/article/details/83307338
一、下载
1、 安装docker的apt源
apt-get install apt-transport-https ca-certificates curl software-properties-common
2 、添加docker官方的GPG
curl -s https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
3 、添加docker的源。如果没有docker.list则自己创建一个
cd /etc/apt/sources.list.d
vim docker.list
清空原文件,加入以下内容
deb https://get.docker.io/ubuntu docker main
二、安装
1、 安装docker
apt install docker.io
2、 查看docker的版本:
docker version
二、利用docker下载镜像
1.查看当前docker镜像
sudo docker images
显示所有存在的镜像
2.下载镜像
进入Docker Hub网页:https://hub.docker.com/
在搜索框中输出自己需要的基础镜像,例如:tensorflow
点击右下角Docker Pull Command中命令的复制,将其复制到ubuntu下的命令行中;也可以通过点击Tag在其中寻找自己想要的不同版本。
下载完成后查看当前存在的镜像信息
docker images
此时发现tensorflow/tensorflow:latest镜像就下载好了
三、构造自己的容器
1.进入tensorflow/tensorflow镜像
docker run -it tensorflow/tensorflow:latest /bin/bash
注:记住5e7ae5f8288d这个容器名称
2.在原始容器中添加自己的模块,例如:在容器中安装opencv-python
pip3 install opencv-python
此时5e7ae5f8288d容器中就加入了我们需要的模块
四、修改容器名,生成镜像
1.显示所有正在运行的容器
docker ps -a
可以发现 5e7ae5f8288d容器就是自定义模块的tensorflow/tensorflow容器
2.要将自定义的容器保存为镜像
docker commit 5e7ae5f8288d 1160966815/tensorflow1:latest
其中5e7ae5f8288d为容器名,1160966815/tensorflow1:latest为自定义的名称
注:自定义的名称要按照Docker Hub自己的账号名称对应,我的账号名称是1160966815,因此设置为1160966815/tensorflow1:latest(之后会讲到为什么这样操作)。
五、发布自定义镜像:https://www.cnblogs.com/fanqisoft/p/11315392.html
⒈在Docker Hub(Docker官方镜像仓库)注册自己的用户名。
⒉在Docker中使用docker login命令登录自己的用户名密码。
(首次登陆需要自己的用户名密码)
⒊使用docker push 镜像名称 上传本地镜像到镜像仓库
docker push 镜像名称
例如:
docker push 1160966815/tensorflow1
此时在Docker Hub中就能够找到自己发布的镜像了:
六、在LiCO中下载镜像
LiCO服务器登陆网址
登陆LiCO之后进入专家模式:
在专家模式命令行中下载之前发布的tensorflow镜像:
singularity build tensorflow.sif docker://1160966815/tensorflow1
(其中tensorflow.sif为自命名文件)
此时镜像已下载好,可以在文件管理中查看
tensorflow.sif文件即为刚下载的镜像文件
七、实现深度学习模型训练
1.上传训练文件
进入文件管理,点击右键,Upload files进行本地文件上传。
2.开始训练
进入提交作业,选择General-Common Job——点击使用——自定义作业名称——选择自己的工作目录——填写运行脚本
运行脚本:(第一个路径为镜像所在的路径,第二个路径为自己的运行代码所在的路径,python代表的意思是:运行的文件为py文件)
根据自己的需求选择资源选项,点击提交,即可开始训练。
参考链接
在ubuntu中下载docker :https://blog.csdn.net/qq_40663357/article/details/83307338 进入Docker Hub网页:https://hub.docker.com/ 发布自定义镜像:https://www.cnblogs.com/fanqisoft/p/11315392.html
优惠劵
DTTRA
关注
关注
3
点赞
踩
14
收藏
觉得还不错?
一键收藏
知道了
5
评论
在LiCO中实现模型训练
为了获得更加稳健的深度学习训练模型,一些庞大的训练在个人电脑上无法实现,本文主要以在LiCO中实现tensorflow2.0-gpu训练模型为例,为读者提供参考,若文档存在问题,请读者提出宝贵的意见。提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档文章目录 一、在ubuntu中下载docker 二、利用docker下载镜像 三、构造自己的容器 四、修改容器名,生成镜像 五、发布自定义镜像 六、在LiCO中下载镜像 七、singul...
复制链接
扫一扫
LiCo1/3Ni1/3Mn1/3O2表面包覆TiO2的研究
02-04
LiCo1/3Ni1/3Mn1/3O2表面包覆TiO2的研究,陈元端,韦妮旎,本文采用溶胶凝胶法制备锂钴镍锰氧化物前驱物,于空气气氛中800oC下烧结20h得LiCo1/3Ni1/3Mn1/3O2产物。将LiCo1/3Ni1/3Mn1/3O2粉末加入钛酸丁酯�
lico-bash-collection:Linux Counter Project的Bash集合
05-04
Bash系列
Linux Counter专案
这是Linux Counter Project的bash集合的存储库。 您可以在此处找到Linux Counter Blog上Bash Collection类别中提到的所有bash脚本: :
它们如下:
| 用于从大文件中切割零件的脚本
| 脚本来获取应用程序实际使用的内存量
| 如何递归替换源文件的文件头
| 服务器运行状况监控器,用于快速获取服务器状态信息
。
5 条评论
您还未登录,请先
登录
后发表或查看评论
联想智能超算平台LICO用户手册
06-15
联想智能超算平台(Lenovo intelligent Computing Orchestration以下简称LiCO)是联想基于超性能计算(HPC)集群的一站式解决方案,其功能包括计算机集群管理,集群监控,作业调度管理,集群用户管理,账户管理,文件系统管理等。
LiCO5.1.0_user_guide_v1.0_chinese.pdf
09-27
联想智能超算平台用户手册 v5.1.0
联想智能超算平台LiCO安装手册.docx
10-17
联想智能超算平台LiCO安装手册.docx
联想集群超算LICO初次使用踩坑说明(遇到的错误,一些使用方法和singularity新建容器方法)
qq1406433326
10-24
5685
联想集群超算LICO初次使用踩坑说明(遇到的错误,一些使用方法和singularity新建容器方法)
说明
本文主要写一些我这几天初次使用LICO的过程中遇到的一点问题和解决的办法,还有一些模块的规范使用方式,和一点自己的小经验,以后如果在使用过程中踩到其他的坑和一些模块的使用方式会慢慢进行更新。。希望对你有所帮助。
环境:LICO5.5
主要参考内容:《LiCO 5.5.0用户指南》
帮助:可以直接ctrl+F开启浏览器的查找关键字,来快速定位到想要的内容
目录遇到的错误(重要)Lenovo Accel.
LICO 6.4.0 配置自己代码的运行的docker
weixin_46005626的博客
08-15
258
例如:Pytorch + cuda +ubuntu的镜像集合 https://github.com/cnstark/pytorch-docker。例如你找到是docker pull cnstark/pytorch:2.0.1-py3.9.17-cuda11.8.0-ubuntu20.04,在cnstark/pytorch:2.0.1-py3.9.17-cuda11.8.0-ubuntu20.04前面加上docker://就行了。在终端中,先cd到YOUR_NAME.sif的目录下,例如:安装一些依赖包。
使用联想计算节点的方法整理
qq_44462766的博客
05-18
1004
本文主要介绍LICO计算节点的使用方法,如何快捷的将电脑中的虚拟环境和代码文件使用计算结点进行计算。
联想机架式服务器安装文档,联想智能超算平台LiCO安装手册.docx
weixin_36138385的博客
08-13
1068
联想智能超算平台LiCO快速安装手册目 录TOC \z \o "1-3" \u \h HYPERLINK \l "_Toc5903704" 1.下载安装包 PAGEREF _Toc5903704 \h 3HYPERLINK \l "_Toc5903705" 2.如果集群已经存在,请参考附录2 PAGEREF _Toc5903705 \h 3HYPERLINK \l "_Toc5903706" 3....
联想Lico+singularity容器制作+anaconda3安装教程
u014687517的博客
06-11
3521
最近在使用联想Lico框架下的高性能计算平台
学校以后要扩建这个平台,现在就7个P4和两个P100
未来有更多显卡会加入(手动微笑)
这个平台不像自己电脑那样有可视化界面
为了方便管理和兼容不同环境下的训练需求
采用容器的方式规避不同框架和需求的包之间的冲突
目前Lico只支持singularity 2版本制作的镜像
这篇博文是用来引导如何安装使用singularity 2版本来制作私有镜像
s...
[游戏]求生之路专家级单通脚本
紫冰核心
01-21
1894
mp_gamemode coop
sv_cheats "1"
sb_takecontrol zoey
nb_delete_all
cl_crosshair_dynamic "0"
z_Difficulty Impossible
versus_tank_chance_intro "1"
versus_tank_chance_finale "1"
versus_tank_chance
【计算机毕业设计】179大学生创新创业平台项目管理子系统
最新发布
卓怡学长的博客
08-24
173
互联网发展至今,无论是其理论还是技术都已经成熟,而且它广泛参与在社会中的方方面面。它让信息都可以通过网络传播,搭配信息管理工具可以很好地为人们提供服务。针对大学生创新创业项目信息管理混乱,出错率高,信息安全性差,劳动强度大,费时费力等问题,采用大学生创新创业平台项目管理子系统可以有效管理,使信息管理能够更加科学和规范。
大学生创新创业平台项目管理子系统在Eclipse环境中,使用Java语言进行编码,使用Mysql创建数据表保存本系统产生的数据。系统可以提供信息显示和相应服务,其管理员管理学生发布的项目,
【计算机毕业设计】245科研项目验收管理系统
卓怡学长的博客
07-07
45
使用旧方法对科研项目信息进行系统化管理已经不再让人们信赖了,把现在的网络信息技术运用在科研项目信息的管理上面可以解决许多信息管理上面的难题,比如处理数据时间很长,数据存在错误不能及时纠正等问题。这次开发的科研项目验收管理系统对景点城市信息,科研项目信息,评论信息,自助资讯信息等进行集中化处理。经过前面自己查阅的网络知识,加上自己在学校课堂上学习的知识,决定开发系统选择B/S模式这种高效率的模式完成系统功能开发。这种模式让操作员基于浏览器的方式进行网站访问,采用的主流的Java语言这种面向对象的语言进行科研项
小白爬虫入门——装软件
Lico_pyhon的博客
08-06
2966
开工了,先装个Python吧!应该是不费吹灰之力的,然鹅!我真是Too young too sample!折腾了一上午还没弄妥的我,写出我的血泪教训!
首先,不去官网下安装包。因为我只是个会‘下载-打开-next-完成’的软件小白,别跟我说path路径,版本,安装包,环境变量,配置目录…….@#¥%……&^%&*,大哥们,我真的是不懂啊,臣妾做不到啊!最后救命的是Anaconda...
linux io编程例程,linux c编程:标准IO库
weixin_29829343的博客
05-14
102
前面介绍对文件进行操作的时候,使用的是open,read,write函数。这一章将要介绍基于流的文件操作方法:fopen,fread,fwrite。这两种方式的区别是什么呢。1种是缓冲文件系统,一种是非缓冲文件系统缓冲文件系统就是采用fopen,fread,fwrite,fgetc,fputc,fputs等函数进行操作。缓冲文件系统的特点是:在内存开辟一个“缓冲区”,为程序中的每一个文件使用;当执...
oracle第一次使用语句创建作业失败记
bcbobo21cn的专栏
09-01
924
先查询作业相关视图和参数;
查看调度相关参数和视图:
查询作业;
查询程序计划;
下面参照网上资料,创建一个作业,在emp表中插入数据,运行100次后终止;
不知哪错了,没创建成功;
在sql developer中运行前面语句,结果如下;
启动作业;
C语言编程联练习 烤烧饼
Lelico_Mo的博客
10-18
757
C语言入门编程记录1
描述
烧饼有两面,要做好一个兰州烧饼,要两面都弄热。当然,一次只能弄一个的话,效率就太低了。有这么一个大平底锅,一次可以同时放入k个兰州烧饼,一分钟能做好一面。而现在有n个兰州烧饼,至少需要多少分钟才能全部做好呢?
输入
依次输入n和k,中间以空格分隔,其中1 <= k,n <= 100000
输出
输出全部做好至少需要的分钟数
样例输入
3 2
样例输出
3
...
python中iloc、loc的使用
lili_wuwu的博客
06-30
1415
python中iloc、loc的使用
联想 lico5,5使用手册
07-17
### 回答1:
联想lico5.5使用手册是一份详细的指南,旨在帮助用户熟悉该设备的功能和操作方法。手册提供了关于设备的基本信息,包括硬件规格、屏幕大小、处理器类型等。
手册首先介绍了设备的外观和各个部件的功能,如电源键、音量控制键等。然后,它详细解释了如何打开和关闭设备,以及如何连接到无线网络。此外,手册还提供了一些有关如何保护设备安全的提示,例如设置屏幕锁定密码和启用指纹识别功能。
手册的进一步内容涵盖了设备的各种功能和应用程序的使用方法。例如,它解释了如何使用设备上的摄像头拍照和录制视频,以及如何编辑和共享这些媒体文件。此外,手册还讲解了如何使用设备上的浏览器进行网页浏览,以及如何下载和安装应用程序。
手册还提供了有关设备设置和个性化选项的说明。例如,它解释了如何更改设备的壁纸和主题,以及如何调整各种设置,如通知和声音设置、语言和输入法设置等。
最后,手册还包含了一些故障排除和常见问题解答。它指导用户如何解决可能遇到的一些常见问题,如应用程序崩溃、无法连接到无线网络等。
总之,联想lico5.5使用手册为用户提供了全面的指导,帮助他们更好地了解和使用该设备。无论是初次使用还是已经熟悉该设备,用户都可以在手册中找到有关各种功能和操作的详细说明。
### 回答2:
联想 lico5,5 是一款智能手机,它提供了详细的使用手册来帮助用户更好地了解和操作手机。以下是有关联想 lico5,5使用手册的回答:
1. 开始使用:使用手册首先介绍了联想 lico5,5的外观和基本操作,包括如何打开手机、插入SIM卡和扩展存储卡,以及如何充电和启动手机。
2. 主要功能:手册介绍了联想 lico5,5的主要功能,包括拨打电话、发送短信、浏览互联网、拍照、录制视频等。手册详细介绍了每个功能的操作步骤和注意事项。
3. 设置和调整:手册提供了关于手机设置和调整的详细说明,包括如何调整屏幕亮度、音量和振动设置,如何连接Wi-Fi和蓝牙设备,以及如何设置锁屏密码和指纹识别等。
4. 应用程序和功能扩展:手册介绍了联想 lico5,5的内置应用程序,如短信、电话簿、相册等,并提供了如何下载和安装其他应用程序的说明。手册还介绍了如何使用手机的GPS导航功能和支付功能等。
5. 常见问题和故障排除:在手册的最后部分,提供了一些常见问题和故障排除的解决方法,如手机无法开机、无法连接互联网等。这些解决方案可以帮助用户快速解决一些常见问题。
总的来说,联想 lico5,5的使用手册提供了全面而详细的指导,帮助用户了解和操作手机的各种功能。它是用户使用这款手机时的重要参考资料,可以帮助用户充分发挥手机的功能,并解决遇到的问题。用户只需仔细阅读手册,按照手册提供的步骤进行操作,就能轻松地使用联想 lico5,5手机。
### 回答3:
联想Lico 5.5是一款智能手机,使用手册提供了用户在使用该手机时的详细指导和说明。下面是对该使用手册的回答。
联想Lico 5.5使用手册对手机的各个方面进行了全面介绍。首先,手册中包含了关于手机的基本信息,例如尺寸、重量、屏幕大小等,这有助于用户了解手机的外观和规格。此外,手册详细介绍了手机的主要功能,如打电话、发送短信、拍照等。对于新手用户来说,这些介绍非常有用,可以帮助他们快速了解和掌握手机的基本操作。
除了基本功能,手册还介绍了手机的高级功能和设置选项。例如,手册中可能会详细介绍如何使用手机上的指纹识别功能、如何设置手机上的各种通知和提醒、如何连接和使用蓝牙设备等。这对于想要更深入了解并充分利用手机功能的用户来说非常有帮助。
此外,手册可能还包含了一些实用技巧和贴士,以帮助用户更好地使用手机。它可以包括如何省电、如何管理手机中的应用程序、如何优化手机的性能等。这些技巧可以帮助用户更好地使用手机,提高其使用体验。
总之,联想Lico 5.5使用手册是用户了解、掌握和充分利用手机功能的重要工具。它提供了对手机各个方面的详细介绍和说明,能够满足用户对手机使用的各种需求。无论是新手还是有经验的用户,使用手册都能为他们提供便利和指导,帮助他们更好地使用联想Lico 5.5智能手机。
“相关推荐”对你有帮助么?
非常没帮助
没帮助
一般
有帮助
非常有帮助
提交
DTTRA
CSDN认证博客专家
CSDN认证企业博客
码龄4年
暂无认证
2
原创
116万+
周排名
104万+
总排名
2810
访问
等级
48
积分
7
粉丝
9
获赞
10
评论
44
收藏
私信
关注
热门文章
tensorflow2.0-keras-LaneNet车道线检测模型复现细节总结
1706
在LiCO中实现模型训练
1100
分类专栏
论文复现
1篇
最新评论
在LiCO中实现模型训练
一只青橘子:
你好请问你的问题解决掉了嘛,我也是这样
在LiCO中实现模型训练
Monkey Me:
你好 打扰一下 我这个在写入.sif文件的时候会报错 While performing build: While searching for mksquashfs: exec: "mksquashfs": executable file not found in $PATH 是因为我缺少 mksquashfs嘛? 应该联系管理员嘛?
tensorflow2.0-keras-LaneNet车道线检测模型复现细节总结
wuxing4550:
博主,如何获取图像中每个车道的label且获取每个车道的坐标
tensorflow2.0-keras-LaneNet车道线检测模型复现细节总结
江图:
博主您好,请问可以共享一份您的完整项目源码吗?(1835656118@qq.com),有偿
tensorflow2.0-keras-LaneNet车道线检测模型复现细节总结
MB_li:
博主你好 我复现了代码用了自己的数据集进行了训练,二值化和实例化效果还可以 但在真实图片的显示就只有一条线 是这么回事呀
您愿意向朋友推荐“博客详情页”吗?
强烈不推荐
不推荐
一般般
推荐
强烈推荐
提交
最新文章
tensorflow2.0-keras-LaneNet车道线检测模型复现细节总结
2021年2篇
目录
目录
分类专栏
论文复现
1篇
目录
评论 5
被折叠的 条评论
为什么被折叠?
到【灌水乐园】发言
查看更多评论
添加红包
祝福语
请填写红包祝福语或标题
红包数量
个
红包个数最小为10个
红包总金额
元
红包金额最低5元
余额支付
当前余额3.43元
前往充值 >
需支付:10.00元
取消
确定
下一步
知道了
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝
规则
hope_wisdom 发出的红包
实付元
使用余额支付
点击重新获取
扫码支付
钱包余额
0
抵扣说明:
1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。 2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。
余额充值
一文了解加拿大最低收入标准 LICO - 知乎
一文了解加拿大最低收入标准 LICO - 知乎切换模式写文章登录/注册一文了解加拿大最低收入标准 LICO飞出国已认证账号在加拿大移民过程中,很多项目都会需要申请人提供充足的资金证明,证明作为新移民有足够的能力独立在加拿大生活,或者作为担保人支撑亲人的生活。无法满足资金要求,也是作为inadmissible一项,被禁止进入加拿大的。什么项目需要提供资金证明?提供多少资金?而此时,LICO就是要参考的标准。什么是LICO?LICO全称是Low Income Cut-Off低收入门槛,是一个基于家庭人口数而定的收入数额,加拿大政府每年会根据通货膨胀更新其标准,低于标准的家庭会将更多收入用于衣、食、住这些必需品上。低于该标准,也可以说you are poor。申请人需要证明financial admissibility有充足资金独立定居加拿大,在加拿大移民申请中,移民官可能会综合申请人工作经验,教育,定居资金,技能等因素验证是否符合移民要求,定居资金则会根据LICO来评估是否充足。哪些常见项目需要满足LICO?在经济类移民项目中计算人口包括申请人本人,配偶或同居伴侣,及未成年子女。既包括随行成员,也包括不随行家庭成员。联邦技术移民类别Federal Skilled Worker Program;联邦技工类别 Federal Trade Worker Program;联邦创业移民SUV一般要求申请人登陆后有足够的资金在加拿大找工作等,通常需要申请人证明手头能有支持半年生活的资金,因此需要根据家庭规模大小,提供对应至少6个月的LICO。大西洋移民项目AIP则仅需要提供12.5%LICO的资金证明。加拿大经验类别CEC,和申请人持有加拿大valid job offer和工作签证的移民项目,已经证明了经济的独立能力,因此,不强制要求提供存款证明。在家庭移民类别中家庭人口计算需要包括A、B、C三大部分A 、担保方及家庭成员担保人;担保人配偶或同居伴侣(或处于分居状态);双方受抚养子女(22周岁以下未婚);(即便担保人没有监护权,或不提供子女抚养费,也必须将符合受抚养子女定义的子女计算在内,这样是为了防止在申请过程中可能的监护权变化影响最低收入要求。)B、被担保方及家庭成员被担保人;被担保人配偶或同居伴侣(即便分居,仍需要计算为人数,除非配偶同时与他人处于同居关系);双方受抚养子女(22周岁以下未婚);(即便家庭成员不随行移民加拿大,或者即便家庭成员是加拿大永久居民或公民,也需要计算在家庭人数中。)C、资金担保期内成员及其家庭成员担保人或者作为共同担保人曾经担保过还在资金担保期内的成员及其家庭成员。父母/祖父母团聚移民在父母/祖父母团聚移民中,作为担保人的子女需要证明在过去3年税前收入满足LICO+30%。受疫情影响,原本可以达到团聚条件的家庭因为疫情收入下降,移民局也贴心的调整了标准,对于2020年收入,可以按照LICO低收入门槛计算,而不必再上浮30%,而且政府发放的失业保险,CERB加拿大紧急福利金,以及其他疫情期间补助都可以算成收入的一部分。配偶或同居伴侣也可以作为共同担保人,这种情况担保资金要求可以是两人收入之和。收入参考标准可以参看NOA的Line 15000.Sponsorship requirements for parents and grandparents are designed to ensure that sponsors provide evidence of their ability to provide long-term financial support. For sponsorship applications received by IRCC on or after January 1, 2014, sponsors and co-signers must meet or exceed the MNI of the LICO, plus 30% for each of the 3 taxation years immediately preceding the date of the application for which the CRA issues an NOA or Option C printout. Undertakings for parents and grandparents remain in effect for 20 years after the sponsored parent or grandparent (and any accompanying dependants) becomes a permanent resident.父母/祖父母超级签证作为父母/祖父母团聚移民的替代方案,超级签证有效期10年,单次停留不超过5年,可延期2年。作为邀请人的子女,年收入需要满足12个月的LICO的要求。proof of financial support by the child or grandchild. This means that the child or grandchild who invites the applicant must prove that their household meets the low income cut-off (LICO) . The Canadian or permanent resident spouse or common-law partner of the host child or grandchild may co-sign the invitation letter to meet the LICO minimum. The following documents are examples of what the child or grandchild, including the co-signer, can use as proof of financial support:notice of assessment (NOA) for the most recent tax yearT4 or T1 for the most recent tax yearpay stubsEmployment Insurance benefit statement, including:a letter from an accountant confirming the child’s or grandchild’s annual income, if they are self-employed, and/orproof of other sources of income (for example, pension statement, investments)an original letter from the employer statingjob titlejob descriptionsalarybank statements其他家庭成员团聚移民除了担保配偶,同居伴侣和子女以外的家庭成员团聚,均需要提供12个月或以上的 LICO证明。发布于 2022-08-17 15:11加拿大收入赞同添加评论分享喜欢收藏申请
Lico.live域名发布页面
Lico.live域名发布页面
主页
用户中心
更多
本站域名Lico云加速(www.licojiasu.org)
备用域名1Lico云加速(www.licojiasu1.org)
备用域名2Lico云加速(www.licojiasu2.org)
备用域名3Lico云加速(www.licojiasu3.org)
备用域名将会以此类推
心情惬意,来杯咖啡吧 ☕
登录
注册
Close
主页
用户中心
使用条款
Lico云加速
穿过长城,我们可以到达任何地方。了解更多 ›
Lico云加速客户端
感谢有你,一路相伴
Android
Lico云加速客户端新升级畅享极致体验
立即下载 ›
Android
Lico云加速新升级畅享极致体验
立即下载 ›
MacOS
心意从这一杯开始
立即下载 ›
Windows
Lico云加速用心制作
立即下载 ›
主页
用户中心
更多
Lenovo Intelligent Computing Orchestration (LiCO) Product Guide > Lenovo Press
Lenovo Intelligent Computing Orchestration (LiCO) Product Guide > Lenovo Press
PC & TABLETS
PHONES
SERVERS & STORAGE
SMART DEVICES
SERVICES & SOLUTIONS
SUPPORT
skip to
main content
Lenovo Press
Lenovo Press
HomeServersPortfolio Guide3D Tour CatalogThinkSystem V3SR950 V3SR860 V3SR850 V3SR685a V3SR680a V3SR675 V3SR665 V3SR655 V3SR650 V3SR645 V3SR635 V3SR630 V3SR250 V3HS350X V3ST650 V3ST250 V3SD665-N V3SD665 V3SD650-N V3SD650-I V3SD650 V3SD550 V3SD530 V3WR5220 G3ThinkSystem V2SR860 V2SR850 V2SR670 V2SR660 V2SR650 V2SR630 V2SR590 V2SR250 V2ST650 V2ST250 V2ST50 V2SD650-N V2SD650 V2SD630 V2SN550 V2ThinkSystemSR950SR860SR850P SR850SR670SR665SR655SR650SR645SR635SR630SR590SR570SR550SR530SR250SR150SD650SD530ST550ST250ST50SN850SN550ThinkEdgeSE455 V3SE450SE360 V2SE350 V2SE350ThinkAgileHX Series for NutanixMX Series for MicrosoftSX for MicrosoftVX Series for VMwareWenTian (联想问天)Mission CriticalHyperconvergedRack Servers1-Socket2-Socket4-Socket8-SocketTower Servers1-Socket2-SocketEdge ServersMulti-Node ServersSupercomputingBlade ServersServersChassisExpansion UnitsNetwork ModulesStorage ModulesNetwork AdaptersStorage AdaptersOptionsProcessorsMemoryCoprocessorsGPU adaptersDrivesRAID AdaptersEthernet AdaptersInfiniBand / OPA AdaptersHost Bus AdaptersPCIe Flash AdaptersExternal StorageBackup UnitsTop-of-Rack SwitchesUPS UnitsPower Distribution UnitsRack CabinetsKVM Switches & ConsolesBenchmarksSAP BWSAP SDSPC-1SPECaccelSPECcpuSPEChpcSPECjbbSPECmpiSPECompSPECpowerSPECvirtSTAC-M3TPC-ETPC-HTPCxVMmarkStoragePortfolio Guide 3D Tour CatalogThinkSystemHS350X V3DE SeriesDG SeriesDM SeriesDB SeriesDS SeriesSAN StorageSoftware-Defined StorageDirect-Attached StorageSAN SwitchesRackEmbeddedTape Backup UnitsTape DrivesTape Autoloaders and LibrariesBenchmarksNetworkingTop-of-Rack Connectivity1 Gb Ethernet10 Gb Ethernet25 Gb Ethernet40 Gb Ethernet100 Gb EthernetCNOSEmbedded Connectivity1 Gb Ethernet10 Gb Ethernet25 Gb Ethernet40 Gb EthernetInfiniBandCampus NetworkingSolutions & SoftwareAlliancesIBMMicrosoftNutanixRed HatSAPVMwareArtificial IntelligenceBig Data & AnalyticsClouderaHortonworksIBMMicrosoft Data Warehouse Fast TrackBlockchainBusiness ApplicationsMicrosoft ApplicationsSAP Business SuiteClient VirtualizationCitrix Virtual AppsVMware HorizonCloud & VirtualizationCloud StorageMSP SolutionsMicrosoft Hyper-VOpenStack CloudVMware vCloudVMware vSphereDatabaseOracleMicrosoft SQL ServerIBM Db2SAP HANASAP NetWeaver BWADevOpsEdge and IoTHigh Performance ComputingHyperconvergedSecuritySecurity Key Lifecycle ManagerOperating SystemsOSIGMicrosoft WindowsVMware vSphereRed Hat Enterprise LinuxSUSE Linux Enterprise ServerSystems ManagementLenovo XClarityBladeCenter Open Fabric ManagerIBM Systems DirectorFlex System ManagerSystem UtilitiesNetwork ManagementAboutAbout Lenovo PressNewsletter Signup
Lenovo Intelligent Computing Orchestration (LiCO)
Product Guide
Home
Top
Author Ana Irimiea
Updated
8 Feb 2024
Form Number
LP0858
PDF size
42 pages, 5.0 MB
Full Change History
Subscribe to Updates
Subscribe
Subscribed to LP0858.
Rate & Provide Feedback
Rating
No Rating
Your Name (optional)
Your Email Address (optional)
Comment (optional)
Submit Feedback
Thank you for your feedback.
Download PDF
Table of Contents
Introduction Did You Know? What's new in LiCO 7.2 Part numbers Features for LiCO users Additional features for LiCO HPC/AI Users Features for LiCO Administrators
Subscription and Support Validated software components Validated hardware components Supported servers (LiCO HPC/AI version) LiCO Implementation services Client PC requirements Related links Related product families Trademarks
Abstract
Lenovo Intelligent Computing Orchestration (LiCO) is a software solution that simplifies the use of clustered computing resources for Artificial Intelligence (AI) model development and training, and HPC workloads.
This product guide provides essential presales information to understand LiCO and its key features, specifications and compatibility. This guide is intended for technical specialists, sales specialists, sales engineers, IT architects, and other IT professionals who want to learn more about LiCO and consider its use in HPC solutions.
Change History
Changes in the February 8, 2024 update:
Updated for LiCO 7.2:
Updated login images
Cloud Tools menu
Adding a template to a workflow in LiCO
HPC runtime module list
Updated all the features under - What's new in LiCO 7.2 section
Added new supported GPU under - Validated hardware components section
NVIDIA L40
Added new servers support under - Supported servers (LiCO HPC/AI version) section
Lenovo ThinkSystem SR860 V3, SR850 V3, SR590 V2
Lenovo WenTian WR5220 G3 (C4C type)
Lenovo ThinkStation P620 (without out-of-band monitoring)
Introduction
Lenovo Intelligent Computing Orchestration (LiCO) is a software solution that simplifies the use of clustered computing resources for Artificial Intelligence (AI) model development and training, and HPC workloads. LiCO interfaces with an open-source software orchestration stack, enabling the convergence of AI onto an HPC or Kubernetes-based cluster.
The unified platform simplifies interaction with the underlying compute resources, enabling customers to take advantage of popular open-source cluster tools while reducing the effort and complexity of using it for HPC and AI.
Figure 1. LiCO 7.2 login
Did You Know?
LiCO enables a single cluster to be used for multiple AI workloads simultaneously, with multiple users accessing the available cluster resources at the same time. Running more workloads can increase utilization of cluster resources, driving more user productivity and value from the environment.
What's new in LiCO 7.2
Lenovo recently announced LiCO Version 7.2, improving the functionality for both AI users, HPC users, and HPC administrators of LiCO, including:
Support OpenHPC v2.6.2
Support Nvidia L40
Support Lenovo ThinkSystem SR860 V3, SR850 V3, SR590 V2, Lenovo WenTian WR5220 G3
Support ThinkStation P620 Tower Workstation
Support JupyterLab, TensorBoard, LAMMPS
Support EasyBuild tool
Support Non-Lenovo Hardware in the cluster
Hybrid HPC supports Agnostic Cloud
Part numbers
The following table lists the ordering information for LiCO.
Note: Lenovo K8S AI LiCO Software updates are end of life (EOL) in June 2023. The last update is LiCO 6.4.
Table 1. LiCO HPC/AI version ordering information
Description
LFO
Software CTO
Feature code
Lenovo HPC AI LiCO Software 90 Day Evaluation License
7S090004WW
7S09CTO2WW
B1YC
Lenovo HPC AI LiCO Webportal w/1 yr S&S
7S09002BWW
7S09CTO6WW
S93A
Lenovo HPC AI LiCO Webportal w/3 yr S&S
7S09002CWW
7S09CTO6WW
S93B
Lenovo HPC AI LiCO Webportal w/5 yr S&S
7S09002DWW
7S09CTO6WW
S93C
Table 2. LiCO K8S/AI ordering information (Kubernetes)
Description
LFO
Software CTO
Feature code
Lenovo K8S AI LiCO Software Evaluation License (90 days)
7S090006WW
7S09CTO3WW
S21M
Lenovo K8S AI LiCO Software 4GPU w/1Yr S&S
7S090007WW
7S09CTO4WW
S21N
Lenovo K8S AI LiCO Software 4GPU w/3Yr S&S
7S090008WW
7S09CTO4WW
S21P
Lenovo K8S AI LiCO Software 4GPU w/5Yr S&S
7S090009WW
7S09CTO4WW
S21Q
Lenovo K8S AI LiCO Software 16GPU upgrade w/1Yr S&S
7S09000AWW
7S09CTO4WW
S21R
Lenovo K8S AI LiCO Software 16GPU upgrade w/3Yr S&S
7S09000BWW
7S09CTO4WW
S21S
Lenovo K8S AI LiCO Software 16GPU upgrade w/5Yr S&S
7S09000CWW
7S09CTO4WW
S21T
Lenovo K8S AI LiCO Software 64GPU upgrade w/1Yr S&S
7S09000DWW
7S09CTO4WW
S21U
Lenovo K8S AI LiCO Software 64GPU upgrade w/3Yr S&S
7S09000EWW
7S09CTO4WW
S21V
Lenovo K8S AI LiCO Software 64GPU upgrade w/5Yr S&S
7S09000FWW
7S09CTO4WW
S21W
Features for LiCO users
Topics in this section:
LiCO versions
Benefits to users
Features for users
Lenovo Accelerated AI
Cloud Tools
Workflow
Admin
LiCO versions
Note: There are two distinct versions of LiCO, LiCO HPC/AI (Host) and LiCO K8S/AI, to allow clients a choice for the which underlying orchestration stack is used, particularly when converging AI workloads onto an existing cluster. The user functionality is common across both versions, with minor environmental differences associated with the underlying orchestration being used.
A summary of the differences for user access is as follows:
LiCO K8S/AI version:
AI framework containers are docker-based and managed outside LiCO in the customer’s docker repository
Custom job submission templates are defined with YAML
Does not include HPC standard job submission templates
LiCO HPC/AI version:
AI framework containers are Singularity-based and managed inside the LiCO interface
Custom job submission templates are defined as batch scripts (for SLURM, LSF, PBS)
Includes HPC standard job submission templates
Benefits to users
LiCO provides users the following benefits:
A web-based portal to deploy, monitor and manage AI development and training jobs on a distributed cluster
Container-based deployment of supported AI frameworks for easy software stack configuration
Direct browser access to Jupyter Notebook instances running on the cluster
Standard and customized job templates to provide an intuitive starting point for less experienced users
Lenovo Accelerated AI pre-defined training and inference templates for many common AI use cases
Lenovo end-to-end workflow for Image Classification, Object Detection, Instance Segmentation, Image GAN, Text Classification, Seq2seq and Memory Network
Workflow to define multiple job submissions as an automated workflow to deploy in a single action
TensorBoard visualization tools integrated into the interface (TensorFlow-based)
Management of private space on shared storage through the GUI
Monitoring of job progress and log access
Features for users
Those designated as LiCO users have access to dashboards related primarily to HPC and AI development and training tasks. Users can submit jobs to the cluster, and monitor their results through the dashboards. The following menus are available to users:
Home menu for users – provides an overview of the resources available in the cluster. Jobs and job status are also given, indicating the runtime for the current job, and the order of jobs deployed. Users may click on jobs to access the associated logs and job files. The figure below displays the home menu.
Figure 2. User Home Menu
Job Templates – allows users to set up a job and submit it to the cluster. The user first picks a job template. After selecting the template, the user gives the job a name and inputs the relevant parameters, chooses the resources to be requested on the cluster and submits it.
Users can take advantage of Lenovo Accelerated AI templates, industry-standard AI templates, submit generic jobs via the Common Job template, as well as create their own templates requesting specified parameters.
Job Templates available in LICO:
Figure 3. HPC job templates available in LiCO
Figure 4. AI job templates available in LiCO
The figure below displays a job template for training with TensorFlow on a single node.
Figure 5. AI Job Template
LiCO also provides TensorBoard monitoring when running certain TensorFlow workloads, as shown in the following figure.
Figure 6. LiCO and TensorBoard monitoring
Jobs menu – displays a dashboard listing jobs and their statuses. In addition, users can select the job and see results and logs pertaining to the job in progress (or after completion). Tags and comments can be added to completed jobs for easier filtering.
Reports – displays a dashboard for obtaining reports on expenses. Expense Reports is supported currently, where the job and storage billing statistics are displayed.
Cloud Tools menu – enables users to create, run and view Jupyter Notebook and Jupyter Lab instances on the cluster from LiCO for model experimentation and development. Users will be able to lunch a CVAT labelling environment, Tiger VNC and the RStudio development environment. See the section for more information.
Lenovo Accelerated DL – provides users with the ability to label data, optimize hyperparameters, as well as test and publish trained models from within an end-to-end workflow in LiCO. LiCO supports Text Classification, Image Classification, Object Detection, and Instance Segmentation workflows. See the section for more information.
Workflow menu – allows users to create multi-step jobs that execute as a single action. Workflows can contain serially-executed steps as well as multiple jobs to execute in parallel within a step to take full advantage of cluster resources. See the section for more information.
Admin menu – allows users to access a number of capabilities not directly associated with deploying workloads to the cluster, including access to shared storage space on the cluster through a drag-and-drop interface and access to provision API and git interfaces. See the section for more information.
Lenovo Accelerated AI
Lenovo Accelerated AI provides a set of templates that aim to make AI training and inference simpler, more accessible, and faster to implement. The Accelerated AI templates differ from the other templates in LiCO in that they do not require the user to input a program; rather, they simply require a workspace (with associated directories) and a labelled dataset.
Lenovo Accelerated DL is based on the LeTrain project. LeTrain is a distributed training engine based on TensorFlow and optimized by Lenovo. Its goal is to make distributed training as easy as single GPU training and achieve linear scaling performance.
Lenovo Accelerated DL provides an end-to-end workflow for Text Classification, Image Classification, Object Detection, and Instance Segmentation, with training based on Lenovo Accelerated AI pre-defined models. A user can import an unprocessed, unlabeled data set of images, label them, train multiple instances with a grid of parameter values, test the output models for validation, and publish to a git repository for use in an application environment. Additionally, users can initiate the workflow steps from a REST API call to take advantage of LiCO as part of a DevOps toolchain.
Following is the workflow illustrating the main features of Lenovo Accelerated DL:
Figure 7. Lenovo Accelerated DL main features workflow
The following use cases are supported with Lenovo Accelerated AI templates:
Image Classification
Object Detection
Instance Segmentation
Medical Image Segmentation
Seq2Seq
Memory Network
Image GAN
Text Classification
The following figure displays the Lenovo Accelerated AI templates.
Figure 8. Lenovo Accelerated Deep Learning (DL) computer vision (CV) templates
Figure 9. Lenovo Accelerated Deep Learning (DL) natural language processing (NLP) templates
Each Lenovo Accelerated AI use-case is supported by both a training and inference template. The training templates provide parameter inputs such as batch size and learning rate. These parameter fields are pre-populated with default values, but are tunable by those with data science knowledge. The templates also provide visual analytics with TensorBoard; the TensorBoard graphs continually update in-flight as the job runs, and the final statistics are available after the job has completed.
In LiCO the Image Classification and Object Detection templates include the ability to select a topology based on the characteristics of a target inference device, such as an IoT Device, Edge Server, or Data Center server.
The following figure displays the embedded TensorBoard interface for a job. TensorBoard provides visualizations for TensorFlow jobs running in LiCO, whether through Lenovo Accelerated AI templates or the standard TensorFlow AI templates.
Figure 10: TensorBoard in LiCO
LiCO also provides inference templates which allow users to predict with new data based on models that have been trained with Lenovo Accelerated AI templates. For the inference templates, users only need to provide a workspace, an input directory (the location of the data on which inference will be performed), an output directory, and the location of the trained model. The job will run, and upon completion, the output directory will contain the analyzed data. For visual templates such as Object Detection, images can be previewed directly from within LiCO’s Manage Files interface.
The following two figures display an input file to the Object Detection inference template, as well as the corresponding output.
Figure 11: JPG file containing image of cat for input into inference job
Figure 12: LiCO output displaying the section of the JPG containing the cat image
Cloud Tools
LiCO includes the capability to create and deploy instances of Jupyter, RStudio Server and TigerVNC on the cluster. Users may create multiple instances, to customize for different software environments and projects. At the launch of an instance, the user can define the amount of compute resource requirements needed (CPU and GPU) to better optimize the performance of the task and optimize resource usage on the cluster.
Once a Jupyter, TigerVNC or an RStudio Server instance is created, the user can deploy it to the cluster and use the environment directly from their browser in a new tab. The user can leverage the interface directly to upload, download and run code as they normally would, utilizing the shared storage space used for LiCO.
Note: RStudio Server does not support the Chinese version.
Figure 13. Cloud Tools menu
Figure 14. Jupyter instance accessible in new browser tab
Figure 15. Integrated RStudio Server environment
Figure 16. Settings definition for an RStudio Server instance
LiCO includes the capability to launch a CVAT labelling environment for image annotation. Users may create and edit multiple CVAT instances for different projects, login to the CVAT web panel to label images and export the labelling as a dataset. The dataset which is created through CVAT can be managed through the dataset management page in LiCO.
Figure 17. CVAT instance accessible in a new browser tab
Opened instances of RStudio Server, Jupyter Notebook, Jupyer Lab, TigerVNC and CVAT can be shared using the online platform URL of that instance.
Figure 18. User can share the Cloud Tools with other or non-hpc users
Workflow
LiCO provides the ability to define multiple job submissions into a single execution action, called a Workflow. Steps are created to execute job submissions in serial, and within each step multiple job submissions may be executed in parallel. Workflow uses LiCO job submission templates to define the jobs for each step, and any template available including custom templates can be used in a workflow.
Figure 19. Defining a workflow in LiCO
Figure 20. Adding a template to a workflow in LiCO
LiCO workflows allow users to automate the deployment of multiple jobs that may be required for a project, so the user can execute and monitor as a single action. Workflows can be easily copied and edited, allowing users to quickly customize existing workflows for multiple projects.
Admin
The Admin tab for the user provides access to container and VNC management.
The Admin tab also enables users to publish a trained model to a git repository or as a docker container image.
LiCO can bill users for jobs and storage instances. Users can download their daily and monthly bills generated automatically on the system.
Some open application programming interfaces (APIs) are available in the API key sub-tab of the Admin tab.
Figure 21. API key page
Additional features for LiCO HPC/AI Users
In addition to the user features above, the LiCO HPC/AI version contains a number of features to simplify HPC workload deployment with a minimal learning curve for users vs. console-based scripting and execution. HPC users can submit jobs easily through standard or custom templates, utilize containers, pre-define runtime modules and environment variables for submission, and since LiCO 6.3 take advantage of advanced features such as Energy Aware Runtime and Intel oneAPI tools and optimizations all from within the LiCO interface.
Topics in this section:
Energy Aware Runtime
Intel oneAPI
HPC Runtime Module Management
Container-based HPC workload deployment
Singularity Container Image Management
Reports
System tools
Energy Aware Runtime
Energy Aware Runtime (EAR) is software technology designed to provide a solution for running MPI applications with higher energy efficiency. Developed in collaboration with Barcelona Supercomputing Center as part of the BSC-Lenovo Cooperation project, EAR is supported for use with the SLURM scheduler through a SPANK plugin. LiCO exposes EAR deployment options within the standard MPI template, allowing users to take advantage of the capability for MPI workloads.
Once the workload has been profiled through a learning phase, EAR will minimize CPU frequency to reduce energy consumption while maintaining a set threshold of performance. This is particularly helpful where MPI applications may not take significant advantage of higher clock frequencies, so the frequency can be reduced to save energy while maintaining expected performance.
Users can select EAR options at job submission in the standard MPI template, either to run the default set by the administrator, minimum time to solution, or minimum energy. Administrators can set the policies and thresholds for EAR usage within the LiCO Administrator portal, as well as which users are authorized to use EAR.
Figure 22. Selection of Energy Policy in MPI template
Figure 23. Administrator portal EAR power policy management
The software technology for EAR is supported separately by Energy Aware Solutions S.L. For more information see https://www.eas4dc.com.
Intel oneAPI
Intel oneAPI is an open-source and standard programming model designed for all industries, which provides the uniform service for the developers of CPU, GPU, and FPGA accelerators. Based on industrial standard and existing programming model of developers, oneAPI open standard can be widely used in varied structures and hardware from different suppliers. The use of Intel oneAPI improves the performance of MPI, OpenMP, TensorFLow, Pythorch, and other programs.
LiCO features templates base on Intel oneAPI – that are optimized to run on Intel processors – developed, tested and validated in collaboration with Intel.
Note: This function is unavailable if Intel oneAPI is not installed.
Figure 24. LiCO Templates for leveraging Intel oneAPI technology
Intel Neural Compressor
Intel Neural Compressor performs model compression to reduce the model size and increase the speed of deep learning inference for deployment on CPUs or GPUs. This open-source Python library automates popular model compression technologies, such as quantization, pruning, and knowledge distillation across multiple deep learning frameworks.
The Python library is integrated in LiCO and used for exporting an image classification model. Intel Neural Compressor FP32, BF16 and INT8 models can be selected.
Figure 25. Export an Image Classification – Intel Neural Compressor job
Intel MPI
Intel MPI Library is a multifabric message-passing library that implements the open-source MPICH specification. Use the library to create, maintain, and test advanced, complex applications that perform better on high-performance computing (HPC) clusters based on Intel processors.
Intel OpenMP
Using the OpenMP pragmas requires an OpenMP-compatible compiler and thread-safe libraries. A perfect choice is the Intel C++ Compiler version 7.0 or newer. (The Intel Fortran compiler also supports OpenMP.) Adding the following command-line option to the compiler instructs it to pay attention to the OpenMP pragmas and to insert threads.
Intel MPITune
The MPITune utility allows users to automatically adjust Intel MPI Library parameters, such as collective operation algorithms, to their cluster configuration or application. The tuner iteratively launches a benchmarking application with different configurations to measure performance and stores the results of each launch. Based on these results, the tuner generates optimal values for the parameters being tuned.
Intel VTune Profiler
Intel VTune Profiler optimizes application performance, system performance, and system configuration for HPC, cloud, IoT, media, storage, and more. Intel VTune Profiler, provided with Intel snapshot performance analyzer, enables users to analyze the serial and multi-threaded applications in hardware platforms (CPU, GPU, FPGA), and analyze the local and remote targets.
In LiCO, users can submit an Intel VTune Profiler job and administrators can perform a platform analysis.
For an Intel MPI, Intel OpenMP and Intel Distribution for Python jobs users can select the VTune Analysis Type.
Figure 26. Intel VTune Profiler integration
Intel Distribution for GDB
The Intel Distribution for GDB application debugger is a companion tool to Intel compilers and libraries. It delivers a unified debugging experience that allows users to efficiently and simultaneously debug cross-platform parallel and threaded applications developed in C, C++, SYCL, OpenMP, or Fortran.
When submitting an Intel MPI or OpenMP job in LiCO, while setting the Template Parameters, users can set the Remotely Debug option to Intel Distribution for GDB. With this setting the running program can be debugged.
Intel Extension for TensorFlow
Intel® Extension for TensorFlow* is a heterogeneous, high performance deep learning extension plugin based on TensorFlow PluggableDevice interface, aiming to bring Intel CPU or GPU devices into TensorFlow open source community for AI workload acceleration. It allows users to flexibly plug an XPU into TensorFlow on-demand, exposing the computing power inside Intel's hardware.
The Intel Extension for TensorFlow job template in LiCO supports running programs on one or more nodes using CPUs or Intel GPUs. For distributed training, using CPU for training currently only supports PS Worker distributed architecture, and using Intel GPU for training currently only supports Horovod distributed training framework.
Intel Extension for PyTorch
Intel® Extension for PyTorch* extends PyTorch* with up-to-date features optimizations for an extra performance boost on Intel hardware. Optimizations take advantage of AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs as well as Intel Xe Matrix Extensions (XMX) AI engines on Intel discrete GPUs. Moreover, through PyTorch* xpu device, Intel® Extension for PyTorch* provides easy GPU acceleration for Intel discrete GPUs with PyTorch*.
LiCO supports users running Intel Extension for PyTorch program on HPC clusters. To use Intel GPU for your pytorch program, If your program contains code like torch.device("cpu"), you should make the corresponding simple modifications before running the job.
Intel Distribution for Python
The Intel Distribution for Python achieve fast math-intensive workload performance without code changes for data science and machine learning problems. Intel Distribution for Python is included as part of the Intel oneAPI AI Analytics Toolkit, which provides accelerated machine learning and data analytics pipelines with optimized deep-learning frameworks and high-performing Python libraries.
Intel Distribution of Modin
The Intel Distribution of Modin is a performant, parallel, and distributed dataframe system that is designed around enabling data scientists to be more productive with the tools that they love. This library is fully compatible with the pandas API. It is powered by OmniSci in the back end and provides accelerated analytics on Intel platforms.
In LiCO, Intel Distribution of Modin templates are available for a single node or multi nodes.
Intel Distribution of Modin Single Node
Intel Distribution of Modin Multi Node
Model Zoo for Intel Architecture
Model Zoo for Intel Architecture contains Intel optimizations for running deep learning workloads on Intel Xeon Scalable processors. In LiCO Image recognition and Object Detection jobs are available with both TensorFlow and PyTorch. Multiple models can be selected when the user is submitting the job.
TensorFlow Image Recognition of Intel Model Zoo
TensorFlow Object Detection of Intel Model Zoo
PyTorch Image Recognition of Intel Model Zoo
PyTorch Object Detection of Intel Model Zoo
HPC Runtime Module Management
LiCO HPC/AI version allows the user to pre-define modules and environmental variables to load at the time of job execution through Job submission templates. These user-defined modules eliminate the step of needing to manually load required modules before job submission, further simplifying the process of running HPC workloads on the cluster. Through the Runtime interface, users can choose from the modules available on the system, define their loading order, and specify environmental variables for repeatable, reliable job deployment.
Figure 27. HPC runtime module list
Figure 28. MPI job template with custom module setup
Container-based HPC workload deployment
Additional standard templates are provided to support deployment of containerized HPC workloads through Singularity or CharlieCloud. These templates simplify deploying containers for HPC workloads by eliminating the need to create custom runtimes and custom templates for these workloads unless needed for more granularity.
In addition to providing a certain number of basic container images, LiCO also allows users to upload customized container images. LiCO 5.2.0 and later versions support running jobs on NGC images.
Figure 29. CharlieCloud and Singularity standard job templates
Singularity Container Image Management
LiCO HPC/AI version provides both users and administrators with the ability to build, upload and manage application environment images through Singularity containers. These images can support users with AI frameworks and HPC workloads, as well as others. Singularity containers may be built from Docker containers, imported from NVIDIA GPU Cloud (NGC), or other image repositories such as the Intel Container Portal. Containers created by administrators are available to all users, and users can create container images for their individual use as well. Users looking to deploy a custom image can also create a custom template that will deploy the container and run workloads in that environment.
Figure 30. Singularity container management through the Administrator portal
Figure 31. Singularity container building within LiCO
Reports
LiCO HPC/AI version provides expanded billing capabilities and provides the user access to monitor charges incurred for a date range via the Expense Reports subtab. Users can also download daily or monthly billing reports as a .xlsx file from the Admin tab.
Figure 32. LiCO User view of Expense Reports
System tools
The system tools option for the user provides access to their storage space on the cluster. The user can upload, download, cut/copy/paste, preview and edit files on the cluster storage space from within the LiCO portal. The text editor within LiCO allows syntax-aware display and editing based on the file extension. Multiple files editor option is available.
Figure 33. Cluster storage access
Figure 34. Text file editor
Features for LiCO Administrators
Topics in this section:
Features for LiCO K8S/AI version administrators
Features for LiCO HPC/AI version Administrators
Features for LiCO Operators
LiCO Deployment
LiCO Upgrade
Features for LiCO K8S/AI version administrators
For administrators of a Kubernetes-based LiCO environment, LiCO provides the ability to monitor activity, create and manage users, monitor LiCO-initiated activity, generate job and operational reports, enable container access for LiCO users, and view the software license currently installed in LiCO. LiCO K8S/AI version does not provide resource monitoring for the administrator, resources can be monitored at the Kubernetes level with a tool such as Kubernetes Dashboard. The following menus are available to administrators in LiCO K8S/AI:
Home menu for Administrators – provides an at-a-glance view of LiCO jobs running and operational messages. For monitoring and managing cluster resources, the administrator can use a tool such as Kubernetes dashboard, Grafana, or other Kubernetes monitoring tools.
User Management menu – provides dashboards to create, import and export LiCO users, and includes administrative actions to edit, suspend, or delete
Monitor menu – provides a view of LiCO jobs running, allocating to the Kubernetes cluster, and completed jobs. This menu also allows the administrator to query and filter operational logs.
Reports menu – allows administrators the ability to generate reports on jobs, for a given time interval. Administrators may export these reports as a spreadsheet, in a PDF, or in HTML. The reports menu also allows the administrator to view cluster utilization for a given date range.
Admin menu – Provides the administrator to map container images for use in job submission templates, and download operations and web logs for LiCO.
Settings menu – allows the administrator to view the currently active license for LiCO, including the license key, license tier and expiration date of the license.
Platform Analysis menu – allows the administrator to analyze and optimize program performance.
Figure 35. LiCO K8S/AI Administrator Home Menu
Features for LiCO HPC/AI version Administrators
For cluster administrators, LiCO provides a sophisticated monitoring solution, built on OpenHPC tooling. The following menus are available to administrators:
Home menu for administrators – provides dashboards giving a global overview of the health of the cluster. Utilization is given for the CPUs, GPUs, memory, storage, and network. Node status is given, indicating which nodes are being used for I/O, compute, login, and management. Job status is also given, indicating runtime for the current job, and the order of jobs in the queue. The Home menu is shown in the following figure.
Figure 36. LiCO HPC/AI Administrator Home Menu
User Management menu – provides dashboards to control user groups and users, determining permissions and access levels (based on LDAP) for the organization. Administrators can also control and provision billing groups for accurate accounting.
Monitor menu – provides dashboards for interactive monitoring and reporting on cluster nodes, including a list of the nodes, or a physical look at the node topology. Administrators may also use the Monitor menu to drill down to the component level, examining statistics on cluster CPUs, GPUs, networking, jobs, and operations. Administrators can access alerts that indicate when these statistics reach unwanted values (for instance, GPU temperature reaching critical levels). These alerts are created using the Setting menu. Additionally, a large screen view is available to display a high-level summary of cluster status, and a cluster view was added since LiCO 6.2 for a focused view of compute resource utilization across the cluster. The figures below display the component and alert dashboards.
Figure 37. LiCO HPC/AI Administrator Component dashboard
Figure 38. LiCO HPC/AI Administrator Alert dashboard
Figure 39. LiCO HPC/AI GPU View dashboard
Hybrid HPC - Bursting on cloud is now possible. LiCO 7.0 has support for hybrid cloud integration leveraging Microsoft Azure. That will allow our customers to add Microsoft Azure access as Supercomputer resources that will be included in the scheduling considerations. The Hybrid HPC enable users to leverage the public cloud to dynamically scale the computing resources of the local HPC cluster.
Figure 40. LiCO Hybrid HPC for Microsoft Azure
LiCO version 7.2 introduces enhanced cloud integration capabilities through the incorporation of Covalent, an open-source Pythonic workflow orchestration platform. This integration allows LiCO to seamlessly operate in a Hybrid Cloud environment, supporting popular cloud providers such as AWS, Azure, and Google Cloud.
Figure 41. LiCO Hybrid HPC
Figure 42. LiCO integration with Covalent for Multi-Cloud support
Reports menu – allows administrators the ability to generate reports on jobs, cluster utilization, alerts, and view current charges and cluster utilization.
Admin menu – Provides the administrator with the capability to create Singularity images for use by all users, generate billing spreadsheets, examine processes and assets, monitor VNC sessions, and download web logs. The administrators can also publish announcements to users using the Notice function.
Settings menu – allows administrators to set up automated notifications and alerts. Administrators may enable the notifications to reach users and interested parties via email, SMS, and WeChat. Administrators may also enable notifications and alerts via uploaded scripts.
The Settings menu also allows administrators to create and modify queues. These queues allow administrators to subdivide hardware based on different types or needs. For example, one queue may contain systems that are exclusively machines with GPUs, while another queue may contain systems that only contain CPUs. This allows the user running the job to select the queue that is more applicable to their requirement. Within the Settings menu, administrators can also set the status of queues, bringing them up or down, draining them, or marking them inactive. Administrators can also limit which queues are available to users by user group.
Starting with LiCO 7.1 administrators can configure Quality of Service (QOS) for each job submitted to Slurm from the Scheduler sub tab. Administrators can edit and delete a limitation, create a new limitation and then associate it with the corresponding Billing Group. This function requires to configure the SLURM account on the cluster.
Platform Analysis menu – allows the administrator to analyze and optimize program performance. Administrators can determine the cause of poor performance by finding software and hardware performance bottlenecks and identifying program hotspots. After that, developers can optimize programs according to the causes.
Figure 43. Platform analysis tools for HPC cluster administrator and end user
License menu – displays the software licenses active in LiCO including the number of licensed processing entitlements and the expiration date of the license.
Features for LiCO Operators
For the purpose of monitoring clusters but not overseeing user access, LiCO provides the Operator designation. LiCO Operators have access to a subset of the dashboards provided to Administrators; namely, the dashboards contained in the Home, Monitor, and Reports menus:
Home menu for operators – provides dashboards giving a global overview of the health of the cluster. Utilization is given for the CPUs, GPUs, memory, storage, and network. Node status is given, indicating which nodes are being used for I/O, compute, login, and management. Job status is also given, indicating runtime for the current job, and the order of jobs in the queue.
Monitor menu – Dashboard that enables interactive monitoring and reporting on cluster nodes, including a list of the nodes, or a physical look at the node topology. Operators may also use the Monitor menu to drill down to the component level, examining statistics on cluster CPUs, GPUs, jobs, and operations. Operators can access alarms that indicate when these statistics reach unwanted values (for instance, GPU temperature reaching critical levels.) These alarms are created by Administrators using the Settings menu (for more information on the Settings menu, see the Features for LiCO Administrators section.)
Reports menu – allows operators the ability to generate reports on jobs, alerts, or actions for a given time interval. Operators may export these reports as a spreadsheet, in a PDF, or in HTML.
LiCO Deployment
Docker containerized deployment
LiCO can be deployed in Docker containers with all the supported operating systems.
In LiCO we offer containerized deployment as a preferred method. This approach involves running the HPC cluster drivers, monitoring software, scheduler and applications on the host operating system, while the container instance contains the LiCO web portal, including the back-end LiCO service.
The container version of our product supports all the schedulers supported in the deployment.
Benefits of Containerized Deployment
Simplify the deployment process: Containerized deployment enables LiCO to work efficiently with the latest OpenHPC package, streamlining updates and maintaining compatibility with the latest features while the number of packages to be deployed is minimal.
Simplified Test Systems: Containerized deployment reduces the complexity of maintaining and testing multiple systems for different OS versions, leading to a more efficient and cost-effective deployment process.
Reduced Operating System Dependency: Containerization reduces the dependency on specific operating systems, enhancing portability and making it easier to deploy LiCO across diverse environments.
Supported Operating Systems
The container version of our product supports the following operating systems:
Rocky Linux 8 (Default Base OS)
RHEL (Red Hat Enterprise Linux) 8
Ubuntu (Only in Container Mode)
SUSE (Only in Container Mode)
By default, the container version of our product uses Rocky Linux 8 as the base operating system for container images.
Customers with specific security considerations for the operating system can opt for host OS deployment. For Rocky and RHEL operating systems, we offer support for both host OS deployment and container deployment. Customers can choose the mode that best suits their requirements.
For any further assistance or inquiries, please review the Deploy LiCO in container section from the installation guides available at https://support.lenovo.com/us/en/solutions/HT507011
LiCO Upgrade
Below you can find the instructions for upgrading LiCO (License and Configuration Optimizer) from version 7.x (where "x" represents the source version) to version 7.y (where "y" represents the latest available version) on different hosting environments.
This upgrade guide below outlines the steps for transitioning from LiCO version 7.x to version 7.y on different hosting environments. Follow the instructions carefully to ensure a successful upgrade while minimizing disruptions to your LiCO system. If you encounter any issues during the upgrade process, consult the official LiCO documentation or seek assistance from L3 support.
This guide is intended for system administrators, developers, or individuals responsible for managing LiCO installations.
Versions:
Source LiCO version: 7.x
Target LiCO version: 7.y
Upgrade Scenarios:
LiCO v7.x on Host > v7.y on Host:
In this scenario, you are upgrading LiCO on the host machine directly from version 7.x to version 7.y. No manual changes are required, and you can use an auto-upgrade script or guide provided by the LiCO team.
LiCO v7.x on Host > v7.y on Container:
For this upgrade, you will be migrating from the host-based installation of LiCO version 7.x to a containerized version 7.y. The process involves removing LiCO packages, retaining configuration files and database files, installing Docker, building or downloading the LiCO v7.y container image, and following the installation guide to configure LiCO v7.y.
LiCO v7.x on Container > v7.y on Container:
If you already have LiCO version 7.x running in a container and want to upgrade to a newer version 7.y, you need to replace the LiCO container image with the one corresponding to version 7.y. Afterward, follow the installation guide to configure LiCO v7.y.
LiCO v7.x on Container > v7.y on Host:
This upgrade scenario is not recommended and should only be attempted with the assistance of L3 support. The process involves migrating from LiCO version 7.x running in a container to a host-based installation of LiCO version 7.y.
Important Notes:
Always back up your configuration files and database files before proceeding with any upgrade.
Make sure to follow the official LiCO installation and upgrade guides provided by the vendor for each specific version.
LiCO GUI Installer is a tool that simplifies HPC cluster deployment and LiCO setup. It runs on the management node and it can use Confluent to deploy the OS on the compute nodes.
The user can define the following node types:
head node (currently only a single head node is supported. (This is the same machine on which the installer runs)
login nodes - one or more
compute nodes - one or more
The compute nodes that have at least 1 GPU defined in the config file are treated as GPU nodes and NVIDIA drivers will be installed on these.
You can download LiCO Installation GUI from here, and follow this guide to deploy HPC cluster and LiCO easily.
Figure 44. LiCO GUI Installer
Diskless installation
A diskless boot system (otherwise known as a PXE boot setup) is a computer system without hard drives. Instead, each computer uses network-attached storage drives on a server to store data.
LiCO support the option to have a diskless installation. You can follow this guide to deploy HPC cluster with the diskless option.
Subscription and Support
LiCO HPC/AI is enabled through a per-CPU and per-GPU subscription and support entitlement model, which once entitled for the all the processors contained within the cluster, gives the customer access to LiCO package updates and Lenovo support for the length of the acquired term.
LiCO K8S/AI is enabled through tiered subscription and support entitlement licensing based on the number of GPU accelerators being accessed by running LiCO workloads (tiers are up to 4 GPU in use, up to 16 GPU in use, and up to 64 GPU in use). Additional licensing beyond 64 GPUs can be provided by contacting your Lenovo sales representative.
Lenovo will provide interoperability support for all software tools defined as validated with LiCO, and development support (Level 3) for specific Lenovo-supported tools only. Open source and supported-vendor bugs/issues will be logged and tracked with their respective communities or companies if desired, with no guarantee from Lenovo for bug fixes. Full support details are provided at the support links below for each respective version of LiCO. Additional support options may be available; please contact your Lenovo sales representative for more information.
LiCO can be acquired as part of a Lenovo Scalable Infrastructure (LeSI) solution or for “roll your own” (RYO) solutions outside of the LeSI framework, and LiCO software package updates are provided directly through the Lenovo Electronic Delivery system. More information on LeSI is available in the LeSI product guide, available from https://lenovopress.com/lp0900.
Lenovo provides support in English globally and in Chinese for China (24x7)
Support response times are as follows:
Severity 1 issues response is 1 business day
Other issues: 3 business days
LiCO has 1-year lifecycle for each release, customer should upgrade to latest version if out of support.
The following table lists end of support for LiCO versions.
Table 3. End of support list
Version
Date
LiCO 7.2
2/06/2025
LiCO 7.1
6/28/2024
LiCO 7.0
12/20/2023
LiCO 6.4
6/24/2023
LiCO 6.3.1
3/29/2023
LiCO 6.3
12/15/2022
LiCO 6.2
6/2/2022
LiCO 6.1
12/15/2021
LiCO 6.0
8/3/2021
LiCO 5.5.0
4/15/2021
LiCO 5.4.0
11/5/2020
LiCO 5.3.1
6/18/2020
LiCO 5.3.0
4/12/2020
LiCO 5.2.1
1/9/2020
LiCO 5.2.0
11/21/2019
LiCO 5.1.0
5/3/2019
Validated software components
LiCO’s software packages are dependent on a number of software components that need to be installed prior to LiCO in order to function properly. Each LiCO software release is validated against a defined configuration of software tools and Lenovo systems, to make deployment more straightforward and enable support. Other management tools, hardware systems and configurations outside the defined stack may be compatible with LiCO, though not formally supported; to determine compatibility with other solutions, please check with your Lenovo sales representative.
The following software components are validated by Lenovo as part of the overall LiCO software solution entitlement:
LiCO HPC/AI version support
Lenovo Development Support (L1-L3)
Graphical User Interface: LiCO
System Management & Provisioning: Confluent
Lenovo LiCO HPC/AI Configuration Support (L1 only)
Job Scheduling & Orchestration: SLURM, OpenPBS, Torque/Maui (HPC only)
System Monitoring: Icinga v2
Container Support (AI): Singularity, CharlieCloud, NGC
AI Frameworks (AI): Caffe, Intel-Caffe, TensorFlow, MxNet, Neon, Chainer, Pytorch, Scikit-learn, PaddlePaddle, NVIDIA TensorRT, TensorBoard
Visualization: Grafana
The following software components are validated for compatibility with LiCO HPC/AI:
Supported by their respective software provider
Operating System: RHEL 8.6, Rocky Linux 8.6, SUSE SLES 15 SP3, CentOS 7.9, Ubuntu 22.04 LTS
File Systems: IBM Spectrum Scale, Lustre GPFS
Job Scheduling & Orchestration: IBM Spectrum LSF v10, Altair PBS Pro
Development Tools: GNU compilers, Intel Cluster Toolkit
LiCO K8S/AI version support
Lenovo Development Support (L1-L3)
Graphical User Interface: LiCO
Lenovo LiCO K8S/AI Configuration Support (L1 only)
AI Frameworks (AI): Caffe, Intel-Caffe, TensorFlow, MxNet, Neon, Chainer, Pytorch, Scikit-learn, PaddlePaddle
Validated hardware components
Supported GPUs
NVIDIA L40, NVIDIA H100, NVIDIA A100, NVIDIA A30, NVIDIA A40, NVIDIA T4, NVIDIA V100, NVIDIA RTX8000, NVIDIA RTX6000
Intel Flex Series 140 GPU, Intel Flex Series 170 GPU
NVIDIA H100 Multi-Instance GPU (MIG), NVIDIA A100 Multi-Instance GPU (MIG)
Note: Subject to specific ThinkSystem platform support, not all GPUs available on all systems
Supported Networks
Intel OmniPath 100
Mellanox Infiniband (FDR, EDR, HDR, NDR)
Gb Ethernet (1, 10, 25, 40,50, 100)
Supported servers (LiCO HPC/AI version)
LiCO seamlessly integrates with both Lenovo servers and workstations, offering robust support for Lenovo hardware within the cluster. Additionally, LiCO extends its compatibility beyond Lenovo infrastructure, providing full support for non-Lenovo hardware within the cluster environment. This versatility ensures optimal performance and flexibility, allowing organizations to leverage LiCO's capabilities across a diverse range of hardware configurations for efficient and scalable computing orchestration.
The following Lenovo systems are supported to run with LiCO HPC/AI. This systems must run one of the supported operating systems as well as the validated software stack, as described in the Validated Software Components section.
ThinkSystem SR860 V3 – The Lenovo ThinkSystem SR860 V3 is a 4-socket server that features a 4U rack design with support for up to eight high-performance GPUs. The server offers technology advances, including 4th Gen Intel Xeon Scalable processors, 4800 MHz DDR5 memory, and PCIe 5.0. For more information, see the SR860 V3 product guide.
ThinkSystem SR850 V3 – The Lenovo ThinkSystem SR850 V3 is a 4-socket server that is densely packed into a 2U rack design. The server offers technology advances, including 4th Gen Intel Xeon Scalable processors, 4800 MHz DDR5 memory, and PCIe Gen 5. For more information, see the SR850 V3 product guide.
ThinkSystem SR675 V3 – The Lenovo ThinkSystem SR675 V3 is a versatile GPU-rich 3U rack server that supports eight double-wide GPUs including the new NVIDIA H100 and L40 Tensor Core GPUs, or the NVIDIA HGX H100 4-GPU offering with NVLink and Lenovo Neptune hybrid liquid-to-air cooling. The server is based on the new AMD EPYC 9004 Series processors (formerly codenamed "Genoa"). For more information, see the SR675 V3 product guide.
ThinkSystem SD665 V3 – The ThinkSystem SD665 V3 Neptune DWC server is the next-generation high-performance server based on the fifth generation Lenovo Neptune™ direct water cooling platform. For more information, see the SD665 V3 product guide.
ThinkSystem SR655 V3 – The Lenovo ThinkSystem SR655 V3 is a 1-socket 2U server that features the AMD EPYC 9004 "Genoa" family of processors. With up to 96 cores per processor and support for the new PCIe 5.0 standard for I/O, the SR655 V3 offers the ultimate in one-socket server performance in a 2U form factor. For more information, see the SR655 V3 product guide.
ThinkSystem SR635 V3 – The Lenovo ThinkSystem SR635 V3 is a 1-socket 1U server that features the AMD EPYC 9004 "Genoa" family of processors. With up to 96 processor cores and support for the new PCIe 5.0 standard for I/O, the SR635 V3 offers the ultimate in one-socket server performance in a 1U form factor. For more information, see the SR635 V3 product guide.
ThinkSystem SD650-I V3 – The ThinkSystem SD650-I V3 Neptune DWC server is the next-generation high-performance server based on the fifth generation Lenovo Neptune™ direct water cooling platform. For more information, see the SD650-I V3 product guide.
ThinkSystem SD650 V3 – The ThinkSystem SD650 V3 Neptune DWC server is the next-generation high-performance server based on the fifth generation Lenovo Neptune™ direct water cooling platform. For more information, see the SD650 V3 product guide.
ThinkSystem SR650 V3 – The Lenovo ThinkSystem SR650 V3 is an ideal 2-socket 2U rack server for small businesses up to large enterprises that need industry-leading reliability, management, and security, as well as maximizing performance and flexibility for future growth. For more information, see the SR650 V3 product guide.
ThinkSystem SR630 V3 – The Lenovo ThinkSystem SR630 V3 is an ideal 2-socket 1U rack server for small businesses up to large enterprises that need industry-leading reliability, management, and security, as well as maximizing performance and flexibility for future growth. For more information, see the SR630 V3 product guide.
ThinkSystem SR665 V3 – The Lenovo ThinkSystem SR665 V3 is a 2-socket 2U server that features the AMD EPYC 9004 "Genoa" family of processors. With up to 96 cores per processor and support for the new PCIe 5.0 standard for I/O, the SR665 V3 offers the ultimate in two-socket server performance in a 2U form factor. For more information, see the SR665 V3 product guide.
ThinkSystem SR645 V3 – The Lenovo ThinkSystem SR645 V3 is a 2-socket 1U server that features the AMD EPYC 9004 "Genoa" family of processors. With up to 96 cores per processor and support for the new PCIe 5.0 standard for I/O, the SR645 V3 offers the ultimate in two-socket server performance in a 1U form factor. For more information, see the SR645 V3 product guide.
ThinkSystem SR670 V2 – The Lenovo ThinkSystem SR670 V2 is a versatile GPU-rich 3U rack server that supports eight double-wide GPUs including the new NVIDIA A100 and A40 Tensor Core GPUs, or the NVIDIA HGX A100 4-GPU offering with NVLink and Lenovo Neptune hybrid liquid-to-air cooling. The server is based on the new third-generation Intel Xeon Scalable processor family (formerly codenamed "Ice Lake"). The server delivers optimal performance for Artificial Intelligence (AI), High Performance Computing (HPC) and graphical workloads across an array of industries. For more information, see the SR670 V2 product guide.
ThinkSystem SD650 V2 – The ThinkSystem SD650 V2 server is the next-generation high-performance server based on Lenovo's fourth generation Lenovo Neptune™ direct water cooling platform. With two third-generation Intel Xeon Scalable processors, the ThinkSystem SD650 V2 server combines the latest Intel processors and Lenovo's market-leading water cooling solution, which results in extreme performance in an extreme dense packaging, supporting your application From Exascale to Everyscale™. For more information, see the SD650 V2 product guide.
ThinkSystem SD650-N V2 – The ThinkSystem SD650-N V2 server is the next-generation high-performance GPU-rich server based on Lenovo's fourth generation Lenovo Neptune™ direct water cooling platform. With four NVIDIA A100 SXM4 GPUs and two third-generation Intel Xeon Scalable processors, the ThinkSystem SD650-N V2 server combines advanced NVIDIA acceleration technology with the latest Intel processors and Lenovo's market-leading water cooling solution, which results in extreme performance in an extreme dense packaging supporting your accelerated application From Exascale to Everyscale™. For more information, see the SD650-N V2 product guide.
ThinkSystem SR650 V2 – The Lenovo ThinkSystem SR650 V2 is an ideal 2-socket 2U rack server for small businesses up to large enterprises that need industry-leading reliability, management, and security, as well as maximizing performance and flexibility for future growth. The SR650 V2 is a very configuration-rich offering, supporting 28 different drive bay configurations in the front, middle and rear of the server and 5 different slot configurations at the rear of the server. This level of flexibility ensures that you can configure the server to meet the needs of your workload. For more information, see the SR650 V2 product guide.
ThinkSystem SR630 V2 – The Lenovo ThinkSystem SR630 V2 is an ideal 2-socket 1U rack server designed to take full advantage of the features of the 3rd generation Intel Xeon Scalable processors, such as the full performance of 270W 40-core processors, support for 3200 MHz memory and PCIe Gen 4.0 support. The server also offers onboard NVMe PCIe ports that allow direct connections to 12x NVMe SSDs, which results in faster access to store and access data to handle a wide range of workloads. For more information, see the SR630 V2 product guide.
ThinkSystem SD530 – The Lenovo ThinkSystem SD530 is an ultra-dense and economical two-socket server in a 0.5U rack form factor. With up to four SD530 server nodes installed in the ThinkSystem D2 enclosure, and the ability to cable and manage up to four D2 enclosures as one asset, you have an ideal high-density 2U four-node (2U4N) platform for enterprise and cloud workloads. The SD530 also supports a number of high-end GPU options with the optional GPU tray installed, making it an ideal solution for AI Training workloads. For more information, see the SD530 product guide.
ThinkSystem SD650 – The Lenovo ThinkSystem SD650 direct water cooled server is an open, flexible and simple data center solution for users of technical computing, grid deployments, analytics workloads, and large-scale cloud and virtualization infrastructures. The direct water cooled solution is designed to operate by using warm water, up to 50°C (122°F). Chillers are not needed for most customers, meaning even greater savings and a lower total cost of ownership. The ThinkSystem SD650 is designed to optimize density and performance within typical data center infrastructure limits, being available in a 6U rack mount unit that fits in a standard 19-inch rack and houses up to 12 water-cooled servers in 6 trays. For more information, see the SD650 product guide.
ThinkSystem SR630 – Lenovo ThinkSystem SR630 is an ideal 2-socket 1U rack server for small businesses up to large enterprises that need industry-leading reliability, management, and security, as well as maximizing performance and flexibility for future growth. The SR630 server is designed to handle a wide range of workloads, such as databases, virtualization and cloud computing, virtual desktop infrastructure (VDI), infrastructure security, systems management, enterprise applications, collaboration/email, streaming media, web, and HPC. For more information, see the SR630 product guide.
ThinkSystem SR650 – The Lenovo ThinkSystem SR650 is an ideal 2-socket 2U rack server for small businesses up to large enterprises that need industry-leading reliability, management, and security, as well as maximizing performance and flexibility for future growth. The SR650 server is designed to handle a wide range of workloads, such as databases, virtualization and cloud computing, virtual desktop infrastructure (VDI), enterprise applications, collaboration/email, and& business analytics and big data. For more information, see the SR650 product guide.
ThinkSystem SR670 – The Lenovo ThinkSystem SR670 is a purpose-built 2 socket 2U accelerated server, supporting up to 8 single-wide or 4 double-wide GPUs and designed for optimal performance required by both Artificial Intelligence and High Performance Computing workloads. Supporting the latest NVIDIA GPUs and Intel Xeon Scalable processors, the SR670 supports hybrid clusters for organizations that may want to consolidate infrastructure, improving performance and compute power, while maintaining optimal TCO. For more information, see the SR670 product guide.
ThinkSystem SR950 – The Lenovo ThinkSystem SR950 is Lenovo’s flagship server, suitable for mission-critical applications that need the most processing power possible in a single server. The powerful 4U ThinkSystem SR950 can expand from two to as many as eight Intel Xeon Scalable Family processors. The modular design of SR950 speeds upgrades and servicing with easy front or rear access to all major subsystems that ensures maximum performance and maximum server uptime. For more information, see the SR950 product guide.
ThinkSystem SR655 – The Lenovo ThinkSystem SR655 is a 1-socket 2U server that features the AMD EPYC 7002 "Rome" family of processors. With up to 64 cores per processor and support for the new PCIe 4.0 standard for I/O, the SR655 offers the ultimate in single-socket server performance. ThinkSystem SR655 is a multi-GPU optimized rack server, providing support for up to 6 low-profile GPUs or 3 double-wide GPUs. For more information, see the SR655 product guide.
ThinkSystem SR635 – The Lenovo ThinkSystem SR635 is a 1-socket 1U server that features the AMD EPYC 7002 "Rome" family of processors. With up to 64 cores per processor and support for the new PCIe 4.0 standard for I/O, the SR635 offers the ultimate in single-socket server performance. For more information, see the SR635 product guide.
ThinkSystem SR645 – The Lenovo ThinkSystem SR645 is a 2-socket 1U server that features the AMD EPYC 7002 "Rome" family of processors. With up to 64 cores per processor and support for the new PCIe 4.0 standard for I/O, the SR645 offers the ultimate in two-socket server performance in a space-saving 1U form factor. For more information, see the SR645 product guide.
ThinkSystem SR665 – The Lenovo ThinkSystem SR665 is a 2-socket 2U server that features the AMD EPYC 7002 "Rome" family of processors. With support for up to 8 single-wide or 3 double-wide GPUs, up to 64 cores per processor and support for the new PCIe 4.0 standard for I/O, the SR665 offers the ultimate in two-socket server performance in a 2U form factor. ThinkSystem SR665 is a multi-GPU optimized rack server, providing support for up to 8 low-profile GPUs or 3 double-wide GPUs. For more information, see the SR665 product guide.
ThinkSystem SR850 – The Lenovo ThinkSystem SR850 is a 4-socket server that features a streamlined 2U rack design that is optimized for price and performance, with best-in-class flexibility and expandability. The SR850 now supports second-generation Intel Xeon Scalable Family processors, up to a total of four, each with up to 28 cores. The ThinkSystem SR850’s agile design provides rapid upgrades for processors and memory, and its large, flexible storage capacity helps to keep pace with data growth. For more information, see the SR850 product guide.
China only:
ThinkServer SR660 V2 - The Lenovo ThinkServer SR660 V2 is an ideal 2-socket 2U rack server for SMB, large enterprises and cloud service provider that need industry-leading performance and flexibility for future growth. The SR660 V2 is based on the new 3rd generation Intel Xeon Scalable processor, with the new Intel Optane Persistent Memory 200 Series, the low latency NVMe SSD and the powerful GPU to support most customers workload such as databases, virtualization and cloud computing, virtual desktop infrastructure (VDI), infrastructure security, systems management, enterprise applications, collaboration/email, streaming media, web, and HPC. For more information, see the SR660 V2 product guide.
ThinkServer SR590 V2 – The Lenovo ThinkServer SR590 V2 is an ideal 2-socket 2U rack server for small businesses up to large enterprises that need industry-leading reliability, management, and security, as well as maximizing performance and flexibility for future growth. The SR590 V2 is based on the 3rd generation Intel Xeon Scalable processor family (formerly codenamed "Ice Lake") and the Intel Optane Persistent Memory 200 Series. For more information, see the SR590 V2 product guide.
WenTian WR5220 G3 – Lenovo WenTian WR5220 G3 is designed for customers of large, SMB enterprises and cloud service providers. It is 2-socket 2U rack server with excellent performance and high scalability. It is based on the 4th or 5th generation Intel Xeon Scalable processor family (codenamed "Sapphire Rapids", "Emerald Rapids") which can reach up to 385W TDP*, it can also support high-performance and high-frequency DDR5 memory, low latency NVMe SSD, and strong GPU performance to meet the most of customer workloads, such as databases, virtualization and cloud computing, AI, high-performance computing, virtual desktop infrastructure, infrastructure security, system management, enterprise applications, collaboration/email, streaming media, etc. For more information, see the WR5220 G3 product guide.
Workstations:
ThinkStation P620 – The ThinkStation P620 workstation tower is equipped with abundant storage and memory capacity, numerous expansion slots, enterprise-class AMD Ryzen PRO manageability, and security features. With unprecedented visual computing powered by NVIDIA® professional graphics support, this eminently configurable workstation is equipped with up to two NVIDIA® RTX™ A6000 graphics cards with NVLink.
Additional Lenovo ThinkSystem and System x servers and workstations may be compatible with LiCO. Contact your Lenovo sales representative for more information.
LiCO Implementation services
Customers who do not have the cluster management software stack required to run with LiCO may engage Lenovo Professional Services to install LiCO and the necessary open-source software. Lenovo Professional Services can provide comprehensive installation and configuration of the software stack, including operation verification, as well as post-installation documentation for reference. Contact your Lenovo sales representative for more information.
Client PC requirements
A web browser is used to access LiCO's monitoring dashboards. To fully utilize LiCO’s monitoring and visualization capabilities, the client PC should meet the following specifications:
Hardware: CPU of 2.0 GHz or above and 8 GB or more of RAM
Display resolution: 1280 x 800 or higher
Browser: Chrome (v62.0 or higher) or Firefox (v56.0 or higher) is recommended
Related links
For more information, see the following resources:
LiCO website:https://www.lenovo.com/us/en/data-center/software/lico/
LiCO HPC/AI (Host) Support website:https://support.lenovo.com/us/en/solutions/HT507011
LiCO K8S/AI (Kubernetes) Support website:https://support.lenovo.com/us/en/solutions/HT509422
Technical LiCO Documentation:https://hpc.lenovo.com/users/lico/
Lenovo HPC & AI Software Stack Product Guidehttps://lenovopress.lenovo.com/lp1651-lenovo-hpc-ai-software-stack
Lenovo DCSC configurator:https://dcsc.lenovo.com
Lenovo AI website:https://www.lenovo.com/us/en/data-center/solutions/analytics-ai/
Lenovo HPC website:https://www.lenovo.com/us/en/data-center/solutions/hpc/
LeSI website:https://www.lenovo.com/us/en/p/data-center/servers/high-density/lenovo-scalable-infrastructure/wmd00000276
OpenHPC User Resources:https://github.com/openhpc/ohpc/wiki/User-Resources
Intel oneAPI:https://software.intel.com/content/www/us/en/develop/tools.html
Altair PBS Professional Documentation:https://www.altair.com/pbs-professional/
Lenovo Compute Orchestration in HPC Data Centers with Slurmhttps://lenovopress.lenovo.com/lp1701-lenovo-compute-orchestration-in-hpc-data-centers-with-slurm
Related product families
Product families related to this document are the following:
Artificial Intelligence
High Performance Computing
Trademarks
Lenovo and the Lenovo logo are trademarks or registered trademarks of Lenovo in the United States, other countries, or both. A current list of Lenovo trademarks is available on the Web at https://www.lenovo.com/us/en/legal/copytrade/.
The following terms are trademarks of Lenovo in the United States, other countries, or both:
Lenovo®
From Exascale to Everyscale
Lenovo Neptune®
System x®
ThinkServer®
ThinkStation®
ThinkSystem®
The following terms are trademarks of other companies:
Intel®, Intel Optane™, Xeon®, and VTune™ are trademarks of Intel Corporation or its subsidiaries.
Linux® is the trademark of Linus Torvalds in the U.S. and other countries.
Microsoft® and Azure® are trademarks of Microsoft Corporation in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
×
View all documents published by this author
×
Configure and Buy
Please select a locale
Cancel
×
Full Change History
Changes in the February 8, 2024 update:
Updated for LiCO 7.2:
Updated login images
Cloud Tools menu
Adding a template to a workflow in LiCO
HPC runtime module list
Updated all the features under - What's new in LiCO 7.2 section
Added new supported GPU under - Validated hardware components section
NVIDIA L40
Added new servers support under - Supported servers (LiCO HPC/AI version) section
Lenovo ThinkSystem SR860 V3, SR850 V3, SR590 V2
Lenovo WenTian WR5220 G3 (C4C type)
Lenovo ThinkStation P620 (without out-of-band monitoring)
Changes in the August 24, 2023 update:
Updated for LiCO 7.1:
Updated login images
Updated Jupyter notebook to official name Jupyter Notebook
Rebranding on features occurred in - Features for LiCO users section and in Additional features for LiCO HPC/AI Users section:
Merged Intel Optimization for TensorFlow2 Single Node and Intel Optimization for TensorFlow2 Multi Node, changed them to Intel Extension for TensorFlow.
Changed Intel Optimization for PyTorch Single Node to Intel Extension for PyTorch.
Changes in the March 21, 2023 update:
Updated URLs and minor changes for PyTorch - Additional features for LiCO HPC/AI Users section
Changes in the February 22, 2023 update:
Updated for LiCO 7.0:
New features
User Home Menu & job templates
New features to Lenovo Accelerated AI
New Deep Learning (DL) computer vision (CV) templates
Removed AI Studio feature
New Intel features
New features with Hybrid HPC with Microsoft Azure
New LiCO deployment section
New table subscription and support
Changes in the August 8, 2022 update:
Updated for LiCO 6.4:
Support NVIDIA Multi-Instance GPU(MIG) on Slurm, LSF and PBS
Expanded support for Intel oneAPI tools and templates (HPC/AI version)
Lenovo Accelerated AI and AI Studio support configurable Early Stopping strategy
Integrated RStudio Server
Integrated CVAT labelling tool
Add Platform Analysis tools for HPC cluster administrator and end user.
Updated LiCO HPC/AI version ordering information
Changes in the June 1, 2021 update:
Updated for LiCO 6.2
Support for new ThinkSystem V2 servers (SR670 V2, SR650 V2, SR630 V2, SD650 V2, SD650-N V2)
Lenovo Accelerated AI for Text Classification
Trained model packaging into a docker container image
Intel OneAPI tools and templates (HPC/AI version)
Cluster View for more detailed resource monitoring (HPC/AI version)
Changes in the December 15, 2020 update:
Updated for LiCO 6.1
Improved text editor with syntax-aware formatting based on extension
Ability to add tags and comments to completed jobs for easy filtering
Integrated Singularity container image builder (HPC/AI)
Support for CharlieCloud (HPC)
Support for NVIDIA A100 (HPC/AI)
Changes in the August 4, 2020 update:
Updated for LiCO 6.0
Workflow feature to pre-define multiple job steps
Infiniband Monitoring for administrators (HPC/AI version)
Estimated job start times (HPC/AI version)
Support for ThinkSystem SR850, SR635, SR645, SR665
Support for NVIDIA RTX 8000
Changes in the April 15, 2020 update:
Updated for LiCO 5.5
TensorFlow 2 standard template support
Cut/Copy/Paste/Duplicate files and folders from within the LiCO storage interface
User notification via email when a job completes or is cancelled
Added CPU and GPU utilization monitoring for K8S/AI running jobs
Expanded billing support to include memory, GPU, and storage utilization (HPC/AI version)
Ability to export daily and monthly billing reports for administrators and users (HPC/AI version)
Cluster utilization monitoring for administrators
Changes in the November 5, 2019 update:
Updated for LiCO 5.4
Jupyter notebook access from the cluster
“Favorites” tab for quick access to frequently used job submission templates
Import/Export of custom job submission templates for ease of sharing between users
Job submission template support for PyTorch and scikit-learn
Additional version of LiCO to support AI workloads on a Kubernetes-based cluster
Changes in the April 16, 2019 update:
Updated for LiCO 5.3
End-to-end AI training workflows for Image Classification, Object Detection, and Instance Segmentation
Option to copy existing jobs into the original template, with existing parameters pre-filled and modifiable
Enablement on the Lenovo ThinkSystem SR950
Support for Keras, Chainer AI framework, and latest MxNet optimizations for Intel CPU training
Integration support for HBase and MongoDB BigData sources
Integration support for trained AI model publishing to git repositories
REST interface to instantiate LiCO AI training functions from DevOps tools
Changes in the February 13, 2019 update:
Corrected the link to the support page - Related links section
Changes in the November 12, 2018 update:
Updates for LiCO Version 5.2
Queue management functionality, providing the ability to create and manage workload queues from within the GUI
Enablement on the Lenovo ThinkSystem SD650 and SR670 systems
Exclusive mode, to select whether to dedicate or share systems when requesting resources
Support for NVIDIA GPU Cloud (NGC) Container images
Lenovo Accelerated AI templates to provide easy-to-use training and inference functionality for a variety of AI use cases
Enhancements to storage management within LiCO
Changes in the August 7, 2018 update:
Revised list of validated software components
Added information on LiCO installation through Lenovo Professional Services
First published: 26 March 2018
×
Course Detail
About Lenovo
Our Company
Smarter Technology For All
News
Investors Relations
Compliance
ESG
Product Recycling
Product Security
Product Recalls
Executive Briefing Center
Lenovo Cares
Careers
Formula 1 Partnership
Products & Services
Laptops & Ultrabooks
Desktop Computers
Workstations
Gaming & VR
Tablets
Servers, Storage, & Networking
Accessories & Software
Services & Warranty
Product FAQs
Outlet
Deals
Lenovo Coupons
Cloud Security Software
Windows 11 Upgrade
Shop By Industry
Small Business Solutions
Large Enterprise Solutions
Government Solutions
Healthcare Solutions
K-12 Solutions
Higher Education Solutions
Student & Teacher Discounts
Healthcare Discounts
First Responder Discount
Senior Discounts
Resources
Gaming Community
LenovoEDU Community
LenovoPRO Community
LenovoPRO Small Business
MyLenovo Rewards
Lenovo Financing
Trade-in Program
Customer Discounts
Affiliate Program
Legion Influencer Program
Student Influencer Program
Affinity Program
Employee Purchase Program
Laptop Buying Guide
Where to Buy
Customer Support
Contact Us
Policy FAQs
Return Policy
Shipping Information
Order Lookup
Register a Product
Replacement Parts
Technical Support
Forums
Provide Feedback
[]
© 2024 Lenovo. All rights reserved.
Privacy
Site Map
Terms of Use
External
Submission Policy
Sales terms and conditions
Anti-Slavery
and Human Trafficking Statement
×
LiCO 7.0.0 Installation Guide (for EL7)_zh-cn
LiCO 7.0.0 Installation Guide (for EL7)_zh-cn
LiCO 7.0.0安装指南(适用于EL7.9)
第 1 章 概述
典型的集群部署
本指南基于包含管理节点、登录节点和计算节点的典型集群部署进行介绍。
下表对集群中的元素进行了说明。
表1. 典型集群中的元素说明
元素
描述
管理 节点
HPC/AI 集群的核心,承载集群管理、监控、调度、策略管理以及用户与帐户管理等主要功能。
计算节点
完成计算任务。
登录节点
将集群连接到外部网络或集群。用户必须使用登录节点登录并上传应用程序数据、开发编译器以及提交调度的任务。
文件服务
文件服务提供共享存储功能。它通过高速网络连接到集群节点。文件服务设置不在本指南的讨论范围内。我们使用简单的NFS 设置。
节点BMC接口
用于访问节点BMC 系统。
节点eth接口
用于管理集群中的节点。还可以用于传输计算数据。
高速网路接口
可选。用于支持文件服务。还可以用于传输计算数据。
注:LiCO 也支持仅包含管理节点和计算节点的集群部署。在这种情况下,所有安装在登录节点上的LiCO 模块都需要安装在管理节点上。
第2 章 部署集群环境
如果集群环境已存在,请跳过本章。
安装操作系统
安装正式版centos7.9。您可以选择最小安装。配置内存并重新启动操作系统:
echo '* soft memlock unlimited' >> /etc/security/limits.conf
echo '* hard memlock unlimited' >> /etc/security/limits.conf
reboot
在集群中的其他节点上部署操作系统
配置环境变量
步骤1. 登录到管理节点。
步骤2. 编辑/root/lico_env.local 并更新该文件中列出的环境变量:
# Management node hostname
sms_name="head"
# IP address of management node in the cluster intranet
sms_ip="192.168.0.2"
# Network interface card MAC address corresponding to the management node IP
sms_mac='b8:59:9f:2b:a2:e2'
# Management node BMC address.
sms_bmc='192.168.1.2'
# set the dns server
dns_server="192.168.10.10"
# set the ipv4 gateway
ipv4_gateway="192.168.0.1"
# Set the domain name
domain_name="hpc.com"
# Set OpenLDAP domain name
lico_ldap_domain_name="dc=hpc,dc=com"
# set OpenLDAP domain component
lico_ldap_domain_component="hpc"
# original OS repository directory
repo_backup_dir="/install/custom/backup"
# OS image pathway
iso_path="/isos"
# Local repository directory for OS
os_repo_dir="/install/custom/server"
sdk_repo_dir="/install/custom/sdk"
# Local repository directory for confluent
confluent_repo_dir="/install/custom/confluent"
# link name of repository directory for Lenovo OpenHPC
link_ohpc_repo_dir="/install/custom/ohpc"
# link name of repository directory for LiCO
link_lico_repo_dir="/install/custom/lico"
# link name of repository directory for LiCO-dep
link_lico_dep_repo_dir="/install/custom/lico-dep"
# Local repository directory for Lenovo OpenHPC, please change it
# according to this version.
ohpc_repo_dir="/install/custom/ohpc-1.3.9"
# LiCO repository directory for LiCO, please change it according to this version.
lico_repo_dir="/install/custom/lico-7.0.0"
# LiCO repository directory for LiCO-dep, please change it according to this version.
lico_dep_repo_dir="/install/custom/lico-dep-7.0.0"
# icinga api listener port
icinga_api_port=5665
# If the confluence automatic discovery mode is enabled, skip the following configurations.
# Total compute nodes
num_computes="2"
# Prefix of compute node hostname.
# Change the configuration according to actual conditions.
compute_prefix="c"
# Compute node hostname list.
# Change the configuration according to actual conditions.
c_name[0]=c1
c_name[1]=c2
# Compute node IP list.
# Change the configuration according to actual conditions.
c_ip[0]=192.168.0.6
c_ip[1]=192.168.0.16
# Network interface card MAC address corresponding to the compute node IP.
# Change the configuration according to actual conditions.
c_mac[0]=fa:16:3e:73:ec:50
c_mac[1]=fa:16:3e:27:32:c6
# Compute node BMC address list.
c_bmc[0]=192.168.1.6
c_bmc[1]=192.168.1.16
# Total login nodes. If there is no login node in the cluster, or the management node
# and the login node is the same node, the number of logins must be "0".
# And the 'l_name', 'l_ip', 'l_mac', and 'l_bmc' lines need to be removed.
num_logins="1"
# Login node hostname list.
# Change the configuration according to actual conditions.
l_name[0]=l1
# Login node IP list.
# Change the configuration according to actual conditions.
l_ip[0]=192.168.0.15
# Network interface card MAC address corresponding to the login node IP.
# Change the configuration according to actual conditions.
l_mac[0]=fa:16:3e:2c:7a:47
# Login node BMC address list.
l_bmc[0]=192.168.1.15
步骤3. 将更改保存到lico_env.local,并重新加载环境变量:
chmod 600 lico_env.local
source lico_env.local
设置好集群环境后,在登录或管理节点上配置公共网络的IP 地址。这样即可从外部网络登录到LiCO Web 门户。
创建本地存储库
创建本地存储库以安装操作系统。
对于CentOS
步骤1. 运行以下命令以创建ISO 存储的目录:
mkdir -p ${iso_path}
步骤2. 从http://isoredirect.centos.org/centos/7/isos/x86_64/下载CentOS-7-x86_64-Everything-2009.iso 和sha256sum.txt 文件。
步骤3. 将文件拷贝到${iso_path}。
步骤4. 运行以下命令以获取该iso 文件的验证码,并确保此验证码与sha256sum.txt 中的验证
码相同。
cd ${iso_path}
sha256sum CentOS-7-x86_64-Everything-2009.iso
cd ~
步骤5. 运行以下命令以装载镜像:
mkdir -p ${os_repo_dir}
mount -o loop ${iso_path}/CentOS-7-x86_64-Everything-2009.iso ${os_repo_dir}
步骤6. 运行以下命令以配置本地存储库:
cat << eof > ${iso_path}/EL7-OS.repo
[EL7-OS]
name=el7-centos
enabled=1
gpgcheck=0
type=rpm-md
baseurl=file://${os_repo_dir}
eof
cp -a ${iso_path}/EL7-OS.repo /etc/yum.repos.d/
步骤7. 运行以下命令以关闭该存储库:
yum install --disablerepo=CentOS* -y yum-utils
yum-config-manager --disable CentOS\*
对于RHEL
步骤1. 运行以下命令以创建ISO 存储的目录:
mkdir -p ${iso_path}
步骤2. 将RHEL-7.9-20200917.0-Server-x86_64-dvd1.iso和RHEL-7.9-20200917.0-Server-x86_64-dvd1.iso.MD5SUM 文件拷贝到${iso_path} 目录。
步骤3. 运行以下命令以检查iso 文件的有效性:
cd ${iso_path}
md5sum -c RHEL-7.9-20200917.0-Server-x86_64-dvd1.iso.MD5SUM
cd ~
步骤4. 运行以下命令以装载镜像:
mkdir -p ${os_repo_dir}
mount -o loop ${iso_path}/RHEL-7.9-20200917.0-Server-x86_64-dvd1.iso ${os_repo_dir}
步骤5. 运行以下命令以配置本地存储库:
cat << eof > ${iso_path}/RHELS7-OS.repo
[RHELS7-OS]
name=RHELS7-OS
enabled=1
gpgcheck=0
type=rpm-md
baseurl=file://${os_repo_dir}
eof
cp -a ${iso_path}/RHELS7-OS.repo /etc/yum.repos.d/
安装Lenovo Confluent
步骤1. 下载以下软件包:
https://hpc.lenovo.com/downloads/22a/confluent-3.4.0-2-el7.tar.xz
步骤2. 将该软件包上传到/root 目录。
步骤3. 创建Confluent 本地存储库:
yum install -y bzip2 tar
mkdir -p $confluent_repo_dir
cd /root
tar -xvf confluent-3.4.0-2-el7.tar.xz -C $confluent_repo_dir
cd $confluent_repo_dir/lenovo-hpc-el7
./mklocalrepo.sh
cd ~
步骤4. 安装Lenovo Confluent:
yum install -y lenovo-confluent tftp-server
systemctl enable confluent --now
systemctl enable tftp.socket --now
systemctl disable firewalld --now
systemctl enable httpd --now
步骤5. 创建Confluent 帐户:
source /etc/profile.d/confluent_env.sh
confetty create /users/
步骤6. 关闭SELinux:
sed -i 's/enforcing/disabled/' /etc/selinux/config
setenforce 0
通过Confluent 部署操作系统
注意:在Confluent 自动发现模式下部署集群时,请按照以下网站上的指导进行操作:
• https://hpc.lenovo.com/users/documentation/confluentdisco.html
• https://hpc.lenovo.com/users/documentation/confluentquickstart_el8.html
建议在Confluent 中创建名为“all”、“login”、“compute”等的组,并将节点与特定组绑定;否则,本指南其余章节中提到的命令可能会无法使用。
指定全局行为
注:在指定全局行为之前,请确保节点中的BMC 用户名和密码一致;如果不一致,则应进行修改。
在Confluent 中,大多数配置都是面向节点的,可从一个组派生出来。默认组“everything”会自动添加到每个节点中,该组可用于指示全局设置。
nodegroupattrib everything deployment.useinsecureprotocols=firmware \
console.method=ipmi dns.servers=$dns_server dns.domain=$domain_name \
net.ipv4_gateway=$ipv4_gateway net.ipv4_method="static"
deployment.useinsecureprotocols=firmware 会启用PXE 支持(“仅限HTTPS”的模式默认是唯一允许的模式),console.method=ipmi 可以被跳过,但如果指定它,则会指示Confluent 使用IPMI访问文本控制台以启用nodeconsole 命令。可以采用相同的方式指定密码和类似内容,并建议使用-p 参数来提示输入值,以避免它们出现在命令历史记录中。请注意,如果未指定,则默认的根用户密码行为是禁用基于密码的登录:
nodegroupattrib everything -p bmcuser bmcpass crypted.rootpassword
在Confluent 中定义节点
步骤1. 在lico_env.local 文件中定义要用于Confluent 的管理节点:
nodegroupdefine all
nodegroupdefine login
nodegroupdefine compute
nodedefine $sms_name
nodeattrib $sms_name net.hwaddr=$sms_mac
nodeattrib $sms_name net.ipv4_address=$sms_ip
nodeattrib $sms_name hardwaremanagement.manager=$sms_bmc
步骤2. 定义要用于Confluent 的计算节点配置:
for ((i=0; i<$num_computes; i++)); do
nodedefine ${c_name[$i]};
nodeattrib ${c_name[$i]} net.hwaddr=${c_mac[$i]};
nodeattrib ${c_name[$i]} net.ipv4_address=${c_ip[$i]};
nodeattrib ${c_name[$i]} hardwaremanagement.manager=${c_bmc[$i]};
nodedefine ${c_name[$i]} groups=all,compute;
done
步骤3. 定义要用于Confluent 的登录节点配置:
for ((i=0; i<$num_logins; i++)); do
nodedefine ${l_name[$i]};
nodeattrib ${l_name[$i]} net.hwaddr=${l_mac[$i]};
nodeattrib ${l_name[$i]} net.ipv4_address=${l_ip[$i]};
nodeattrib ${l_name[$i]} hardwaremanagement.manager=${l_bmc[$i]};
nodedefine ${l_name[$i]} groups=all,login;
done
准备名称解析
注:不强制要求使用特定的名称解析解决方案,但如果没有相应的策略,可通过以下步骤设定一个基本策略。
步骤1. 将节点信息附加到/etc/hosts:
for node_name in $(nodelist); do
noderun -n $node_name echo {net.ipv4_address} {node} {node}.{dns.domain} >> /etc/hosts
done
步骤2. 安装并启动到dnsmasq,创建可通过dns 使用的/etc/hosts:
yum install -y dnsmasq
systemctl enable dnsmasq --now
初始化Confluent 操作系统部署
用户可以通过osdeploy 命令的已初始化的子命令来设置操作系统部署要求。-i 参数用于交互式地提示可用的选项:
ssh-keygen -t ed25519
chown confluent /var/lib/confluent
osdeploy initialize -i
执行操作系统部署
对于Centos
步骤1. 导入安装介质:
osdeploy import ${iso_path}/CentOS-7-x86_64-Everything-2009.iso
步骤2. 开始部署:
nodedeploy all -n centos-7.9-x86_64-default
步骤3. (可选)查看部署进度:
nodedeploy all
对于RHEL
步骤1. 导入安装介质:
osdeploy import ${iso_path}/RHEL-7.9-20200917.0-Server-x86_64-dvd1.iso
步骤2. 开始部署:
nodedeploy all -n rhel-7.9-x86_64-default
步骤3. 检查部署过程:
nodedeploy all
为其他节点启用NGINX
注意:如果其他节点的操作系统是CentOS,请运行以下命令以关闭该存储库::
nodeshell all "yum install --disablerepo=CentOS* -y yum-utils"
nodeshell all "yum-config-manager --disable CentOS\*"
为其他节点禁用防火墙
nodeshell all "systemctl disable firewalld --now"
nodeshell all "sed -i 's/enforcing/disabled/' /etc/selinux/config"
nodeshell all "setenforce 0"
检查点A
检查并确保安装已完成:
nodeshell all uptime
注:输出应如下所示:
c1: 05:03am up 0:02, 0 users, load average: 0.20, 0.13, 0.05
c2: 05:03am up 0:02, 0 users, load average: 0.20, 0.14, 0.06
l1: 05:03am up 0:02, 0 users, load average: 0.17, 0.13, 0.05
……
为节点安装基础结构软件
注:在安装节点列中,M 代表“管理节点”,L 代表“登录节点”,C 代表“计算节点”。
软件名称
组件名称
版本
服务名称
安装节点
备注
nfs
nfs-utils
1.3.0
nfs-server
M
/
chrony
chrony
3.4
chronyd
M、C、L
/
slurm
ohpc-slurm-server
1.3.8
mung、slurmctld
M
/
ohpc-slurm-client
1.3.8
mung、slurmd
C、L
/
icinga2
icinga2
2.13.5
icinga2
M、C、L
/
singularity
singularity-ohpc
3.7.1
/
M
mpi
openmpi3-gnu8-ohpc
3.1.4
/
M
至少需要安装一种类型的MPI
mpich-gnu8-ohpc
3.3.1
/
M
mvapich2-gnu8-ohpc
2.3
/
M
为安装程序定义共享目录
以下步骤以/install/installer 为例,说明了如何为安装程序定义共享目录:
步骤1. 管理节点共享/install/installer:
yum install -y nfs-utils
systemctl enable nfs-server --now
share_installer_dir="/install/installer"
mkdir -p $share_installer_dir
echo "/install/installer *(rw,async,no_subtree_check,no_root_squash)" >> /etc/exports
exportfs -a
步骤2. 分发/etc/hosts:
cp /etc/hosts $share_installer_dir
scp $share_installer_dir/hosts c1:/etc/ (注:c1为计算节点的hostname,如果存在多台计算节点,请将此文件拷贝到其它所有计算节点)
步骤3. 启用httpd 服务:
cat << eof > /etc/httpd/conf.d/installer.conf
Alias /install /install
AllowOverride None
Require all granted
Options +Indexes +FollowSymLinks
eof
systemctl restart httpd
注:/install is the basic directory for repository which configured in the lico_env.local
file.
步骤4.对于CentOS,运行以下命令:
cp /etc/yum.repos.d/EL7-OS.repo $share_installer_dir
sed -i '/^baseurl=/d' $share_installer_dir/EL7-OS.repo
echo "baseurl=http://${sms_name}${os_repo_dir}" >>$share_installer_dir/EL7-OS.repo
scp $share_installer_dir/EL7-OS.repo c1:/etc/yum.repos.d/ (注:c1为计算节点的hostname,如果存在多台计算节点,请将此文件拷贝到其它所有计算节点)
对于RHEL,运行以下命令:
cp /etc/yum.repos.d/RHELS7-OS.repo $share_installer_dir
sed -i '/^baseurl=/d' $share_installer_dir/RHELS7-OS.repo
echo "baseurl=http://${sms_name}${os_repo_dir}" >>$share_installer_dir/RHELS7-OS.repo
scp $share_installer_dir/RHELS7-OS.repo c1:/etc/yum.repos.d/ (注:c1为计算节点的hostname,如果存在多台计算节点,请将此文件拷贝到其它所有计算节点)
步骤5. 启用存储库:
nodeshell all yum clean all
nodeshell all yum makecache
步骤6. 为集群节点安装NFS:
nodeshell all yum install -y nfs-utils
步骤7. 为集群节点配置该共享目录:
nodeshell all mkdir -p $share_installer_dir
nodeshell all "echo '${sms_ip}:/install/installer /install/installer \
nfs nfsvers=4.0,nodev,nosuid,noatime 0 0' >> /etc/fstab"
步骤8. 装载共享目录:
nodeshell all mount /install/installer
为其他节点配置内存
步骤1. 运行以下命令:
cp /etc/security/limits.conf $share_installer_dir
nodeshell all cp $share_installer_dir/limits.conf /etc/security/limits.conf
nodeshell all reboot
步骤2. 检查并确保安装已完成:
nodeshell all uptime
为管理节点配置本地yum 存储库
步骤1. 从https://hpc.lenovo.com/lico/downloads/5.5/Lenovo-OpenHPC-1.3.9.CentOS_7.x86_64.tar 下载包。
步骤2. 将包上传到管理节点上的/root directory 目录。
步骤3. 运行以下命令以配置本地Lenovo OpenHPC 存储库:
mkdir -p $ohpc_repo_dir
cd /root
tar xvf Lenovo-OpenHPC-1.3.9.CentOS_7.x86_64.tar -C $ohpc_repo_dir
rm -rf $link_ohpc_repo_dir
ln -s $ohpc_repo_dir $link_ohpc_repo_dir
$link_ohpc_repo_dir/make_repo.sh
为登录和计算节点配置本地yum 存储库
步骤1. 运行以下命令以添加本地存储库:
cp /etc/yum.repos.d/Lenovo.OpenHPC.local.repo $share_installer_dir
sed -i '/^baseurl=/d' $share_installer_dir/Lenovo.OpenHPC.local.repo
sed -i '/^gpgkey=/d' $share_installer_dir/Lenovo.OpenHPC.local.repo
echo "baseurl=http://${sms_name}${link_ohpc_repo_dir}/CentOS_7" \
>> $share_installer_dir/Lenovo.OpenHPC.local.repo
echo "gpgkey=http://${sms_name}${link_ohpc_repo_dir}/CentOS_7\
/repodata/repomd.xml.key" >> $share_installer_dir/Lenovo.OpenHPC.local.repo
步骤2. 运行以下命令为其他节点分发文件:
nodeshell all cp $share_installer_dir/Lenovo.OpenHPC.local.repo \
/etc/yum.repos.d/
nodeshell all "echo -e %_excludedocs 1 >> ~/.rpmmacros"
配置LiCO 依赖项存储库
步骤1. 下载以下软件包:
https://hpc.lenovo.com/lico/downloads/7.0/lico-dep-7.0.0.el7.x86_64.tgz
步骤2. 将该软件包上传到/root 目录。
步骤3. 为管理节点配置存储库:
mkdir -p $lico_dep_repo_dir
cd /root
tar -xvf lico-dep-7.0.0.el7.x86_64.tgz -C $lico_dep_repo_dir
rm -rf $link_lico_dep_repo_dir
ln -s $lico_dep_repo_dir $link_lico_dep_repo_dir
$link_lico_dep_repo_dir/mklocalrepo.sh
注意:在运行这些命令之前,需确保已事先为上述操作和后续操作在管理节点中配置了本地操作系统存储库。
步骤4. (可选)如果集群已存在,请检查您的版本
步骤5. 为其他节点配置存储库:
cp /etc/yum.repos.d/lico-dep.repo $share_installer_dir
sed -i '/^baseurl=/d' $share_installer_dir/lico-dep.repo
sed -i '/^gpgkey=/d' $share_installer_dir/lico-dep.repo
sed -i "/name=lico-dep-local-library/a\baseurl=http://${sms_name}\
${link_lico_dep_repo_dir}/library/" $share_installer_dir/lico-dep.repo
sed -i "/name=lico-dep-local-library/a\gpgkey=http://${sms_name}\
${link_lico_dep_repo_dir}/RPM-GPG-KEY-LICO-DEP-EL7" $share_installer_dir/lico-dep.repo
sed -i "/name=lico-dep-local-standalone/a\baseurl=http://${sms_name}\
${link_lico_dep_repo_dir}/standalone/" $share_installer_dir/lico-dep.repo
sed -i "/name=lico-dep-local-standalone/a\gpgkey=http://${sms_name}\
${link_lico_dep_repo_dir}/RPM-GPG-KEY-LICO-DEP-EL7" $share_installer_dir/lico-dep.repo
nodeshell all cp $share_installer_dir/lico-dep.repo /etc/yum.repos.d
获取LiCO 安装包
步骤1. 请与联想销售或支持人员联系获取EL7 的LiCO 7.0.0 发布包lico-release-7.0.0.el7.x86_64.tar.gz 和LiCO许可证文件。
步骤2. 将该发布包上传到管理节点。
为LiCO 配置本地存储库
步骤1. 为管理节点配置本地存储库:
mkdir -p $lico_repo_dir
tar zxvf lico-release-7.0.0.el7.x86_64.tar.gz -C $lico_repo_dir --strip-components 1
rm -rf $link_lico_repo_dir
ln -s $lico_repo_dir $link_lico_repo_dir
$link_lico_repo_dir/mklocalrepo.sh
步骤2. 为其他节点配置本地yum 存储库:
cp /etc/yum.repos.d/lico-release.repo $share_installer_dir
sed -i '/baseurl=/d' $share_installer_dir/lico-release.repo
sed -i "/name=lico-release-host/a\baseurl=http://${sms_name}\
${link_lico_repo_dir}/host/" $share_installer_dir/lico-release.repo
sed -i "/name=lico-release-public/a\baseurl=http://${sms_name}\
${link_lico_repo_dir}/public/" $share_installer_dir/lico-release.repo
步骤3. 分发存储库文件:
nodeshell all cp $share_installer_dir/lico-release.repo /etc/yum.repos.d/
配置Confluent 本地存储库
步骤1. 为其他节点配置本地存储库:
cp /etc/yum.repos.d/lenovo-hpc.repo $share_installer_dir
sed -i '/^baseurl=/d' $share_installer_dir/lenovo-hpc.repo
sed -i '/^gpgkey=/d' $share_installer_dir/lenovo-hpc.repo
echo "baseurl=http://${sms_name}${confluent_repo_dir}/lenovo-hpc-el7" \
>> $share_installer_dir/lenovo-hpc.repo
echo "gpgkey=http://${sms_name}${confluent_repo_dir}/lenovo-hpc-el7\
/lenovohpckey.pub" >> $share_installer_dir/lenovo-hpc.repo
步骤2. 分发存储库文件:
nodeshell all cp $share_installer_dir/lenovo-hpc.repo /etc/yum.repos.d/
安装Slurm
步骤1. 安装基础包:
yum install -y lenovo-ohpc-base
步骤2. 安装Slurm:
yum install -y ohpc-slurm-server
步骤3. 安装Slurm 客户端:
nodeshell all yum install -y ohpc-base-compute ohpc-slurm-client lmod-ohpc
步骤4. (可选)防止以非根用户身份登录到计算节点:
nodeshell compute "echo 'account required pam_slurm.so' >> /etc/pam.d/sshd"
注:To allow non-root logins to the compute nodes regardless of whether a Slurm
job is running on these nodes, skip this step. If this step is performed, the non-root
logins to the compute nodes will only be allowed when a Slurm job is running on
these nodes under a particular username. In this case, non-root ssh logins will work
for that particular username in this process.
步骤5. (可选)要保存以前的作业信息并使用内存计费功能,请参考以下信息来安装和配置slurm计费功能:
https://slurm.schedmd.com/accounting.html
配置NFS
配置用户共享目录
以下步骤以/home 为例说明了如何创建用户共享目录。
步骤1. 管理节点共享/home:
echo "/home *(rw,async,no_subtree_check,no_root_squash)" >> /etc/exports
exportfs -a
步骤2. 卸载已装载的/home:
nodeshell all "sed -i '/ \/home /d' /etc/fstab"
nodeshell all umount /home
步骤3. 为集群节点配置该共享目录:
nodeshell all "echo '${sms_ip}:/home /home nfs nfsvers=4.0,nodev,nosuid,noatime \
0 0' >> /etc/fstab"
步骤4. 装载共享目录:
nodeshell all mount /home
为OpenHPC 配置共享目录
步骤1. 管理用于OpenHPC 的节点共享/opt/ohpc/pub:
echo "/opt/ohpc/pub *(ro,no_subtree_check,fsid=11)" >> /etc/exports
exportfs -a
步骤2. 为集群节点配置共享目录:
nodeshell all mkdir -p /opt/ohpc/pub
nodeshell all "echo '${sms_ip}:/opt/ohpc/pub /opt/ohpc/pub nfs \
nfsvers=4.0,nodev,noatime 0 0' >> /etc/fstab"
步骤3. 装载共享目录:
nodeshell all mount /opt/ohpc/pub
注意:This directory is mandatory. If you have shared this directory from the
management node and mounted it on all other nodes, skip this step.
为监控配置监控目录
Step 1. 管理节点共享 /opt/lico/pub:
mkdir -p /opt/lico/pub
echo "/opt/lico/pub *(ro,sync,no_subtree_check,no_root_squash)" >> /etc/exports
exportfs -a
Step 2. 为集群节点配置共享目录:
nodeshell all mkdir -p /opt/lico/pub
nodeshell all "echo '${sms_ip}:/opt/lico/pub /opt/lico/pub nfs nfsvers=4.0,nodev,noatime \
0 0' >> /etc/fstab"
Step 3. 挂载共享目录:
nodeshell all mount /opt/lico/pub
配置Chrony
注:如果已为集群中的节点配置Chrony 服务,请跳过本节。
步骤1. 安装Chrony:
yum install -y chrony
步骤2. 集群时间不同步可能会导致意外问题。请参考以下信息来配置chronyd 服务:
https://chrony.tuxfamily.org/documentation.html
安装GPU 驱动程序
应在每个GPU 计算节点上安装GPU 驱动程序。如果只有部分节点安装了GPU,请将nodeshell
命令中的compute 参数替换为GPU 节点对应的节点范围。
禁用Nouveau 驱动程序
要安装显示驱动程序,请先禁用Nouveau 驱动程序。
步骤1. 将操作系统配置为在文本控制台上启动,然后重新启动系统:
注:仅当操作系统配置为在图形桌面上启动时才需要执行此步骤。
nodeshell compute systemctl set-default multi-user.target
步骤2. 添加配置文件:
cat << eof > $share_installer_dir/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
eof
步骤3. 分发配置文件:
nodeshell compute cp $share_installer_dir/blacklist-nouveau.conf \
/usr/lib/modprobe.d/blacklist-nouveau.conf
步骤4. 重新生成内核initramfs:
nodeshell compute dracut --force
步骤5. 使配置生效:
nodeshell compute reboot
安装GPU 驱动程序
步骤1. 从https://us.download.nvidia.com/tesla/520.61.07/NVIDIA-Linux-x86_64-520.61.07.run,下载NVIDIA 驱动程序,并将其复制到共享目录$share_installer_dir。
步骤2. 运行以下命令:
yum install -y tar bzip2 make automake gcc gcc-c++ pciutils \
elfutils-libelf-devel libglvnd-devel
yum install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
chmod +x $share_installer_dir/NVIDIA-Linux-x86_64-520.61.07.run
$share_installer_dir/NVIDIA-Linux-x86_64-520.61.07.run --add-this-kernel -s
nodeshell compute $share_installer_dir/NVIDIA-Linux-x86_64-520.61.07-custom.run -s
步骤3. 在GPU 节点上运行以下命令来确定是否可以识别到GPU:
nodeshell compute nvidia-smi
注:如果运行该命令无法识别GPU 信息,请重启所有GPU 节点。然后重新运行该命令。
nodeshell compute reboot
为GPU 驱动程序配置自动启动
步骤1. 添加配置文件:
cat << eof > $share_installer_dir/nvidia-persistenced.service
[Unit]
Description=NVIDIA Persistence Daemon
After=syslog.target
[Service]
Type=forking
PIDFile=/var/run/nvidia-persistenced/nvidia-persistenced.pid
Restart=always
ExecStart=/usr/bin/nvidia-persistenced --verbose
ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced/*
TimeoutSec=300
[Install]
WantedBy=multi-user.target
eof
cat << eof > $share_installer_dir/nvidia-modprobe-loader.service
[Unit]
Description=NVIDIA ModProbe Service
After=syslog.target
Before=slurmd.service
[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-modprobe -u -c=0
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
eof
步骤2. 分发配置文件:
nodeshell compute cp $share_installer_dir/nvidia-persistenced.service \
/usr/lib/systemd/system/nvidia-persistenced.service
nodeshell compute cp $share_installer_dir/nvidia-modprobe-loader.service \
/usr/lib/systemd/system/nvidia-modprobe-loader.service
nodeshell compute mkdir -p /var/run/nvidia-persistenced
步骤3. 重新启动服务:
nodeshell compute systemctl daemon-reload
nodeshell compute systemctl enable nvidia-persistenced --now
nodeshell compute systemctl enable nvidia-modprobe-loader.service --now
配置Slurm
步骤1. 从以下网站下载slurm.conf:
https://hpc.lenovo.com/lico/downloads/7.0/examples/conf/
步骤2. 将slurm.conf 上传到$share_installer_dir,并根据配置“slurm.conf”中的说明修改此文件。
步骤3. 从以下网站下载cgroup.conf:
https://hpc.lenovo.com/lico/downloads/7.0/examples/conf/
步骤4. 将cgroup.conf 上传到$share_installer_dir。
步骤5. 分发配置:
cp $share_installer_dir/slurm.conf /etc/slurm/slurm.conf
nodeshell all cp $share_installer_dir/slurm.conf /etc/slurm/slurm.conf
cp $share_installer_dir/cgroup.conf /etc/slurm/cgroup.conf
nodeshell all cp $share_installer_dir/cgroup.conf /etc/slurm/cgroup.conf
cp /etc/munge/munge.key $share_installer_dir
nodeshell all cp $share_installer_dir/munge.key /etc/munge/munge.key
步骤6. (可选)仅适用于GPU 节点:
• 如果在GPU 节点中启用了Nvidia MIG,请配置GPU 节点。有关更多信息,请参阅:https://gitlab.com/nvidia/hpc/slurm-mig-discovery
• 如果在GPU 节点中禁用或不支持Nvidia MIG,请从https://hpc.lenovo.com/lico/downloads/7.0/examples/conf/ 下载示例文件gres.conf,根据您的需要编辑该示例文件,然后将其上传到GPU 节点上的/etc/slurm 中。
步骤7. 启动服务:
systemctl enable munge
systemctl enable slurmctld
systemctl restart munge
systemctl restart slurmctld
步骤8. 启动其他节点服务:
nodeshell all systemctl enable munge
nodeshell all systemctl restart munge
nodeshell all systemctl enable slurmd
nodeshell all systemctl restart slurmd
配置slurm.conf
需要配置的典型字段如下:
• 集群名称:
ClusterName=mycluster
• 管理节点名称:
SlurmctldHost=c031
• GPU 调度:
GresTypes=gpu
注:在集群中,仅当包含GPU 节点时才需要使用此条目。如果集群不包含GPU 节点,请删除此条目。
• 集群节点定义:
NodeName=c031 Gres=gpu:4 CPUs=28 RealMemory=200000 State=UNKNOWN
NodeName=c032 Gres=gpu:4 CPUs=28 RealMemory=200000 State=UNKNOWN
– Gres:GPU 数量
– CPUs:节点上的CPU 数量。
– RealMemory:节点的内存大小(单位:M)。
• 分区定义:
PartitionName=compute Nodes=c0[31-32] Default=YES MaxTime=INFINITE State=UP
PartitionName=compute1 Nodes=c0[31-32] Default=NO MaxTime=INFINITE State=UP
注:
– Default:标识此分区是否为默认分区。提交作业时,可以选择一个分区。如果不选择分区,则会使用默认分区。
– Nodes : NodeName 列表。如果NodeName 无规律, 允许Nodes=[nodename1,nodename2,...]
。
• 强制实施分区限制定义:
EnforcePartLimits=ALL
注意:如果要在作业请求的资源超过集群资源量时提交直接错误响应,请使用此配置。否则,作业将保留在队列中。有关如何配置slurm.conf 的更多详细信息,请参阅Slurm 官方网站:
https://slurm.schedmd.com/slurm.conf.html
gres.conf
该配置文件描述了安装在GPU 节点上的GPU 和GPU 内存。此文件的内容可能因GPU 节点而异。
请修改以下内容:
Name=gpu File=/dev/nvidia[0-3]
注:应该将/dev/nvidia[0-3] 中的[0–3] 更改为您的实际GPU 配置。例如,/dev/nvidia0 表示一张GPU 卡,而/dev/nvidia[0-1] 则表示两张GPU 卡。
(可选)安装Icinga2
注:如果没有使用LiCO 来监控集群,请跳过本节。
如果已准备好IB 设备,并且需要安装IB 驱动程序,请在安装Icinga2 之前参考LeSI 22A_SI 最佳配置在操作系统中安装IB 驱动程序。USB 网卡会影响MPI 调用的IB 网卡。因此,建议在开机过程中添加“rmmod cdc_ether”,以删除USB 网卡。
步骤1. 安装icinga2:
yum install -y icinga2
nodeshell all yum install -y icinga2
步骤2. 安装LiCO icinga2 插件:
yum install -y nagios-plugins-ping lico-icinga-plugin-slurm
步骤3. 打开API 功能:
icinga2 api setup
步骤4. 配置icinga2:
icinga2 node setup --master --disable-confd
echo -e "LANG=en_US.UTF-8" >> /etc/sysconfig/icinga2
systemctl restart icinga2
步骤5. 为其他节点配置icinga2 代理:
nodeshell all icinga2 pki save-cert --trustedcert \
/var/lib/icinga2/certs/trusted-parent.crt --host ${sms_name}
for ((i=0;i<$num_computes;i++));do
ticket=`icinga2 pki ticket --cn ${c_name[${i}]}`
nodeshell ${c_name[${i}]} icinga2 node setup --ticket ${ticket} --cn ${c_name[${i}]} \
--endpoint ${sms_name} --zone ${c_name[${i}]} --parent_zone master --parent_host \
${sms_name} --trustedcert /var/lib/icinga2/certs/trusted-parent.crt \
--accept-commands --accept-config --disable-confd
done
for ((i=0;i<$num_logins;i++));do
ticket=`icinga2 pki ticket --cn ${l_name[${i}]}`
nodeshell ${l_name[${i}]} icinga2 node setup --ticket ${ticket} --cn ${l_name[${i}]} \
--endpoint ${sms_name} --zone ${l_name[${i}]} --parent_zone master --parent_host \
${sms_name} --trustedcert /var/lib/icinga2/certs/trusted-parent.crt \
--accept-commands --accept-config --disable-confd
done
nodeshell all "echo -e 'LANG=en_US.UTF-8' >> /etc/sysconfig/icinga2"
nodeshell all systemctl restart icinga2
步骤6. 在管理节点上配置全局模板文件:
mkdir -p /etc/icinga2/zones.d/global-templates
echo -e "object CheckCommand \"lico_monitor\" {\n command = [ \"/opt/lico/pub/monitor/\
lico_icinga_plugin/lico-icinga-plugin\" ]\n}" > /etc/icinga2/zones.d/global-templates/commands.conf
echo -e "object CheckCommand \"lico_job_monitor\" {\n command = [\"/opt/lico/pub/monitor/\
lico_icinga_plugin/lico-job-icinga-plugin\" ]\n}" >> /etc/icinga2/zones.d/global-\
templates/commands.conf
echo -e "object CheckCommand \"lico_check_procs\" {\n command =[ \"/opt/lico/pub/monitor\
/lico_icinga_plugin/lico-process-icinga-plugin\" ]\n}" >>/etc/icinga2/zones.d/global-\
templates/commands.conf
echo -e "object CheckCommand \"lico_vnc_monitor\" {\n command =[ \"/opt/lico/pub/monitor/\
lico_icinga_plugin/lico-vnc-icinga-plugin\" ]\n}" >> /etc/icinga2/zones.d/global-\
templates/commands.conf
chown -R icinga:icinga /etc/icinga2/zones.d/global-templates
步骤7. 定义区域文件:
mkdir -p /etc/icinga2/zones.d/master
echo -e "object Host \"${sms_name}\" {\n check_command = \"hostalive\"\n \
address = \"${sms_ip}\"\n vars.agent_endpoint = name\n}\n" >> \
/etc/icinga2/zones.d/master/hosts.conf
for ((i=0;i<$num_computes;i++));do
echo -e "object Endpoint \"${c_name[${i}]}\" {\n host = \"${c_name[${i}]}\"\n \
port = \"${icinga_api_port}\"\n log_duration = 0\n}\nobject \
Zone \"${c_name[${i}]}\" {\n endpoints = [ \"${c_name[${i}]}\" ]\n \
parent = \"master\"\n}\n" >> /etc/icinga2/zones.d/master/agent.conf
echo -e "object Host \"${c_name[${i}]}\" {\n check_command = \"hostalive\"\n \
address = \"${c_ip[${i}]}\"\n vars.agent_endpoint = name\n}\n" >> \
/etc/icinga2/zones.d/master/hosts.conf
done
for ((i=0;i<$num_logins;i++));do
echo -e "object Endpoint \"${l_name[${i}]}\" {\n host = \"${l_name[${i}]}\"\n \
port = \"${icinga_api_port}\"\n log_duration = 0\n}\nobject \
Zone \"${l_name[${i}]}\" {\n endpoints = [ \"${l_name[${i}]}\" ]\n \
parent = \"master\"\n}\n" >> /etc/icinga2/zones.d/master/agent.conf
echo -e "object Host \"${l_name[${i}]}\" {\n check_command = \"hostalive\"\n \
address = \"${l_ip[${i}]}\"\n vars.agent_endpoint = name\n}\n" >> \
/etc/icinga2/zones.d/master/hosts.conf
done
echo -e "apply Service \"lico\" {\n check_command = \"lico_monitor\"\n \
max_check_attempts = 5\n check_interval = 1m\n retry_interval = 30s\n assign \
where host.name == \"${sms_name}\"\n assign where host.vars.agent_endpoint\n \
command_endpoint = host.vars.agent_endpoint\n}\n" > \
/etc/icinga2/zones.d/master/service.conf
echo -e "apply Service \"lico-procs-service\" {\n check_command = \"lico_\
check_procs\"\n enable_active_checks = false\n assign where \
host.name == \"${sms_name}\"\n assign where host.vars.agent_endpoint\n \
command_endpoint = host.vars.agent_endpoint\n}\n" >> \
/etc/icinga2/zones.d/master/service.conf
echo -e "apply Service \"lico-job-service\" {\n check_command = \"lico_job_monitor\"\n \
max_check_attempts = 5\n check_interval = 1m\n retry_interval = 30s\n assign \
where host.name == \"${sms_name}\"\n assign where host.vars.agent_endpoint\n \
command_endpoint = host.vars.agent_endpoint\n}\n" >> \
/etc/icinga2/zones.d/master/service.conf
echo -e "apply Service \"lico-vnc-service\" {\n check_command = \"lico_vnc_monitor\"\n \
max_check_attempts = 5\n check_interval = 15s\n retry_interval = 30s\n assign \
where host.name == \"${sms_name}\"\n assign where host.vars.agent_endpoint\n \
command_endpoint = host.vars.agent_endpoint\n}\n" >> \
/etc/icinga2/zones.d/master/service.conf
chown -R icinga:icinga /etc/icinga2/zones.d/master
chmod u+s /opt/lico/pub/monitor/lico_icinga_plugin/lico-vnc-icinga-plugin
systemctl restart icinga2
步骤8. 启用服务:
nodeshell all modprobe ipmi_devintf
nodeshell all systemctl enable icinga2
modprobe ipmi_devintf
systemctl enable icinga2
步骤9. (可选)检查配置:
icinga2 daemon -C
安装MPI
步骤1. 运行以下命令以将三个模块(OpenMPI、MPICH 和MVAPICH)安装到系统:
yum install -y openmpi3-gnu8-ohpc mpich-gnu8-ohpc mvapich2-gnu8-ohpc
步骤2. 设置默认模块。
运行以下命令以将OpenMPI 模块设置为默认模块:
yum install -y lmod-defaults-gnu8-openmpi3-ohpc
运行以下命令以将MPICH 模块设置为默认模块:
yum install -y lmod-defaults-gnu8-mpich-ohpc
运行以下命令以将MVAPICH 模块设置为默认模块:
yum install -y lmod-defaults-gnu8-mvapich2-ohpc
注:MVAPICH 要求Infiniband 或OPA 存在且工作正常。应安装以下包以支持Infiniband
或OPA:
yum list installed libibmad5 librdmacm1 rdma infinipath-psm dapl-devel \
dapl-utils libibverbs-utils
MPI 类型之间的依赖关系
安装MPI 时,请遵循以下依赖关系:
• 要使用MVAPICH2(psm2),请安装mvapich2-psm2-gnu8-ohpc。
• 要使用OpenMPI(PMIx),请安装openmpi3-pmix-slurm-gnu8-ohpc。
• openmpi3-gnu8-ohpc 与openmpi3-pmix-slurm-gnu8-ohpc 不兼容。
• mvapich2-psm2-gnu8-ohpc 与mvapich2-gnu8-ohpc 不兼容。
安装Singularity
Singularity 是一个面向HPC 的轻型容器框架。
步骤1. 运行以下命令安装Singularity:
yum install -y singularity-ohpc
步骤2. 通过将以下内容添加到module try-add 区块的末尾以编辑
/opt/ohpc/pub/modulefiles/ohpc:
module try-add singularity
步骤3. 在module del 区块中,添加以下内容作为第一行:
module del singularity
步骤4. 运行以下命令:
source /etc/profile.d/lmod.sh
如果默认模块随着lmod-defaults* 包的安装发生了更改,对/opt/ohpc/pub/modulefiles/ohpc
所做的更改可能会丢失。此情况下, 请再次修改/opt/ohpc/pub/modulefiles/ohpc , 或在
/etc/profile.d/lmod.sh 的末尾添加module try-add singularity。
检查点B
步骤1. 运行以下命令以测试Slurm 是否已正确安装:
sinfo
注:
• 输出应如下所示:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
normal* up 1-00:00:00 2 idle c[1-2]
……
• 所有节点的状态应为idle;不接受idle*。
步骤2. 运行以下命令以添加test 帐户:
useradd test -m --uid 65530
nodeshell all useradd test -m --uid 65530
步骤3. 使用测试帐户和Slurm 分发的测试程序登录到计算节点:
su - test
mpicc -O3 /opt/ohpc/pub/examples/mpi/hello.c
srun -n 8 -N 1 -w
prun ./a.out
注:输出应如下所示:
Master compute host = c1
Resource manager = slurm
Launch cmd = mpiexec.hydra -bootstrap slurm ./a.out
Hello, world (8 procs total)
--> Process # 0 of 8 is alive. -> c1
--> Process # 4 of 8 is alive. -> c2
--> Process # 1 of 8 is alive. -> c1
--> Process # 5 of 8 is alive. -> c2
--> Process # 2 of 8 is alive. -> c1
--> Process # 6 of 8 is alive. -> c2
--> Process # 3 of 8 is alive. -> c1
--> Process # 7 of 8 is alive. -> c2
步骤4. 结束测试:
exit
注:要离开“test”用户会话,请再次输入“exit”。
步骤5. 删除该test 用户:
nodeshell all userdel test
userdel test -r
命令执行完毕后,您将切换为管理节点的根用户。
第3 章安装LiCO 依赖项
集群检查
检查环境变量smsname、{lico_ldap_domain_name} 和${lico_repo_dir}
echo $sms_name;echo $lico_repo_dir;echo $lico_ldap_domain_name
注:
• 输出应如下所示:
head
/install/custom/lico-7.0.0
dc=hpc,dc=com
检查安装程序的共享目录
检查共享目录$share_installer_dir:
echo $share_installer_dir
注:
• 输出应如下所示:
/install/installer
检查LiCO 依赖项存储库:
yum repolist | grep lico-dep-local
注:
• 输出应如下所示:
lico-dep-local-library lico-dep-local-library 234
lico-dep-local-standalone lico-dep-local-standalone 108
检查LiCO 存储库:
yum repolist | grep lico-release
lico-release-host lico-release-host 82
lico-release-public lico-release-public 42
检查NFS
注:如果集群不使用NFS 作为分布式文件系统,请跳过本节。
检查NFS 服务:
systemctl status nfs-server | grep Active && exportfs -v | grep -E '/home|/opt/ohpc/pub'
注:
• 输出应如下所示:
Active: active (exited) since Sat 2019-10-12 16:04:21 CST; 2 days ago
/opt/ohpc/pub
squash)
/home
请检查所有其他节点上的装载点:
nodeshell all "df | grep -E '/home | /opt/ohpc/pub'"
注:
• 输出应如下所示:
c1: 10.1.1.31:/home 485642240 111060992 374581248 23% /home
c1: 10.1.1.31:/opt/ohpc/pub 485642240 111060992 374581248 23% /opt/ohpc/pub
检查Slurm
检查slurmctld:
systemctl status slurmctld | grep Active
注:
• 输出应如下所示:
Active: active (running) since Tue 2018-07-24 19:02:49 CST; 1 months 20 days ago
检查计算节点上的slurmd:
nodeshell compute "systemctl status slurmd | grep Active"
注:
• 输出应如下所示:
c1: Active: active (running) since Tue 2018-07-24 19:02:49 CST; 1 months 20 days ago
c2: Active: active (running) since Sat 2018-07-21 17:16:59 CST; 1 months 23 days ago
检查MPI 和Singularity
module list
注:
• 输出应如下所示:
Currently Loaded Modules:
1) prun/1.3 2) gnu8/8.3.0 3) openmpi3/3.1.4 4) singularity/3.7.1 5) ohpc
安装LiCO 依赖项
注:在安装节点列中,M 代表“管理节点”,L 代表“登录节点”,C 代表“计算节点”。
软件
组件
版本
服务
安装节点
备注
rabbitmq
rabbitmq-server
3.9.10
rabbitmq-server
M
mariadb
mariadb-server
10.3.32
mariadb-server
M
influxdb
influxdb
1.8.10
influxdb
M
confluent
confluent
3.4.0
confluent
M
libuser
libuser
0.62
M
python3-libuser
0.62
M
安装RabbitMQ
LiCO 使用RabbitMQ 作为消息代理。
步骤1. 安装RabbitMQ:
yum install -y rabbitmq-server
步骤2. 启动RabbitMQ 服务:
systemctl enable rabbitmq-server --now
安装MariaDB
LiCO 使用MariaDB 作为数据存储的对象相关数据库。
步骤1. 安装MariaDB:
yum install -y mariadb-server mariadb-devel
步骤2. 启动MariaDB 服务:
systemctl enable mariadb --now
步骤3. 为LiCO 配置MariaDB:
注:安装lico-passwd-tool 时将使用此处的用户名和密码。因此,请在安装MariaDB 时
记录这些信息。
mysql
create database lico character set utf8 collate utf8_bin;
create user '
grant ALL on lico.* to '
exit
步骤4. 配置MariaDB 限制:
sed -i "/\[mysqld\]/a\max-connections=1024" /etc/my.cnf.d/mariadb-server.cnf
mkdir /usr/lib/systemd/system/mariadb.service.d
cat << eof > /usr/lib/systemd/system/mariadb.service.d/limits.conf
[Service]
LimitNOFILE=10000
eof
systemctl daemon-reload
systemctl restart mariadb
安装InfluxDB
LiCO 使用InfluxDB 作为执行存储监控的时间序列数据库。
步骤1. 安装InfluxDB:
yum install -y influxdb
systemctl enable influxdb --now
步骤2. 创建InfluxDB 用户:
• 进入InfluxDB shell:
influx
• 创建数据库:
create database lico
• 使用数据库:
use lico
• 创建管理员用户时,请确保使用字符串密码:
create user
• 退出InfluxDB shell:
exit
• 进行配置:
sed -i '/# auth-enabled = false/a\ auth-enabled = true' /etc/influxdb/config.toml
• 重新启动InfluxDB:
systemctl restart influxdb
配置用户认证
安装OpenLDAP
注:如果配置了OpenLDAP 或已在集群中使用其他认证服务,请跳过本节。OpenLDAP 是轻量级目录访问协议的开源版本。建议使用OpenLDAP 来管理用户。但是,LiCO也支持与Linux-PAM 兼容的其他认证服务。
步骤1. 安装OpenLDAP:
yum install -y slapd-ssl-config openldap-servers
步骤2. 修改配置文件:
sed -i "s/dc=hpc,dc=com/${lico_ldap_domain_name}/" /usr/share/openldap-servers/lico.ldif
sed -i "/dc:/s/hpc/${lico_ldap_domain_component}/" /usr/share/openldap-servers/lico.ldif
sed -i "s/dc=hpc,dc=com/${lico_ldap_domain_name}/" /etc/openldap/slapd.conf
slapadd -v -l /usr/share/openldap-servers/lico.ldif -f /etc/openldap/slapd.conf -b \
${lico_ldap_domain_name}
步骤3. 获取OpenLDAP 密钥:
slappasswd
步骤4. 编辑/etc/openldap/slapd.conf,将根用户密码设置为获得的密钥。
rootpw
步骤5. 更改配置文件的所属用户:
chown -R ldap:ldap /var/lib/ldap
chown ldap:ldap /etc/openldap/slapd.conf
步骤6. 编辑/etc/sysconfig/slapd 配置,并确保和如下配置一致。
SLAPD_URLS="ldapi:/// ldap:/// ldaps:///"
SLAPD_OPTIONS="-f /etc/openldap/slapd.conf"
步骤7. 启动OpenLDAP 服务:
systemctl enable slapd --now
步骤8. 验证该服务是否已启动:
systemctl status slapd
安装libuser
libuser 模块是OpenLDAP 的推荐工具包。该模块的安装是可选的。
步骤1. 安装libuser:
yum install -y libuser python3-libuser
步骤2. 从https://hpc.lenovo.com/lico/downloads/7.0/examples/conf/ 将libuser.conf 下载到管理节点上的/etc 中,然后按照文件中的说明修改此文件。
安装OpenLDAP-client
echo "TLS_REQCERT never" >> /etc/openldap/ldap.conf
cp /etc/openldap/ldap.conf $share_installer_dir
nodeshell all cp $share_installer_dir/ldap.conf /etc/openldap/ldap.conf
安装nss-pam-ldapd
nss-pam-ldapd 是一个名称服务交换模块和可插拔认证模块。LiCO 使用此模块来认证用户。
步骤1. 运行以下命令以在管理节点上安装nss-pam-ldapd:
yum install -y nss-pam-ldapd authconfig
authconfig --useshadow --usemd5 --enablemkhomedir --disablecache \
--enablelocauthorize --disablesssd --disablesssdauth \
--enableforcelegacy --enableldap --enableldapauth --disableldaptls \
--ldapbasedn=${lico_ldap_domain_name} --ldapserver="ldap://${sms_name}" --updateall
echo "rootpwmoddn uid=admin,${lico_ldap_domain_name}" >> /etc/nslcd.conf
systemctl enable nslcd
systemctl start nslcd
步骤2. 运行以下命令以在其他节点上安装nss-pam-ldapd:
nodeshell all yum install -y nss-pam-ldapd authconfig
nodeshell all authconfig --useshadow --usemd5 --enablemkhomedir \
--disablecache --enablelocauthorize --disablesssd \
--disablesssdauth --enableforcelegacy --enableldap --enableldapauth \
--disableldaptls --ldapbasedn="${lico_ldap_domain_name}" \
--ldapserver="ldap://${sms_name}" --updateall
nodeshell all echo "\""rootpwmoddn uid=admin,${lico_ldap_domain_name}"\"" \>\> /etc/nslcd.conf
nodeshell all systemctl enable nslcd
nodeshell all systemctl start nslcd
注:默认情况下,xCAT 部署操作系统将禁用SElinux。因此,运行authconfig 命令将输出一条参考消息,如“getsebool: SELinux is disabled”。您可以忽略此消息,其并不影响功能。
第4 章 安装LiCO
安装LiCO Core
注:在安装节点列中,M 代表“管理节点”,L 代表“登录节点”,C 代表“计算节点”。
表7. 要安装的LiCO 组件列表
组件
软件
版本
服务
安装节点
备注
lico-core
lico-core
7.0.0
lico
M
lico-portal
lico-portal
7.0.0
L
lico-core-extend
lico-confluent-proxy
1.2.1
M
lico-vnc-proxy
1.3.0
lico-vnc-proxy
M
lico-env
lico-ai-scripts
1.3.0
M
lico monitor
lico-icinga-mond
1.5.0
lico-icinga-mond
M
lico-icinga-plugin-slurm
1.5.0
M
lico alarm notification
lico-sms-agent
1.2.7
lico-sms-agent
M
如果需要通过短信发送警报,则是必需的
lico-wechat-agent
1.2.7
lico-wechat-agent
M
如果需要通过微信发送警报,则是必需的
lico-mail-agent
1.3.8
lico-mail-agent
M
如果需要通过电子邮件发送警报,则是必需的
lico manager
lico-file-manager
2.2.2
lico-file-manager
M
基本组件
lico-task
lico-async-task
1.1.2
lico-async-task、lico-async-task-proxy
M、L
步骤1. 根据需要执行以下操作之一:
• 要使用LiCO 进行集群监控,请按以下方式安装LiCO 模块:
对于Centos,安装lico-async-task之前需要配置源:
cat << eof >>${iso_path}/EL7-OS.repo
[EL7-extras]
name=el7-extras
enabled=1
baseurl=http://mirror.centos.org/centos/7/extras/x86_64/
eof
cp -a ${iso_path}/EL7-OS.repo /etc/yum.repos.d/
(可选)对于REHL,安装lico-async-task之前需要配置源:
cat << eof >>${iso_path}/RHELS7-OS.repo
[EL7-extras]
name=el7-extras
enabled=1
baseurl=http://mirror.centos.org/centos/7/extras/x86_64/
eof
cp -a ${iso_path}/RHELS7-OS.repo /etc/yum.repos.d/
步骤2.开始安装lico组件
yum clean all
yum makecache
yum install -y python3-cffi
yum install -y lico-core lico-file-manager lico-confluent-proxy \
lico-vnc-proxy lico-icinga-mond lico-async-task lico-service-tool \
lico-ai-scripts
步骤3. (可选)提供电子邮件、短息和微信服务:
yum install -y lico-mail-agent
yum install -y lico-sms-agent
yum install -y lico-wechat-agent
步骤4. 重新启动服务:
systemctl restart confluent
在登录节点上安装LiCO 模块:
nodeshell login yum install -y lico-workspace-skeleton lico-portal lico-service-tool
第5 章 配置LiCO
配置服务帐户
注:
• 本指南中配置了MariaDB、InfluxDB、Confluent 和LDAP 的用户名或密码。
• 从/etc/icinga2/conf.d/api-users.conf 文件中获取icinga2 的用户名和密码。
在管理节点上,使用工具lico-password-tool。
按照以下提示输入MariaDB、InfluxDB、Confluent、Icinga2 和LDAP 的用户名或密码:
lico-password-tool
为其他节点配置服务帐户:
nodeshell login mkdir -p /var/lib/lico/tool
cp /var/lib/lico/tool/.db $share_installer_dir
nodeshell login cp $share_installer_dir/.db /var/lib/lico/tool
配置集群节点
步骤1. 将集群信息导入系统:
cp /etc/lico/nodes.csv.example /etc/lico/nodes.csv
步骤2. 编辑集群信息文件:
vi /etc/lico/nodes.csv
注:建议您将此文件下载到本地计算机,并使用Excel 或其他表格编辑软件编辑该文件。之后,您可以将其上传到管理节点并覆盖原始文件。
机房信息
以下是机房信息表的示例。
表8. 机房信息表
room
name
location_description
Shanghai Solution Room
Shanghai Zhangjiang
在name 和location_description 字段中输入一个信息条目。
逻辑组信息
管理人员可以使用逻辑组将集群中的节点划分为不同的组。逻辑组不会影响对计算机资源的使用或
权限配置。
以下是逻辑组信息表的示例。
表9. 逻辑组信息表
group
name
login
您需要在name 字段中输入至少一个逻辑组名称。
机房行信息
机房行是指机房内的机架顺序。输入集群节点所在机架行的信息。
以下是机房行信息表的示例。
表10. 机房行信息表
row
name
index
belonging_room
row1
1
Shanghai Solution Room
请在以下字段中输入至少一个行信息条目:
• name:行名称(在同一个机房内必须是唯一的)
• index:行顺序(必须是正整数,并且在同一个机房内必须是唯一的)
• belonging_room:行所在机房的名称
注:请将此信息添加到机房信息表中。
机架信息
以下是机架信息表的示例。
表11. 机架信息表
rack
name
column
belonging_row
rack1
1
row1
请在以下字段中输入至少一个机架信息条目:
• name:机架名称(在同一个机房内必须是唯一的)
• column:机架位置列,也称为机架编号(必须是正整数,并且在同一行中必须是唯一的)
• belonging_row:机架所在行的名称
注:请将此信息添加到行信息表中。
机箱信息
如果集群中有机箱,请输入机箱信息。
以下是机箱信息表的示例。
表12. 机箱信息表
chassis
name
belonging_rack
location_u_in_rack
location_u_in_rack
chassis1
rack1
7
7X20
该表中的字段说明如下:
• name:机箱名称(在同一个机房内必须是唯一的)
• belonging_rack:机架位置名称(必须使用机架信息表中配置的名称)
• location_u_in_rack:机箱基座在机架中的位置(单位:U)。在标准机柜中,该值应该在1 到
42 之间。例如,机箱基座位于5U。
• machine_type:机箱类型
节点信息
在节点信息表中输入集群中所有节点的信息。由于宽度原因,示例节点信息表拆分为两个部分进
行显示。
表13. 节点信息表(第1 部分)
node
name
node-type
immip
hostip
machine_type
ipmi_user
head
head
10.240.212.13
127.0.0.1
7X58
表14. 节点信息表(第2 部分)
ipmi_pwd
belonging_rack
belonging_chassis
location_u
groups
rack1
2
login
字段的描述如下:
• name:节点主机名(不需要域名)
• nodetype:head 表示管理节点;login 表示登录节点;compute 表示计算节点。
• immip:节点BMC 系统的IP 地址
• hostip:主机网络上的节点的IP 地址
• machine_type:节点的产品名称
• ipmi_user:节点的XCC(BMC)帐户
• ipmi_pwd:节点的XCC(BMC)密码
• belonging_rack:节点所在机架的名称(需要将配置的名称添加到机架信息表中)。如果节点属于机箱,请将此字段留空。
• belonging_chassis:节点所在机箱的名称(需要将配置的名称添加到机箱信息表中)。如果节点属于机架,请将此字段留空。
• location_u:节点位置。如果节点位于机箱中,请输入节点所在的机箱插槽。如果节点位于机架
中,请输入节点基座在机架中的位置(单位:U)。
• groups:节点位置逻辑组的名称。一个节点可以属于多个逻辑组。组名应该用“;”分隔。请在逻辑组信息表中配置逻辑组名称。
配置通用资源
该模块仅在调度器为slurm 时执行。执行以下操作之一来配置通用资源:
• 如果默认没有配置通用资源,而GPU 资源位于集群中且需要计费,请执行如下命令:
cp /etc/lico/gres.csv.example /etc/lico/gres.csv
• 如果为Slurm 配置了其他通用资源,并且需要对这些资源进行计费,请执行以下命令:
vi /etc/lico/gres.csv
注:为确保历史计费信息的准确性,已从gres.csv 中删除的通用资源仍会保留在系统数据库中。
通用资源信息
以下是通用资源信息表的示例:
code
display_name
unit
gpu
GPU
card
在以下字段中输入至少一项通用资源信息:
• code:此代码应与调度器中定义的通用资源类型一致。如果已经按照本文档安装了LiCO,则
可以根据slurm.conf 中的GresTypes 配置填写该代码。
• display_name:LiCO 系统中显示的通用资源名称。建议使用有意义的显示名称。
• unit:资源单位。
集群服务列表
注:在安装节点列中,M 代表“管理节点”,L 代表“登录节点”,C 代表“计算节点”。
表15. 集群服务列表
软件
组件
服务
默认端口
安装节点
lico
lico-core
lico
18080/tcp
M
lico-confluent-proxy
18081/tcp
M
lico-vnc-proxy
lico-vnc-proxy
18082/tcp、18083/tcp
M
lico-vnc-mond
lico-vnc-mond
C
lico-sms-agent
lico-sms-agent
18092/tcp
M
lico-wechat-agent
lico-wechat-agent
18090/tcp
M
lico-mail-agent
lico-mail-agent
18091/tcp
M
lico-file-manager
lico-file-manager
18085/tcp
M
lico-async-task
lico-async-task-proxy
18086/tcp
M、L
lico-async-task
18084/tcp
M、L
lico依赖项
nginx
nginx
80/tcp、443/tcp
L、M
rabbitmq
rabbitmq-server
5762/tcp
M
mariadb
mariadb
3306/tcp
confluent
confluent
4005/tcp、13001/tcp
M
influxdb
influxdb
8086/tcp、8088/tcp
M
ldap
slapd
389/tcp、636/tcp
M
nslcd
M、C、L
集群
nfs
nfs
111/tcp、111/udp、2049/tcp、2049/udp
M
chrony
chronyd
M
slurm
munge
M、C
slurmctld
6817/tcp
M
slurmd
6818/tcp
C
icinga2
icinga2
5665/tcp、5665/udp
M、C、L
dns
named
53/udp
M
dhcp
dhcpd
67/udp
M
配置LiCO 组件
有关配置LiCO 的更多信息,请参阅:
https://hpc.lenovo.com/lico/downloads/7.0/configuration/host/configuration.html
lico-portal
为了防止https 和NGINX Web 服务器之间的冲突,可能需要修改安装了lico-portal 模块(该模
块为外部Web 服务提供不同的端口)的节点的一些路径文件。
/etc/nginx/nginx.conf
可编辑/etc/nginx/nginx.conf,将端口更改为8080:
listen 8080 default_server;
listen [::]:8080 default_server;
要隐藏服务器版本信息,请修改/etc/nginx/nginx.conf,将server_tokens 关闭:
http{
......
sendfile on;
server_tokens off;
……
}
/etc/nginx/conf.d/https.conf
可编辑/etc/nginx/conf.d/https.conf,将默认https 端口443 更改为另一个端口:
listen
注:请确保该端口未被其他应用程序使用且未被防火墙阻止。
/etc/nginx/conf.d/sites-available/lico.conf
可编辑/etc/nginx/conf.d/sites-available/lico.conf,将第一行替换为以下内容:
set $lico_host 127.0.0.1;
注:如果lico-portal 无法运行,可将127.0.0.1 更改为管理节点的IP 地址。
/etc/lico/portal.conf
可编辑/etc/lico/portal.conf,添加自定义快捷链接。请参阅/etc/lico/portal.conf.example 以了解配置格式。
初始化系统
初始化LiCO:
lico init
初始化云工具
初始化云工具:
lico cloudtool import -n 'CVAT' -c \
cvat -t cvat -p job_queue,cores_per_node,username,password,ram_size,share_dir
lico cloudtool import -n 'Jupyter Notebook' -c jupyter -t jupyter -p \
image_path,jupyter_cmd,password,job_queue,cores_per_node,gpu_per_node,check_timeout,run_time
lico cloudtool import -n 'RStudio Server' -c \
rstudio -t rstudio -p job_queue,cores_per_node,gpu_per_node,password,run_time
lico cloudtool import -n 'TigerVNC' -c \
tigervnc -t tigervnc -p job_queue,cores_per_node,gpu_per_node,runtime_id,password,run_time
初始化用户
完成以下步骤以初始化LiCO 用户:
步骤1. (可选)要使用LDAP 管理用户,请在LiCO 配置文件/etc/lico/lico.ini.d/user.ini 中找
到以下配置并将值更改为“true”:
USE_LIBUSER = false
步骤2. (可选)向LDAP 添加具有管理员权限的新用户:
luseradd
nodeshell all "su -
步骤3. 将用户导入LiCO:
lico import_user -u
第6 章启动并登录到LiCO
启动LiCO
步骤1. 在登录节点上启动LiCO 相关服务:
nodeshell login lico-service-tool enable
nodeshell login lico-service-tool start
步骤2. 在管理节点上启动LiCO 相关服务:
lico-service-tool enable
lico-service-tool start
-1.41 0 obj
<<
/Title (���T`�fz����{�^sS� \( L i C O \))
/Creator (�� w k h t m l t o p d f 0 . 1 2 . 4 - d e v - 4 f a 8 3 3 8)
/Producer (�� Q t 4 . 8 . 7)
/CreationDate (D:20240101171131-05'00')
>>
endobj
3 0 obj
<<
/Type /ExtGState
/SA true
/SM 0.02
/ca 1.0
/CA 1.0
/AIS false
/SMask /None>>
endobj
4 0 obj
[/Pattern /DeviceRGB]
endobj
6 0 obj
<<
/Type /XObject
/Subtype /Image
/Width 2100
/Height 248
/BitsPerComponent 8
/ColorSpace /DeviceRGB
/Length 7 0 R
/Filter /DCTDecode
>>
stream
���� JFIF d d �� C
�� C�� �4" ��
�� � } !1AQa"q2���#B��R��$3br�
%&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz���������������������������������������������������������������������������
�� � w !1AQaq"2�B���� #3R�br�
$4�%�&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz�������������������������������������������������������������������������� ? ���(����
(��
(��
(��
(��
(��
(��
(��
(��
(��
(��
���q� )��xj?�I-|�_]�?�"�� � G� I%�W"� �����G���� $�e� ^*� ��֯ˏ�9K�|������z��� �������� ���k�� �Q[��� Ң
�� %���� 隇�Q_�� QE QE QE QE QE QE QE QE QE QE QE QE W}�)� ��|6� ��K� Ҹ�����S� ���m� cN�� �qV�_�C�~g��� Ⱦ��%� ���R����� �}3� N��u|�� �� �m������p���s��W� �&��� �W��E�9�}F)�i�N������)���qVf-*�i)A��S��)��{^��Ҙ��҂ u���OC�҃1�N�ZZ�Ci�ԧS֥J�N*E�h3:R�ZJU�A2�����ǽ<UD�C����zt���"�R����t�����jS��ǥJ:Tk֤"�D�ru��F�jD�A��p�R�D:ԩ�j���:��H�}*��Ȓ���NN�̘�2i���O��Hz~U2s֡N�2U�ȕ
:��N�ꨙH�;� �3� ɬk�3\� �5�}k_&��h� emT��x��������οZ��iz��"���q���QE�r_� x{���yy��}����B�g0,�1l��� ��efS�=�O~��#���z�N��ӮK>���[}B!�u�"�2ISݔ���ua�E�k�|[�΅�M2�VҮ�$�`x=�Xa��xe!�b+���jx��-&����?E�/�\;UҚs������G�[h��"��:����� �`x������V�žQ��D
�Y
��F���