推荐算法配置指导
为 recommend
创建应用程序图表时,主要需要配置位于 templates/
文件夹中的四个文件:embedding.yaml
、prerank.yaml
、rank.yaml
和 train.yaml
。
embedding.yaml
embedding.yaml 示例
Yaml
apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
name: user-embedding-r4sport
namespace: {{ .Release.Namespace }}
spec:
schedule: '0 */1 * * *'
startingDeadlineSeconds: 0
concurrencyPolicy: Replace
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
suspend: false
ttlStrategy:
secondsAfterSuccess: 3600
secondsAfterCompletion: 3600
secondsAfterFailure: 3600
workflowSpec:
entrypoint: userEmbeddingFlow
volumes:
- name: huggingface
hostPath:
type: DirectoryOrCreate
path: >-
{{ .Values.userspace.appData }}/rss/model/huggingface
templates:
- name: userEmbeddingFlow
steps:
- - name: user-embedding
template: user-embedding-template
- name: user-embedding-template
container:
image: 'beclab/r4userembedding'
imagePullPolicy: Always
env:
- name: KNOWLEDGE_BASE_API_URL
value: {{ .Values.apiUrl }}
- name: TERMINUS_RECOMMEND_SOURCE_NAME
value: r4sport
volumeMounts:
- mountPath: /root/.cache/huggingface
name: huggingface
字段介绍
选项名称 | 描述 |
---|---|
apiVersion | 使用的API版本。 |
kind | 定义了一个CronWorkflow对象。 |
metadata.name | CronWorkflow的名称。 |
metadata.namespace | CronWorkflow所属的命名空间。 |
spec.schedule | Cron表达式,定义了CronWorkflow的调度时间。 |
spec.startingDeadlineSeconds | CronWorkflow的启动截止时间,表示从调度时间开始的最大延迟时间。 |
spec.concurrencyPolicy | 并发策略,指定了当CronWorkflow下一次调度时间到来时,如何处理当前正在运行的作业。 |
spec.successfulJobsHistoryLimit | 成功作业的历史记录限制数。 |
spec.failedJobsHistoryLimit | 失败作业的历史记录限制数。 |
spec.suspend | 指示是否暂停CronWorkflow的运行。 |
spec.ttlStrategy.secondsAfterSuccess | 成功作业完成后的存活时间,以秒为单位。 |
spec.ttlStrategy.secondsAfterCompletion | 作业完成后的存活时间,以秒为单位。 |
spec.ttlStrategy.secondsAfterFailure | 失败作业完成后的存活时间,以秒为单位。 |
spec.workflowSpec.entrypoint | Workflow的入口点。 |
spec.workflowSpec.volumes[0].name | 卷的定义,名称为huggingface。 |
spec.workflowSpec.volumes[0].hostPath.type | 宿主机路径类型,指定为目录或创建目录。 |
spec.workflowSpec.volumes[0].hostPath.path | 宿主机路径。 |
spec.workflowSpec.templates[0].name | Workflow模板的名称。 |
spec.workflowSpec.templates[0].steps[0][0].name | 步骤的定义,名称。 |
spec.workflowSpec.templates[0].steps[0][0].template | 引用的模板名称。 |
spec.workflowSpec.templates[1].name | 模板的名称。 |
spec.workflowSpec.templates[1].container.image | 容器的镜像名称。 |
spec.workflowSpec.templates[1].container.imagePullPolicy | 镜像拉取策略。 |
spec.workflowSpec.templates[1].container.env[0].name | 环境变量的定义,名称。 |
spec.workflowSpec.templates[1].container.env[0].value | 环境变量的值。 |
spec.workflowSpec.templates[1].container.env[1].name | 环境变量的定义,名称。 |
spec.workflowSpec.templates[1].container.env[1].value | 环境变量的值。 |
spec.workflowSpec.templates[1].container.volumeMounts[0].mountPath | 挂载路径的定义。 |
spec.workflowSpec.templates[1].container.volumeMounts[0].name | 挂载的卷名称。 |
prerank.yaml
prerank.yaml 示例
Yaml
apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
name: prerank-r4sport
namespace: {{ .Release.Namespace }}
spec:
schedule: '*/5 * * * *'
startingDeadlineSeconds: 0
concurrencyPolicy: Replace
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
suspend: false
ttlStrategy:
secondsAfterSuccess: 3600
secondsAfterCompletion: 3600
secondsAfterFailure: 3600
workflowSpec:
entrypoint: algorithm
volumes:
- name: nfs
hostPath:
type: DirectoryOrCreate
path: >-
{{ .Values.userspace.appData }}/rss/data
- name: juicefs
hostPath:
type: DirectoryOrCreate
path: >-
{{ .Values.userspace.appData }}/rss/data
templates:
- name: algorithm
steps:
- - name: recall
template: recall-template
- - name: prerank
template: prerank-template
- name: recall-template
container:
image: 'beclab/r4recall:v0.0.5'
imagePullPolicy: Always
env:
- name: KNOWLEDGE_BASE_API_URL
value: {{ .Values.apiUrl }}
- name: NFS_ROOT_DIRECTORY
value: /nfs
- name: JUICEFS_ROOT_DIRECTORY
value: /juicefs
- name: ALGORITHM_FILE_CONFIG_PATH
value: /usr/config/
- name: TERMINUS_RECOMMEND_SOURCE_NAME
value: r4sport
- name: SUPPORT_LANGUAGE
value: en
- name: SUPPORT_TIMELINESS
value: '0'
- name: SYNC_PROVIDER
value: bytetrade
- name: SYNC_FEED_NAME
value: sport
- name: SYNC_MODEL_NAME
value: bert_v2
volumeMounts:
- mountPath: /nfs
name: nfs
- mountPath: /juicefs
name: juicefs
- name: prerank-template
container:
image: 'beclab/r4prerank:v0.0.5'
imagePullPolicy: Always
env:
- name: KNOWLEDGE_BASE_API_URL
value: {{ .Values.apiUrl }}
- name: NFS_ROOT_DIRECTORY
value: /nfs
- name: JUICEFS_ROOT_DIRECTORY
value: /juicefs
- name: ALGORITHM_FILE_CONFIG_PATH
value: /usr/config/
- name: TERMINUS_RECOMMEND_SOURCE_NAME
value: r4sport
- name: SUPPORT_LANGUAGE
value: en
- name: SUPPORT_TIMELINESS
value: '0'
volumeMounts:
- mountPath: /nfs
name: nfs
- mountPath: /juicefs
name: juicefs
rank.yaml
rank.yaml 示例
Yaml
apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
name: rank-r4sport
namespace: {{ .Release.Namespace }}
spec:
schedule: '*/5 * * * *'
startingDeadlineSeconds: 0
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
suspend: false
ttlStrategy:
secondsAfterSuccess: 3600
secondsAfterCompletion: 3600
secondsAfterFailure: 3600
workflowSpec:
entrypoint: rankFlow
volumes:
- name: model
hostPath:
type: DirectoryOrCreate
path: >-
{{ .Values.userspace.appData }}/rss/model
templates:
- name: rankFlow
steps:
- - name: extractor
template: extractor-template
- - name: rank
template: rank-template
- name: extractor-template
container:
image: 'beclab/r4extractor:v0.0.5'
imagePullPolicy: Always
env:
- name: KNOWLEDGE_BASE_API_URL
value: {{ .Values.apiUrl }}
- name: TERMINUS_RECOMMEND_SOURCE_NAME
value: r4sport
volumeMounts:
- mountPath: /opt/rank_model
name: model
- name: rank-template
container:
image: 'beclab/r4rank'
imagePullPolicy: Always
env:
- name: KNOWLEDGE_BASE_API_URL
value: {{ .Values.apiUrl }}
- name: TERMINUS_RECOMMEND_SOURCE_NAME
value: r4sport
volumeMounts:
- mountPath: /opt/rank_model
name: model
train.yaml
train.yaml 示例
Yaml
apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
name: rank-r4sport
namespace: {{ .Release.Namespace }}
spec:
schedule: '*/5 * * * *'
startingDeadlineSeconds: 0
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
suspend: false
ttlStrategy:
secondsAfterSuccess: 3600
secondsAfterCompletion: 3600
secondsAfterFailure: 3600
workflowSpec:
entrypoint: rankFlow
volumes:
- name: model
hostPath:
type: DirectoryOrCreate
path: >-
{{ .Values.userspace.appData }}/rss/model
templates:
- name: rankFlow
steps:
- - name: extractor
template: extractor-template
- - name: rank
template: rank-template
- name: extractor-template
container:
image: 'beclab/r4extractor:v0.0.5'
imagePullPolicy: Always
env:
- name: KNOWLEDGE_BASE_API_URL
value: {{ .Values.apiUrl }}
- name: TERMINUS_RECOMMEND_SOURCE_NAME
value: r4sport
volumeMounts:
- mountPath: /opt/rank_model
name: model
- name: rank-template
container:
image: 'beclab/r4rank'
imagePullPolicy: Always
env:
- name: KNOWLEDGE_BASE_API_URL
value: {{ .Values.apiUrl }}
- name: TERMINUS_RECOMMEND_SOURCE_NAME
value: r4sport
volumeMounts:
- mountPath: /opt/rank_model
name: model