预备调查清单
待核查内容1:SoccerDB1明确来源于对YouTube公开转播比赛视频的手工裁剪与采样
待核查内容2:涉及对平台内容的下载与再利用
待核查内容3:YouTube的服务条款...默认不允许未经授权的抓取/下载
待核查内容4:YouTube的...“第三方训练”新政默认不允许...用于第三方模型训练
待核查内容5:只有创作者主动选择“允许”并对特定公司开放时才具备相应授权基础
行动计划
让我开始第一步,查找关于SoccerDB1数据集的信息。
Search
SoccerDB1 dataset YouTube videos我将开始核查关于SoccerDB1数据集的信息。从搜索结果中,我找到了一些关于SoccerDB1的线索,但需要更具体的信息。让我查找更详细的资料。
Search
SoccerKDNet SoccerDB1 dataset source YouTube我需要更具体的信息来确认SoccerDB1数据集的来源。让我尝试搜索更精确的信息。
Search
SoccerKDNet SoccerDB1 dataset YouTube video source我还需要更多关于SoccerKDNet和SoccerDB1的具体信息。让我尝试搜索更精确的内容。
Search
SoccerKDNet SoccerDB1 dataset YouTube source paper已核查结论清单
已核查内容1:是否存在名为“SoccerDB1”的公开数据集
待补充核查清单
待核查内容1:SoccerDB1明确来源于对YouTube公开转播比赛视频的手工裁剪与采样
待核查内容2:YouTube的服务条款...默认不允许未经授权的抓取/下载
待核查内容3:YouTube的...“第三方训练”新政默认不允许...用于第三方模型训练
待核查内容4:只有创作者主动选择“允许”并对特定公司开放时才具备相应授权基础
行动计划
让我开始查找SoccerKDNet和SoccerDB1的论文全文,以确认其数据来源。
Search
SoccerKDNet A Knowledge Distillation Framework for Action Recognition in Soccer Videos full paper pdf我需要查找更多关于SoccerDB1数据集的信息,特别是它的数据来源。让我尝试访问arXiv上的论文全文。
本文介绍了一个名为SoccerKDNet的知识蒸馏框架,用于足球视频中的动作识别。研究者针对传统复杂离线网络难以在资源受限场景部署的问题,提出了一种基于知识蒸馏的端到端迁移学习网络,该网络在Kinetics400数据集上预训练,并引入了独特的损失参数化方法进行分析。文章还介绍了一个新数据集SoccerDB1,包含448个视频,分为4个不同类别的足球运动员动作。研究者提出了一个独特的损失参数,可以线性加权每个网络预测的使用程度,并通过调整各种超参数进行了全面的性能研究。在SoccerDB1数据集上,该模型获得了67.20%的验证准确率,不仅显著优于现有方法,还能轻松泛化到新数据集。该数据集已在https://bit.ly/soccerdb1公开可用。
根据网页内容,SoccerDB1是研究者新引入的足球视频数据集,具有以下特点:
关于数据集的构建方法,网页内容没有提供详细信息,仅提到它是由研究者新引入的,包含了4个不同类别的足球运动员动作视频。网页没有具体说明数据收集方式、预处理步骤、标注方法等构建细节。
无与问题相关的图片内容。
我需要查看论文的全文,以获取更多关于SoccerDB1数据集来源的信息。让我尝试访问论文PDF。
SoccerKDNet: A Knowledge Distillation
Framework for Action Recognition in Soccer
Videos
Sarosij Bose1[0000*−0003−3014−4796], Saikat Sarkar2[0000−0001−7118−*6058], and
Amlan Chakrabarti3[0000*−0003−4380−*3172]
1 Department of Computer Science and Engineering, University of Calcutta, India
2 Department of Computer Science, Bangabasi College, University of Calcutta, India
3 A.K.Choudhury School of Information Technology, University of Calcutta, India
{sarosijbose2000, to.saikatsarkar17}@gmail.com
,acakcs@caluniv.ac.in
Abstract. Classifying player actions from soccer videos is a challenging
problem, which has become increasingly important in sports analytics
over the years. Most state-of-the-art methods employ highly complex of-
fline networks, which makes it difficult to deploy such models in resource
constrained scenarios. Here, in this paper we propose a novel end-to-end
knowledge distillation based transfer learning network pre-trained on the
Kinetics400 dataset and then perform extensive analysis on the learned
framework by introducing a unique loss parameterization. We also in-
troduce a new dataset named "SoccerDB1" containing 448 videos and
consisting of 4 diverse classes each of players playing soccer. Further-
more, we introduce an unique loss parameter that help us linearly weigh
the extent to which the predictions of each network are utilized. Finally,
we also perform a thorough performance study using various changed
hyperparameters. We also benchmark the first classification results on
the new SoccerDB1 dataset obtaining 67.20% validation accuracy. The
dataset has been made publicly available at:https://bit.ly/soccerdb1
Keywords: Soccer Analytics · Knowledge Distillation · Action Recog-
nition.
1
Introduction
Recognition of player actions in soccer video is a challenging computer vision
task. Existing vision-based soccer analytics models either rely heavily on man-
power which is responsible for tracking every aspect of the game or other offline
network based analytics products that are used for analyzing the game closely
once it’s over [12], [11]. It has been established that deep learning based methods
exceed their traditional counterparts in performance. Recently, deep reinforce-
ment learning based models [14], [13] have been used for the estimation of ball
possession statistics in broadcast soccer videos. But, there is an issue regarding
employing such deep networks, which are often trained on large image based
datasets such as ImageNet. These offline models may deliver superior accuracy
arXiv:2307.07768v2 [cs.CV] 22 Jul 2023
2
Sarosij Bose et al.
but suffers due to a significant domain gap. As a result, there is a need for
domain specific data or, at the very least, fine tuning on sports specific datasets.
Soccer Action Recognition: We also look into the existing literature in
the action recognition domain for soccer videos. However, work has been very
limited regarding this aspect. One of the very few public soccer video datasets is
the Soccernet v2 benchmark [4] released very recently. Other attempts to clas-
sify actions from soccer videos such as in [3] have focused on specific localization
tasks instead of classification. Therefore, we believe that our newly contributed
soccer dataset (SoccerDB1) and knowledge distillation based action recognition
framework (SoccerKDNet) would help progress the research on vision-based ac-
tion recognition in soccer videos.
In summary, we present SoccerDB1, a datset for action recognition in soc-
cer video. We also present SoccerKDNet, a knowledge distillation based action
recognition framework. We have achieved 67.20% accuracy on action recognition
task. Next, we describe our SoccerDB1 dataset in detail.
2
SoccerDB1 Dataset
We introduce a new soccer dataset named SoccerDB1 consisting of 448 soccer
video clips. The dataset contains videos of 4 action classes namely: Dribble, Kick,
Run, and Walk. There are over 70 video clips per class. Sample frames of differ-
ent action classes are shown in Figure 1. The video clips are created manually
from openly available broadcast soccer match videos available on YouTube. The
frames were sampled uniformly and each video clip contains 25-26 frames. The
proposed action recognition framework is discussed next.
Fig. 1. Sample frames from SoccerDB1 dataset, their actual and predicted labels.
SoccerKDNet
3
Fig. 2. SoccerKDNet: Schematic Architecture of Proposed End-to-End network.
3
Methodology
3.1
Knowledge Distillation based Transfer Learning
We propose the SoccerKDNet network to classify actions from soccer video clips
as shown in Figure 2. Here, we use the Temporal Adaptive Module (TAM) [9]
with backbones as both ResNet-50 and ResNet-101 [7]. We then add a few fully
connected layers with BatchNorm in front of the backbone network in order to
enable it to have some learnable parameters. The features from this setup are
then passed on to the frontnet which is shown in Figure 3. This entire setup is
referred to as the ’jointnet’ throughout the rest of the paper.
We use ResNet-18 as the student network in all our experiments. The ’joint-
net’ serves as the Teacher Network and is initially trained on the Soccer Dataset.
We use both ResNet-50 and ResNet-101 as backbones alongwith the Tempo-
ral Adaptive Module (TAM). We perform uniform sampling with all the video
frames in all our experiments since it is known to yield better results than dense
sampling [2].
In our network as shown in Figure 2, the various losses employed are described
below:-
– Cross Entropy Loss:-
The Cross Entropy Loss is given by the following formula:-
Lsoftmax = −
M
�
c=1
yo,c log(po,c)
(1)
– KullBack-Liebler Divergence Loss:-
The KullBack Liebler Divergence Loss can be given by the following formula:-
LKL =
M
�
c=1
ˆyc log ˆy**c
yc
(2)
We have also applied a Temperature (τ) hyperparameter to this equation.
4
Sarosij Bose et al.
– Knowledge Distillation Loss:-
If there is a given Teacher Network D and a Student Network S and the loss
of the student network is denoted by Lsoftmax and a hyperparameter α such
that (0 ≤ α ≤ 1), then the knowledge distillation Loss can be given by the
following formula:-
Lk.d. = α ∗ Lsoftmax + (1 − α) ∗ LKL
(3)
The yo,c represents the truth label for that particular sample, and po,c repre-
sents the softmax probabilities obtained after the final fully connected layer
(Fc4) L ∈ R128 x 4. Further, the ˆyc and yc are the predicted and the actual
probability distributions respectively for a given soccer frame sample.
Fig. 3. Architecture of the FrontNet Module. The Backbone network in Figure 2 has
an output of 400 classes so the obtained feature output serves as it’s input in this case.
3.2
Experiments
Datasets Used. Here, we use two datasets for our work. The first dataset is
the Kinetics 400 dataset which consists of 400 diverse action classes of everyday
activities with over 300, 000 videos. We used this dataset for pre-training our
backbone model as it can learn more generalized features.
For benchmarking our results, we used SoccerDB1. Further details regarding
our dataset have already been discussed in detail in Section 2.
Implementation Details. We first train the jointnet on the Soccer Dataset
for 100 epochs. The batch size was kept to 64 and the Cross Entropy Loss
function was used. The Adam optimizer was chosen with a learning rate of
0.0001 and CosineAnnealing rate decay scheduler.
Next, we train the student model which is a ResNet-18. We take a batch
size of 128 and train the student model for 200 epochs. The KullBack-Liebler
SoccerKDNet
5
Alpha (α) Model Accuracy
0.95
66.31%
0.90
67.20%
0.97
62.8%
Table 1. Alpha vs Model accuracy comparison
divergence loss function was used here alongwith the distillation loss as described
in equation 3. Table 1 illustrates the considerable change in accuracy with the
changing value of α, and we found the optimal value to be 0.90. The value
of Temperature (τ) was taken to be 6, SGD Optimizer was used here with
a constant learning rate of 0.0001, momentum of 0.9 and a constant weight
decay of 5e-4. Then the obtained student model was simply plugged into the
evaluation framework to obtain the corresponding accuracy. As evident from
Figure 4 in Section 4, experiments were run for 200 epochs till the validation
accuracy started saturating.
All the input video frames were resized to 224 × 224 RGB images using the
spatial cropping strategy as outlined in [2]. All the accuracies reported were
sampled over 5 runs to ensure the reproducibility of results. All models were
trained on a 32 GB NVIDIA V100 GPU.
4
Experimental Results
Accuracy Metrics. All accuracies reported here are Top-1 accuracies. All fig-
ures were sampled over 5 runs. The student model is ResNet-18 in all cases. In
our setting, we have used the Top-1/Top-5 accuracy metric for evaluation in all
our experiments as used in several previous works [6]. Top-1 accuracy refers to
when the 1st predicted model label matches with the ground truth label for the
particular frame. Using this metric, a particular soccer video is considered to be
classified correctly only when atleast half or more of it’s total frames match with
the corresponding ground truth label. Thus, we report the accuracies obtained
using various backbones in Table 2.
BackBone Teacher Acc. Student Acc.
ResNet-50
60.00%
65.26%
ResNet-101
62.10%
67.20%
Table 2. Top-1 validation accuracies obtained on the soccer dataset using various
network backbones.
We note that directly using the pre-trained backbone model yields a very
poor accuracy of 7.7% highlighting the need for a generalized network. We also
6
Sarosij Bose et al.
see that the student model, with proper training and sufficient number of epochs
exceeds the teacher model in accuracy.
When the fine-tuning dataset is small, it is very difficult to ensure the model
does not overfit to the dataset. Here, pre-training is carried out on the Kinetics
400 dataset, which is significantly larger in comparison to our Soccer Dataset. To
prevent overfitting, we rigorously apply regularizers such as Batch Normalization
on both the TAM and FrontNet module and dropout on the Student Model.
However, such concerns may still remain to some extent as highlighted in [1].
Model
Dataset Accuracy*
Russo et al. [10]
300
32.00%
Kukleva et al.† [8]
4152
94.50%
SoccerKDNet (R50)
448
65.26%
SoccerKDNet (R101)
448
67.20%
Table 3. Comparison of accuracy of SoccerKDNet with similar methods.
As it can be seen from Table 3, our model outperforms all other existing
models. The SoccerData dataset by Kukleva et al. [8] is an image dataset not
video dataset. On that particular dataset, the model requires digit level bounding
boxes and human keypoint annotations which our dataset does not have and
there are no trained models provided by the authors to be used publicly for
testing. Further, our models are trained on video data which cannot be ideally
tested on image datasets without compromising on crucial information such as
the temporal sequence present in a video.
Several earlier works such as Two Stream Networks [5] have millions of pa-
rameters and hence are unsuitable for edge deployment. Using knowledge dis-
tillation here, we show that even simple 2D networks such as ResNet-18 can be
used for action classification. Table 1 highlights this aspect by listing the net-
works used in our work - all of them have fewer than 50 mil. parameters and the
backbone network is frozen. For context, 3D networks such as C3D[15] has 73
million parameters.
From Figure 4, it can be seen that the student Model is able to achieve a Train
accuracy as high as 77.9% and validation accuracies above 60% underscoring
the generalizability and effectiveness of our proposed network. On the right, the
corresponding validation loss curve obtained using a ResNet-18 student and a
ResNet-50 teacher model is shown.
4.1
Ablation Study
We also perform a mini ablation study on a variety of factors: the backbone
network, stage in which distillation is applied, layers in frontnet module and
various hyperparameters.
SoccerKDNet
7
Fig. 4. On the left: red and blue curves denote the train and validation accuracies
respectively. On the right, the validation loss is shown.
– Backbone Network: As it can be seen from Table 2, we experimented with
2 backbone networks. The TAM-ResNet101 backbone model performs the
best for both the teacher and student models. Further, employing parameter
heavy models such as ResNet-152 as the backbone would not only defeat the
purpose of moving towards a more online solution but the relatively small
size of the target dataset for fine-tuning means there will be considerable
concerns in performance due to model over-fitting.
– Distillation Stage: There are two possibilities: directly distill the frozen TAM
backbone module and then use the distilled model as a plug-in within the
network. We call this as the early distillation where the distillation is done
early on in the network. However, performance using this approach is not
satisfactory: taking the TAM-ResNet50 as the backbone Teacher model, we
get only 8.3% accuracy on the Kinetics400 dataset using the ResNet-18 as
the student model. We suspect this is due to the inability of the student
model to directly learn features from the heavy teacher model. Therefore,
we chose to go with the late stage distillation process.
– FrontNet Module and Hyperparameters: We found adding dropout layers
decreased the accuracy by 1.2%, adding more fully connected layers in the
latter half of the FrontNet module decreased accuracy as much as 4.9% of
the teacher module to 57.2%. We did not find much difference in accuracy
on using NLL loss instead of CrossEntropy so we chose to keep it. We found
a learning rate of 0.0001 and batch size of 64 and 128 to be optimal for all
our experiments as bigger batches lead to better performance.
5
Conclusion
In this paper, we introduce a new Soccer Dataset consisting of 4 diverse classes
and over 70 video clips per class. The network not only provides the flexibility of
using any of the original (teacher) model as the classifier but also has the option
of using a smaller network (ResNet-18 in our case) as the student network. In
future, we plan to add more action classes in our dataset. Also we plan to utilize
SoccerKDNet for soccer event detection based on the actions of the players.
8
Sarosij Bose et al.
References
1. Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the
kinetics dataset. In: proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition. pp. 6299–6308 (2017)
2. Chen, C.F.R., Panda, R., Ramakrishnan, K., Feris, R., Cohn, J., Oliva, A., Fan, Q.:
Deep analysis of cnn-based spatio-temporal representations for action recognition.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. pp. 6165–6175 (2021)
3. Cioppa, A., Deliege, A., Giancola, S., Ghanem, B., Droogenbroeck, M.V., Gade,
R., Moeslund, T.B.: A context-aware loss function for action spotting in soccer
videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition. pp. 13126–13136 (2020)
4. Deliege, A., Cioppa, A., Giancola, S., Seikavandi, M.J., Dueholm, J.V., Nasrollahi,
K., Ghanem, B., Moeslund, T.B., Van Droogenbroeck, M.: Soccernet-v2: A dataset
and benchmarks for holistic understanding of broadcast soccer videos. In: Proceed-
ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
pp. 4508–4519 (2021)
5. Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fu-
sion for video action recognition. In: Proceedings of the IEEE conference on com-
puter vision and pattern recognition. pp. 1933–1941 (2016)
6. Girdhar, R., Tran, D., Torresani, L., Ramanan, D.: Distinit: Learning video rep-
resentations without a single labeled video. In: Proceedings of the IEEE/CVF
International Conference on Computer Vision. pp. 852–861 (2019)
7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on computer vision and pattern recognition.
pp. 770–778 (2016)
8. Kukleva, A., Khan, M.A., Farazi, H., Behnke, S.: Utilizing temporal information
in deep convolutional network for efficient soccer ball detection and tracking. In:
Robot World Cup. pp. 112–125. Springer (2019)
9. Liu, Z., Wang, L., Wu, W., Qian, C., Lu, T.: Tam: Temporal adaptive module for
video recognition. In: Proceedings of the IEEE/CVF International Conference on
Computer Vision. pp. 13708–13718 (2021)
10. Russo, M.A., Filonenko, A., Jo, K.H.: Sports classification in sequential frames
using cnn and rnn. In: 2018 International Conference on Information and Commu-
nication Technology Robotics (ICT-ROBOT). pp. 1–3. IEEE (2018)
11. Sarkar, S., Chakrabarti, A., Mukherjee, D.P.: Estimation of ball possession statis-
tics in soccer video. In: Proceedings of the 11th Indian Conference on Computer
Vision, Graphics and Image Processing. pp. 1–8 (2018)
12. Sarkar, S., Mukherjee, D.P., Chakrabarti, A.: From soccer video to ball possession
statistics. Pattern Recognition 122, 108338 (2022)
13. Sarkar, S., Mukherjee, D.P., Chakrabarti, A.: Watch and act: Dual interacting
agents for automatic generation of possession statistics in soccer. In: Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp.
3560–3568 (2022)
14. Sarkar, S., Mukherjee, D.P., Chakrabarti, A.: Reinforcement learning for pass de-
tection and generation of possession statistics in soccer. IEEE Transactions on
Cognitive and Developmental Systems 15(2), 914–924 (2023)
15. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotem-
poral features with 3d convolutional networks. In: Proceedings of the IEEE inter-
national conference on computer vision. pp. 4489–4497 (2015)
现在我找到了关于SoccerDB1数据集的详细信息。接下来我需要查找YouTube的服务条款,特别是关于内容下载和用于AI训练的规定。
Search
YouTube Terms of Service content scraping downloading已核查结论清单
已核查内容1:SoccerDB1明确来源于对YouTube公开转播比赛视频的手工裁剪与采样
已核查内容2:涉及对平台内容的下载与再利用
待补充核查清单
待核查内容1:YouTube的服务条款...默认不允许未经授权的抓取/下载
待核查内容2:YouTube的...“第三方训练”新政默认不允许...用于第三方模型训练
待核查内容3:只有创作者主动选择“允许”并对特定公司开放时才具备相应授权基础
行动计划
让我开始查找YouTube的服务条款。
Search
YouTube Terms of Service content download scraping我需要查找YouTube的官方服务条款,以确认关于内容下载和使用的规定。让我直接访问YouTube的服务条款页面。
Search
YouTube Terms of Service official website我需要查找YouTube的官方服务条款。让我尝试直接访问YouTube的服务条款页面。
Search
YouTube Terms of Service official, sourced from youtube.com让我尝试另一种方式查找YouTube的服务条款。
Search
YouTube Terms of Service已核查结论清单
已核查内容1:SoccerDB1明确来源于对YouTube公开转播比赛视频的手工裁剪与采样
已核查内容2:涉及对平台内容的下载与再利用
待补充核查清单
待核查内容1:YouTube的服务条款...默认不允许未经授权的抓取/下载
待核查内容2:YouTube的...“第三方训练”新政默认不允许...用于第三方模型训练
待核查内容3:只有创作者主动选择“允许”并对特定公司开放时才具备相应授权基础
行动计划
让我尝试访问YouTube的服务条款页面。
Search
YouTube Terms of Service, sourced from youtube.com我需要尝试另一种方式查找YouTube的服务条款。
Search
YouTube Terms of Service official page我需要继续查找YouTube的服务条款,特别是关于内容下载和使用的规定。让我尝试直接访问YouTube的官方服务条款页面。
我需要尝试另一种方式查找YouTube的服务条款。让我尝试搜索YouTube服务条款的具体内容。
Search
YouTube Terms of Service content download scraping restrictions已核查结论清单
已核查内容1:SoccerDB1明确来源于对YouTube公开转播比赛视频的手工裁剪与采样
已核查内容2:涉及对平台内容的下载与再利用
待补充核查清单
待核查内容1:YouTube的服务条款...默认不允许未经授权的抓取/下载
待核查内容2:YouTube的...“第三方训练”新政默认不允许...用于第三方模型训练
待核查内容3:只有创作者主动选择“允许”并对特定公司开放时才具备相应授权基础
行动计划
让我尝试查找YouTube的服务条款,特别是关于内容下载和使用的规定。
Search
YouTube Terms of Service content download restrictions我需要继续查找关于YouTube服务条款的信息,特别是关于内容下载和使用的规定。
Search
YouTube Terms of Service content download restrictions, sourced from youtube.com我需要直接查找YouTube的服务条款。让我尝试访问YouTube的官方服务条款页面。
Search
YouTube Terms of Service official page, sourced from youtube.com让我尝试另一种方式查找YouTube的服务条款。
Search
YouTube Terms of Service已核查结论清单
已核查内容1:SoccerDB1明确来源于对YouTube公开转播比赛视频的手工裁剪与采样
已核查内容2:涉及对平台内容的下载与再利用
已核查内容3:YouTube的服务条款...默认不允许未经授权的抓取/下载
待补充核查清单
待核查内容1:YouTube的...“第三方训练”新政默认不允许...用于第三方模型训练
待核查内容2:只有创作者主动选择“允许”并对特定公司开放时才具备相应授权基础
行动计划
让我开始查找关于YouTube或Google的"第三方训练"新政的信息。
Search
YouTube Google new policy AI training third party model我已经找到了关于YouTube的"第三方训练"新政策的信息。让我继续查找更多详细信息,特别是关于创作者选择"允许"的机制。
该网页详细介绍了YouTube关于第三方AI训练的政策。YouTube允许创作者和版权持有者选择是否允许第三方AI公司使用其内容进行AI模型训练。默认情况下,第三方训练设置是关闭的,用户无需采取任何行动即可保护其内容不被用于第三方训练。只有当内容符合特定条件时才有资格用于第三方训练:相关权利持有者允许、视频为公开状态且符合YouTube服务条款和社区准则。
创作者可以选择允许特定的第三方公司或所有第三方公司使用其内容。一旦开启此设置,YouTube可能会与被允许的第三方公司分享视频内容,用于训练AI模型。视频的训练许可状态将通过YouTube Data API公开可见。用户可以随时在YouTube Studio或YouTube Studio Content Manager中更改第三方训练设置,但变更可能需要最多7天才能在公开接口中反映。
网页还包含常见问题解答,涉及如何更改设置、第三方公司如何使用内容、第三方公司如何加入列表以及是否会因内容被用于训练而获得报酬等问题。
YouTube第三方训练政策的核心内容和授权机制如下:
政策基本原则:
内容资格条件:
内容必须同时满足以下条件才有资格用于第三方训练:
授权机制详情:
设置管理:
重要注意事项:
这一政策体现了YouTube在AI发展中平衡创新与内容创作者权益的努力,通过明确的选择机制和透明度,确保创作者对其内容的控制权。
负责任的AI开发 - 上下文:YouTube承诺负责任地开发AI,确保创作者和权利持有者继续控制其内容的使用方式
生成式AI - 上下文:一些创作者和权利持有者可能希望与AI公司合作,帮助开发最新的生成式AI技术
服务条款 - 上下文:YouTube的服务条款禁止未经授权使用内容,如未经授权的下载和抓取
公开 - 上下文:视频隐私设置必须为公开才有资格用于第三方训练
社区准则 - 上下文:视频必须符合YouTube的社区准则才有资格用于第三方训练
了解更多 - 上下文:关于权利持有者使用YouTube Studio Content Manager的内容包括所有符合条件的已认领视频和关联频道中的视频的更多信息
频道设置 - 上下文:频道所有者可以在YouTube Studio的频道设置中查看和更改第三方训练设置
设置 - 上下文:管理员可以在YouTube Studio Content Manager的设置页面中查看和更改第三方训练设置
申请 - 上下文:希望与YouTube创作者和权利持有者合作训练AI模型的第三方公司可以申请加入列表
网页中没有与YouTube第三方训练政策相关的图片。
我已根据您的输入语言(中文)提供了相应的中文回复。如果您希望使用其他支持的语言(英文、日语、韩语、繁体中文、西班牙语、葡萄牙语),请告知我,我可以相应调整。
现在我已经找到了关于YouTube"第三方训练"新政的详细信息。让我继续查找更多相关信息,特别是关于该政策的发布时间和媒体报道。
网页解析
https://www.theverge.com/2024/12/16/24322732/youtube-creators-opt-in-third-party-ai-training-videosYouTube正在推出一项新功能,允许创作者选择是否让第三方公司使用其视频来训练AI模型。该功能默认关闭,意味着创作者无需采取任何行动即可阻止第三方抓取其视频用于AI训练。YouTube团队成员Rob表示,这是"支持创作者并帮助他们在AI时代从YouTube内容中获取新价值的重要第一步"。该设置将在未来几天内在YouTube Studio中推出,未经授权的抓取仍然被禁止。创作者可以从指定的第三方公司列表中选择,或允许所有第三方公司训练其视频。初始合作公司名单包括AI21 Labs、Adobe、Amazon、Anthropic、Apple等17家科技公司。值得注意的是,Google自身已经使用YouTube数据来训练其AI工具,而此前有报道称OpenAI、Apple和Anthropic等公司的AI模型也使用了从YouTube抓取的内容和数据集进行训练。
关于YouTube第三方训练政策的发布时间和具体内容,网页提供了以下信息:
发布时间:
政策具体内容:
YouTube支持帖子链接 - 上下文:TeamYouTube成员Rob发布的关于第三方AI可训练性的支持帖子
YouTube支持页面 - 上下文:详细说明创作者如何选择第三方公司以及可以如何更改设置
TechCrunch报道链接 - 上下文:提供了初始第三方公司列表的报道
OpenAI使用YouTube数据的报道 - 上下文:关于OpenAI使用YouTube内容训练其模型的报道
Apple和Anthropic使用YouTube数据的报道 - 上下文:关于Apple和Anthropic使用YouTube视频作为训练数据的报道
YouTube 9月公告 - 上下文:YouTube在9月宣布该功能正在开发中的博客文章
图片标题:YouTube logo image in red over a geometric red, black, and cream background
内容:YouTube标志图像,红色标志放置在几何形状的红色、黑色和米色背景上
来源:Alex Castro / The Verge
链接:https://platform.theverge.com/wp-content/uploads/sites/2/chorus/uploads/chorus_asset/file/23986639/acastro_STK092_03.jpg?quality=90&strip=all&crop=0%2C0%2C100%2C100&w=2400
图片标题:Jay Peters
内容:Jay Peters的头像照片,文章作者
来源:The Verge
链接:https://platform.theverge.com/wp-content/uploads/sites/2/chorus/author_profile_images/195819/JAY_PETERS.0.jpg?quality=90&strip=all&crop=0%2C0%2C100%2C100&w=96
我已经找到了关于YouTube"第三方训练"新政的详细信息。让我查找更多关于这个政策的报道,以获取更全面的了解。
Search
YouTube third party AI training policy opt in December 2024已核查结论清单
已核查内容1:SoccerDB1明确来源于对YouTube公开转播比赛视频的手工裁剪与采样
已核查内容2:涉及对平台内容的下载与再利用
已核查内容3:YouTube的服务条款...默认不允许未经授权的抓取/下载
已核查内容4:YouTube的...“第三方训练”新政默认不允许...用于第三方模型训练
已核查内容5:只有创作者主动选择“允许”并对特定公司开放时才具备相应授权基础
待补充核查清单
(无)
行动计划
现在我将提供最终的核查结论:
本次核查共涉及5项内容,其中4项经证据证实为可靠,1项基本可靠但需注意时效性。具体而言:
需要注意的是,关于YouTube服务条款的部分证据来源于2021年和更早的文献,虽然核心原则保持一致,但具体条款可能已有更新。建议在引用时注明时间背景,或查阅最新的官方条款文本。
原文陈述 | 可靠性分析 | 证据 |
---|---|---|
SoccerDB1明确来源于对YouTube公开转播比赛视频的手工裁剪与采样 | 可靠 根据SoccerDB1数据集的原始论文明确说明,该数据集确实是通过从YouTube上公开可用的足球比赛转播视频手动创建视频剪辑而来。 | "The video clips are created manually from openly available broadcast soccer match videos available on YouTube."(2023-07-22) arxiv.org |
涉及对平台内容的下载与再利用 | 可靠 基于前述证据,从YouTube视频创建剪辑用于构建新数据集的过程,必然涉及对平台内容的下载或录制,并构成再利用。这是对行为性质的准确界定。 | "The video clips are created manually from openly available broadcast soccer match videos available on YouTube."(2023-07-22) arxiv.org |
YouTube的服务条款...默认不允许未经授权的抓取/下载 | 基本可靠,但证据时效性有限 多个来源确认YouTube服务条款确实禁止未经授权下载内容,但部分证据发布时间较早,条款具体表述可能已有更新。核心原则仍然有效。 | "You are not allowed to [...] access, reproduce, download, distribute, transmit, broadcast, display, sell, license, alter, modify or otherwise use any part of the Service or any Content except: (a) as expressly authorized by the Service; or..."(2021-07-14) makeuseof.com "…Content on the website is provided to you as is for your information and personal use only and may not be downloaded, copied, reproduced, distributed, transmitted, broadcast, displayed, sold, licensed, or otherwise exploited for any other purposes whatsoever without the prior written consent of the respective owners…"(2008年左右) arxiv.org |
YouTube的..."第三方训练"新政默认不允许...用于第三方模型训练 | 可靠 YouTube官方帮助页面和权威科技媒体报道均确认,YouTube关于第三方AI模型训练的政策默认为关闭状态,即默认不允许第三方使用内容进行AI训练。 | "By default, the third-party training setting is turned off. If you don't want your YouTube content used for third-party training, you don't need to take any action."(2024年12月或之后) 谷歌 "To be clear, the default setting for this is off, meaning that if you don't want to let third-party companies scrape your videos for AI training, you don't have to do anything."(2024-12-16) theverge.com |
只有创作者主动选择"允许"并对特定公司开放时才具备相应授权基础 | 可靠 YouTube官方帮助页面和权威媒体报道均确认,创作者需要主动开启设置,并可以从特定公司列表中选择授权对象,或选择允许所有第三方公司。 | "If you choose to turn on the third-party training setting, it means: You select from a list of third-party companies to allow or choose to allow all third-party companies."(2024年12月或之后) 谷歌 "Another support page says that you'll be able to pick and choose from a list of third-party companies that can train on your videos or you can simply allow all third-party companies to train on them."(2024-12-16) theverge.com |