![]() ![]() ![]() As the new class is a combination of LogSoftmax + KLDivLoss, I guess it's not too surprising that KL-div is what is being computed, the only difference being that we expect logit inputs for the new loss vs. I agree that KL would be preferred in practice here- just wanted to bring up a possible incongruity between the definition and naming. I am slightly inclined towards aligning with KL here because of how it is used in ML. unsqueeze( 1), v)Ĭlass LabelSmoothingCrossEntropy2( _Loss): full( size =( N, num_classes), fill_value = eps / num_classes, device = device). full( size =( N, 1), fill_value = 1 - eps, device = device) loss import _Loss def smooth_labels( target: Tensor, num_classes: int, eps: float = 0.1):ĭevice = target. functional as F from torch import Tensor from torch. The smoothed target can be the result of mixup/cutmix OR can be the result of a default smooth_labels method similar to what you described here: In order to achieve this we will need a modified loss which accepts an already smoothed target. These are SOTA primitives that we would like to add on TorchVision (see pytorch/vision#3911). The above covers a vast majority of applications but unfortunately it won't do for Computer Vision applications that use Data Augmentation techniques such as mixup and cutmix. eps = eps def forward( self, input: Tensor, target: Tensor) -> Tensor: loss import _Loss class LabelSmoothingCrossEntropy( _Loss):ĭef _init_( self, eps: float = 0.1, size_average = None, reduce = None, reduction: str = 'mean'): It's a middle ground solution with reasonable performance since it does not have to convert the target value to a one-hot encoded vector while at the same time does not introduce more complex parameters on the low level C++ code: You can have a simple wrapper such as the following that accepts a target value and reuses standard building blocks from PyTorch. I wonder if that justifies providing a more user-friendly wrapper that simplifies the code and gives the requested functionality.Īs you highlighted in the past, there are various potential implementations for this and each offers various degrees of flexibility/performance. Most solutions, require quite a significant amount of boilerplate code and careful implementations. There are numerous tickets with several followers, forum posts and discussions around this. Having said that, I wonder if the fact that there is so much confusion in the community hints that we have a UX problem. Generalized Entropy Regularization or There’s Nothing Special about Label Smoothing - section 2.1.Regularizing Neural Networks by Penalizing Confident Output Distributions - section 3.2. ![]() Rethinking the Inception Architecture for Computer Vision - section 7.For those, who would like more info about the relationship between Label Smoothing and Kullback-Leibler divergence, here are some references: Unfortunately there is much confusion around this (see #7455 (comment), #7455 (comment), etc), so hopefully your detailed analysis will provide proof that indeed KLDivLoss can support label smoothing. First of all thanks for taking the time to write a comprehensive explanation of the situation. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |