valueerror: attempting to unscale fp16 gradients.
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval. I tried adding a Dense layer with uniform activation, changing the loss function and the optimizer, getting the gradients as follows:
Hello!I want to use 16 * A10(16 * a10) to inference Llama2-70B(fp16 ...