Nsight Compute

While the screenshots are from an old version, the example (matrix multiplication) and the tutorial in this article are very good to start learning using Nsight Compute:

https://developer.nvidia.com/blog/using-nsight-compute-to-inspect-your-kernels/

For more material visit this 3 part tutorial

  1. https://developer.nvidia.com/blog/analysis-driven-optimization-preparing-for-analysis-with-nvidia-nsight-compute-part-1/
  2. https://developer.nvidia.com/blog/analysis-driven-optimization-analyzing-and-improving-performance-with-nvidia-nsight-compute-part-2
  3. https://developer.nvidia.com/blog/analysis-driven-optimization-finishing-the-analysis-with-nvidia-nsight-compute-part-3

If you are more the visual type the following article contains some demo videos about Nsight Compute

https://developer.nvidia.com/blog/sc20-demos-new-nsight-systems-and-nsight-compute-demos/