Nsight Compute
While the screenshots are from an old version, the example (matrix multiplication) and the tutorial in this article are very good to start learning using Nsight Compute:
https://developer.nvidia.com/blog/using-nsight-compute-to-inspect-your-kernels/
For more material visit this 3 part tutorial
- https://developer.nvidia.com/blog/analysis-driven-optimization-preparing-for-analysis-with-nvidia-nsight-compute-part-1/
- https://developer.nvidia.com/blog/analysis-driven-optimization-analyzing-and-improving-performance-with-nvidia-nsight-compute-part-2
- https://developer.nvidia.com/blog/analysis-driven-optimization-finishing-the-analysis-with-nvidia-nsight-compute-part-3
If you are more the visual type the following article contains some demo videos about Nsight Compute
https://developer.nvidia.com/blog/sc20-demos-new-nsight-systems-and-nsight-compute-demos/