Merge pull request #690 from w23/stream-E342

Stuff done during stream E342 - [x] revert back indirect specular kernel size
2023-12-05 09:59:57 -08:00 · 2023-12-05 09:59:57 -08:00 · 80b524d643
parent dc0b968028 6cb45dda77
commit 80b524d643
3 changed files with 145 additions and 26 deletions
--- a/ref/vk/NOTES.md
+++ b/ref/vk/NOTES.md
@ -935,3 +935,79 @@ Observations:
 	  second, looking for a hole, if no compat allocation found.
 	  Whether that's too slow will be visible when we embark on changelevel optimization journey. And if it is, we
 	  could replace it with proper freelist thing from alolcator.
+
+# 2023-12-05 E342
+## Shader profiling
+### Data sources
+- VK_KHR_performance_query
+    - Available only on AMD ≤ RX 6900 on Linux and Intel cards
+    - Example list of metrics available on my AMD card:
+        - Got 17 counters:
+            - 0: command GRBM/GPU active cycles, C (cycles the GPU is active processing a command buffer.)
+            - 1: command Shaders/Waves, generic (Number of waves executed)
+            - 2: command Shaders/Instructions, generic (Number of Instructions executed)
+            - 3: command Shaders/VALU Instructions, generic (Number of VALU Instructions executed)
+            - 4: command Shaders/SALU Instructions, generic (Number of SALU Instructions executed)
+            - 5: command Shaders/VMEM Load Instructions, generic (Number of VMEM load instructions executed)
+            - 6: command Shaders/SMEM Load Instructions, generic (Number of SMEM load instructions executed)
+            - 7: command Shaders/VMEM Store Instructions, generic (Number of VMEM store instructions executed)
+            - 8: command Shaders/LDS Instructions, generic (Number of LDS Instructions executed)
+            - 9: command Shaders/GDS Instructions, generic (Number of GDS Instructions executed)
+            - 10: command Shader Utilization/VALU Busy, % (Percentage of time the VALU units are busy)
+            - 11: command Shader Utilization/SALU Busy, % (Percentage of time the SALU units are busy)
+            - 12: command Memory/VRAM read size, b (Number of bytes read from VRAM)
+            - 13: command Memory/VRAM write size, b (Number of bytes written to VRAM)
+            - 14: command Memory/L0 cache hit ratio, b (Hit ratio of L0 cache)
+            - 15: command Memory/L1 cache hit ratio, b (Hit ratio of L1 cache)
+            - 16: command Memory/L2 cache hit ratio, b (Hit ratio of L2 cache)
+- VK_KHR_shader_clock
+    - Available almost everywhere
+    - Enables:
+        - https://registry.khronos.org/OpenGL/extensions/ARB/ARB_shader_clock.txt
+            - `uint64_t clockARB()` || `uvec2 clock2x32ARB()`
+            - gives subgroup-local monotonic time in unspecified units
+        - https://github.com/KhronosGroup/GLSL/blob/master/extensions/ext/GL_EXT_shader_realtime_clock.txt
+            - `clockRealtimeEXT` || `clockRealtime2x32EXT`
+            - gpu-global time in unspecified units
+
+### Data collection
+#### Shader clocks
+Need to get per-pixel values out of shader.
+##### Simplest method: thermal map
+The simplest method is to have yet another texture that we can write into, and have a rt_debug_display_only output.
+- Texture format? E.g.:
+    - rgba16f -- 4 channels for 4 delta values scaled by something from UBO
+    - rgba16f -- 4 channels for 4 absolute values scaled by UBO const
+
+This doesn't need anything special. Can use basically the same machinery we have now.
+
+##### On-GPU profile analysis
+1. Shader specifies arbitrary buffer+struct
+2. Sebastian reads struct size (w/o parsing fields) and notes that in meat file
+3. Native code just creates the buffer for GPU only, w/o being aware of what's inside.
+4. Have a dedicated compute pass reading these buffers and summarizing them into a texture or a buffer defined at compile time.
+5. Read this texture/buffer after frame.
+
+This would need:
+- sebastian parsing buffer array item size
+- meat buffer item size metadata
+- native buffer creation
+- native buffer r/w state/barrier tracking
+
+If extracting data from buffer:
+- passing vk buffer data back to cpu land w/ synchronization
+
+##### Universal method would be
+1. Allow specifying arbitrary structures/buffers in shaders.
+2. Teach sebastian.py to parse these structures and encode their layout in meat files
+3. Native could would create such structures with specified size/resolution (how to know expected size?)
+4. Native would then copy them over to CPU land and parse based on meat metadata.
+
+This would need the same as above, plus:
+- sebastian parsing struct fields
+- meatpipe reading fields
+
+- Q: how to analyze this volume of data on CPU?
+- A: probably should still do it on GPU lol
+
+This would also allow passing arbitrary per-pixel data from shaders, which would make shader debugging much much easier.
--- a/ref/vk/TODO.md
+++ b/ref/vk/TODO.md
@ -1,16 +1,50 @@
+# 2023-12-05 E342
+- [x] tone down the specular indirect blur
+- [-] try func_wall static light opt, #687
+	→ decided to postpone, a lot more logic changes are needed
+- [x] increase rendertest wait by 1 -- increased scroll speed instead
+- [x] update rendertest images
+- [x] Discuss shader profiling
+- [-] Discuss Env-based verbose log control
+
+Longer-term agenda for current season:
+- [ ] Tools:
+	- [ ] Shader profiling. Measure impact of changes. Regressions.
+- [ ] Better PBR math, e.g.:
+	- [ ] fix black dielectrics, #666
+- [ ] Transparency:
+	- [ ] Figure out why additive transparency differs visibly from raster
+	- [ ] Extract and specialize effects, e.g.
+		- [ ] Rays -> volumetrics
+		- [ ] Glow -> bloom
+		- [ ] Smoke -> volumetrics
+		- [ ] Sprites/portals -> emissive volumetrics
+		- [ ] Holo models -> emissive additive
+		- [ ] Some additive -> translucent
+		- [ ] what else
+	- [ ] Proper material mode for translucency, with reflections, refraction (index), fresnel, etc.
+- [ ] Lighting
+	- [ ] Point spheres sampling
+	- [ ] Increase limits
+	- [ ] s/poly/triangle/ -- simpler sampling, universal
+	- [ ] Better and dynamically sized clusters
+	- [ ] Cache rays -- do not cast shadow rays for everything, do a separate ray-only pass for visibility caching
+- [ ] Bounces
+	- [ ] Moar bounces
+	- [ ] MIS
+	- [ ] Cache directions for strong indirect light
+
 # 2023-12-04 E341
- [ ] investigate envlight missing #680
+- [-] investigate envlight missing #680
+	- couldn't reproduce more than once
 - [x] add more logs for the above
 - [x] double switchable lights, #679
- [ ] tone down the specular indirect blur
- [ ] increase rendertest wait by 1
- [ ] update rendertest images
+
+-- season cut --

 # 2023-12-01 E340
 - [x] Better resolution changes:
    - [x] Dynamic max resolution (start with current one, then grow by some growth factor)
- [ ] Discuss Per-pixel profiling
- [ ] Discuss Env-based verbose log control

 # 2023-11-30 E339
 - [x] rendermode patch
--- a/ref/vk/shaders/denoiser.comp
+++ b/ref/vk/shaders/denoiser.comp
@ -95,16 +95,19 @@ Components blurSamples(const ivec2 res, const ivec2 pix) {

 	const int DIRECT_DIFFUSE_KERNEL = 3;
 	const int INDIRECT_DIFFUSE_KERNEL = 5;
-	const int SPECULAR_KERNEL = 2;
-	const int KERNEL_SIZE = max(max(DIRECT_DIFFUSE_KERNEL, INDIRECT_DIFFUSE_KERNEL), SPECULAR_KERNEL);
+	const int DIRECT_SPECULAR_KERNEL = 2;
+	const int INDIRECT_SPECULAR_KERNEL = 2;
+	const int KERNEL_SIZE = max(max(max(DIRECT_DIFFUSE_KERNEL, INDIRECT_DIFFUSE_KERNEL), DIRECT_SPECULAR_KERNEL), INDIRECT_SPECULAR_KERNEL);

 	const float direct_diffuse_sigma = DIRECT_DIFFUSE_KERNEL / 2.;
 	const float indirect_diffuse_sigma = INDIRECT_DIFFUSE_KERNEL / 2.;
-	const float specular_sigma = SPECULAR_KERNEL / 2.;
+	const float direct_specular_sigma = DIRECT_SPECULAR_KERNEL / 2.;
+	const float indirect_specular_sigma = INDIRECT_SPECULAR_KERNEL / 2.;

 	float direct_diffuse_total = 0.;
 	float indirect_diffuse_total = 0.;
-	float specular_total = 0.;
+	float direct_specular_total = 0.;
+	float indirect_specular_total = 0.;

 	const ivec2 res_scaled = res / INDIRECT_SCALE;
 	for (int x = -KERNEL_SIZE; x <= KERNEL_SIZE; ++x)
@ -140,29 +143,34 @@ Components blurSamples(const ivec2 res, const ivec2 pix) {
 				c.direct_diffuse += imageLoad(light_poly_diffuse, p).rgb * direct_diffuse_scale;
 			}

-			if (all(lessThan(abs(ivec2(x, y)), ivec2(INDIRECT_DIFFUSE_KERNEL))))
+			if (all(lessThan(abs(ivec2(x, y)), ivec2(INDIRECT_DIFFUSE_KERNEL))) && do_indirect)
 			{
 				// TODO indirect operates at different scale, do a separate pass
-				if (do_indirect) {
-					const float indirect_diffuse_scale = scale
-						* normpdf(x, indirect_diffuse_sigma)
-						* normpdf(y, indirect_diffuse_sigma);
+				const float indirect_diffuse_scale = scale
+					* normpdf(x, indirect_diffuse_sigma)
+					* normpdf(y, indirect_diffuse_sigma);

-					indirect_diffuse_total += indirect_diffuse_scale;
-					c.indirect_diffuse += imageLoad(indirect_diffuse, p_indirect).rgb * indirect_diffuse_scale;
-				}
+				indirect_diffuse_total += indirect_diffuse_scale;
+				c.indirect_diffuse += imageLoad(indirect_diffuse, p_indirect).rgb * indirect_diffuse_scale;
 			}

-			if (all(lessThan(abs(ivec2(x, y)), ivec2(SPECULAR_KERNEL))))
+			if (all(lessThan(abs(ivec2(x, y)), ivec2(DIRECT_SPECULAR_KERNEL))))
 			{
-				const float specular_scale = scale * normpdf(x, specular_sigma) * normpdf(y, specular_sigma);
-				specular_total += specular_scale;
+				const float specular_scale = scale * normpdf(x, direct_specular_sigma) * normpdf(y, direct_specular_sigma);
+				direct_specular_total += specular_scale;

 				c.direct_specular += imageLoad(light_poly_specular, p).rgb * specular_scale;
 				c.direct_specular += imageLoad(light_point_specular, p).rgb * specular_scale;
+			}
+
+			if (all(lessThan(abs(ivec2(x, y)), ivec2(INDIRECT_SPECULAR_KERNEL)))) {
+				const ivec2 p_indirect = (pix + ivec2(x, y)) / INDIRECT_SCALE;// + ivec2(x, y);
+				const bool do_indirect = all(lessThan(p_indirect, res_scaled)) && all(greaterThanEqual(p_indirect, ivec2(0)));

-				// TODO indirect operates at different scale, do a separate pass
 				if (do_indirect) {
+					// TODO indirect operates at different scale, do a separate pass
+					const float specular_scale = scale * normpdf(x, indirect_specular_sigma) * normpdf(y, indirect_specular_sigma);
+					indirect_specular_total += specular_scale;
 					c.indirect_specular += imageLoad(indirect_specular, p_indirect).rgb * specular_scale;
 				}
 			}
@ -174,10 +182,11 @@ Components blurSamples(const ivec2 res, const ivec2 pix) {
 	if (indirect_diffuse_total > 0.)
 		c.indirect_diffuse *= indirect_diffuse_total;

-	if (specular_total > 0.) {
-		c.direct_specular *= specular_total;
-		c.indirect_specular *= specular_total;
-	}
+	if (direct_specular_total > 0.)
+		c.direct_specular *= direct_specular_total;
+
+	if (indirect_specular_total > 0.)
+		c.indirect_specular *= indirect_specular_total;

 	return c;
 }