Merge pull request #690 from w23/stream-E342
Stuff done during stream E342 - [x] revert back indirect specular kernel size
This commit is contained in:
commit
80b524d643
|
@ -935,3 +935,79 @@ Observations:
|
|||
second, looking for a hole, if no compat allocation found.
|
||||
Whether that's too slow will be visible when we embark on changelevel optimization journey. And if it is, we
|
||||
could replace it with proper freelist thing from alolcator.
|
||||
|
||||
# 2023-12-05 E342
|
||||
## Shader profiling
|
||||
### Data sources
|
||||
- VK_KHR_performance_query
|
||||
- Available only on AMD ≤ RX 6900 on Linux and Intel cards
|
||||
- Example list of metrics available on my AMD card:
|
||||
- Got 17 counters:
|
||||
- 0: command GRBM/GPU active cycles, C (cycles the GPU is active processing a command buffer.)
|
||||
- 1: command Shaders/Waves, generic (Number of waves executed)
|
||||
- 2: command Shaders/Instructions, generic (Number of Instructions executed)
|
||||
- 3: command Shaders/VALU Instructions, generic (Number of VALU Instructions executed)
|
||||
- 4: command Shaders/SALU Instructions, generic (Number of SALU Instructions executed)
|
||||
- 5: command Shaders/VMEM Load Instructions, generic (Number of VMEM load instructions executed)
|
||||
- 6: command Shaders/SMEM Load Instructions, generic (Number of SMEM load instructions executed)
|
||||
- 7: command Shaders/VMEM Store Instructions, generic (Number of VMEM store instructions executed)
|
||||
- 8: command Shaders/LDS Instructions, generic (Number of LDS Instructions executed)
|
||||
- 9: command Shaders/GDS Instructions, generic (Number of GDS Instructions executed)
|
||||
- 10: command Shader Utilization/VALU Busy, % (Percentage of time the VALU units are busy)
|
||||
- 11: command Shader Utilization/SALU Busy, % (Percentage of time the SALU units are busy)
|
||||
- 12: command Memory/VRAM read size, b (Number of bytes read from VRAM)
|
||||
- 13: command Memory/VRAM write size, b (Number of bytes written to VRAM)
|
||||
- 14: command Memory/L0 cache hit ratio, b (Hit ratio of L0 cache)
|
||||
- 15: command Memory/L1 cache hit ratio, b (Hit ratio of L1 cache)
|
||||
- 16: command Memory/L2 cache hit ratio, b (Hit ratio of L2 cache)
|
||||
- VK_KHR_shader_clock
|
||||
- Available almost everywhere
|
||||
- Enables:
|
||||
- https://registry.khronos.org/OpenGL/extensions/ARB/ARB_shader_clock.txt
|
||||
- `uint64_t clockARB()` || `uvec2 clock2x32ARB()`
|
||||
- gives subgroup-local monotonic time in unspecified units
|
||||
- https://github.com/KhronosGroup/GLSL/blob/master/extensions/ext/GL_EXT_shader_realtime_clock.txt
|
||||
- `clockRealtimeEXT` || `clockRealtime2x32EXT`
|
||||
- gpu-global time in unspecified units
|
||||
|
||||
### Data collection
|
||||
#### Shader clocks
|
||||
Need to get per-pixel values out of shader.
|
||||
##### Simplest method: thermal map
|
||||
The simplest method is to have yet another texture that we can write into, and have a rt_debug_display_only output.
|
||||
- Texture format? E.g.:
|
||||
- rgba16f -- 4 channels for 4 delta values scaled by something from UBO
|
||||
- rgba16f -- 4 channels for 4 absolute values scaled by UBO const
|
||||
|
||||
This doesn't need anything special. Can use basically the same machinery we have now.
|
||||
|
||||
##### On-GPU profile analysis
|
||||
1. Shader specifies arbitrary buffer+struct
|
||||
2. Sebastian reads struct size (w/o parsing fields) and notes that in meat file
|
||||
3. Native code just creates the buffer for GPU only, w/o being aware of what's inside.
|
||||
4. Have a dedicated compute pass reading these buffers and summarizing them into a texture or a buffer defined at compile time.
|
||||
5. Read this texture/buffer after frame.
|
||||
|
||||
This would need:
|
||||
- sebastian parsing buffer array item size
|
||||
- meat buffer item size metadata
|
||||
- native buffer creation
|
||||
- native buffer r/w state/barrier tracking
|
||||
|
||||
If extracting data from buffer:
|
||||
- passing vk buffer data back to cpu land w/ synchronization
|
||||
|
||||
##### Universal method would be
|
||||
1. Allow specifying arbitrary structures/buffers in shaders.
|
||||
2. Teach sebastian.py to parse these structures and encode their layout in meat files
|
||||
3. Native could would create such structures with specified size/resolution (how to know expected size?)
|
||||
4. Native would then copy them over to CPU land and parse based on meat metadata.
|
||||
|
||||
This would need the same as above, plus:
|
||||
- sebastian parsing struct fields
|
||||
- meatpipe reading fields
|
||||
|
||||
- Q: how to analyze this volume of data on CPU?
|
||||
- A: probably should still do it on GPU lol
|
||||
|
||||
This would also allow passing arbitrary per-pixel data from shaders, which would make shader debugging much much easier.
|
||||
|
|
|
@ -1,16 +1,50 @@
|
|||
# 2023-12-05 E342
|
||||
- [x] tone down the specular indirect blur
|
||||
- [-] try func_wall static light opt, #687
|
||||
→ decided to postpone, a lot more logic changes are needed
|
||||
- [x] increase rendertest wait by 1 -- increased scroll speed instead
|
||||
- [x] update rendertest images
|
||||
- [x] Discuss shader profiling
|
||||
- [-] Discuss Env-based verbose log control
|
||||
|
||||
Longer-term agenda for current season:
|
||||
- [ ] Tools:
|
||||
- [ ] Shader profiling. Measure impact of changes. Regressions.
|
||||
- [ ] Better PBR math, e.g.:
|
||||
- [ ] fix black dielectrics, #666
|
||||
- [ ] Transparency:
|
||||
- [ ] Figure out why additive transparency differs visibly from raster
|
||||
- [ ] Extract and specialize effects, e.g.
|
||||
- [ ] Rays -> volumetrics
|
||||
- [ ] Glow -> bloom
|
||||
- [ ] Smoke -> volumetrics
|
||||
- [ ] Sprites/portals -> emissive volumetrics
|
||||
- [ ] Holo models -> emissive additive
|
||||
- [ ] Some additive -> translucent
|
||||
- [ ] what else
|
||||
- [ ] Proper material mode for translucency, with reflections, refraction (index), fresnel, etc.
|
||||
- [ ] Lighting
|
||||
- [ ] Point spheres sampling
|
||||
- [ ] Increase limits
|
||||
- [ ] s/poly/triangle/ -- simpler sampling, universal
|
||||
- [ ] Better and dynamically sized clusters
|
||||
- [ ] Cache rays -- do not cast shadow rays for everything, do a separate ray-only pass for visibility caching
|
||||
- [ ] Bounces
|
||||
- [ ] Moar bounces
|
||||
- [ ] MIS
|
||||
- [ ] Cache directions for strong indirect light
|
||||
|
||||
# 2023-12-04 E341
|
||||
- [ ] investigate envlight missing #680
|
||||
- [-] investigate envlight missing #680
|
||||
- couldn't reproduce more than once
|
||||
- [x] add more logs for the above
|
||||
- [x] double switchable lights, #679
|
||||
- [ ] tone down the specular indirect blur
|
||||
- [ ] increase rendertest wait by 1
|
||||
- [ ] update rendertest images
|
||||
|
||||
-- season cut --
|
||||
|
||||
# 2023-12-01 E340
|
||||
- [x] Better resolution changes:
|
||||
- [x] Dynamic max resolution (start with current one, then grow by some growth factor)
|
||||
- [ ] Discuss Per-pixel profiling
|
||||
- [ ] Discuss Env-based verbose log control
|
||||
|
||||
# 2023-11-30 E339
|
||||
- [x] rendermode patch
|
||||
|
|
|
@ -95,16 +95,19 @@ Components blurSamples(const ivec2 res, const ivec2 pix) {
|
|||
|
||||
const int DIRECT_DIFFUSE_KERNEL = 3;
|
||||
const int INDIRECT_DIFFUSE_KERNEL = 5;
|
||||
const int SPECULAR_KERNEL = 2;
|
||||
const int KERNEL_SIZE = max(max(DIRECT_DIFFUSE_KERNEL, INDIRECT_DIFFUSE_KERNEL), SPECULAR_KERNEL);
|
||||
const int DIRECT_SPECULAR_KERNEL = 2;
|
||||
const int INDIRECT_SPECULAR_KERNEL = 2;
|
||||
const int KERNEL_SIZE = max(max(max(DIRECT_DIFFUSE_KERNEL, INDIRECT_DIFFUSE_KERNEL), DIRECT_SPECULAR_KERNEL), INDIRECT_SPECULAR_KERNEL);
|
||||
|
||||
const float direct_diffuse_sigma = DIRECT_DIFFUSE_KERNEL / 2.;
|
||||
const float indirect_diffuse_sigma = INDIRECT_DIFFUSE_KERNEL / 2.;
|
||||
const float specular_sigma = SPECULAR_KERNEL / 2.;
|
||||
const float direct_specular_sigma = DIRECT_SPECULAR_KERNEL / 2.;
|
||||
const float indirect_specular_sigma = INDIRECT_SPECULAR_KERNEL / 2.;
|
||||
|
||||
float direct_diffuse_total = 0.;
|
||||
float indirect_diffuse_total = 0.;
|
||||
float specular_total = 0.;
|
||||
float direct_specular_total = 0.;
|
||||
float indirect_specular_total = 0.;
|
||||
|
||||
const ivec2 res_scaled = res / INDIRECT_SCALE;
|
||||
for (int x = -KERNEL_SIZE; x <= KERNEL_SIZE; ++x)
|
||||
|
@ -140,29 +143,34 @@ Components blurSamples(const ivec2 res, const ivec2 pix) {
|
|||
c.direct_diffuse += imageLoad(light_poly_diffuse, p).rgb * direct_diffuse_scale;
|
||||
}
|
||||
|
||||
if (all(lessThan(abs(ivec2(x, y)), ivec2(INDIRECT_DIFFUSE_KERNEL))))
|
||||
if (all(lessThan(abs(ivec2(x, y)), ivec2(INDIRECT_DIFFUSE_KERNEL))) && do_indirect)
|
||||
{
|
||||
// TODO indirect operates at different scale, do a separate pass
|
||||
if (do_indirect) {
|
||||
const float indirect_diffuse_scale = scale
|
||||
* normpdf(x, indirect_diffuse_sigma)
|
||||
* normpdf(y, indirect_diffuse_sigma);
|
||||
const float indirect_diffuse_scale = scale
|
||||
* normpdf(x, indirect_diffuse_sigma)
|
||||
* normpdf(y, indirect_diffuse_sigma);
|
||||
|
||||
indirect_diffuse_total += indirect_diffuse_scale;
|
||||
c.indirect_diffuse += imageLoad(indirect_diffuse, p_indirect).rgb * indirect_diffuse_scale;
|
||||
}
|
||||
indirect_diffuse_total += indirect_diffuse_scale;
|
||||
c.indirect_diffuse += imageLoad(indirect_diffuse, p_indirect).rgb * indirect_diffuse_scale;
|
||||
}
|
||||
|
||||
if (all(lessThan(abs(ivec2(x, y)), ivec2(SPECULAR_KERNEL))))
|
||||
if (all(lessThan(abs(ivec2(x, y)), ivec2(DIRECT_SPECULAR_KERNEL))))
|
||||
{
|
||||
const float specular_scale = scale * normpdf(x, specular_sigma) * normpdf(y, specular_sigma);
|
||||
specular_total += specular_scale;
|
||||
const float specular_scale = scale * normpdf(x, direct_specular_sigma) * normpdf(y, direct_specular_sigma);
|
||||
direct_specular_total += specular_scale;
|
||||
|
||||
c.direct_specular += imageLoad(light_poly_specular, p).rgb * specular_scale;
|
||||
c.direct_specular += imageLoad(light_point_specular, p).rgb * specular_scale;
|
||||
}
|
||||
|
||||
if (all(lessThan(abs(ivec2(x, y)), ivec2(INDIRECT_SPECULAR_KERNEL)))) {
|
||||
const ivec2 p_indirect = (pix + ivec2(x, y)) / INDIRECT_SCALE;// + ivec2(x, y);
|
||||
const bool do_indirect = all(lessThan(p_indirect, res_scaled)) && all(greaterThanEqual(p_indirect, ivec2(0)));
|
||||
|
||||
// TODO indirect operates at different scale, do a separate pass
|
||||
if (do_indirect) {
|
||||
// TODO indirect operates at different scale, do a separate pass
|
||||
const float specular_scale = scale * normpdf(x, indirect_specular_sigma) * normpdf(y, indirect_specular_sigma);
|
||||
indirect_specular_total += specular_scale;
|
||||
c.indirect_specular += imageLoad(indirect_specular, p_indirect).rgb * specular_scale;
|
||||
}
|
||||
}
|
||||
|
@ -174,10 +182,11 @@ Components blurSamples(const ivec2 res, const ivec2 pix) {
|
|||
if (indirect_diffuse_total > 0.)
|
||||
c.indirect_diffuse *= indirect_diffuse_total;
|
||||
|
||||
if (specular_total > 0.) {
|
||||
c.direct_specular *= specular_total;
|
||||
c.indirect_specular *= specular_total;
|
||||
}
|
||||
if (direct_specular_total > 0.)
|
||||
c.direct_specular *= direct_specular_total;
|
||||
|
||||
if (indirect_specular_total > 0.)
|
||||
c.indirect_specular *= indirect_specular_total;
|
||||
|
||||
return c;
|
||||
}
|
||||
|
|
Loading…
Reference in New Issue