4b81d19a1a
Shader_IR: Implement ICMP.
2019-09-19 20:56:29 -04:00
7606da5611
VideoCore: Corrections to the MME Inliner and removal of hacky instance management.
2019-09-19 11:41:29 -04:00
b31880dc5e
Merge pull request #2784 from ReinUsesLisp/smem
...
shader_ir: Implement shared memory
2019-09-18 16:26:05 -04:00
0526bf1895
shader_ir/warp: Implement SHFL
2019-09-17 17:44:07 -03:00
36abf67e79
shader/image: Implement SUATOM and fix SUST
2019-09-10 20:22:31 -03:00
34b2c60f95
Merge pull request #2823 from ReinUsesLisp/shr-clamp
...
shader/shift: Implement SHR wrapped and clamped variants
2019-09-10 11:56:17 -04:00
1f43e5296f
gl_shader_decompiler: Keep track of written images and mark them as modified
2019-09-05 23:26:05 -03:00
3a450c1395
kepler_compute: Implement texture queries
2019-09-05 20:35:51 -03:00
4de04eba39
shader_ir: Implement LD_S
...
Loads from shared memory.
2019-09-05 01:38:37 -03:00
f17415d431
shader_ir: Implement ST_S
...
This instruction writes to a memory buffer shared with threads within
the same work group. It is known as "shared" memory in GLSL.
2019-09-05 01:38:37 -03:00
77ef4fa907
shader/shift: Implement SHR wrapped and clamped variants
...
Nvidia defaults to wrapped shifts, but this is undefined behaviour on
OpenGL's spec. Explicitly mask/clamp according to what the guest shader
requires.
2019-09-04 01:55:24 -03:00
dfae2d141a
half_set_predicate: Fix predicate assignments
2019-09-04 01:54:23 -03:00
81fbc5370d
Merge pull request #2812 from ReinUsesLisp/f2i-selector
...
shader_ir/conversion: Implement F2I and F2F F16 selector
2019-09-03 22:35:33 -04:00
d4f33b822b
Merge pull request #2811 from ReinUsesLisp/fsetp-fix
...
float_set_predicate: Add missing negation bit for the second operand
2019-09-03 22:34:34 -04:00
4d4f9cc104
video_core: Silent miscellaneous warnings ( #2820 )
...
* texture_cache/surface_params: Remove unused local variable
* rasterizer_interface: Add missing documentation commentary
* maxwell_dma: Remove unused rasterizer reference
* video_core/gpu: Sort member declaration order to silent -Wreorder warning
* fermi_2d: Remove unused MemoryManager reference
* video_core: Silent unused variable warnings
* buffer_cache: Silent -Wreorder warnings
* kepler_memory: Remove unused MemoryManager reference
* gl_texture_cache: Add missing override
* buffer_cache: Add missing include
* shader/decode: Remove unused variables
2019-08-30 14:08:00 -04:00
f8cc5668f8
Merge pull request #2758 from ReinUsesLisp/packed-tid
...
shader/decode: Implement S2R Tic
2019-08-29 12:58:43 -04:00
e3534700d7
shader_ir/conversion: Split int and float selector and implement F2F H1
2019-08-28 16:09:33 -03:00
b13fbc25b8
shader_ir/conversion: Implement F2I F16 Ra.H1
2019-08-27 23:40:40 -03:00
6207751b00
float_set_predicate: Add missing negation bit for the second operand
2019-08-27 21:57:43 -03:00
4e35177e23
shader_ir: Implement VOTE
...
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics
Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.
To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:
* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true
ballotARB, also known as "uint64_t(activeThreadsNV())", emits
VOTE.ANY Rd, PT, PT;
on nouveau's compiler. This doesn't match exactly to Nvidia's code
VOTE.ALL Rd, PT, PT;
Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
2019-08-21 14:50:38 -03:00
dfdd20142e
Merge pull request #2777 from ReinUsesLisp/hsetp2-fe3h-fix
...
half_set_predicate: Fix HSETP2_C constant buffer offset
2019-08-21 10:29:17 -04:00
cedc1aab4a
Merge pull request #2753 from FernandoS27/float-convert
...
Shader_Ir: Implement F16 Variants of F2F, F2I, I2F.
2019-08-21 10:27:57 -04:00
ca61e298b3
Merge pull request #2778 from ReinUsesLisp/nop
...
shader_ir: Implement NOP
2019-08-18 08:51:34 -04:00
2ff8044806
shader_ir: Implement NOP
2019-08-04 03:02:55 -03:00
ec0da3ef64
half_set_predicate: Fix HSETP2_C constant buffer offset
2019-08-04 02:50:55 -03:00
77f1a676a1
decode/half_set_predicate: Fix predicates
2019-07-26 00:12:38 -03:00
b0ff3179ef
Merge pull request #2739 from lioncash/cflow
...
video_core/control_flow: Minor changes/warning cleanup
2019-07-25 13:04:56 -04:00
4d26550f5f
Merge pull request #2737 from FernandoS27/track-fix
...
Shader_Ir: Correct tracking to track from right to left
2019-07-25 12:41:52 -04:00
31e8a61527
Merge pull request #2743 from FernandoS27/surpress-assert
...
Downgrade and suppress a series of GPU asserts and debug messages.
2019-07-25 12:34:36 -04:00
104641db07
shader/decode: Implement S2R Tic
2019-07-22 16:16:10 -03:00
11f4e739bd
Shader_Ir: Implement F16 Variants of F2F, F2I, I2F.
...
This commit takes care of implementing the F16 Variants of the
conversion instructions and makes sure conversions are done.
2019-07-20 17:38:25 -04:00
1158777737
Shader_Ir: Change Debug Asserts for Log Warnings
2019-07-19 22:15:34 -04:00
45c162444d
shader/half_set_predicate: Fix HSETP2 implementation
2019-07-19 22:21:22 -03:00
6c4985edc9
shader/half_set_predicate: Implement missing HSETP2 variants
2019-07-19 22:20:47 -03:00
c1c89411da
video_core/control_flow: Provide operator!= for types with operator==
...
Provides operational symmetry for the respective structures.
2019-07-18 21:03:31 -04:00
1780e0e3d0
video_core/control_flow: Prevent sign conversion in TryGetBlock()
...
The return value is a u32, not an s32, so this would result in an
implicit signedness conversion.
2019-07-18 21:03:31 -04:00
a162a844d2
video_core/control_flow: Remove unnecessary BlockStack copy constructor
...
This is the default behavior of the copy constructor, so it doesn't need
to be specified.
While we're at it we can make the other non-default constructor
explicit.
2019-07-18 21:03:30 -04:00
56bc11d952
video_core/control_flow: Use std::move where applicable
...
Results in less work being done where avoidable.
2019-07-18 21:03:30 -04:00
e7b39f47f8
video_core/control_flow: Use the prefix variant of operator++ for iterators
...
Same thing, but potentially allows a standard library implementation to
pick a more efficient codepath.
2019-07-18 21:03:30 -04:00
6885e7e7ec
video_core/control_flow: Use empty() member function for checking emptiness
...
It's what it's there for.
2019-07-18 21:03:30 -04:00
45fa12a05c
video_core: Resolve -Wreorder warnings
...
Ensures that the constructor members are always initialized in the order
that they're declared in.
2019-07-18 21:03:30 -04:00
47df844338
video_core/control_flow: Make program_size for ScanFlow() a std::size_t
...
Prevents a truncation warning from occurring with MSVC. Also the
internal data structures already treat it as a size_t, so this is just a
discrepancy in the interface.
2019-07-18 21:03:29 -04:00
3df9558593
video_core/control_flow: Place all internally linked types/functions within an anonymous namespace
...
Previously, quite a few functions were being linked with external
linkage.
2019-07-18 21:03:29 -04:00
1109db86b7
video_core/shader/decode: Prevent sign-conversion warnings
...
Makes it explicit that the conversions here are intentional.
2019-07-18 21:03:29 -04:00
63bda67a34
Merge pull request #2738 from lioncash/shader-ir
...
shader-ir: Minor cleanup-related changes
2019-07-18 13:52:01 -04:00
5a06e33859
Shader_Ir: correct clang format
2019-07-18 10:09:26 -04:00
0b65e9335e
Shader_Ir: Downgrade precision and rounding asserts to debug asserts.
...
This commit reduces the sevirity of asserts for FP precision and
rounding as this are well known and have little to no consequences in
gpu's accuracy.
2019-07-18 08:17:19 -04:00
223a535f3f
Merge pull request #2740 from lioncash/bra
...
shader/decode/other: Correct branch indirect argument within BRA handling
2019-07-17 14:25:08 -04:00
bebbdc2067
shader_ir: std::move Node instance where applicable
...
These are std::shared_ptr instances underneath the hood, which means
copying them isn't as cheap as a regular pointer. Particularly so on
weakly-ordered systems.
This avoids atomic reference count increments and decrements where they
aren't necessary for the core set of operations.
2019-07-16 19:49:23 -04:00
60926ac16b
shader_ir: Rename Get/SetTemporal to Get/SetTemporary
...
This is more accurate in terms of describing what the functions are
actually doing. Temporal relates to time, not the setting of a temporary
itself.
2019-07-16 19:47:43 -04:00