Commit Graph

273 Commits

Author SHA1 Message Date
017474c3f8 shader/shift: Implement SHF_LEFT_{IMM,R}
Shifts a pair of registers to the left and returns the high register.
2020-02-01 21:19:44 -03:00
d95d4ac843 shader/memory: Implement ATOM.ADD
ATOM operates atomically on global memory. For now only add ATOM.ADD
since that's what was found in commercial games.

This asserts for ATOM.ADD.S32 (handling the others as unimplemented),
although ATOM.ADD.U32 shouldn't be any different.

This change forces us to change the default type on SPIR-V storage
buffers from float to uint. We could also alias the buffers, but it's
simpler for now to just use uint. While we are at it, abstract the code
to avoid repetition.
2020-01-26 01:54:24 -03:00
63ba41a26d shader/memory: Implement ATOMS.ADD.U32 2020-01-16 17:30:55 -03:00
028b2718ed Merge pull request #3239 from ReinUsesLisp/p2r
shader/p2r: Implement P2R Pr
2019-12-31 20:37:16 -05:00
8a76f816a4 Merge pull request #3228 from ReinUsesLisp/ptp
shader/texture: Implement AOFFI and PTP for TLD4 and TLD4S
2019-12-26 21:43:44 -05:00
cf27b59493 shader/r2p: Refactor P2R to support P2R 2019-12-20 17:55:42 -03:00
8b26b4228b shader_bytecode: Fix TLD4S encoding 2019-12-17 23:32:10 -03:00
e09c1fbc1f shader/texture: Implement TLD4.PTP 2019-12-16 04:09:24 -03:00
af89723fa3 Shader_Ir: Correct TLD4S encoding and implement f16 flag. 2019-12-11 19:53:17 -04:00
425a254fa2 shader: Implement MEMBAR.GL
Implement using memoryBarrier in GLSL and OpMemoryBarrier on SPIR-V.
2019-12-10 16:45:03 -03:00
6233b1db08 shader_ir/memory: Implement patch stores 2019-12-09 23:25:21 -03:00
74f515e8b6 shader_bytecode: Remove corrupted character 2019-12-06 20:31:56 -03:00
cd0f5dfc17 Shader_IR: Implement TXD instruction. 2019-11-14 11:15:27 -04:00
f3d1b370aa Shader_IR: Implement FLO instruction. 2019-11-14 11:15:27 -04:00
95137a04e1 Shader_Bytecode: Add encodings for FLO, SHF and TXD 2019-11-14 11:15:26 -04:00
b6f6733131 Merge pull request #3081 from ReinUsesLisp/fswzadd-shuffles
shader: Implement FSWZADD and reimplement SHFL
2019-11-14 10:27:27 -04:00
096f339a2a video_core: Silence implicit conversion warnings 2019-11-08 22:48:50 +00:00
56e237d1f9 shader_ir/warp: Implement FSWZADD 2019-11-07 20:08:41 -03:00
9293c3a0f2 Shader_IR: Fix TLD4 and add Bindless Variant.
This commit fixes an issue where not all 4 results of tld4 were being
written, the color component was defaulted to red, among other things.
It also implements the bindless variant.
2019-10-30 12:02:03 -04:00
7fdf991097 shader_bytecode: Make Matcher constexpr capable
Greatly shrinks the amount of generated code for GetDecodeTable().

Collapses an assembly output of 9000+ lines down to ~3621 with Clang,
and 6513 down to ~2616 with GCC, given it's now allowed to construct all
the entries as a sequence of constant data.
2019-10-24 01:10:10 -04:00
376f1a4432 Merge pull request #2869 from ReinUsesLisp/suld
shader/image: Implement SULD and fix SUATOM
2019-09-23 21:47:03 -04:00
9286976948 Merge pull request #2878 from FernandoS27/icmp
shader_ir: Implement ICMP
2019-09-21 18:06:07 -03:00
44000971e2 gl_shader_decompiler: Use uint for images and fix SUATOM
In the process remove implementation of SUATOM.MIN and SUATOM.MAX as
these require a distinction between U32 and S32. These have to be
implemented with imageCompSwap loop.
2019-09-21 17:33:52 -03:00
675f23aedc shader/image: Implement SULD and remove irrelevant code
* Implement SULD as float.
* Remove conditional declaration of GL_ARB_shader_viewport_layer_array.
2019-09-21 17:32:48 -03:00
4de0f1e1c8 shader_bytecode: Add SULD encoding 2019-09-21 17:31:46 -03:00
527b841c15 Shader_IR: ICMP corrections and fixes 2019-09-21 14:28:03 -04:00
4b81d19a1a Shader_IR: Implement ICMP. 2019-09-19 20:56:29 -04:00
0526bf1895 shader_ir/warp: Implement SHFL 2019-09-17 17:44:07 -03:00
36abf67e79 shader/image: Implement SUATOM and fix SUST 2019-09-10 20:22:31 -03:00
77ef4fa907 shader/shift: Implement SHR wrapped and clamped variants
Nvidia defaults to wrapped shifts, but this is undefined behaviour on
OpenGL's spec. Explicitly mask/clamp according to what the guest shader
requires.
2019-09-04 01:55:24 -03:00
81fbc5370d Merge pull request #2812 from ReinUsesLisp/f2i-selector
shader_ir/conversion: Implement F2I and F2F F16 selector
2019-09-03 22:35:33 -04:00
d4f33b822b Merge pull request #2811 from ReinUsesLisp/fsetp-fix
float_set_predicate: Add missing negation bit for the second operand
2019-09-03 22:34:34 -04:00
e3534700d7 shader_ir/conversion: Split int and float selector and implement F2F H1 2019-08-28 16:09:33 -03:00
b13fbc25b8 shader_ir/conversion: Implement F2I F16 Ra.H1 2019-08-27 23:40:40 -03:00
6207751b00 float_set_predicate: Add missing negation bit for the second operand 2019-08-27 21:57:43 -03:00
4e35177e23 shader_ir: Implement VOTE
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics

Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.

To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:

* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true

ballotARB, also known as "uint64_t(activeThreadsNV())", emits

VOTE.ANY Rd, PT, PT;

on nouveau's compiler. This doesn't match exactly to Nvidia's code

VOTE.ALL Rd, PT, PT;

Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
2019-08-21 14:50:38 -03:00
cedc1aab4a Merge pull request #2753 from FernandoS27/float-convert
Shader_Ir: Implement F16 Variants of F2F, F2I, I2F.
2019-08-21 10:27:57 -04:00
2ff8044806 shader_ir: Implement NOP 2019-08-04 03:02:55 -03:00
11f4e739bd Shader_Ir: Implement F16 Variants of F2F, F2I, I2F.
This commit takes care of implementing the F16 Variants of the 
conversion instructions and makes sure conversions are done.
2019-07-20 17:38:25 -04:00
6c4985edc9 shader/half_set_predicate: Implement missing HSETP2 variants 2019-07-19 22:20:47 -03:00
1bdb59fc6e Merge pull request #2695 from ReinUsesLisp/layer-viewport
gl_shader_decompiler: Implement gl_ViewportIndex and gl_Layer in vertex shaders
2019-07-15 16:28:07 -04:00
0ec9da2f9f Merge pull request #2692 from ReinUsesLisp/tlds-f16
shader/texture: Add F16 support for TLDS
2019-07-14 08:44:38 -04:00
8a6fc529a9 shader_ir: Implement BRX & BRA.CC 2019-07-09 08:14:37 -04:00
c9d886c84e gl_shader_decompiler: Implement gl_ViewportIndex and gl_Layer in vertex shaders
This commit implements gl_ViewportIndex and gl_Layer in vertex and
geometry shaders. In the case it's used in a vertex shader, it requires
ARB_shader_viewport_layer_array. This extension is available on AMD and
Nvidia devices (mesa and proprietary drivers), but not available on
Intel on any platform. At the moment of writing this description I don't
know if this is a hardware limitation or a driver limitation.

In the case that ARB_shader_viewport_layer_array is not available,
writes to these registers on a vertex shader are ignored, with the
appropriate logging.
2019-07-07 20:42:55 -03:00
d0966b9f7c shader/texture: Add F16 support for TLDS 2019-07-07 16:05:56 -03:00
4d63f97945 shader_bytecode: Include missing <array> 2019-06-24 01:51:02 -03:00
06c4ce8645 shader: Decode SUST and implement backing image functionality 2019-06-20 21:38:33 -03:00
4e81fc8296 shader: Implement texture buffers 2019-06-20 21:36:12 -03:00
a32c52b1d8 shader_bytecode: Mark EXIT as flow instruction 2019-06-04 12:18:35 -04:00
75e7b45d69 shader/memory: Implement ST (generic memory) 2019-05-20 22:41:53 -03:00