Commit Graph

79 Commits

Author SHA1 Message Date
Wunk e13735b624
video_core: Implement an arm64 shader-jit backend (#7002)
* externals: Add oaksim submodule

Used for emitting ARM64 assembly

* common: Implement aarch64 ABI

Utilize oaknut to implement a stack frame.

* tests: Allow shader-jit tests for x64 and a64

Run the shader-jit tests for both x86_64 and arm64 targets

* video_core: Initialize arm64 shader-jit backend

Passes all current unit tests!

* shader_jit_a64: protect/unprotect memory when jit-ing

Required on MacOS. Memory needs to be fully unprotected and then
re-protected when writing or there will be memory access errors on
MacOS.

* shader_jit_a64: Fix ARM64-Imm overflow

These conditionals were throwing exceptions since the immediate values
were overflowing the available space in the `EOR` instructions. Instead
they are generated from `MOV` and then `EOR`-ed after.

* shader_jit_a64: Fix Geometry shader conditional

* shader_jit_a64: Replace `ADRL` with `MOVP2R`

Fixes some immediate-generation exceptions.

* common/aarch64: Fix CallFarFunction

* shader_jit_a64: Optimize `SantitizedMul`

Co-authored-by: merryhime <merryhime@users.noreply.github.com>

* shader_jit_a64: Fix address register offset behavior

Based on https://github.com/citra-emu/citra/pull/6942
Passes unit tests.

* shader_jit_a64: Fix `RET` address offset

A64 stack is 16-byte aligned rather than 8. So a direct port of the x64
code won't work. Fixes weird branches into invalid memory for any
shaders with subroutines.

* shader_jit_a64: Increase max program size

Tuned for A64 program size.

* shader_jit_a64: Use `UBFX` for extracting loop-state

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Optimize `SUB+CMP` to `SUBS`

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Optimize `CMP+B` to `CBNZ`

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Use `FMOV` for `ONE` vector

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Remove x86-specific documentation

* shader_jit_a64: Use `UBFX` to extract exponent

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Remove redundant MIN/MAX `SRC2`-NaN check

Special handling only needs to check SRC1 for NaN, not SRC2.
It would work as follows in the four possible cases:

No NaN: No special handling needed.
Only SRC1 is NaN: The special handling is triggered because SRC1 is NaN, and SRC2 is picked.
Only SRC2 is NaN: FMAX automatically picks SRC2 because it always picks the NaN if there is one.
Both SRC1 and SRC2 are NaN: The special handling is triggered because SRC1 is NaN, and SRC2 is picked.

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit/tests:: Add catch-stringifier for vec2f/vec3f

* shader_jit/tests: Add Dest Mask unit test

* shader_jit_a64: Fix Dest-Mask `BSL` operand order

Passes the dest-mask unit tests now.

* shader_jit_a64: Use `MOVI` for DestEnable mask

Accelerate certain cases of masking with MOVI as well

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit/tests: Add source-swizzle unit test

This is not expansive. Generating all `4^4` cases seems to make Catch2
crash. So I've added some component-masking(non-reordering) tests based
on the Dest-Mask unit-test and some additional ones to test
broadcasts/splats and component re-ordering.

* shader_jit_a64: Fix swizzle index generation

This was still generating `SHUFPS` indices and not the ones that we wanted for the `TBL` instruction. Passes all unit tests now.

* shader_jit/tests: Add `ShaderSetup` constructor to `ShaderTest`

Rather than using the direct output of `CompileShaderSetup` allow a
`ShaderSetup` object to be passed in directly.  This enabled the ability
emit assembly that is not directly supported by nihstro.

* shader_jit/tests: Add `CALL` unit-test

Tests nested `CALL` instructions to eventually reach an `EX2`
instruction.

EX2 is picked in particular since it is implemented as an even deeper
dispatch and ensures subroutines are properly implemented between `CALL`
instructions and implementation-calls.

* shader_jit_a64: Fix nested `BL` subroutines

`lr` was getting writen over by nested calls to `BL`, causing undefined
behavior with mixtures of `CALL`, `EX2`, and `LG2` instructions.

Each usage of `BL` is now protected with a stach push/pop to preserve
and restore teh `lr` register to allow nested subroutines to work
properly.

* shader_jit/tests: Allocate generated tests on heap

Each of these generated shader-test objects were causing the stack to
overflow.  Allocate each of the generated tests on the heap and use
unique_ptr so they only exist within the life-time of the `REQUIRE`
statement.

* shader_jit_a64: Preserve `lr` register from external function calls

`EMIT` makes an external function call, and should be preserving `lr`

* shader_jit/tests: Add `MAD` unit-test

The Inline Asm version requires an upstream fix:
https://github.com/neobrain/nihstro/issues/68

Instead, the program code is manually configured and added.

* shader_jit/tests: Fix uninitialized instructions

These `union`-type instruction-types were uninitialized, causing tests
to indeterminantly fail at times.

* shader_jit_a64: Remove unneeded `MOV`

Residue from the direct-port of x64 code.

* shader_jit_a64: Use `std::array` for `instr_table`

Add some type-safety and const-correctness around this type as well.

* shader_jit_a64: Avoid c-style offset casting

Add some more const-correctness to this function as well.

* video_core: Add arch preprocessor comments

* common/aarch64: Use X16 as the veneer register

https://developer.arm.com/documentation/102374/0101/Procedure-Call-Standard

* shader_jit/tests: Add uniform reading unit-test

Particularly to ensure that addresses are being properly truncated

* common/aarch64: Use `X0` as `ABI_RETURN`

`X8` is used as the indirect return result value in the case that the
result is bigger than 128-bits. Principally `X0` is the general-case
return register though.

* common/aarch64: Add veneer register note

`LR` is generally overwritten by `BLR` anyways, and would also be a safe
veneer to utilize for far-calls.

* shader_jit_a64: Remove unneeded scratch register from `SanitizedMul`

* shader_jit_a64: Fix CALLU condition

Should be `EQ` not `NE`. Fixes the regression on Kid Icarus.
No known regressions anymore!

---------

Co-authored-by: merryhime <merryhime@users.noreply.github.com>
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
2023-11-05 21:40:31 +01:00
GPUCode 542209c993
video_core: Initialize unrefered attributes to 1.f (#6966) 2023-09-16 14:46:44 -07:00
GPUCode 9b82de6b24
Refactor software renderer (#6621) 2023-06-24 00:59:18 +02:00
GPUCode b5d6f645bd
Prepare frontend for multiple graphics APIs (#6347)
* externals: Update dynarmic

* settings: Introduce GraphicsAPI enum

* For now it's OpenGL only but will be expanded upon later

* citra_qt: Introduce backend agnostic context management

* Mostly a direct port from yuzu

* core: Simplify context acquire

* settings: Add option to create debug contexts

* renderer_opengl: Abstract initialization to Driver

* This commit also updates glad and adds some useful extensions which we will use in part 2

* Rasterizer construction is moved to the specific renderer instead of RendererBase.
  Software rendering has been disable to achieve this but will be brought back in the next commit.

* video_core: Remove Init/Shutdown methods from renderer

* The constructor and destructor can do the same job

* In addition move opengl function loading to Qt since SDL already does this. Also remove ErrorVideoCore which is never reached

* citra_qt: Decouple software renderer from opengl part 1

* citra: Decouple software renderer from opengl part 2

* android: Decouple software renderer from opengl part 3

* swrasterizer: Decouple software renderer from opengl part 4

* This commit simply enforces the renderer naming conventions in the software renderer

* video_core: Move RendererBase to VideoCore

* video_core: De-globalize screenshot state

* video_core: Pass system to the renderers

* video_core: Commonize shader uniform data

* video_core: Abstract backend agnostic rasterizer operations

* bootmanager: Remove references to OpenGL for macOS

OpenGL macOS headers definitions clash heavily with each other

* citra_qt: Proper title for api settings

* video_core: Reduce boost usage

* bootmanager: Fix hide mouse option

Remove event handlers from RenderWidget for events that are
already handled by the parent GRenderWindow.
Also enable mouse tracking on the RenderWidget.

* android: Remove software from graphics api list

* code: Address review comments

* citra: Port per-game settings read

* Having to update the default value for all backends is a pain so lets centralize it

* android: Rename to OpenGLES

---------

Co-authored-by: MerryMage <MerryMage@users.noreply.github.com>
Co-authored-by: Vitor Kiguchi <vitor-kiguchi@hotmail.com>
2023-03-27 14:29:17 +03:00
Steveice10 a8848cce43 build: Update to support multi-arch builds. 2023-01-07 01:09:32 -08:00
Lioncash 643472e24a common/vector_math: Move Vec[x] types into the Common namespace
These types are within the common library, so they should be using the
Common namespace.
2019-03-02 15:04:13 +01:00
tgsm d6c530d08c video_core: use nested namespaces 2019-02-19 03:09:57 -05:00
Weiyi Wang 7d8f115185 Prefix all size_t with std::
done automatically by executing regex replace `([^:0-9a-zA-Z_])size_t([^0-9a-zA-Z_])` -> `$1std::size_t$2`
2018-09-06 16:03:28 -04:00
wwylele 0eab948728 reformat all files with clang-format 2018-06-29 16:56:12 +03:00
wwylele 7c5a76e58b log: replace all NGLOG with LOG 2018-06-29 14:18:07 +03:00
NarcolepticK 9ae70e733f video-core: Migrate logging macros (#3878)
* video-core: Migrate logging macros

* video-core: Fixed missed clang format

* video-core: Migrated LOG_GENERIC macro
2018-06-29 00:13:30 +03:00
Daniel Lim Wee Soong 98760336be video_core/shader/shader: Remove include cinttypes 2018-03-28 22:40:16 +08:00
Daniel Lim Wee Soong 968569aa61 Replace format specifiers for all usages of ASSERT_MSG 2018-03-27 23:28:42 +08:00
James Rowe f61141e86a Update the entire application to use the new clang format style 2018-03-09 10:54:43 -07:00
Dwayne Slater 41929371dc Optimize AttributeBuffer to OutputVertex conversion (#3283)
Optimize AttributeBuffer to OutputVertex conversion

First I unrolled the inner loop, then I pushed semantics validation
outside of the hotloop.

I also added overflow slots to avoid conditional branches.

Super Mario 3D Land's intro runs at almost full speed when compiled with
Clang, and theres a noticible speed increase in MSVC. GCC hasn't been
tested but I'm confident in its ability to optimize this code.
2018-01-02 15:32:33 -08:00
Yuri Kunde Schlesner 230a7557f1 Shader: Store AttributeBuffers in GS output buffer
This also does the output masking early at EMIT time, instead of when a
triangle is sent to the vertex handler.
2017-12-09 20:33:59 -08:00
Yuri Kunde Schlesner 0184419814 Shader: Refactor output_mask copy loop to function 2017-12-09 20:31:24 -08:00
Huw Pascoe a234e4c200 Improved performance of FromAttributeBuffer
Ternary operator is optimized by the compiler
whereas std::min() is meant to return a value.

I've noticed a 5%-10% emulation speed increase.
2017-09-17 15:56:36 +01:00
wwylele bb63ae3052 correct constness 2017-08-19 10:13:20 +03:00
wwylele 46c6973d2b pica/shader: extend UnitState for GS
Among four shader units in pica, a special unit can be configured to run both VS and GS program. GSUnitState represents this unit, which extends UnitState (which represents the other three normal units) with extra state for primitive emitting. It uses lots of raw pointers to represent internal structure in order to keep it standard layout type for JIT to access.
This unit doesn't handle triangle winding (inverting) itself; instead, it calls a WindingSetter handler. This will be explained in the following commits
2017-08-19 10:13:20 +03:00
Yuri Kunde Schlesner 443bb3d522 Merge pull request #2550 from yuriks/pica-refactor2
Small VideoCore cleanups
2017-02-12 12:33:26 -08:00
Yuri Kunde Schlesner e2fa1ca5e1 video_core: Fix benign out-of-bounds indexing of array (#2553)
The resulting pointer wasn't written to unless the index was verified as
valid, but that's still UB and triggered debug checks in MSVC.

Reported by garrettboast on IRC
2017-02-10 20:51:09 -08:00
Yuri Kunde Schlesner 60fc0b086f VideoCore: Split regs.h inclusions 2017-02-09 00:04:24 -08:00
Yuri Kunde Schlesner 5759d94b5c VideoCore: Move Regs to its own file 2017-02-04 13:59:12 -08:00
Yuri Kunde Schlesner f7c7f422c6 VideoCore: Split shader regs from Regs struct 2017-02-04 13:59:11 -08:00
Yuri Kunde Schlesner 000e78144c VideoCore: Split rasterizer regs from Regs struct 2017-02-04 13:08:47 -08:00
Yuri Kunde Schlesner dcdffabfe6 VideoCore: Extract swrast-specific data from OutputVertex 2017-01-29 21:31:38 -08:00
Yuri Kunde Schlesner 8ed9f9d49f VideoCore/Shader: Clean up OutputVertex::FromAttributeBuffer
This also fixes a long-standing but neverthless harmless memory
corruption bug, whech the padding of the OutputVertex struct would get
corrupted by unused attributes.
2017-01-29 21:31:38 -08:00
Yuri Kunde Schlesner 92bf5c88e6 VideoCore: Split shader output writing from semantic loading 2017-01-29 21:31:37 -08:00
Yuri Kunde Schlesner 335df895b9 VideoCore: Consistently use shader configuration to load attributes 2017-01-29 21:31:37 -08:00
Yuri Kunde Schlesner ab6954e942 VideoCore: Rename some types to more accurate names 2017-01-29 21:31:36 -08:00
Yuri Kunde Schlesner 6fa3687afc Shader: Remove OutputRegisters struct 2017-01-25 18:53:25 -08:00
Yuri Kunde Schlesner 9ea5eacf91 Shader: Initialize conditional_code in interpreter
This doesn't belong in LoadInputVertex because it also happens for
non-VS invocations. Since it's not used by the JIT it seems adequate to
initialize it in the interpreter which is the only thing that cares
about them.
2017-01-25 18:53:24 -08:00
Yuri Kunde Schlesner 114d6b2f97 VideoCore/Shader: Split interpreter and JIT into separate ShaderEngines 2017-01-25 18:53:24 -08:00
Yuri Kunde Schlesner 8eefc62833 VideoCore/Shader: Rename shader_jit_x64{ => _compiler}.{cpp,h} 2017-01-25 18:53:23 -08:00
Yuri Kunde Schlesner dd4a1672a7 VideoCore/Shader: Split shader uniform state and shader engine
Currently there's only a single dummy implementation, which will be
split in a following commit.
2017-01-25 18:53:23 -08:00
Yuri Kunde Schlesner bd82cffd0b VideoCore/Shader: Add constness to methods 2017-01-25 18:53:23 -08:00
Yuri Kunde Schlesner 1e1f939817 VideoCore/Shader: Use only entry_point as ShaderSetup param
This removes all implicit dependency of ShaderState on global PICA
state.
2017-01-25 18:53:23 -08:00
Yuri Kunde Schlesner e3caf669b0 VideoCore/Shader: Use self instead of g_state.vs in ShaderSetup 2017-01-25 18:53:23 -08:00
Yuri Kunde Schlesner 34d581f2dc VideoCore/Shader: Extract input vertex loading code into function 2017-01-25 18:53:20 -08:00
Kloen 5cc94c17f6 video_core: fix shader.cpp signed / unsigned warning 2017-01-23 16:53:31 +01:00
Yuri Kunde Schlesner c135317de1 VideoCore/Shader: Extract DebugData out from UnitState 2016-12-16 00:16:25 -08:00
Yuri Kunde Schlesner f00ada3363 VideoCore: Eliminate an unnecessary copy in the drawcall loop 2016-12-14 21:00:29 -08:00
Yuri Kunde Schlesner 26b68313b9 VideoCore: Fix out-of-bounds read in ShaderSetup::ProduceDebugInfo
As far as I can tell, memset was replaced by a fill without correcting
the parameter type, causing an out-of-bounds array read in the Vec4
constructor.
2016-09-29 21:11:36 -07:00
Yuri Kunde Schlesner 84fbbe2629 Use negative priorities to avoid special-casing the self-include 2016-09-21 00:15:56 -07:00
Emmanuel Gil Peyrot ebdae19fd2 Remove empty newlines in #include blocks.
This makes clang-format useful on those.

Also add a bunch of forgotten transitive includes, which otherwise
prevented compilation.
2016-09-21 11:15:47 +09:00
Yuri Kunde Schlesner 396a8d91a4 Manually tweak source formatting and then re-run clang-format 2016-09-18 21:14:25 -07:00
Emmanuel Gil Peyrot dc8479928c Sources: Run clang-format on everything. 2016-09-18 09:38:01 +09:00
Jannik Vogel ff0fa86b17 Retrieve shader result from new OutputRegisters-type 2016-05-16 18:55:51 +02:00
Jannik Vogel 1308afe2c2 Use new shader-jit signature for interpreter 2016-05-13 09:41:55 +02:00