History log of /linux-master/drivers/gpu/drm/i915/i915_gem.c
Revision Date Author Comments
# 86ceaaae 28-Nov-2023 Jonathan Cavitt <jonathan.cavitt@intel.com>

drm/i915/gem: Atomically invalidate userptr on mmu-notifier

Never block for outstanding work on userptr object upon receipt of a
mmu-notifier. The reason we originally did so was to immediately unbind
the userptr and unpin its pages, but since that has been dropped in
commit b4b9731b02c3c ("drm/i915: Simplify userptr locking"), we never
return the pages to the system i.e. never drop our page->mapcount and so
do not allow the page and CPU PTE to be revoked. Based on this history,
we know we are safe to drop the wait entirely.

Upon return from mmu-notifier, we will still have the userptr pages
pinned preventing the following PTE operation (such as try_to_unmap)
adjusting the vm_area_struct, so it is safe to keep the pages around for
as long as we still have i/o pending.

We do not have any means currently to asynchronously revalidate the
userptr pages, that is always prior to next use.

Signed-off-by: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231128162505.3493942-1-jonathan.cavitt@intel.com


# 607a2c64 02-Nov-2023 Jani Nikula <jani.nikula@intel.com>

drm/i915: move display spinlock init to display code

The gem code has no business accessing i915->display directly.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231102155223.2298316-2-jani.nikula@intel.com


# 2b562f03 28-Sep-2023 Mathias Krause <minipli@grsecurity.net>

drm/i915: Register engines early to avoid type confusion

Commit 1ec23ed7126e ("drm/i915: Use uabi engines for the default engine
map") switched from using for_each_engine() to for_each_uabi_engine() to
iterate over the user engines. While this seems to be a sensible change,
it's only safe to do when the engines are actually chained using the
rb-tree structure which is not the case during early driver
initialization where it can be either a lock-less list or regular
double-linked list.

In fact, the modesetting initialization code may end up calling
default_engines() through the fb helper code while the engines list
is still llist_node-based:

i915_driver_probe() ->
intel_display_driver_probe() ->
intel_fbdev_init() ->
drm_fb_helper_init() ->
drm_client_init() ->
drm_client_open() ->
drm_file_alloc() ->
i915_driver_open() ->
i915_gem_open() ->
i915_gem_context_open() ->
i915_gem_create_context() ->
default_engines()

Using for_each_uabi_engine() in default_engines() is therefore wrong, as
it would try to interpret the llist as rb-tree, making it find no engine
at all, as the rb_left and rb_right members will still be NULL, as they
haven't been initialized yet.

To fix this type confusion register the engines earlier and at the same
time reduce the amount of code that has to deal with the intermediate
llist state.

Reported-by: sanity checks in grsecurity
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 1ec23ed7126e ("drm/i915: Use uabi engines for the default engine map")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230928182019.10256-2-minipli@grsecurity.net
[tursulin: fixed commit tag typo]


# c1464a89 30-Aug-2023 Jani Nikula <jani.nikula@intel.com>

drm/i915: add minimal i915_gem_object_frontbuffer.h

Split out frontbuffer related declarations and static inlines from
gem/i915_gem_object.h into new gem/i915_gem_object_frontbuffer.h.

The main goal is to reduce header interdependencies. With
gem/i915_gem_object.h including display/intel_frontbuffer.h,
modification of the latter causes a whopping 300+ objects to be rebuilt,
while many of the source files actually needing it aren't explicitly
including it at all.

After the change, only 21 objects depend on display/intel_frontbuffer.h,
directly or indirectly.

Cc: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230830085127.2416842-1-jani.nikula@intel.com


# 6007265a 28-Sep-2023 Mathias Krause <minipli@grsecurity.net>

drm/i915: Register engines early to avoid type confusion

Commit 1ec23ed7126e ("drm/i915: Use uabi engines for the default engine
map") switched from using for_each_engine() to for_each_uabi_engine() to
iterate over the user engines. While this seems to be a sensible change,
it's only safe to do when the engines are actually chained using the
rb-tree structure which is not the case during early driver
initialization where it can be either a lock-less list or regular
double-linked list.

In fact, the modesetting initialization code may end up calling
default_engines() through the fb helper code while the engines list
is still llist_node-based:

i915_driver_probe() ->
intel_display_driver_probe() ->
intel_fbdev_init() ->
drm_fb_helper_init() ->
drm_client_init() ->
drm_client_open() ->
drm_file_alloc() ->
i915_driver_open() ->
i915_gem_open() ->
i915_gem_context_open() ->
i915_gem_create_context() ->
default_engines()

Using for_each_uabi_engine() in default_engines() is therefore wrong, as
it would try to interpret the llist as rb-tree, making it find no engine
at all, as the rb_left and rb_right members will still be NULL, as they
haven't been initialized yet.

To fix this type confusion register the engines earlier and at the same
time reduce the amount of code that has to deal with the intermediate
llist state.

Reported-by: sanity checks in grsecurity
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 1ec23ed7126e ("drm/i915: Use uabi engines for the default engine map")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230928182019.10256-2-minipli@grsecurity.net
[tursulin: fixed commit tag typo]
(cherry picked from commit 2b562f032fc2594fb3fac22b7a2eb3c1969a7ba3)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# e894b724 05-Jun-2023 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Use the fdinfo helper

Use the common fdinfo helper for printing the basics. Remove now unused
client id allocation code.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Rob Clark <robdclark@chromium.org>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230605123224.373633-1-tvrtko.ursulin@linux.intel.com


# 9275277d 09-May-2023 Fei Yang <fei.yang@intel.com>

drm/i915: use pat_index instead of cache_level

Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.

From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity. PAT is expected to work much like MOCS already works today,
and by design userspace is expected to select the index that exactly
matches the desired behavior described in the hardware specification.

For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For kernel objects, cache_level is used
for simplicity and backward compatibility. For Pre-gen12 platforms PAT can
have 1:1 mapping to i915_cache_level, so these two are interchangeable. see
the use of LEGACY_CACHELEVEL.

One consequence of this change is that gen8_pte_encode is no longer working
for gen12 platforms due to the fact that gen12 platforms has different PAT
definitions. In the meantime the mtl_pte_encode introduced specfically for
MTL becomes generic for all gen12 platforms. This patch renames the MTL
PTE encode function into gen12_pte_encode and apply it to all gen12. Even
though this change looks unrelated, but separating them would temporarily
break gen12 PTE encoding, thus squash them in one patch.

Special note: this patch changes the way caching behavior is controlled in
the sense that some objects are left to be managed by userspace. For such
objects we need to be careful not to change the userspace settings.There
are kerneldoc and comments added around obj->cache_coherent, cache_dirty,
and how to bypass the checkings by i915_gem_object_has_cache_level. For
full understanding, these changes need to be looked at together with the
two follow-up patches, one disables the {set|get}_caching ioctl's and the
other adds set_pat extension to the GEM_CREATE uAPI.

Bspec: 63019

Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230509165200.1740-3-fei.yang@intel.com


# d670c78e 03-Apr-2023 Jani Nikula <jani.nikula@intel.com>

drm/i915: rename intel_pm.[ch] to intel_clock_gating.[ch]

Observe that intel_pm.[ch] is now purely about clock gating, so rename
them to intel_clock_gating.[ch]. Rename the functions to
intel_clock_gating_*() to follow coding conventions.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230403122428.3526263-1-jani.nikula@intel.com


# 5d954316 31-Mar-2023 Lee Jones <lee@kernel.org>

drm/i915/i915_gem: Provide function names to complete the expected kerneldoc format

Fixes the following W=1 kernel build warning(s):

drivers/gpu/drm/i915/i915_gem.c:447: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
drivers/gpu/drm/i915/i915_gem.c:536: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
drivers/gpu/drm/i915/i915_gem.c:726: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
drivers/gpu/drm/i915/i915_gem.c:811: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst

Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Eric Anholt <eric@anholt.net>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee@kernel.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
[Jani: fix i915_gem_sw_finish_ioctl while applying]
Link: https://patchwork.freedesktop.org/patch/msgid/20230331092607.700644-15-lee@kernel.org


# badb3027 21-Mar-2023 Andi Shyti <andi.shyti@linux.intel.com>

drm/i915: Use i915 instead of dev_priv insied the file_priv structure

In the process of renaming all instances of 'dev_priv' to 'i915',
start using 'i915' within the 'drm_i915_file_private' structure.

Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230322001611.632321-1-andi.shyti@linux.intel.com


# f47e6306 28-Dec-2022 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Typecheck page lookups

We need to check that we avoid integer overflows when looking up a page,
and so fix all the instances where we have mistakenly used a plain
integer instead of a more suitable long. Be pedantic and add integer
typechecking to the lookup so that we can be sure that we are safe.
And it also uses pgoff_t as our page lookups must remain compatible with
the page cache, pgoff_t is currently exactly unsigned long.

v2: Move added i915_utils's macro into drm_util header (Jani N)
v3: Make not use the same macro name on a function. (Mauro)
For kernel-doc, macros and functions are handled in the same namespace,
the same macro name on a function prevents ever adding documentation
for it.
v4: Add kernel-doc markups to the kAPI functions and macros (Mauoro)
v5: Fix an alignment to match open parenthesis
v6: Rebase
v10: Use assert_typable instead of exactly_pgoff_t() macro. (Kees)
v11: Change the use of assert_typable to assert_same_typable (G.G)
v12: Change to use static_assert(__castable_to_type(n ,T)) style since
the assert_same_typable() macro has been dropped. (G.G)
v13: Change the use of __castable_to_type() to castable_to_type()
Remove an unnecessary header include line. (G.G)
v16: Fix "ERROR:SPACING" Checkpatch report (G.G)

Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Co-developed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> (v2)
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> (v3)
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> (v5)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221228192252.917299-2-gwan-gyeong.mun@intel.com


# 9bff18d1 23-Nov-2022 Christian König <christian.koenig@amd.com>

drm/ttm: use per BO cleanup workers

Instead of a single worker going over the list of delete BOs in regular
intervals use a per BO worker which blocks for the resv object and
locking of the BO.

This not only simplifies the handling massively, but also results in
much better response time when cleaning up buffers.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221125102137.1801-3-christian.koenig@amd.com


# 0f857158 21-Nov-2022 Aravind Iddamsetty <aravind.iddamsetty@intel.com>

drm/i915/mtl: Media GT and Render GT share common GGTT

On XE_LPM+ platforms the media engines are carved out into a separate
GT but have a common GGTMMADR address range which essentially makes
the GGTT address space to be shared between media and render GT. As a
result any updates in GGTT shall invalidate TLB of GTs sharing it and
similarly any operation on GGTT requiring an action on a GT will have to
involve all GTs sharing it. setup_private_pat was being done on a per
GGTT based as that doesn't touch any GGTT structures moved it to per GT
based.

BSPEC: 63834

v2:
1. Add details to commit msg
2. includes fix for failure to add item to ggtt->gt_list, as suggested
by Lucas
3. as ggtt_flush() is used only for ggtt drop i915_is_ggtt check within
it.
4. setup_private_pat moved out of intel_gt_tiles_init

v3:
1. Move out for_each_gt from i915_driver.c (Jani Nikula)

v4: drop using RCU primitives on ggtt->gt_list as it is not an RCU list
(Matt Roper)

Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221122070126.4813-1-aravind.iddamsetty@intel.com


# ee71434e 07-Nov-2022 Aravind Iddamsetty <aravind.iddamsetty@intel.com>

drm/i915/mtl: Handle wopcm per-GT and limit calculations.

With MTL standalone media architecture the wopcm layout has changed,
with separate partitioning in WOPCM for the root GT GuC and the media
GT GuC. The size of WOPCM is 4MB with the lower 2MB reserved for the
media GT and the upper 2MB for the root GT.

Given that MTL has GuC deprivilege, the WOPCM registers are pre-locked
by the bios. Therefore, we can skip all the math for the partitioning
and just limit ourselves to sanity-checking the values.

v2: fix makefile file ordering (Jani)
v3: drop XELPM_SAMEDIA_WOPCM_SIZE, check huc instead of VDBOX (John)
v4: further clarify commit message, remove blank line (John)

Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221108020600.3575467-5-daniele.ceraolospurio@intel.com


# 9deca798 07-Nov-2022 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915/uc: fetch uc firmwares for each GT

The FW binaries are independently loaded on each GT. On MTL, the memory
is shared so we could potentially re-use a single allocation, but on
discrete multi-gt platforms we are going to need independent copies,
so it is easier to do the same on MTL as well, given that the amount
of duplicated memory is relatively small (~500K).

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221108020600.3575467-2-daniele.ceraolospurio@intel.com


# a10234fd 09-Nov-2022 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Partial abandonment of legacy DRM logging macros

Convert some usages of legacy DRM logging macros into versions which tell
us on which device have the events occurred.

v2:
* Don't have struct drm_device as local. (Jani, Ville)

v3:
* Store gt, not i915, in workaround list. (John)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221109104633.2579245-1-tvrtko.ursulin@linux.intel.com


# e66c8dcf 27-Oct-2022 Anshuman Gupta <anshuman.gupta@intel.com>

drm/i915: Encapsulate lmem rpm stuff in intel_runtime_pm

Runtime pm is not really per GT, therefore it make sense to
move lmem_userfault_list, lmem_userfault_lock and
userfault_wakeref from intel_gt to intel_runtime_pm structure,
which is embedded to i915.

No functional change.

v2:
- Fixes the code comment nit. [Matt Auld]

Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221027092242.1476080-2-anshuman.gupta@intel.com


# da3dbdfe 23-Sep-2022 Nirmoy Das <nirmoy.das@intel.com>

drm/i915: remove excessive i915_gem_drain_freed_objects

i915_gem_drain_workqueue() call i915_gem_drain_freed_objects()
so no need to call that again.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220923073515.23093-2-nirmoy.das@intel.com


# c50cec9b 23-Sep-2022 Nirmoy Das <nirmoy.das@intel.com>

drm/i915: Fix a potential UAF at device unload

i915_gem_drain_freed_objects() might not be enough to
free all the objects and RCU delayed work might get
scheduled after the i915 device struct gets freed.

Call i915_gem_drain_workqueue() to catch all RCU delayed work.

Suggested-by: Chris Wilson <chris.p.wilson@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220923073515.23093-1-nirmoy.das@intel.com


# 1cec3444 16-Sep-2022 Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>

drm/i915/gem: Flush contexts on driver release

Due to i915_perf assuming that it can use the i915_gem_context reference
to protect its i915->gem.contexts.list iteration, we need to defer removal
of the context from the list until last reference to the context is put.
However, there is a risk of triggering kernel warning on contexts list not
empty at driver release time if we deleagate that task to a worker for
i915_gem_context_release_work(), unless that work is flushed first.
Unfortunately, it is not flushed on driver release. Fix it.

Instead of additionally calling flush_workqueue(), either directly or via
a new dedicated wrapper around it, replace last call to
i915_gem_drain_freed_objects() with existing i915_gem_drain_workqueue()
that performs both tasks.

Fixes: 75eefd82581f ("drm/i915: Release i915_gem_context from a worker")
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: stable@kernel.org # v5.16+
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220916092403.201355-2-janusz.krzysztofik@linux.intel.com


# f569ae75 15-Sep-2022 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Handle all GTs on driver (un)load paths

This, along with the changes already landed in commit 1c66a12ab431
("drm/i915: Handle each GT on init/release and suspend/resume") makes
engines from all GTs actually known to the driver.

To accomplish this we need to sprinkle a lot of for_each_gt calls around
but is otherwise pretty un-eventuful.

v2:
- Consolidate adjacent GT loops in a couple places. (Daniele)

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220915232654.3283095-5-matthew.d.roper@intel.com


# ad74457a 13-Sep-2022 Anshuman Gupta <anshuman.gupta@intel.com>

drm/i915/dgfx: Release mmap on rpm suspend

Release all mmap mapping for all lmem objects which are associated
with userfault such that, while pcie function in D3hot, any access
to memory mappings will raise a userfault.

Runtime resume the dgpu(when gem object lies in lmem).
This will transition the dgpu graphics function to D0
state if it was in D3 in order to access the mmap memory
mappings.

v2:
- Squashes the patches. [Matt Auld]
- Add adequate locking for lmem_userfault_list addition. [Matt Auld]
- Reused obj->userfault_count to avoid double addition. [Matt Auld]
- Added i915_gem_object_lock to check
i915_gem_object_is_lmem. [Matt Auld]

v3:
- Use i915_ttm_cpu_maps_iomem. [Matt Auld]
- Fix 'ret == 0 to ret == VM_FAULT_NOPAGE'. [Matt Auld]
- Reuse obj->userfault_count as a bool 0 or 1. [Matt Auld]
- Delete the mmaped obj from lmem_userfault_list in obj
destruction path. [Matt Auld]
- Get a wakeref for object destruction patch. [Matt Auld]
- Use intel_wakeref_auto to delay runtime PM. [Matt Auld]

v4:
- Avoid using mmo offset to get the vma_node. [Matt Auld]
- Added comment to use the lmem_userfault_lock. [Matt Auld]
- Get lmem_userfault_lock in i915_gem_object_release_mmap_offset.
[Matt Auld]
- Fixed kernel test robot generated warning.

v5:
- Addressed the cosmetics comments. [Andi]
- Changed i915_gem_runtime_pm_object_release_mmap_offset() name to
i915_gem_object_runtime_pm_release_mmap_offset() to be rhythmic.

PCIe Specs 5.3.1.4.1

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6331
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220913152714.16541-3-anshuman.gupta@intel.com


# f5e92d23 13-Sep-2022 Anshuman Gupta <anshuman.gupta@intel.com>

drm/i915: Refactor userfault_wakeref to re-use

Refactor userfault_wakeref to re-use for discrete lmem mmap mapping
as well, as on discrete GTT mmap are not supported. Moving
userfault_wakeref from ggtt to gt structure.

Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220913152714.16541-2-anshuman.gupta@intel.com


# 3bb6a442 01-Sep-2022 Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

drm/i915: Rename ggtt_view as gtt_view

So far, different views (normal, partial, rotated and remapped)
into the same object are only supported for GGTT mappings.
But with the upcoming VM_BIND feature, PPGTT will also use the
partial view mapping. Hence rename ggtt_view to more generic
gtt_view.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220901183854.3446-1-niranjana.vishwanathapura@intel.com


# 5aea37bf 05-Sep-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915: un-inline i915_gem_drain_freed_objects()

I can't idenfity a single hot path that would require
i915_gem_drain_freed_objects() to be inline. Un-inline it.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/6c289c55afee0d9a3067122db63277b8d60cf74f.1662390010.git.jani.nikula@intel.com


# 230bb131 05-Sep-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915: un-inline i915_gem_drain_workqueue()

i915_gem_drain_workqueue() is not used on any hot paths. Un-unline it.

Replace the do-while with a for loop while at it.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/2c89e7e0a3528caf7ba9ffa29b2bb9f13f2357d1.1662390010.git.jani.nikula@intel.com


# 5da6d6c7 29-Aug-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915: move fb_tracking under display sub-struct

Move display frontbuffer tracking related members under drm_i915_private
display sub-struct.

Rename struct i915_frontbuffer_tracking to intel_frontbuffer_tracking
while at it.

FIXME: fb_tracking.lock mutex init should be moved away from
i915_gem_init_early().

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/a5444d0a373afca46da9a2f6e4db442af21b429b.1661779055.git.jani.nikula@intel.com


# 95086cb9 24-Aug-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915: split gem quirks from display quirks

The lone gem quirk is an outlier, not even handled by the common quirk
code. Split it to a separate gem_quirks member.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/fe9c0cb1e49da0ddc31d24c996af5fd09bce3042.1661346845.git.jani.nikula@intel.com


# 9d0bad17 19-Aug-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915: move page_sizes to runtime info

If it's modified runtime, it's runtime info.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Maarten Lankhort <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/f6825dd97d2ba63aa395c30131c4b9e6ef32b0c8.1660910433.git.jani.nikula@intel.com


# 5ce8f744 16-Sep-2022 Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>

drm/i915/gem: Flush contexts on driver release

Due to i915_perf assuming that it can use the i915_gem_context reference
to protect its i915->gem.contexts.list iteration, we need to defer removal
of the context from the list until last reference to the context is put.
However, there is a risk of triggering kernel warning on contexts list not
empty at driver release time if we deleagate that task to a worker for
i915_gem_context_release_work(), unless that work is flushed first.
Unfortunately, it is not flushed on driver release. Fix it.

Instead of additionally calling flush_workqueue(), either directly or via
a new dedicated wrapper around it, replace last call to
i915_gem_drain_freed_objects() with existing i915_gem_drain_workqueue()
that performs both tasks.

Fixes: 75eefd82581f ("drm/i915: Release i915_gem_context from a worker")
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: stable@kernel.org # v5.16+
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220916092403.201355-2-janusz.krzysztofik@linux.intel.com
(cherry picked from commit 1cec34442408a77ba5396b19725fed2c398005c3)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 5f0d4d14 01-Apr-2022 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Explicitly track DRM clients

Tracking DRM clients more explicitly will allow later patches to
accumulate past and current GPU usage in a centralised place and also
consolidate access to owning task pid/name.

Unique client id is also assigned for the purpose of distinguishing/
consolidating between multiple file descriptors owned by the same process.

v2:
Chris Wilson:
* Enclose new members into dedicated structs.
* Protect against failed sysfs registration.

v3:
* sysfs_attr_init.

v4:
* Fix for internal clients.

v5:
* Use cyclic ida for client id. (Chris)
* Do not leak pid reference. (Chris)
* Tidy code with some locals.

v6:
* Use xa_alloc_cyclic to simplify locking. (Chris)
* No need to unregister individial sysfs files. (Chris)
* Rebase on top of fpriv kref.
* Track client closed status and reflect in sysfs.

v7:
* Make drm_client more standalone concept.

v8:
* Simplify sysfs show. (Chris)
* Always track name and pid.

v9:
* Fix cyclic id assignment.

v10:
* No need for a mutex around xa_alloc_cyclic.
* Refactor sysfs into own function.
* Unregister sysfs before freeing pid and name.
* Move clients setup into own function.

v11:
* Call clients init directly from driver init. (Chris)

v12:
* Do not fail client add on id wrap. (Maciej)

v13 (Lucas): Rebase.

v14:
* Dropped sysfs bits.

v15:
* Dropped tracking of pid/ and name.
* Dropped RCU freeing of the client object.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v11
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> # v11
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220401142205.3123159-2-tvrtko.ursulin@linux.intel.com


# 230523ba 20-Mar-2022 Vivek Kasireddy <vivek.kasireddy@intel.com>

drm/i915/gem: Don't evict unmappable VMAs when pinning with PIN_MAPPABLE (v2)

On platforms capable of allowing 8K (7680 x 4320) modes, pinning 2 or
more framebuffers/scanout buffers results in only one that is mappable/
fenceable. Therefore, pageflipping between these 2 FBs where only one
is mappable/fenceable creates latencies large enough to miss alternate
vblanks thereby producing less optimal framerate.

This mainly happens because when i915_gem_object_pin_to_display_plane()
is called to pin one of the FB objs, the associated vma is identified
as misplaced -- because there is no space for it in the aperture --
and therefore i915_vma_unbind() is called which unbinds and evicts it.
This misplaced vma gets subseqently pinned only when
i915_gem_object_ggtt_pin_ww() is called without PIN_MAPPABLE. This whole
thing results in a latency of ~10ms and happens every other repaint cycle.
Therefore, to fix this issue, we just ensure that the misplaced VMA
does not get evicted when we try to pin it with PIN_MAPPABLE -- by
returning early if the mappable/fenceable flag is not set.

Testcase:
Running Weston and weston-simple-egl on an Alderlake_S (ADLS) platform
with a 8K@60 mode results in only ~40 FPS (compared to ~59 FPS with
this patch). Since upstream Weston submits a frame ~7ms before the
next vblank, the latencies seen between atomic commit and flip event
are 7, 24 (7 + 16.66), 7, 24..... suggesting that it misses the
vblank every other frame.

Here is the ftrace snippet that shows the source of the ~10ms latency:
i915_gem_object_pin_to_display_plane() {
0.102 us | i915_gem_object_set_cache_level();
i915_gem_object_ggtt_pin_ww() {
0.390 us | i915_vma_instance();
0.178 us | i915_vma_misplaced();
i915_vma_unbind() {
__i915_active_wait() {
0.082 us | i915_active_acquire_if_busy();
0.475 us | }
intel_runtime_pm_get() {
0.087 us | intel_runtime_pm_acquire();
0.259 us | }
__i915_active_wait() {
0.085 us | i915_active_acquire_if_busy();
0.240 us | }
__i915_vma_evict() {
ggtt_unbind_vma() {
gen8_ggtt_clear_range() {
10507.255 us | }
10507.689 us | }
10508.516 us | }

v2:
- Expand the code comments to describe the ping-pong issue.

Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220321005431.1113890-1-vivek.kasireddy@intel.com


# eb950819 04-Mar-2022 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915/gem: Remove some unnecessary code

The test for vma should always return true, and when assigning -EBUSY
to ret, the variable should already have that value.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220304082641.308069-4-thomas.hellstrom@linux.intel.com


# d9393973 04-Mar-2022 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915: Remove the vma refcount

Now that i915_vma_parked() is taking the object lock on vma destruction,
and the only user of the vma refcount, i915_gem_object_unbind()
also takes the object lock, remove the vma refcount.

v3: Documentation update.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220304082641.308069-3-thomas.hellstrom@linux.intel.com


# e1a7ab4f 04-Mar-2022 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915: Remove the vm open count

vms are not getting properly closed. Rather than fixing that,
Remove the vm open count and instead rely on the vm refcount.

The vm open count existed solely to break the strong references the
vmas had on the vms. Now instead make those references weak and
ensure vmas are destroyed when the vm is destroyed.

Unfortunately if the vm destructor and the object destructor both
wants to destroy a vma, that may lead to a race in that the vm
destructor just unbinds the vma and leaves the actual vma destruction
to the object destructor. However in order for the object destructor
to ensure the vma is unbound it needs to grab the vm mutex. In order
to keep the vm mutex alive until the object destructor is done with
it, somewhat hackishly grab a vm_resv refcount that is released late
in the vma destruction process, when the vm mutex is no longer needed.

v2: Address review-comments from Niranjana
- Clarify that the struct i915_address_space::skip_pte_rewrite is a hack
and should ideally be replaced in an upcoming patch.
- Remove an unneeded continue in clear_vm_list and update comment.

v3:
- Documentation update
- Commit message formatting

Co-developed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220304082641.308069-2-thomas.hellstrom@linux.intel.com


# 5f2ec909 10-Feb-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915: don't include drm_cache.h in i915_drv.h

Include it only in files that use it.

Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/14edab4a193ea3f73f387a88e3836c8555401871.1644507885.git.jani.nikula@intel.com


# 5472b3f2 10-Feb-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915: split out i915_file_private.h from i915_drv.h

Limit the scope of struct drm_i915_file_private to the files that
actually need it.

Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/e375859dc1729a1b988036e4103e5b1bd48caa00.1644507885.git.jani.nikula@intel.com


# f3392b85 10-Feb-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915: remove leftover i915_gem_pm.h declarations from i915_drv.h

Remove the duplicates.

Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/48f5ace6393533372da5d13df3de0203c8db11b3.1644507885.git.jani.nikula@intel.com


# 0f341974 14-Jan-2022 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Add i915_vma_unbind_unlocked, and take obj lock for i915_vma_unbind, v2.

We want to remove more members of i915_vma, which requires the locking to
be held more often.

Start requiring gem object lock for i915_vma_unbind, as it's one of the
callers that may unpin pages.

Some special care is needed when evicting, because the last reference to
the object may be held by the VMA, so after __i915_vma_unbind, vma may be
garbage, and we need to cache vma->obj before unlocking.

Changes since v1:
- Make trylock failing a WARN. (Matt)
- Remove double i915_vma_wait_for_bind() (Matt)
- Move atomic_set to right before mutex_unlock(), to make it more clear
they belong together. (Matt)

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220114132320.109030-5-maarten.lankhorst@linux.intel.com


# 2f6b90da 10-Jan-2022 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915: Use vma resources for async unbinding

Implement async (non-blocking) unbinding by not syncing the vma before
calling unbind on the vma_resource.
Add the resulting unbind fence to the object's dma_resv from where it is
picked up by the ttm migration code.
Ideally these unbind fences should be coalesced with the migration blit
fence to avoid stalling the migration blit waiting for unbind, as they
can certainly go on in parallel, but since we don't yet have a
reasonable data structure to use to coalesce fences and attach the
resulting fence to a timeline, we defer that for now.

Note that with async unbinding, even while the unbind waits for the
preceding bind to complete before unbinding, the vma itself might have been
destroyed in the process, clearing the vma pages. Therefore we can
only allow async unbinding if we have a refcounted sg-list and keep a
refcount on that for the vma resource pages to stay intact until
binding occurs. If this condition is not met, a request for an async
unbind is diverted to a sync unbind.

v2:
- Use a separate kmem_cache for vma resources for now to isolate their
memory allocation and aid debugging.
- Move the check for vm closed to the actual unbinding thread. Regardless
of whether the vm is closed, we need the unbind fence to properly wait
for capture.
- Clear vma_res::vm on unbind and update its documentation.
v4:
- Take cache coloring into account when searching for vma resources
pending unbind. (Matthew Auld)
v5:
- Fix timeout and error check in i915_vma_resource_bind_dep_await().
- Avoid taking a reference on the object for async binding if
async unbind capable.
- Fix braces around a single-line if statement.
v6:
- Fix up the cache coloring adjustment. (Kernel test robot <lkp@intel.com>)
- Don't allow async unbinding if the vma_res pages are not the same as
the object pages. (Matthew Auld)
v7:
- s/unsigned long/u64/ in a number of places (Matthew Auld)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220110172219.107131-5-thomas.hellstrom@linux.intel.com


# db583eea 07-Jan-2022 Jani Nikula <jani.nikula@intel.com>

drm/i915: split out gem/i915_gem_userptr.h from i915_drv.h

We already have the gem/i915_gem_userptr.c file.

Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/c29f66604ebd973b8eff1cce7d7c53615a26480f.1641561552.git.jani.nikula@intel.com


# 204129a2 04-Jan-2022 Michał Winiarski <michal.winiarski@intel.com>

drm/i915: Use to_gt() helper for GGTT accesses

GGTT is currently available both through i915->ggtt and gt->ggtt, and we
eventually want to get rid of the i915->ggtt one.
Use to_gt() for all i915->ggtt accesses to help with the future
refactoring.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220104223550.56135-1-andi.shyti@linux.intel.com


# 576c4ef5 16-Dec-2021 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Force ww lock for i915_gem_object_ggtt_pin_ww, v2.

We will need the lock to unbind the vma, and wait for bind to complete.
Remove the special casing for the !ww path, and force ww locking for all.

Changes since v1:
- Pass err to for_i915_gem_ww handling for -EDEADLK handling.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211216142749.1966107-6-maarten.lankhorst@linux.intel.com


# 2cbc876d 14-Dec-2021 Michał Winiarski <michal.winiarski@intel.com>

drm/i915: Use to_gt() helper

Use to_gt() helper consistently throughout the codebase.
Pure mechanical s/i915->gt/to_gt(i915). No functional changes.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211214193346.21231-10-andi.shyti@linux.intel.com


# 775affb0 08-Nov-2021 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915/gem: Fix gem_madvise for ttm+shmem objects

Gem-TTM objects that are backed by shmem might have populated
page-vectors without having the GEM pages set. Those objects
aren't moved to the correct shrinker / purge list by gem_madvise.

For such objects, identified by having the
_SELF_MANAGED_SHRINK_LIST set, make sure they end up on the
correct list.

v2:
- Revert a change that made swapped-out objects inaccessible for
truncating. (Matthew Auld)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211108123637.929617-1-thomas.hellstrom@linux.intel.com


# 2ea6ec76 27-Oct-2021 Matthew Auld <matthew.auld@intel.com>

drm/i915: move cpu_write_needs_clflush

Move it next to its partner in crime; gpu_write_needs_clflush. For
better readability lets keep gpu vs cpu at least in the same file.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027161813.3094681-3-matthew.auld@intel.com


# d0a65249 17-Sep-2021 Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>

drm/i915: Make wa list per-gt

Support for multiple GT's within a single i915 device will be arriving
soon. Since each GT may have its own fusing and require different
workarounds, we need to make the GT workaround functions and multicast
steering setup per-gt.

Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210917170845.836358-1-matthew.d.roper@intel.com


# 5c43ec5d 17-Jun-2021 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915: Break out dma_resv ww locking utilities to separate files

As we're about to add more ww-related functionality,
break out the dma_resv ww locking utilities to their own files

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210617063018.92802-3-thomas.hellstrom@linux.intel.com


# 1c4dbe05 17-Jun-2021 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915: Reference objects on the ww object list

Since the ww transaction endpoint easily end up far out-of-scope of
the objects on the ww object list, particularly for contending lock
objects, make sure we reference objects on the list so they don't
disappear under us.

This comes with a performance penalty so it's been debated whether this
is really needed. But I think this is motivated by the fact that locking
is typically difficult to get right, and whatever we can do to make it
simpler for developers moving forward should be done, unless the
performance impact is far too high.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210617063018.92802-2-thomas.hellstrom@linux.intel.com


# dc2408d8 16-Jun-2021 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915/gem: Remove duplicated call to ops->pread

Between

commit ae30af84edb5b7cc95485922e43afd909a892e1b
Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date: Tue Mar 23 16:50:00 2021 +0100

drm/i915: Disable userptr pread/pwrite support.

and

commit 0049b688459b846f819b6e51c24cd0781fcfde41
Author: Matthew Auld <matthew.auld@intel.com>
Date: Thu Nov 5 15:49:33 2020 +0000

drm/i915/gem: Allow backends to override pread implementation

this accidentally landed twice.

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210616090350.828696-1-daniel.vetter@ffwll.ch


# 213d5092 10-Jun-2021 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915/ttm: Introduce a TTM i915 gem object backend

Most logical place to introduce TTM buffer objects is as an i915
gem object backend. We need to add some ops to account for added
functionality like delayed delete and LRU list manipulation.

Initially we support only LMEM and SYSTEM memory, but SYSTEM
(which in this case means evicted LMEM objects) is not
visible to i915 GEM yet. The plan is to move the i915 gem system region
over to the TTM system memory type in upcoming patches.

We set up GPU bindings directly both from LMEM and from the system region,
as there is no need to use the legacy TTM_TT memory type. We reserve
that for future porting of GGTT bindings to TTM.

Remove the old lmem backend.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210610070152.572423-2-thomas.hellstrom@linux.intel.com


# 651e7d48 05-Jun-2021 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915: replace IS_GEN and friends with GRAPHICS_VER

This was done by the following semantic patch:

@@ expression i915; @@
- INTEL_GEN(i915)
+ GRAPHICS_VER(i915)

@@ expression i915; expression E; @@
- INTEL_GEN(i915) >= E
+ GRAPHICS_VER(i915) >= E

@@ expression dev_priv; expression E; @@
- !IS_GEN(dev_priv, E)
+ GRAPHICS_VER(dev_priv) != E

@@ expression dev_priv; expression E; @@
- IS_GEN(dev_priv, E)
+ GRAPHICS_VER(dev_priv) == E

@@
expression dev_priv;
expression from, until;
@@
- IS_GEN_RANGE(dev_priv, from, until)
+ IS_GRAPHICS_VER(dev_priv, from, until)

@def@
expression E;
identifier id =~ "^gen$";
@@
- id = GRAPHICS_VER(E)
+ ver = GRAPHICS_VER(E)

@@
identifier def.id;
@@
- id
+ ver

It also takes care of renaming the variable we assign to GRAPHICS_VER()
so to use "ver" rather than "gen".

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210606045050.103862-2-lucas.demarchi@intel.com


# d1487389 02-Jun-2021 Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/i915/ttm Initialize the ttm device and memory managers

Temporarily remove the buddy allocator and related selftests
and hook up the TTM range manager for i915 regions.

Also modify the mock region selftests somewhat to account for a
fragmenting manager.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210602083818.241793-2-thomas.hellstrom@linux.intel.com


# 8777d17b 17-May-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Pin the L-shape quirked object as unshrinkable

When instantiating a tiled object on an L-shaped memory machine, we mark
the object as unshrinkable to prevent the shrinker from trying to swap
out the pages. We have to do this as we do not know the swizzling on the
individual pages, and so the data will be scrambled across swap out/in.

Not only do we need to move the object off the shrinker list, we need to
mark the object with shrink_pin so that the counter is consistent across
calls to madvise.

v2: in the madvise ioctl we need to check if the object is currently
shrinkable/purgeable, not if the object type supports shrinking

Fixes: 0175969e489a ("drm/i915/gem: Use shrinkable status for unknown swizzle quirks")
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3293
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3450
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v5.12+
Link: https://patchwork.freedesktop.org/patch/msgid/20210517084640.18862-1-matthew.auld@intel.com


# 73552e00 29-Apr-2021 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Remove erroneous i915_is_ggtt check for I915_GEM_OBJECT_UNBIND_VM_TRYLOCK

We changed the locking hierarchy for both ppgtt and ggtt, so both locks
should be trylocked inside i915_gem_object_unbind().

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: bc6f80cce9ae ("drm/i915: Use trylock in shrinker for ggtt on bsw vt-d and bxt, v2.")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210429120158.1105318-1-maarten.lankhorst@linux.intel.com
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #irc


# bc6f80cc 25-Apr-2021 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Use trylock in shrinker for ggtt on bsw vt-d and bxt, v2.

The stop_machine() lock may allocate memory, but is called inside
vm->mutex, which is taken in the shrinker. This will cause a lockdep
splat, as can be seen below:

<4>[ 462.585762] ======================================================
<4>[ 462.585768] WARNING: possible circular locking dependency detected
<4>[ 462.585773] 5.12.0-rc5-CI-Trybot_7644+ #1 Tainted: G U
<4>[ 462.585779] ------------------------------------------------------
<4>[ 462.585783] i915_selftest/5540 is trying to acquire lock:
<4>[ 462.585788] ffffffff826440b0 (cpu_hotplug_lock){++++}-{0:0}, at: stop_machine+0x12/0x30
<4>[ 462.585814]
but task is already holding lock:
<4>[ 462.585818] ffff888125369c70 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x38e/0xb40 [i915]
<4>[ 462.586301]
which lock already depends on the new lock.

<4>[ 462.586305]
the existing dependency chain (in reverse order) is:
<4>[ 462.586309]
-> #2 (&vm->mutex/1){+.+.}-{3:3}:
<4>[ 462.586323] i915_gem_shrinker_taints_mutex+0x2d/0x50 [i915]
<4>[ 462.586719] i915_address_space_init+0x12d/0x130 [i915]
<4>[ 462.587092] ppgtt_init+0x4e/0x80 [i915]
<4>[ 462.587467] gen8_ppgtt_create+0x3e/0x5c0 [i915]
<4>[ 462.587828] i915_ppgtt_create+0x28/0xf0 [i915]
<4>[ 462.588203] intel_gt_init+0x123/0x370 [i915]
<4>[ 462.588572] i915_gem_init+0x129/0x1f0 [i915]
<4>[ 462.588971] i915_driver_probe+0x753/0xd80 [i915]
<4>[ 462.589320] i915_pci_probe+0x43/0x1d0 [i915]
<4>[ 462.589671] pci_device_probe+0x9e/0x110
<4>[ 462.589680] really_probe+0xea/0x410
<4>[ 462.589690] driver_probe_device+0xd9/0x140
<4>[ 462.589697] device_driver_attach+0x4a/0x50
<4>[ 462.589704] __driver_attach+0x83/0x140
<4>[ 462.589711] bus_for_each_dev+0x75/0xc0
<4>[ 462.589718] bus_add_driver+0x14b/0x1f0
<4>[ 462.589724] driver_register+0x66/0xb0
<4>[ 462.589731] i915_init+0x70/0x87 [i915]
<4>[ 462.590053] do_one_initcall+0x56/0x2e0
<4>[ 462.590061] do_init_module+0x55/0x200
<4>[ 462.590068] load_module+0x2703/0x2990
<4>[ 462.590074] __do_sys_finit_module+0xad/0x110
<4>[ 462.590080] do_syscall_64+0x33/0x80
<4>[ 462.590089] entry_SYSCALL_64_after_hwframe+0x44/0xae
<4>[ 462.590096]
-> #1 (fs_reclaim){+.+.}-{0:0}:
<4>[ 462.590109] fs_reclaim_acquire+0x9f/0xd0
<4>[ 462.590118] kmem_cache_alloc_trace+0x3d/0x430
<4>[ 462.590126] intel_cpuc_prepare+0x3b/0x1b0
<4>[ 462.590133] cpuhp_invoke_callback+0x9e/0x890
<4>[ 462.590141] _cpu_up+0xa4/0x130
<4>[ 462.590147] cpu_up+0x82/0x90
<4>[ 462.590153] bringup_nonboot_cpus+0x4a/0x60
<4>[ 462.590159] smp_init+0x21/0x5c
<4>[ 462.590167] kernel_init_freeable+0x8a/0x1b7
<4>[ 462.590175] kernel_init+0x5/0xff
<4>[ 462.590181] ret_from_fork+0x22/0x30
<4>[ 462.590187]
-> #0 (cpu_hotplug_lock){++++}-{0:0}:
<4>[ 462.590199] __lock_acquire+0x1520/0x2590
<4>[ 462.590207] lock_acquire+0xd1/0x3d0
<4>[ 462.590213] cpus_read_lock+0x39/0xc0
<4>[ 462.590219] stop_machine+0x12/0x30
<4>[ 462.590226] bxt_vtd_ggtt_insert_entries__BKL+0x36/0x50 [i915]
<4>[ 462.590601] ggtt_bind_vma+0x5d/0x80 [i915]
<4>[ 462.590970] i915_vma_bind+0xdc/0x1c0 [i915]
<4>[ 462.591374] i915_vma_pin_ww+0x435/0xb40 [i915]
<4>[ 462.591779] make_obj_busy+0xcb/0x330 [i915]
<4>[ 462.592170] igt_mmap_offset_exhaustion+0x45f/0x4c0 [i915]
<4>[ 462.592562] __i915_subtests.cold.7+0x42/0x92 [i915]
<4>[ 462.592995] __run_selftests.part.3+0x10d/0x172 [i915]
<4>[ 462.593428] i915_live_selftests.cold.5+0x1f/0x47 [i915]
<4>[ 462.593860] i915_pci_probe+0x93/0x1d0 [i915]
<4>[ 462.594210] pci_device_probe+0x9e/0x110
<4>[ 462.594217] really_probe+0xea/0x410
<4>[ 462.594226] driver_probe_device+0xd9/0x140
<4>[ 462.594233] device_driver_attach+0x4a/0x50
<4>[ 462.594240] __driver_attach+0x83/0x140
<4>[ 462.594247] bus_for_each_dev+0x75/0xc0
<4>[ 462.594254] bus_add_driver+0x14b/0x1f0
<4>[ 462.594260] driver_register+0x66/0xb0
<4>[ 462.594267] i915_init+0x70/0x87 [i915]
<4>[ 462.594586] do_one_initcall+0x56/0x2e0
<4>[ 462.594592] do_init_module+0x55/0x200
<4>[ 462.594599] load_module+0x2703/0x2990
<4>[ 462.594605] __do_sys_finit_module+0xad/0x110
<4>[ 462.594612] do_syscall_64+0x33/0x80
<4>[ 462.594618] entry_SYSCALL_64_after_hwframe+0x44/0xae
<4>[ 462.594625]
other info that might help us debug this:

<4>[ 462.594629] Chain exists of:
cpu_hotplug_lock --> fs_reclaim --> &vm->mutex/1

<4>[ 462.594645] Possible unsafe locking scenario:

<4>[ 462.594648] CPU0 CPU1
<4>[ 462.594652] ---- ----
<4>[ 462.594655] lock(&vm->mutex/1);
<4>[ 462.594664] lock(fs_reclaim);
<4>[ 462.594671] lock(&vm->mutex/1);
<4>[ 462.594679] lock(cpu_hotplug_lock);
<4>[ 462.594686]
*** DEADLOCK ***

<4>[ 462.594690] 4 locks held by i915_selftest/5540:
<4>[ 462.594696] #0: ffff888100fbc240 (&dev->mutex){....}-{3:3}, at: device_driver_attach+0x18/0x50
<4>[ 462.594715] #1: ffffc900006cb9a0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: make_obj_busy+0x81/0x330 [i915]
<4>[ 462.595118] #2: ffff88812a6081e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: make_obj_busy+0x21f/0x330 [i915]
<4>[ 462.595519] #3: ffff888125369c70 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x38e/0xb40 [i915]
<4>[ 462.595934]
stack backtrace:
<4>[ 462.595939] CPU: 0 PID: 5540 Comm: i915_selftest Tainted: G U 5.12.0-rc5-CI-Trybot_7644+ #1
<4>[ 462.595947] Hardware name: GOOGLE Kefka/Kefka, BIOS MrChromebox 02/04/2018
<4>[ 462.595952] Call Trace:
<4>[ 462.595961] dump_stack+0x7f/0xad
<4>[ 462.595974] check_noncircular+0x12e/0x150
<4>[ 462.595982] ? save_stack.isra.17+0x3f/0x70
<4>[ 462.595991] ? drm_mm_insert_node_in_range+0x34a/0x5b0
<4>[ 462.596000] ? i915_vma_pin_ww+0x9ec/0xb40 [i915]
<4>[ 462.596410] __lock_acquire+0x1520/0x2590
<4>[ 462.596419] ? do_init_module+0x55/0x200
<4>[ 462.596429] lock_acquire+0xd1/0x3d0
<4>[ 462.596435] ? stop_machine+0x12/0x30
<4>[ 462.596445] ? gen8_ggtt_insert_entries+0xf0/0xf0 [i915]
<4>[ 462.596816] cpus_read_lock+0x39/0xc0
<4>[ 462.596824] ? stop_machine+0x12/0x30
<4>[ 462.596831] stop_machine+0x12/0x30
<4>[ 462.596839] bxt_vtd_ggtt_insert_entries__BKL+0x36/0x50 [i915]
<4>[ 462.597210] ggtt_bind_vma+0x5d/0x80 [i915]
<4>[ 462.597580] i915_vma_bind+0xdc/0x1c0 [i915]
<4>[ 462.597986] i915_vma_pin_ww+0x435/0xb40 [i915]
<4>[ 462.598395] ? make_obj_busy+0xcb/0x330 [i915]
<4>[ 462.598786] make_obj_busy+0xcb/0x330 [i915]
<4>[ 462.599180] ? 0xffffffff81000000
<4>[ 462.599187] ? debug_mutex_unlock+0x50/0xa0
<4>[ 462.599198] igt_mmap_offset_exhaustion+0x45f/0x4c0 [i915]
<4>[ 462.599592] __i915_subtests.cold.7+0x42/0x92 [i915]
<4>[ 462.600026] ? i915_perf_selftests+0x20/0x20 [i915]
<4>[ 462.600422] ? __i915_nop_setup+0x10/0x10 [i915]
<4>[ 462.600820] __run_selftests.part.3+0x10d/0x172 [i915]
<4>[ 462.601253] i915_live_selftests.cold.5+0x1f/0x47 [i915]
<4>[ 462.601686] i915_pci_probe+0x93/0x1d0 [i915]
<4>[ 462.602037] ? _raw_spin_unlock_irqrestore+0x3d/0x60
<4>[ 462.602047] pci_device_probe+0x9e/0x110
<4>[ 462.602057] really_probe+0xea/0x410
<4>[ 462.602067] driver_probe_device+0xd9/0x140
<4>[ 462.602075] device_driver_attach+0x4a/0x50
<4>[ 462.602084] __driver_attach+0x83/0x140
<4>[ 462.602091] ? device_driver_attach+0x50/0x50
<4>[ 462.602099] ? device_driver_attach+0x50/0x50
<4>[ 462.602107] bus_for_each_dev+0x75/0xc0
<4>[ 462.602116] bus_add_driver+0x14b/0x1f0
<4>[ 462.602124] driver_register+0x66/0xb0
<4>[ 462.602133] i915_init+0x70/0x87 [i915]
<4>[ 462.602453] ? 0xffffffffa0606000
<4>[ 462.602458] do_one_initcall+0x56/0x2e0
<4>[ 462.602466] ? kmem_cache_alloc_trace+0x374/0x430
<4>[ 462.602476] do_init_module+0x55/0x200
<4>[ 462.602484] load_module+0x2703/0x2990
<4>[ 462.602500] ? __do_sys_finit_module+0xad/0x110
<4>[ 462.602507] __do_sys_finit_module+0xad/0x110
<4>[ 462.602519] do_syscall_64+0x33/0x80
<4>[ 462.602527] entry_SYSCALL_64_after_hwframe+0x44/0xae
<4>[ 462.602535] RIP: 0033:0x7fab69d8d89d

Changes since v1:
- Add lockdep annotations during init, to ensure that lockdep is primed.
This also fixes a false positive when reading /proc/lockdep_stats
during module reload.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210426102351.921874-1-maarten.lankhorst@linux.intel.com
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


# 036867e9 17-May-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Pin the L-shape quirked object as unshrinkable

When instantiating a tiled object on an L-shaped memory machine, we mark
the object as unshrinkable to prevent the shrinker from trying to swap
out the pages. We have to do this as we do not know the swizzling on the
individual pages, and so the data will be scrambled across swap out/in.

Not only do we need to move the object off the shrinker list, we need to
mark the object with shrink_pin so that the counter is consistent across
calls to madvise.

v2: in the madvise ioctl we need to check if the object is currently
shrinkable/purgeable, not if the object type supports shrinking

Fixes: 0175969e489a ("drm/i915/gem: Use shrinkable status for unknown swizzle quirks")
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3293
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3450
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v5.12+
Link: https://patchwork.freedesktop.org/patch/msgid/20210517084640.18862-1-matthew.auld@intel.com
(cherry picked from commit 8777d17b68dcfbfbd4d524f444adefae56f41225)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# cf41a8f1 23-Mar-2021 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Finally remove obj->mm.lock.

With all callers and selftests fixed to use ww locking, we can now
finally remove this lock.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-62-maarten.lankhorst@linux.intel.com


# f1ac8a02 28-Jan-2021 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Fix pread/pwrite to work with new locking rules.

We are removing obj->mm.lock, and need to take the reservation lock
before we can pin pages. Move the pinning pages into the helper, and
merge gtt pwrite/pread preparation and cleanup paths.

The fence lock is also removed; it will conflict with fence annotations,
because of memory allocations done when pagefaulting inside copy_*_user.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
[danvet: Pick the older version to avoid the conflicts]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210128162612.927917-31-maarten.lankhorst@linux.intel.com
Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-31-maarten.lankhorst@linux.intel.com


# 7d1c2618 23-Mar-2021 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Take reservation lock around i915_vma_pin.

We previously complained when ww == NULL.

This function is now only used in selftests to pin an object,
and ww locking is now fixed.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
[danvet: Resolve conflict because we don't have a set-domain refactor,
see
https://lore.kernel.org/intel-gfx/20210203090205.25818-8-chris@chris-wilson.co.uk/

The really worrying thing here is that the above patch had a change in
arguments for i915_gem_object_set_to_gtt_domain(), without any
explanation. I decided to just faithfully apply Maarten's change but
not the argument change which was in Maarten's context diff.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-26-maarten.lankhorst@linux.intel.com


# ed29c269 23-Mar-2021 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Fix userptr so we do not have to worry about obj->mm.lock, v7.

Instead of doing what we do currently, which will never work with
PROVE_LOCKING, do the same as AMD does, and something similar to
relocation slowpath. When all locks are dropped, we acquire the
pages for pinning. When the locks are taken, we transfer those
pages in .get_pages() to the bo. As a final check before installing
the fences, we ensure that the mmu notifier was not called; if it is,
we return -EAGAIN to userspace to signal it has to start over.

Changes since v1:
- Unbinding is done in submit_init only. submit_begin() removed.
- MMU_NOTFIER -> MMU_NOTIFIER
Changes since v2:
- Make i915->mm.notifier a spinlock.
Changes since v3:
- Add WARN_ON if there are any page references left, should have been 0.
- Return 0 on success in submit_init(), bug from spinlock conversion.
- Release pvec outside of notifier_lock (Thomas).
Changes since v4:
- Mention why we're clearing eb->[i + 1].vma in the code. (Thomas)
- Actually check all invalidations in eb_move_to_gpu. (Thomas)
- Do not wait when process is exiting to fix gem_ctx_persistence.userptr.
Changes since v5:
- Clarify why check on PF_EXITING is (temporarily) required.
Changes since v6:
- Ensure userptr validity is checked in set_domain through a special path.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
[danvet: s/kfree/kvfree/ in i915_gem_object_userptr_drop_ref in the
previous review round, but which got lost. The other open questions
around page refcount are imo better discussed in a separate series,
with amdgpu folks involved].
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-17-maarten.lankhorst@linux.intel.com


# ae30af84 23-Mar-2021 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Disable userptr pread/pwrite support.

Userptr should not need the kernel for a userspace memcpy, userspace
needs to call memcpy directly.

Specifically, disable i915_gem_pwrite_ioctl() and i915_gem_pread_ioctl().

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-12-maarten.lankhorst@linux.intel.com


# aaee716e 23-Mar-2021 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Add gem object locking to madvise.

Doesn't need the full ww lock, only checking if pages are bound.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #irc
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-7-maarten.lankhorst@linux.intel.com


# f8d1ff10 17-Mar-2021 Ashutosh Dixit <ashutosh.dixit@intel.com>

drm/i915: Disable pread/pwrite ioctl's for future platforms (v3)

The rationale for this change is roughly as follows:

1. The functionality can be done entirely in userspace with a
combination of mmap + memcpy

2. The only reason anyone in userspace is still using it is because
someone implemented bo_subdata that way in libdrm ages ago and
they're all too lazy to write the 5 lines of code to do a map.

3. This falls cleanly into the category of things which will only get
more painful with local memory support.

These ioctls aren't used much anymore by "real" userspace drivers.
Vulkan has never used them and neither has the iris GL driver. The old
i965 GL driver does use PWRITE for glBufferSubData but it only supports
up through Gen11; Gen12 was never enabled in i965. The compute driver
has never used PREAD/PWRITE. The only remaining user is the media
driver which uses it exactly twice and they're easily removed [1] so
expecting them to drop it going forward is reasonable.

IGT changes which handle this kernel change have also been submitted [2].

[1] https://github.com/intel/media-driver/pull/1160
[2] https://patchwork.freedesktop.org/series/81384/

v2 (Jason Ekstrand):
- Improved commit message with the status of all usermode drivers
- A more future-proof platform check

v3 (Jason Ekstrand):
- Drop the HAS_LMEM checks as they're already covered by the version
checks

Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210317234014.2271006-4-jason@jlekstrand.net


# 29d88083 23-Jan-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Move freeze/freeze_late next to suspend/suspend_late

Push the hibernate pm routines next to the suspend pm routines in
gem/i915_gem_pm.c. This has the side-effect of putting the wbinvd()
abusers next to each other.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: 30d2bfd09383 ("drm/i915/gem: Almagamate clflushes on freeze")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210123145543.10533-1-chris@chris-wilson.co.uk
(cherry picked from commit 6d8f02207420e76db693a00ccb44792474e297fc)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 0175969e 19-Jan-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Use shrinkable status for unknown swizzle quirks

Give obj->mm.quirked a name much more reflective of its purpose
(i915_gem_object_has_tiling_quirk) and move it from the obj->mm field as
it doesn't denote a quirk of the backing store, but a quirk in the
object in its treatment of the backing pages, similar to tiling modes.

Then instead of abusing the pinned status of the buffer to protect it
from the shrinker, we can instead hide the buffer from the shrinker so
it is never considered for being swapped.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210119214336.1463-4-chris@chris-wilson.co.uk


# 30d2bfd0 19-Jan-2021 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Almagamate clflushes on freeze

When flushing objects larger than the CPU cache it is preferrable to use
a single wbinvd() rather than overlapping clflush(). At runtime, we
avoid wbinvd() due to its system-wide latencies, but during
singlethreaded suspend, no one will observe the imposed latency and we
can opt for the faster wbinvd to clear all objects in a single hit.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210119214336.1463-2-chris@chris-wilson.co.uk


# dcaccaf0 14-Jan-2021 Matthew Auld <matthew.auld@intel.com>

drm/i915/gem: split gem_create into own file

In preparation for gem_create_ext break out the gem_create uAPI, so that
we don't clutter i915_gem.c once we start adding various extensions

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20210114182402.840247-1-matthew.auld@intel.com


# f8246cf4 15-Dec-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Drop free_work for GEM contexts

The free_list and worker was introduced in commit 5f09a9c8ab6b ("drm/i915:
Allow contexts to be unreferenced locklessly"), but subsequently made
redundant by the removal of the last sleeping lock in commit 2935ed5339c4
("drm/i915: Remove logical HW ID"). As we can now free the GEM context
immediately from any context, remove the deferral of the free_list

v2: Lift removing the context from the global list into close().

Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20201215152138.8158-1-chris@chris-wilson.co.uk


# 0eb0feb9 05-Nov-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Pull phys pread/pwrite implementations to the backend

Move the specialised interactions with the physical GEM object from the
pread/pwrite ioctl handler into the phys backend.

Currently, if one is able to exhaust the entire aperture and then try to
pwrite into an object not backed by struct page, we accidentally invoked
the phys pwrite handler on a non-phys object; calamitous.

Fixes: c6790dc22312 ("drm/i915: Wean off drm_pci_alloc/drm_pci_free")
Testcase: igt/gem_pwrite/exhaustion
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20201105154934.16022-2-chris@chris-wilson.co.uk
(cherry picked from commit 852e1b3644817f071427b83859b889c788a0cf69)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 0a1db6f0 05-Nov-2020 Matthew Auld <matthew.auld@intel.com>

drm/i915/gem: Allow backends to override pread implementation

As there are more and more complicated interactions between the different
backing stores and userspace, push the control into the backends rather
than accumulate them all inside the ioctl handlers.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201105154934.16022-1-chris@chris-wilson.co.uk
(cherry picked from commit 0049b688459b846f819b6e51c24cd0781fcfde41)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 852e1b36 05-Nov-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Pull phys pread/pwrite implementations to the backend

Move the specialised interactions with the physical GEM object from the
pread/pwrite ioctl handler into the phys backend.

Currently, if one is able to exhaust the entire aperture and then try to
pwrite into an object not backed by struct page, we accidentally invoked
the phys pwrite handler on a non-phys object; calamitous.

Fixes: c6790dc22312 ("drm/i915: Wean off drm_pci_alloc/drm_pci_free")
Testcase: igt/gem_pwrite/exhaustion
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20201105154934.16022-2-chris@chris-wilson.co.uk


# 0049b688 05-Nov-2020 Matthew Auld <matthew.auld@intel.com>

drm/i915/gem: Allow backends to override pread implementation

As there are more and more complicated interactions between the different
backing stores and userspace, push the control into the backends rather
than accumulate them all inside the ioctl handlers.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20201105154934.16022-1-chris@chris-wilson.co.uk


# 47b08693 19-Aug-2020 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Make sure execbuffer always passes ww state to i915_vma_pin.

As a preparation step for full object locking and wait/wound handling
during pin and object mapping, ensure that we always pass the ww context
in i915_gem_execbuffer.c to i915_vma_pin, use lockdep to ensure this
happens.

This also requires changing the order of eb_parse slightly, to ensure
we pass ww at a point where we could still handle -EDEADLK safely.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-15-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# c43ce123 19-Aug-2020 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Use per object locking in execbuf, v12.

Now that we changed execbuf submission slightly to allow us to do all
pinning in one place, we can now simply add ww versions on top of
struct_mutex. All we have to do is a separate path for -EDEADLK
handling, which needs to unpin all gem bo's before dropping the lock,
then starting over.

This finally allows us to do parallel submission, but because not
all of the pinning code uses the ww ctx yet, we cannot completely
drop struct_mutex yet.

Changes since v1:
- Keep struct_mutex for now. :(
Changes since v2:
- Make sure we always lock the ww context in slowpath.
Changes since v3:
- Don't call __eb_unreserve_vma in eb_move_to_gpu now; this can be
done on normal unlock path.
- Unconditionally release vmas and context.
Changes since v4:
- Rebased on top of struct_mutex reduction.
Changes since v5:
- Remove training wheels.
Changes since v6:
- Fix accidentally broken -ENOSPC handling.
Changes since v7:
- Handle gt buffer pool better.
Changes since v8:
- Properly clear variables, to make -EDEADLK handling not BUG.
Change since v9:
- Fix unpinning fence on pnv and below.
Changes since v10:
- Make relocation gpu chaining working again.
Changes since v11:
- Remove relocation chaining, pain to make it work.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-9-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 1af343cd 19-Aug-2020 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Remove locking from i915_gem_object_prepare_read/write

Execbuffer submission will perform its own WW locking, and we
cannot rely on the implicit lock there.

This also makes it clear that the GVT code will get a lockdep splat when
multiple batchbuffer shadows need to be performed in the same instance,
fix that up.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-7-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 80f0b679 19-Aug-2020 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.

i915_gem_ww_ctx is used to lock all gem bo's for pinning and memory
eviction. We don't use it yet, but lets start adding the definition
first.

To use it, we have to pass a non-NULL ww to gem_object_lock, and don't
unlock directly. It is done in i915_gem_ww_ctx_fini.

Changes since v1:
- Change ww_ctx and obj order in locking functions (Jonas Lahtinen)

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200819140904.1708856-6-maarten.lankhorst@linux.intel.com
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 27a5dcfe 28-Jul-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Remove disordered per-file request list for throttling

I915_GEM_THROTTLE dates back to the time before contexts where there was
just a single engine, and therefore a single timeline and request list
globally. That request list was in execution/retirement order, and so
walking it to find a particular aged request made sense and could be
split per file.

That is no more. We now have many timelines with a file, as many as the
user wants to construct (essentially per-engine, per-context). Each of
those run independently and so make the single list futile. Remove the
disordered list, and iterate over all the timelines to find a request to
wait on in each to satisfy the criteria that the CPU is no more than 20ms
ahead of its oldest request.

It should go without saying that the I915_GEM_THROTTLE ioctl is no
longer used as the primary means of throttling, so it makes sense to push
the complication into the ioctl where it only impacts upon its few
irregular users, rather than the execbuf/retire where everybody has to
pay the cost. Fortunately, the few users do not create vast amount of
contexts, so the loops over contexts/engines should be concise.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728152010.30701-1-chris@chris-wilson.co.uk
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 51dc276d 11-Jun-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Leave vma intact as they are discarded

If we find ourselves trying to reuse a misplaced but active vma, we
currently try to discard it to avoid having to wait to unbind it
(upsetting the current user fo the vma). An alternative to marking it as
a dicarded vma and keeping it in both the obj->vma.list and
obj->vma.tree, is to simply remove it from the lookup rbtree.

While it remains in the list of vma, it will be unbound under eviction
pressure and freed along with the object. We will never reuse it again
for new instances. As before, with no pruning, the list may continually
grow, but eventually we will have the most constrained version of the
ggtt view that meets all requirements -- so the list of vma should not
grow without bound.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2012
Fixes: 9bdcaa5e3a2f ("drm/i915: Discard a misplaced GGTT vma")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200611180421.23262-1-chris@chris-wilson.co.uk


# 84d24cb5 05-Jun-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Correct discard i915_vma_compare assertion

As a last minute addition, I added an assertion to make sure that the
new i915_vma view would be equal to the discard. However, the positive
encouragement from CI only goes to show that we rarely take this path,
and it wasn't until the post-merge run did we hit the assert -- because
it compared the wrong view. Fixup the copy'n'paste error and compare
against both the old view and the expected new view.

Fixes: 9bdcaa5e3a2f ("drm/i915: Discard a misplaced GGTT vma")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200605184844.24644-1-chris@chris-wilson.co.uk


# 9bdcaa5e 05-Jun-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Discard a misplaced GGTT vma

Across the many users of the GGTT vma (internal objects, mmapings,
display etc), we may end up with conflicting requirements for the
placement. Currently, we try to resolve the conflict by unbinding the
vma and rebinding it to match the new constraints; over time we will end
up with a GGTT that matches the most strict constraints over all
concurrent users. However, this causes a problem if the vma is currently
in use as we must wait until it is idle before moving it. But there is
no restriction on the number of views we may use (apart from the limited
size of the GGTT itself), and so if the active vma does not meet our
requirements, try and build a new one!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200605165258.1483-1-chris@chris-wilson.co.uk


# 9da0ea09 01-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Drop cached obj->bind_count

We cached the number of vma bound to the object in order to speed up
shrinker decisions. This has been superseded by being more proactive in
removing objects we cannot shrink from the shrinker lists, and so we can
drop the clumsy attempt at atomically counting the bind count and
comparing it to the number of pinned mappings of the object. This will
only get more clumsier with asynchronous binding and unbinding.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200401223924.16667-1-chris@chris-wilson.co.uk


# 0d86ee35 01-Apr-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Make fence revocation unequivocal

If we must revoke the fence because the VMA is no longer present, or
because the fence no longer applies, ensure that we do and convert it
into an error if we try but cannot.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200401210104.15907-3-chris@chris-wilson.co.uk


# dec9cf9e 16-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Pull restoration of GGTT fences underneath the GT

Make the GT responsible for restoring its fence when it wakes up from
suspend.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200316113846.4974-2-chris@chris-wilson.co.uk


# f899f786 16-Mar-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move GGTT fence registers under gt/

Since the fence registers control HW detiling through the GGTT
aperture, make them a part of the intel_ggtt under gt/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200316113846.4974-1-chris@chris-wilson.co.uk


# 83d2bdb6 25-Feb-2020 Jani Nikula <jani.nikula@intel.com>

drm/i915: significantly reduce the use of <drm/i915_drm.h>

The #include has been splattered all over the place, but there are
precious few places, all .c files, that actually need it.

v2: remove leftover double newlines

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200225133131.3301-1-jani.nikula@intel.com


# aa314619 02-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wean off drm_pci_alloc/drm_pci_free

drm_pci_alloc and drm_pci_free are just very thin wrappers around
dma_alloc_coherent, with a note that we should be removing them.
Furthermore since

commit de09d31dd38a50fdce106c15abd68432eebbd014
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Date: Fri Jan 15 16:51:42 2016 -0800

page-flags: define PG_reserved behavior on compound pages

As far as I can see there's no users of PG_reserved on compound pages.
Let's use PF_NO_COMPOUND here.

drm_pci_alloc has been declared broken since it mixes GFP_COMP and
SetPageReserved. Avoid this conflict by weaning ourselves off using the
abstraction and using the dma functions directly.

Reported-by: Taketo Kabe
Closes: https://gitlab.freedesktop.org/drm/intel/issues/1027
Fixes: de09d31dd38a ("page-flags: define PG_reserved behavior on compound pages")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v4.5+
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200202153934.3899472-1-chris@chris-wilson.co.uk
(cherry picked from commit c6790dc22312f592c1434577258b31c48c72d52a)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 051c89cf 22-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Detect overflow in calculating dumb buffer size

To multiply 2 u32 numbers to generate a u64 in C requires a bit of
forewarning for the compiler.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200123125934.1401755-1-chris@chris-wilson.co.uk
(cherry picked from commit 0f8f8a64300092852b9361cd835395ee71e6a7d6)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# f6c26b55 04-Feb-2020 Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>

drm/i915: Never allow userptr into the new mapping types

Commit 4f2a572eda67 ("drm/i915/userptr: Never allow userptr into the
mappable GGTT") made I915_GEM_MMAP_GTT IOCTLs to fail when attempted
on a userptr object in order to protect from a lockdep splat. Later
on, new mapping types were introduced by commit cc662126b413
("drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET"). Those new mapping
types suffer from the same lockdep splat issue but they now succeed
when tried on top of a userptr object. Fix it.

v2: Don't play with the -ENODEV driver response (Chris)

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200204162302.1299516-1-chris@chris-wilson.co.uk


# c6790dc2 02-Feb-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wean off drm_pci_alloc/drm_pci_free

drm_pci_alloc and drm_pci_free are just very thin wrappers around
dma_alloc_coherent, with a note that we should be removing them.
Furthermore since

commit de09d31dd38a50fdce106c15abd68432eebbd014
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Date: Fri Jan 15 16:51:42 2016 -0800

page-flags: define PG_reserved behavior on compound pages

As far as I can see there's no users of PG_reserved on compound pages.
Let's use PF_NO_COMPOUND here.

drm_pci_alloc has been declared broken since it mixes GFP_COMP and
SetPageReserved. Avoid this conflict by weaning ourselves off using the
abstraction and using the dma functions directly.

Reported-by: Taketo Kabe
Closes: https://gitlab.freedesktop.org/drm/intel/issues/1027
Fixes: de09d31dd38a ("page-flags: define PG_reserved behavior on compound pages")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org> # v4.5+
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20200202153934.3899472-1-chris@chris-wilson.co.uk


# e3793468 30-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use the async worker to avoid reclaim tainting the ggtt->mutex

On Braswell and Broxton (also known as Valleyview and Apollolake), we
need to serialise updates of the GGTT using the big stop_machine()
hammer. This has the side effect of appearing to lockdep as a possible
reclaim (since it uses the cpuhp mutex and that is tainted by per-cpu
allocations). However, we want to use vm->mutex (including ggtt->mutex)
from within the shrinker and so must avoid such possible taints. For this
purpose, we introduced the asynchronous vma binding and we can apply it
to the PIN_GLOBAL so long as take care to add the necessary waits for
the worker afterwards.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/211
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200130181710.2030251-3-chris@chris-wilson.co.uk


# e986209c 30-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Rename i915_gem_restore_ggtt_mappings() for its new placement

The i915_ggtt now sits beneath gt/ outside of the auspices of gem/ and
should be given a fresh name to reflect that. We also want to give it a
name that reflects its role in the system suspend/resume, with the
intention of pulling together all the GGTT operations (e.g. restoring
the fence registers once they are pulled under gt/intel_ggtt_detiler.c)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Rreviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200130181710.2030251-2-chris@chris-wilson.co.uk


# 0f8f8a64 22-Jan-2020 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Detect overflow in calculating dumb buffer size

To multiply 2 u32 numbers to generate a u64 in C requires a bit of
forewarning for the compiler.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200123125934.1401755-1-chris@chris-wilson.co.uk


# 48a1b8d4 14-Jan-2020 Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>

drm/i915: Make WARN* drm specific where drm_priv ptr is available

drm specific WARN* calls include device information in the
backtrace, so we know what device the warnings originate from.

Covert all the calls of WARN* with device specific drm_WARN*
variants in functions where drm_i915_private struct pointer is readily
available.

The conversion was done automatically with below coccinelle semantic
patch. checkpatch errors/warnings are fixed manually.

@rule1@
identifier func, T;
@@
func(...) {
...
struct drm_i915_private *T = ...;
<+...
(
-WARN(
+drm_WARN(&T->drm,
...)
|
-WARN_ON(
+drm_WARN_ON(&T->drm,
...)
|
-WARN_ONCE(
+drm_WARN_ONCE(&T->drm,
...)
|
-WARN_ON_ONCE(
+drm_WARN_ON_ONCE(&T->drm,
...)
)
...+>
}

@rule2@
identifier func, T;
@@
func(struct drm_i915_private *T,...) {
<+...
(
-WARN(
+drm_WARN(&T->drm,
...)
|
-WARN_ON(
+drm_WARN_ON(&T->drm,
...)
|
-WARN_ONCE(
+drm_WARN_ONCE(&T->drm,
...)
|
-WARN_ON_ONCE(
+drm_WARN_ON_ONCE(&T->drm,
...)
)
...+>
}

command: ls drivers/gpu/drm/i915/*.c | xargs spatch --sp-file \
<script> --linux-spacing --in-place

Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200115034455.17658-10-pankaj.laxminarayan.bharadiya@intel.com


# 05e8a5f5 04-Jan-2020 Ramalingam C <ramalingam.c@intel.com>

drm/i915: Create dumb buffer from LMEM

When LMEM is supported, dumb buffer preferred to be created from LMEM.

v2:
Parameters are reshuffled. [Chris]
v3:
s/region_id/mem_type
v4:
use the i915_gem_object_create_region [chris]

Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200104191043.2207314-2-chris@chris-wilson.co.uk


# 4b0dd4a2 30-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/selftests: Flush the context worker

When cleaning up the mock device, remember to flush the context worker
to free the residual GEM contexts before shutting down the device.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/802
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191230165821.3840449-1-chris@chris-wilson.co.uk


# 76f9764c 22-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Introduce a vma.kref

Start introducing a kref on i915_vma in order to protect the vma unbind
(i915_gem_object_unbind) from a parallel destruction (i915_vma_parked).
Later, we will use the refcount to manage all access and turn i915_vma
into a first class container.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Acked-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191222210256.2066451-2-chris@chris-wilson.co.uk


# f5af1659 22-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add a simple is-bound check before unbinding

Only acquire the various atomic references required to unbind the vma if
we do need to unbind the vma.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191222210256.2066451-1-chris@chris-wilson.co.uk


# e85ade1f 18-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Hold reference to intel_frontbuffer as we track activity

Since obj->frontbuffer is no longer protected by the struct_mutex, as we
are processing the execbuf, it may be removed. Mark the
intel_frontbuffer as rcu protected, and so acquire a reference to
the struct as we track activity upon it.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/827
Fixes: 8e7cb1799b4f ("drm/i915: Extract intel_frontbuffer active tracking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191218104043.3539458-1-chris@chris-wilson.co.uk
(cherry picked from commit da42104f589d979bbe402703fd836cec60befae1)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# e26b6d43 21-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Pull GT initialisation under intel_gt_init()

Begin pulling the GT setup underneath a single GT umbrella; let intel_gt
take ownership of its engines! As hinted, the complication is the
lifetime of the probed engine versus the active lifetime of the GT
backends. We need to detect the engine layout early and keep it until
the end so that we can sanitize state on takeover and release.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Acked-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191222120752.1368352-1-chris@chris-wilson.co.uk


# 78be2c30 21-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move i915_gem_init_contexts() earlier

As the GEM global context setup is now independent of the GT state
(although GT does currently still depend upon the global
i915->kernel_context), we can move its init earlier, leaving the gt init
ready to be extracted.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191221200109.1202310-1-chris@chris-wilson.co.uk


# e6ba7648 21-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove i915->kernel_context

Allocate only an internal intel_context for the kernel_context, forgoing
a global GEM context for internal use as we only require a separate
address space (for our own protection).

Now having weaned GT from requiring ce->gem_context, we can stop
referencing it entirely. This also means we no longer have to create random
and unnecessary GEM contexts for internal use.

GEM contexts are now entirely for tracking GEM clients, and intel_context
the execution environment on the GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Acked-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191221160324.1073045-1-chris@chris-wilson.co.uk


# 9f3ccd40 20-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop GEM context as a direct link from i915_request

Keep the intel_context as being the primary state for i915_request, with
the GEM context a backpointer from the low level state for the rarer
cases we need client information. Our goal is to remove such references
to clients from the backend, and leave the HW submission agnostic to
client interfaces and self-contained.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-1-chris@chris-wilson.co.uk


# da42104f 18-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Hold reference to intel_frontbuffer as we track activity

Since obj->frontbuffer is no longer protected by the struct_mutex, as we
are processing the execbuf, it may be removed. Mark the
intel_frontbuffer as rcu protected, and so acquire a reference to
the struct as we track activity upon it.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/827
Fixes: 8e7cb1799b4f ("drm/i915: Extract intel_frontbuffer active tracking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191218104043.3539458-1-chris@chris-wilson.co.uk


# 750bde2f 21-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Serialise with remote retirement

Since retirement may be running in a worker on another CPU, it may be
skipped in the local intel_gt_wait_for_idle(). To ensure the state is
consistent for our sanity checks upon load, serialise with the remote
retirer by waiting on the timeline->mutex.

Outside of this use case, e.g. on suspend or module unload, we expect the
slack to be picked up by intel_gt_pm_wait_for_idle() and so prefer to
put the special case serialisation with retirement in its single user,
for now at least.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121071044.97798-2-chris@chris-wilson.co.uk
(cherry picked from commit 2d0fb251360ab7eccbffd99f6933a2a4de678d52)
Fixes: 093b92287363 ("drm/i915: Split i915_active.mutex into an irq-safe spinlock for the rbtree")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/754
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 16c46fd5 08-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Avoid rcu_barrier() from shrinker paths

As i915_gem_object_unbind() waits on an rcu_barrier() to flush vm
releases (and destruction of their bound vma), we have to be careful not
to invoke that barrier from beneath the shrinker:

<4> [430.222671] WARNING: possible circular locking dependency detected
<4> [430.222673] 5.4.0-rc8-CI-CI_DRM_7508+ #1 Tainted: G U
<4> [430.222675] ------------------------------------------------------
<4> [430.222677] gem_pwrite/2317 is trying to acquire lock:
<4> [430.222678] ffffffff82248218 (rcu_state.barrier_mutex){+.+.}, at: rcu_barrier+0x23/0x190
<4> [430.222685]
but task is already holding lock:
<4> [430.222687] ffffffff82263a40 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.117+0x0/0x30
<4> [430.222691]
which lock already depends on the new lock.

<4> [430.222693]
the existing dependency chain (in reverse order) is:
<4> [430.222695]
-> #2 (fs_reclaim){+.+.}:
<4> [430.222698] fs_reclaim_acquire.part.117+0x24/0x30
<4> [430.222702] kmem_cache_alloc_trace+0x2a/0x2c0
<4> [430.222705] intel_cpuc_prepare+0x37/0x1a0
<4> [430.222709] cpuhp_invoke_callback+0x9b/0x9d0
<4> [430.222712] _cpu_up+0xa2/0x140
<4> [430.222714] do_cpu_up+0x61/0xa0
<4> [430.222718] smp_init+0x57/0x96
<4> [430.222722] kernel_init_freeable+0xac/0x1c7
<4> [430.222725] kernel_init+0x5/0x100
<4> [430.222728] ret_from_fork+0x24/0x50
<4> [430.222729]
-> #1 (cpu_hotplug_lock.rw_sem){++++}:
<4> [430.222733] cpus_read_lock+0x34/0xd0
<4> [430.222734] rcu_barrier+0xaa/0x190
<4> [430.222736] kernel_init+0x21/0x100
<4> [430.222737] ret_from_fork+0x24/0x50
<4> [430.222739]
-> #0 (rcu_state.barrier_mutex){+.+.}:
<4> [430.222742] __lock_acquire+0x1328/0x15d0
<4> [430.222743] lock_acquire+0xa7/0x1c0
<4> [430.222746] __mutex_lock+0x9a/0x9d0
<4> [430.222747] rcu_barrier+0x23/0x190
<4> [430.222850] i915_gem_object_unbind+0x264/0x3d0 [i915]
<4> [430.222882] i915_gem_shrink+0x297/0x5f0 [i915]
<4> [430.222912] i915_gem_shrink_all+0x38/0x60 [i915]
<4> [430.222934] i915_drop_caches_set+0x1f0/0x240 [i915]
<4> [430.222938] simple_attr_write+0xb0/0xd0
<4> [430.222941] full_proxy_write+0x51/0x80
<4> [430.222943] vfs_write+0xb9/0x1d0
<4> [430.222944] ksys_write+0x9f/0xe0
<4> [430.222946] do_syscall_64+0x4f/0x210
<4> [430.222948] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [430.222950]
other info that might help us debug this:

<4> [430.222952] Chain exists of:
rcu_state.barrier_mutex --> cpu_hotplug_lock.rw_sem --> fs_reclaim

<4> [430.222955] Possible unsafe locking scenario:

<4> [430.222957] CPU0 CPU1
<4> [430.222958] ---- ----
<4> [430.222960] lock(fs_reclaim);
<4> [430.222961] lock(cpu_hotplug_lock.rw_sem);
<4> [430.222963] lock(fs_reclaim);
<4> [430.222964] lock(rcu_state.barrier_mutex);
<4> [430.222966]
*** DEADLOCK ***

<4> [430.222968] 3 locks held by gem_pwrite/2317:
<4> [430.222969] #0: ffff88849e2d9408 (sb_writers#14){.+.+}, at: vfs_write+0x1a4/0x1d0
<4> [430.222973] #1: ffff888496976db0 (&attr->mutex){+.+.}, at: simple_attr_write+0x36/0xd0
<4> [430.222976] #2: ffffffff82263a40 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.117+0x0/0x30
<4> [430.222980]
stack backtrace:
<4> [430.222982] CPU: 1 PID: 2317 Comm: gem_pwrite Tainted: G U 5.4.0-rc8-CI-CI_DRM_7508+ #1
<4> [430.222985] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.2321.A08.1909162051 09/16/2019
<4> [430.222989] Call Trace:
<4> [430.222992] dump_stack+0x71/0x9b
<4> [430.222995] check_noncircular+0x19b/0x1c0
<4> [430.222998] ? __lock_acquire+0x1328/0x15d0
<4> [430.222999] __lock_acquire+0x1328/0x15d0
<4> [430.223001] ? mark_held_locks+0x49/0x70
<4> [430.223003] lock_acquire+0xa7/0x1c0
<4> [430.223005] ? rcu_barrier+0x23/0x190
<4> [430.223008] __mutex_lock+0x9a/0x9d0
<4> [430.223009] ? rcu_barrier+0x23/0x190
<4> [430.223011] ? rcu_barrier+0x23/0x190
<4> [430.223013] ? find_held_lock+0x2d/0x90
<4> [430.223045] ? i915_gem_object_unbind+0x24a/0x3d0 [i915]
<4> [430.223048] ? rcu_barrier+0x23/0x190
<4> [430.223049] rcu_barrier+0x23/0x190
<4> [430.223081] i915_gem_object_unbind+0x264/0x3d0 [i915]
<4> [430.223119] i915_gem_shrink+0x297/0x5f0 [i915]

Closes: https://gitlab.freedesktop.org/drm/intel/issues/743
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191208161252.3015727-1-chris@chris-wilson.co.uk


# 1a74934b 06-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Flush the pwrite through the chipset before signaling

Before we signal the fence to indicate completion, ensure the pwrite
through the indirect GGTT is coherent (as best as we know) in memory.
Any listeners to the fence may start immediately and sample from the
backing store prior to the writes being posted, thus seeing stale data.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191206105527.1130413-1-chris@chris-wilson.co.uk


# 5c4fe63a 05-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Reinitialise the local list before repeating

As we may start the loop again, we require our local list of i915_vma
we've processed to be reinitialised.

Fixes: aa5e4453dc05 ("drm/i915/gem: Try to flush pending unbind events")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/731
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191205132912.606868-1-chris@chris-wilson.co.uk


# aa5e4453 03-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Try to flush pending unbind events

If we cannot handle a vma within the unbind loop, try to flush the
pending events (i915_vma_parked, i915_vm_release) and try again. This
avoids a round trip to userspace that is not guaranteed to make forward
progress, as the events we wait upon require being idle.

References: cb6c3d45f948 ("drm/i915/gem: Avoid parking the vma as we unbind")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191204123556.3740002-1-chris@chris-wilson.co.uk


# cc662126 03-Dec-2019 Abdiel Janulgue <abdiel.janulgue@linux.intel.com>

drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET

This is really just an alias of mmap_gtt. The 'mmap offset' nomenclature
comes from the value returned by this ioctl which is the offset into the
device fd which userpace uses with mmap(2).

mmap_gtt was our initial mmap_offset implementation, this extends
our CPU mmap support to allow additional fault handlers that depends on
the object's backing pages.

Note that we multiplex mmap_gtt and mmap_offset through the same ioctl,
and use the zero extending behaviour of drm to differentiate between
them, when we inspect the flags.

To support multiple mmap types on an object we need to support multiple
mmap_offsets for an object (each offset in the global device address
space corresponding to a unique instance of the object for a file + mmap
type). As we drop the simplified drm core idea of a single mmap_offset,
we need to provide replacement hooks for the dumb mmap interface as
well.

Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1675
Testcase: igt/gem_mmap_offset
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191204120032.3682839-1-chris@chris-wilson.co.uk


# cb6c3d45 03-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Avoid parking the vma as we unbind

In order to avoid keeping a reference on the i915_vma (which is long
overdue!) we have to coordinate all the possible lifetimes and only use
the vma while we know it is alive. In this episode, we are reminded that
while idle, the closed vma are destroyed. So if the GT idles while we are
working with the vma, the vma itself becomes invalid.

First class i915_vma here we come, but in the meantime keep piling on
the straw.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191203155032.3137263-1-chris@chris-wilson.co.uk


# 42d10511 02-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Lift i915_vma_pin() out of intel_renderstate_emit()

Once inside a request, inside the timeline->mutex, pinning is verboten.

<4> [896.032829] ======================================================
<4> [896.032831] WARNING: possible circular locking dependency detected
<4> [896.032835] 5.4.0-rc8-CI-Patchwork_15533+ #1 Tainted: G U
<4> [896.032838] ------------------------------------------------------
<4> [896.032841] gem_exec_parall/3720 is trying to acquire lock:
<4> [896.032844] ffff888401863270 (&kernel#2){+.+.}, at: i915_request_create+0x16/0x1c0 [i915]
<4> [896.032915]
but task is already holding lock:
<4> [896.032917] ffff8883ec1c93c0 (&vm->mutex){+.+.}, at: i915_vma_pin+0xf3/0x11c0 [i915]
<4> [896.032952]
which lock already depends on the new lock.

<4> [896.032954]
the existing dependency chain (in reverse order) is:
<4> [896.032956]
-> #1 (&vm->mutex){+.+.}:
<4> [896.032961] __mutex_lock+0x9a/0x9d0
<4> [896.032995] i915_vma_pin+0xf3/0x11c0 [i915]
<4> [896.033033] intel_renderstate_emit+0xb9/0x9e0 [i915]
<4> [896.033081] i915_gem_init+0x5a9/0xa50 [i915]
<4> [896.033112] i915_driver_probe+0xb00/0x15f0 [i915]
<4> [896.033144] i915_pci_probe+0x43/0x1c0 [i915]
<4> [896.033149] pci_device_probe+0x9e/0x120
<4> [896.033154] really_probe+0xea/0x420
<4> [896.033158] driver_probe_device+0x10b/0x120
<4> [896.033161] device_driver_attach+0x4a/0x50
<4> [896.033164] __driver_attach+0x97/0x130
<4> [896.033168] bus_for_each_dev+0x74/0xc0
<4> [896.033171] bus_add_driver+0x142/0x220
<4> [896.033174] driver_register+0x56/0xf0
<4> [896.033178] do_one_initcall+0x58/0x2ff
<4> [896.033183] do_init_module+0x56/0x1f8
<4> [896.033187] load_module+0x243e/0x29f0
<4> [896.033190] __do_sys_finit_module+0xe9/0x110
<4> [896.033194] do_syscall_64+0x4f/0x210
<4> [896.033197] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [896.033200]
-> #0 (&kernel#2){+.+.}:
<4> [896.033206] __lock_acquire+0x1328/0x15d0
<4> [896.033209] lock_acquire+0xa7/0x1c0
<4> [896.033213] __mutex_lock+0x9a/0x9d0
<4> [896.033255] i915_request_create+0x16/0x1c0 [i915]
<4> [896.033287] intel_engine_flush_barriers+0x4c/0x100 [i915]
<4> [896.033327] ggtt_flush+0x37/0x60 [i915]
<4> [896.033366] i915_gem_evict_something+0x46b/0x5a0 [i915]
<4> [896.033407] i915_gem_gtt_insert+0x21d/0x6a0 [i915]
<4> [896.033449] i915_vma_pin+0xb36/0x11c0 [i915]
<4> [896.033488] gen6_ppgtt_pin+0xd5/0x170 [i915]
<4> [896.033523] ring_context_pin+0x2e/0xc0 [i915]
<4> [896.033554] __intel_context_do_pin+0x6b/0x190 [i915]
<4> [896.033591] i915_gem_do_execbuffer+0x1814/0x26c0 [i915]
<4> [896.033627] i915_gem_execbuffer2_ioctl+0x11b/0x460 [i915]
<4> [896.033632] drm_ioctl_kernel+0xa7/0xf0
<4> [896.033635] drm_ioctl+0x2e1/0x390
<4> [896.033638] do_vfs_ioctl+0xa0/0x6f0
<4> [896.033641] ksys_ioctl+0x35/0x60
<4> [896.033644] __x64_sys_ioctl+0x11/0x20
<4> [896.033647] do_syscall_64+0x4f/0x210
<4> [896.033650] entry_SYSCALL_64_after_hwframe+0x49/0xbe

Lift the object allocation and pin prior to the request construction.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191202204316.2665847-1-chris@chris-wilson.co.uk


# 3e817471 03-Dec-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Take runtime-pm wakeref prior to unbinding

Some machines require ACPI for runtime resume, and ACPI is quite kmalloc
happy. We cannot handle kmalloc from inside the vm->mutex, as they are
used by the shrinker, and so we must ensure the global runtime-pm is
awake prior to unbinding to avoid the potential inversion.

<4> [57.121748] ======================================================
<4> [57.121750] WARNING: possible circular locking dependency detected
<4> [57.121753] 5.4.0-rc8-CI-CI_DRM_7466+ #1 Tainted: G U
<4> [57.121754] ------------------------------------------------------
<4> [57.121756] i915_pm_rpm/1105 is trying to acquire lock:
<4> [57.121758] ffffffff82263a40 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.117+0x0/0x30
<4> [57.121766]
but task is already holding lock:
<4> [57.121768] ffff888475a593c0 (&vm->mutex){+.+.}, at: i915_vma_unbind+0x21/0x50 [i915]
<4> [57.121868]
which lock already depends on the new lock.

<4> [57.121869]
the existing dependency chain (in reverse order) is:
<4> [57.121871]
-> #1 (&vm->mutex){+.+.}:
<4> [57.121951] i915_gem_shrinker_taints_mutex+0xa2/0xd0 [i915]
<4> [57.122028] i915_address_space_init+0xa9/0x170 [i915]
<4> [57.122104] i915_ggtt_init_hw+0x47/0x130 [i915]
<4> [57.122150] i915_driver_probe+0xbb4/0x15f0 [i915]
<4> [57.122197] i915_pci_probe+0x43/0x1c0 [i915]
<4> [57.122202] pci_device_probe+0x9e/0x120
<4> [57.122206] really_probe+0xea/0x420
<4> [57.122209] driver_probe_device+0x10b/0x120
<4> [57.122212] device_driver_attach+0x4a/0x50
<4> [57.122214] __driver_attach+0x97/0x130
<4> [57.122217] bus_for_each_dev+0x74/0xc0
<4> [57.122220] bus_add_driver+0x142/0x220
<4> [57.122222] driver_register+0x56/0xf0
<4> [57.122226] do_one_initcall+0x58/0x2ff
<4> [57.122230] do_init_module+0x56/0x1f8
<4> [57.122233] load_module+0x243e/0x29f0
<4> [57.122236] __do_sys_finit_module+0xe9/0x110
<4> [57.122239] do_syscall_64+0x4f/0x210
<4> [57.122242] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [57.122244]
-> #0 (fs_reclaim){+.+.}:
<4> [57.122249] __lock_acquire+0x1328/0x15d0
<4> [57.122251] lock_acquire+0xa7/0x1c0
<4> [57.122254] fs_reclaim_acquire.part.117+0x24/0x30
<4> [57.122257] __kmalloc+0x48/0x320
<4> [57.122261] acpi_ns_internalize_name+0x44/0x9b
<4> [57.122264] acpi_ns_get_node_unlocked+0x6b/0xd3
<4> [57.122267] acpi_ns_get_node+0x3b/0x50
<4> [57.122271] acpi_get_handle+0x8a/0xb4
<4> [57.122274] acpi_has_method+0x1c/0x40
<4> [57.122278] acpi_pci_set_power_state+0x40/0xe0
<4> [57.122281] pci_platform_power_transition+0x3e/0x90
<4> [57.122284] pci_set_power_state+0x83/0xf0
<4> [57.122287] pci_restore_standard_config+0x22/0x40
<4> [57.122289] pci_pm_runtime_resume+0x23/0xc0
<4> [57.122293] __rpm_callback+0xb1/0x110
<4> [57.122296] rpm_callback+0x1a/0x70
<4> [57.122299] rpm_resume+0x50e/0x790
<4> [57.122302] __pm_runtime_resume+0x42/0x80
<4> [57.122357] __intel_runtime_pm_get+0x15/0x60 [i915]
<4> [57.122435] ggtt_unbind_vma+0x24/0x60 [i915]
<4> [57.122514] __i915_vma_unbind.part.39+0xb5/0x500 [i915]
<4> [57.122593] i915_vma_unbind+0x2d/0x50 [i915]
<4> [57.122668] i915_gem_object_unbind+0x11c/0x260 [i915]
<4> [57.122740] i915_gem_object_set_cache_level+0x32/0x90 [i915]
<4> [57.122810] i915_gem_set_caching_ioctl+0x1f7/0x2f0 [i915]
<4> [57.122815] drm_ioctl_kernel+0xa7/0xf0
<4> [57.122818] drm_ioctl+0x2e1/0x390
<4> [57.122822] do_vfs_ioctl+0xa0/0x6f0
<4> [57.122825] ksys_ioctl+0x35/0x60
<4> [57.122828] __x64_sys_ioctl+0x11/0x20
<4> [57.122830] do_syscall_64+0x4f/0x210
<4> [57.122833] entry_SYSCALL_64_after_hwframe+0x49/0xbe

Closes: https://gitlab.freedesktop.org/drm/intel/issues/711
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191203101347.2836057-1-chris@chris-wilson.co.uk


# 2d0fb251 21-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Serialise with remote retirement

Since retirement may be running in a worker on another CPU, it may be
skipped in the local intel_gt_wait_for_idle(). To ensure the state is
consistent for our sanity checks upon load, serialise with the remote
retirer by waiting on the timeline->mutex.

Outside of this use case, e.g. on suspend or module unload, we expect the
slack to be picked up by intel_gt_pm_wait_for_idle() and so prefer to
put the special case serialisation with retirement in its single user,
for now at least.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121071044.97798-2-chris@chris-wilson.co.uk


# b291ce0a 15-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Purge the sudden reappearance of i915_gem_object_pin()

This died many years ago as we now use i915_vma first and foremost.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191115170835.1367869-1-chris@chris-wilson.co.uk


# 4f7af194 22-May-2018 Jon Bloomfield <jon.bloomfield@intel.com>

drm/i915: Support ro ppgtt mapped cmdparser shadow buffers

For Gen7, the original cmdparser motive was to permit limited
use of register read/write instructions in unprivileged BB's.
This worked by copying the user supplied bb to a kmd owned
bb, and running it in secure mode, from the ggtt, only if
the scanner finds no unsafe commands or registers.

For Gen8+ we can't use this same technique because running bb's
from the ggtt also disables access to ppgtt space. But we also
do not actually require 'secure' execution since we are only
trying to reduce the available command/register set. Instead we
will copy the user buffer to a kmd owned read-only bb in ppgtt,
and run in the usual non-secure mode.

Note that ro pages are only supported by ppgtt (not ggtt), but
luckily that's exactly what we need.

Add the required paths to map the shadow buffer to ppgtt ro for Gen8+

v2: IS_GEN7/IS_GEN (Mika)
v3: rebase
v4: rebase
v5: rebase

Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris.p.wilson@intel.com>


# fd6fe087 01-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Call intel_gt_sanitize() directly

Assume all responsibility for operating on the HW to sanitize the GT
state upon load/resume in intel_gt_sanitize() itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191101141009.15581-1-chris@chris-wilson.co.uk
(cherry picked from commit 797a615357ac0feb79c9ce41f5eaac3eb738a51f)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 797a6153 01-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Call intel_gt_sanitize() directly

Assume all responsibility for operating on the HW to sanitize the GT
state upon load/resume in intel_gt_sanitize() itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191101141009.15581-1-chris@chris-wilson.co.uk


# 4605bb73 01-Nov-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Pull timeline initialise to intel_gt_init_early

Our timelines are currently contained within an intel_gt, and we only
need to perform list/spinlock initialisation, so we can pull the
intel_timelines_init() into our intel_gt_init_early().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191101130406.4142-1-chris@chris-wilson.co.uk


# dd6e38df 29-Oct-2019 Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>

drm/i915: Fix i915_inject_load_error() name to read *_probe_*

Commit 50d84418f586 ("drm/i915: Add i915 to i915_inject_probe_failure")
introduced new functions unfortunately named incompatibly with rules
established by commit f2db53f14d3d ("drm/i915: Replace "_load" with
"_probe" consequently"). Fix it for consistency.

Suggested-by: Michał Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: Michał Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191029102036.6326-2-janusz.krzysztofik@linux.intel.com


# 3e7abf81 24-Oct-2019 Andi Shyti <andi@etezian.org>

drm/i915: Extract GT render power state management

i915_irq.c is large. One reason for this is that has a large chunk of
the GT render power management stashed away in it. Extract that logic
out of i915_irq.c and intel_pm.c and put it under one roof.

Based on a patch by Chris Wilson.

Signed-off-by: Andi Shyti <andi.shyti@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191024211642.7688-1-chris@chris-wilson.co.uk


# ae2e28b0 22-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Teach record_defaults to operate on the intel_gt

Again we wish to operate on the engines, which are owned by the
intel_gt. As such it is easier, and much more consistent, to pass the
intel_gt parameter.

v2: Unexport i915_gem_load_power_context()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191022141935.15733-1-chris@chris-wilson.co.uk


# 7f63aa23 22-Oct-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Pass intel_gt to intel_engines_verify_workarounds

Engines belong to the GT so make it indicative in the API.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191022094726.3001-7-tvrtko.ursulin@linux.intel.com


# 7841fcbd 22-Oct-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Pass intel_gt to intel_engines_init

Engines belong to the GT so make it indicative in the API.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191022094726.3001-6-tvrtko.ursulin@linux.intel.com


# 78f60603 22-Oct-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Pass intel_gt to intel_engines_setup

Engines belong to the GT so make it indicative in the API.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191022094726.3001-5-tvrtko.ursulin@linux.intel.com


# b0258bf2 22-Oct-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Pass intel_gt to intel_engines_cleanup

Engines belong to the GT so make it indicative in the API.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191022094726.3001-4-tvrtko.ursulin@linux.intel.com


# 18f3b272 21-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove pm park/unpark notifications

With the last user, i915_vma_parked(), retired, there are no more users
of the per-gt pm notifications and we can remove the unused
infrastructure.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191021183236.21790-2-chris@chris-wilson.co.uk


# da1184cd 18-Oct-2019 Matthew Auld <matthew.auld@intel.com>

drm/i915: treat shmem as a region

Convert shmem to an intel_memory_region.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191018090751.28295-2-matthew.auld@intel.com


# e9d4c924 16-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Store i915_ggtt as the backpointer on fence registers

Now that i915_ggtt knows everything about its own paths to perform mmio,
we can use that as our primary backpointer for individual fence
registers. This reduces the amount of pointer dancing we have to perform
on the common paths, but more importantly finishes our fence register
encapsulation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191016143234.4075-1-chris@chris-wilson.co.uk


# eca0b720 16-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Do initial mocs configuration directly

Now that we record the default "goldenstate" context, we do not need to
emit the mocs registers at the start of each context and can simply do
mmio before the first context and capture the registers as part of its
default image. As a consequence, this means that we repeat the mmio
after each engine reset, fixing up any platform and registers that were
zapped by the reset (for those platforms with global not context-saved
settings).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111723
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111645
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Reviewed-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191016090749.7092-1-chris@chris-wilson.co.uk


# 4f2a572e 28-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/userptr: Never allow userptr into the mappable GGTT

Daniel Vetter uncovered a nasty cycle in using the mmu-notifiers to
invalidate userptr objects which also happen to be pulled into GGTT
mmaps. That is when we unbind the userptr object (on mmu invalidation),
we revoke all CPU mmaps, which may then recurse into mmu invalidation.

We looked for ways of breaking the cycle, but the revocation on
invalidation is required and cannot be avoided. The only solution we
could see was to not allow such GGTT bindings of userptr objects in the
first place. In practice, no one really wants to use a GGTT mmapping of
a CPU pointer...

Just before Daniel's explosive lockdep patches land in v5.4-rc1, we got
a genuine blip from CI:

<4>[ 246.793958] ======================================================
<4>[ 246.793972] WARNING: possible circular locking dependency detected
<4>[ 246.793989] 5.3.0-gbd6c56f50d15-drmtip_372+ #1 Tainted: G U
<4>[ 246.794003] ------------------------------------------------------
<4>[ 246.794017] kswapd0/145 is trying to acquire lock:
<4>[ 246.794030] 000000003f565be6 (&dev->struct_mutex/1){+.+.}, at: userptr_mn_invalidate_range_start+0x18f/0x220 [i915]
<4>[ 246.794250]
but task is already holding lock:
<4>[ 246.794263] 000000001799cef9 (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe6/0x2a0
<4>[ 246.794291]
which lock already depends on the new lock.

<4>[ 246.794307]
the existing dependency chain (in reverse order) is:
<4>[ 246.794322]
-> #3 (&anon_vma->rwsem){++++}:
<4>[ 246.794344] down_write+0x33/0x70
<4>[ 246.794357] __vma_adjust+0x3d9/0x7b0
<4>[ 246.794370] __split_vma+0x16a/0x180
<4>[ 246.794385] mprotect_fixup+0x2a5/0x320
<4>[ 246.794399] do_mprotect_pkey+0x208/0x2e0
<4>[ 246.794413] __x64_sys_mprotect+0x16/0x20
<4>[ 246.794429] do_syscall_64+0x55/0x1c0
<4>[ 246.794443] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 246.794456]
-> #2 (&mapping->i_mmap_rwsem){++++}:
<4>[ 246.794478] down_write+0x33/0x70
<4>[ 246.794493] unmap_mapping_pages+0x48/0x130
<4>[ 246.794519] i915_vma_revoke_mmap+0x81/0x1b0 [i915]
<4>[ 246.794519] i915_vma_unbind+0x11d/0x4a0 [i915]
<4>[ 246.794519] i915_vma_destroy+0x31/0x300 [i915]
<4>[ 246.794519] __i915_gem_free_objects+0xb8/0x4b0 [i915]
<4>[ 246.794519] drm_file_free.part.0+0x1e6/0x290
<4>[ 246.794519] drm_release+0xa6/0xe0
<4>[ 246.794519] __fput+0xc2/0x250
<4>[ 246.794519] task_work_run+0x82/0xb0
<4>[ 246.794519] do_exit+0x35b/0xdb0
<4>[ 246.794519] do_group_exit+0x34/0xb0
<4>[ 246.794519] __x64_sys_exit_group+0xf/0x10
<4>[ 246.794519] do_syscall_64+0x55/0x1c0
<4>[ 246.794519] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 246.794519]
-> #1 (&vm->mutex){+.+.}:
<4>[ 246.794519] i915_gem_shrinker_taints_mutex+0x6d/0xe0 [i915]
<4>[ 246.794519] i915_address_space_init+0x9f/0x160 [i915]
<4>[ 246.794519] i915_ggtt_init_hw+0x55/0x170 [i915]
<4>[ 246.794519] i915_driver_probe+0xc9f/0x1620 [i915]
<4>[ 246.794519] i915_pci_probe+0x43/0x1b0 [i915]
<4>[ 246.794519] pci_device_probe+0x9e/0x120
<4>[ 246.794519] really_probe+0xea/0x3d0
<4>[ 246.794519] driver_probe_device+0x10b/0x120
<4>[ 246.794519] device_driver_attach+0x4a/0x50
<4>[ 246.794519] __driver_attach+0x97/0x130
<4>[ 246.794519] bus_for_each_dev+0x74/0xc0
<4>[ 246.794519] bus_add_driver+0x13f/0x210
<4>[ 246.794519] driver_register+0x56/0xe0
<4>[ 246.794519] do_one_initcall+0x58/0x300
<4>[ 246.794519] do_init_module+0x56/0x1f6
<4>[ 246.794519] load_module+0x25bd/0x2a40
<4>[ 246.794519] __se_sys_finit_module+0xd3/0xf0
<4>[ 246.794519] do_syscall_64+0x55/0x1c0
<4>[ 246.794519] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 246.794519]
-> #0 (&dev->struct_mutex/1){+.+.}:
<4>[ 246.794519] __lock_acquire+0x15d8/0x1e90
<4>[ 246.794519] lock_acquire+0xa6/0x1c0
<4>[ 246.794519] __mutex_lock+0x9d/0x9b0
<4>[ 246.794519] userptr_mn_invalidate_range_start+0x18f/0x220 [i915]
<4>[ 246.794519] __mmu_notifier_invalidate_range_start+0x85/0x110
<4>[ 246.794519] try_to_unmap_one+0x76b/0x860
<4>[ 246.794519] rmap_walk_anon+0x104/0x280
<4>[ 246.794519] try_to_unmap+0xc0/0xf0
<4>[ 246.794519] shrink_page_list+0x561/0xc10
<4>[ 246.794519] shrink_inactive_list+0x220/0x440
<4>[ 246.794519] shrink_node_memcg+0x36e/0x740
<4>[ 246.794519] shrink_node+0xcb/0x490
<4>[ 246.794519] balance_pgdat+0x241/0x580
<4>[ 246.794519] kswapd+0x16c/0x530
<4>[ 246.794519] kthread+0x119/0x130
<4>[ 246.794519] ret_from_fork+0x24/0x50
<4>[ 246.794519]
other info that might help us debug this:

<4>[ 246.794519] Chain exists of:
&dev->struct_mutex/1 --> &mapping->i_mmap_rwsem --> &anon_vma->rwsem

<4>[ 246.794519] Possible unsafe locking scenario:

<4>[ 246.794519] CPU0 CPU1
<4>[ 246.794519] ---- ----
<4>[ 246.794519] lock(&anon_vma->rwsem);
<4>[ 246.794519] lock(&mapping->i_mmap_rwsem);
<4>[ 246.794519] lock(&anon_vma->rwsem);
<4>[ 246.794519] lock(&dev->struct_mutex/1);
<4>[ 246.794519]
*** DEADLOCK ***

v2: Say no to mmap_ioctl

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111744
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111870
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190928082546.3473-1-chris@chris-wilson.co.uk
(cherry picked from commit a4311745bba9763e3c965643d4531bd5765b0513)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 78427933 04-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop struct_mutex from around GEM initialisation

We no longer need to placate lockdep by holding struct_mutex for our
initialisation, so don't.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-21-chris@chris-wilson.co.uk


# a4e7ccda 04-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move context management under GEM

Keep track of the GEM contexts underneath i915->gem.contexts and assign
them their own lock for the purposes of list management.

v2: Focus on lock tracking; ctx->vm is protected by ctx->mutex
v3: Correct split with removal of logical HW ID

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-15-chris@chris-wilson.co.uk


# 66101975 04-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move request runtime management onto gt

Requests are run from the gt and are tided into the gt runtime power
management, so pull the runtime request management under gt/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-12-chris@chris-wilson.co.uk


# f33a8a51 04-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Merge wait_for_timelines with retire_request

wait_for_timelines is essentially the same loop as retiring requests
(with an extra timeout), so merge the two into one routine.

v2: i915_retire_requests_timeout and keep VT'd w/a as !interruptible

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-10-chris@chris-wilson.co.uk


# 7e805762 04-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop struct_mutex from around i915_retire_requests()

We don't need to hold struct_mutex now for retiring requests, so drop it
from i915_retire_requests() and i915_gem_wait_for_idle(), finally
removing I915_WAIT_LOCKED for good.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-8-chris@chris-wilson.co.uk


# b1e3177b 04-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Coordinate i915_active with its own mutex

Forgo the struct_mutex serialisation for i915_active, and interpose its
own mutex handling for active/retire.

This is a multi-layered sleight-of-hand. First, we had to ensure that no
active/retire callbacks accidentally inverted the mutex ordering rules,
nor assumed that they were themselves serialised by struct_mutex. More
challenging though, is the rule over updating elements of the active
rbtree. Instead of the whole i915_active now being serialised by
struct_mutex, allocations/rotations of the tree are serialised by the
i915_active.mutex and individual nodes are serialised by the caller
using the i915_timeline.mutex (we need to use nested spinlocks to
interact with the dma_fence callback lists).

The pain point here is that instead of a single mutex around execbuf, we
now have to take a mutex for active tracker (one for each vma, context,
etc) and a couple of spinlocks for each fence update. The improvement in
fine grained locking allowing for multiple concurrent clients
(eventually!) should be worth it in typical loads.

v2: Add some comments that barely elucidate anything :(

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-6-chris@chris-wilson.co.uk


# 2850748e 04-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pull i915_vma_pin under the vm->mutex

Replace the struct_mutex requirement for pinning the i915_vma with the
local vm->mutex instead. Note that the vm->mutex is tainted by the
shrinker (we require unbinding from inside fs-reclaim) and so we cannot
allocate while holding that mutex. Instead we have to preallocate
workers to do allocate and apply the PTE updates after we have we
reserved their slot in the drm_mm (using fences to order the PTE writes
with the GPU work and with later unbind).

In adding the asynchronous vma binding, one subtle requirement is to
avoid coupling the binding fence into the backing object->resv. That is
the asynchronous binding only applies to the vma timeline itself and not
to the pages as that is a more global timeline (the binding of one vma
does not need to be ordered with another vma, nor does the implicit GEM
fencing depend on a vma, only on writes to the backing store). Keeping
the vma binding distinct from the backing store timelines is verified by
a number of async gem_exec_fence and gem_exec_schedule tests. The way we
do this is quite simple, we keep the fence for the vma binding separate
and only wait on it as required, and never add it to the obj->resv
itself.

Another consequence in reducing the locking around the vma is the
destruction of the vma is no longer globally serialised by struct_mutex.
A natural solution would be to add a kref to i915_vma, but that requires
decoupling the reference cycles, possibly by introducing a new
i915_mm_pages object that is own by both obj->mm and vma->pages.
However, we have not taken that route due to the overshadowing lmem/ttm
discussions, and instead play a series of complicated games with
trylocks to (hopefully) ensure that only one destruction path is called!

v2: Add some commentary, and some helpers to reduce patch churn.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk


# b290a78b 03-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use helpers for drm_mm_node booleans

A subset of 71724f708997 ("drm/mm: Use helpers for drm_mm_node booleans")
in order to prepare drm-intel-next-queued for subsequent patches before
we can backmerge 71724f708997 itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004142226.13711-1-chris@chris-wilson.co.uk


# 4ee92c71 03-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/mm: Convert drm_mm_node booleans to bitops

A straightforward conversion of assignment and checking of the boolean
state flags (allocated, scanned) into non-atomic bitops. The caller
remains responsible for all locking around the drm_mm and its nodes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191003210100.22250-4-chris@chris-wilson.co.uk


# 71724f70 03-Oct-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/mm: Use helpers for drm_mm_node booleans

In preparation for rearranging the booleans into a flags field, ensure
all the current users are using the inline helpers and not directly
accessing the members.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191003210100.22250-3-chris@chris-wilson.co.uk


# a4311745 28-Sep-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/userptr: Never allow userptr into the mappable GGTT

Daniel Vetter uncovered a nasty cycle in using the mmu-notifiers to
invalidate userptr objects which also happen to be pulled into GGTT
mmaps. That is when we unbind the userptr object (on mmu invalidation),
we revoke all CPU mmaps, which may then recurse into mmu invalidation.

We looked for ways of breaking the cycle, but the revocation on
invalidation is required and cannot be avoided. The only solution we
could see was to not allow such GGTT bindings of userptr objects in the
first place. In practice, no one really wants to use a GGTT mmapping of
a CPU pointer...

Just before Daniel's explosive lockdep patches land in v5.4-rc1, we got
a genuine blip from CI:

<4>[ 246.793958] ======================================================
<4>[ 246.793972] WARNING: possible circular locking dependency detected
<4>[ 246.793989] 5.3.0-gbd6c56f50d15-drmtip_372+ #1 Tainted: G U
<4>[ 246.794003] ------------------------------------------------------
<4>[ 246.794017] kswapd0/145 is trying to acquire lock:
<4>[ 246.794030] 000000003f565be6 (&dev->struct_mutex/1){+.+.}, at: userptr_mn_invalidate_range_start+0x18f/0x220 [i915]
<4>[ 246.794250]
but task is already holding lock:
<4>[ 246.794263] 000000001799cef9 (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe6/0x2a0
<4>[ 246.794291]
which lock already depends on the new lock.

<4>[ 246.794307]
the existing dependency chain (in reverse order) is:
<4>[ 246.794322]
-> #3 (&anon_vma->rwsem){++++}:
<4>[ 246.794344] down_write+0x33/0x70
<4>[ 246.794357] __vma_adjust+0x3d9/0x7b0
<4>[ 246.794370] __split_vma+0x16a/0x180
<4>[ 246.794385] mprotect_fixup+0x2a5/0x320
<4>[ 246.794399] do_mprotect_pkey+0x208/0x2e0
<4>[ 246.794413] __x64_sys_mprotect+0x16/0x20
<4>[ 246.794429] do_syscall_64+0x55/0x1c0
<4>[ 246.794443] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 246.794456]
-> #2 (&mapping->i_mmap_rwsem){++++}:
<4>[ 246.794478] down_write+0x33/0x70
<4>[ 246.794493] unmap_mapping_pages+0x48/0x130
<4>[ 246.794519] i915_vma_revoke_mmap+0x81/0x1b0 [i915]
<4>[ 246.794519] i915_vma_unbind+0x11d/0x4a0 [i915]
<4>[ 246.794519] i915_vma_destroy+0x31/0x300 [i915]
<4>[ 246.794519] __i915_gem_free_objects+0xb8/0x4b0 [i915]
<4>[ 246.794519] drm_file_free.part.0+0x1e6/0x290
<4>[ 246.794519] drm_release+0xa6/0xe0
<4>[ 246.794519] __fput+0xc2/0x250
<4>[ 246.794519] task_work_run+0x82/0xb0
<4>[ 246.794519] do_exit+0x35b/0xdb0
<4>[ 246.794519] do_group_exit+0x34/0xb0
<4>[ 246.794519] __x64_sys_exit_group+0xf/0x10
<4>[ 246.794519] do_syscall_64+0x55/0x1c0
<4>[ 246.794519] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 246.794519]
-> #1 (&vm->mutex){+.+.}:
<4>[ 246.794519] i915_gem_shrinker_taints_mutex+0x6d/0xe0 [i915]
<4>[ 246.794519] i915_address_space_init+0x9f/0x160 [i915]
<4>[ 246.794519] i915_ggtt_init_hw+0x55/0x170 [i915]
<4>[ 246.794519] i915_driver_probe+0xc9f/0x1620 [i915]
<4>[ 246.794519] i915_pci_probe+0x43/0x1b0 [i915]
<4>[ 246.794519] pci_device_probe+0x9e/0x120
<4>[ 246.794519] really_probe+0xea/0x3d0
<4>[ 246.794519] driver_probe_device+0x10b/0x120
<4>[ 246.794519] device_driver_attach+0x4a/0x50
<4>[ 246.794519] __driver_attach+0x97/0x130
<4>[ 246.794519] bus_for_each_dev+0x74/0xc0
<4>[ 246.794519] bus_add_driver+0x13f/0x210
<4>[ 246.794519] driver_register+0x56/0xe0
<4>[ 246.794519] do_one_initcall+0x58/0x300
<4>[ 246.794519] do_init_module+0x56/0x1f6
<4>[ 246.794519] load_module+0x25bd/0x2a40
<4>[ 246.794519] __se_sys_finit_module+0xd3/0xf0
<4>[ 246.794519] do_syscall_64+0x55/0x1c0
<4>[ 246.794519] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 246.794519]
-> #0 (&dev->struct_mutex/1){+.+.}:
<4>[ 246.794519] __lock_acquire+0x15d8/0x1e90
<4>[ 246.794519] lock_acquire+0xa6/0x1c0
<4>[ 246.794519] __mutex_lock+0x9d/0x9b0
<4>[ 246.794519] userptr_mn_invalidate_range_start+0x18f/0x220 [i915]
<4>[ 246.794519] __mmu_notifier_invalidate_range_start+0x85/0x110
<4>[ 246.794519] try_to_unmap_one+0x76b/0x860
<4>[ 246.794519] rmap_walk_anon+0x104/0x280
<4>[ 246.794519] try_to_unmap+0xc0/0xf0
<4>[ 246.794519] shrink_page_list+0x561/0xc10
<4>[ 246.794519] shrink_inactive_list+0x220/0x440
<4>[ 246.794519] shrink_node_memcg+0x36e/0x740
<4>[ 246.794519] shrink_node+0xcb/0x490
<4>[ 246.794519] balance_pgdat+0x241/0x580
<4>[ 246.794519] kswapd+0x16c/0x530
<4>[ 246.794519] kthread+0x119/0x130
<4>[ 246.794519] ret_from_fork+0x24/0x50
<4>[ 246.794519]
other info that might help us debug this:

<4>[ 246.794519] Chain exists of:
&dev->struct_mutex/1 --> &mapping->i_mmap_rwsem --> &anon_vma->rwsem

<4>[ 246.794519] Possible unsafe locking scenario:

<4>[ 246.794519] CPU0 CPU1
<4>[ 246.794519] ---- ----
<4>[ 246.794519] lock(&anon_vma->rwsem);
<4>[ 246.794519] lock(&mapping->i_mmap_rwsem);
<4>[ 246.794519] lock(&anon_vma->rwsem);
<4>[ 246.794519] lock(&dev->struct_mutex/1);
<4>[ 246.794519]
*** DEADLOCK ***

v2: Say no to mmap_ioctl

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111744
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111870
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190928082546.3473-1-chris@chris-wilson.co.uk


# a3f356b2 27-Sep-2019 Matthew Auld <matthew.auld@intel.com>

drm/i915: simplify i915_gem_init_early

i915_gem_init_early doesn't need to return anything.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190927173409.31175-3-matthew.auld@intel.com


# 5311f517 26-Sep-2019 Michał Winiarski <michal.winiarski@intel.com>

drm/i915: Define explicit wedged on init reset state

We're currently using scratch presence as a way of identifying that we
entered wedged state at driver initialization time.
Let's use a separate flag rather than rely on scratch.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190926133142.2838-1-chris@chris-wilson.co.uk


# dab3588a 10-Sep-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Make wait_for_timelines take struct intel_gt

Timelines live in struct intel_gt so make wait_for_timelines take
exactly what it needs.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190910143823.10686-3-tvrtko.ursulin@linux.intel.com


# 61fa60ff 10-Sep-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Move GT init to intel_gt.c

Code in i915_gem_init_hw is all about GT init so move it to intel_gt.c
renaming to intel_gt_init_hw.

Existing intel_gt_init_hw is renamed to intel_gt_init_hw_early since it
is currently called from driver probe.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190910143823.10686-2-tvrtko.ursulin@linux.intel.com


# 42014f69 05-Sep-2019 Andi Shyti <andi.shyti@intel.com>

drm/i915: Hook up GT power management

Refactor the GT power management interface to work through the GT now
that it is under the control of gt/

Based on a patch by Chris Wilson.

Signed-off-by: Andi Shyti <andi.shyti@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190905111403.10071-1-andi.shyti@intel.com


# 29326a16 23-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush the existing fence before GGTT read/write

Our fence management is lazy, very lazy. If the user marks an object as
untiled, we do not immediately flush the fence but merely mark it as
dirty. On the next use we have to remember to check and remove the fence,
by which time we hope it is idle and we do not have to wait.

v2: Throw away the old fence on the next ggtt_pin.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111468
Fixes: 1f7fd484fff1 ("drm/i915: Replace i915_vma_put_fence()")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823153944.20630-1-chris@chris-wilson.co.uk
(cherry picked from commit 636e83f2f208555c3d19d8b454ebdd8d8f4652cc)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# ff36c5c4 23-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Hold irq-off for the entire fake lock period

Sadly lockdep records when the irqs are re-enabled and then marks up the
fake lock as being irq-unsafe. Our hand is forced and so we must mark up
the entire fake lock critical section as irq-off.

Hopefully this is the last tweak required!

v2: Not quite, we need to mark the timeline spinlock as irqsafe. That
was a genuine bug being hidden by the earlier lockdep splat.

Fixes: d67739268cf0 ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823132700.25286-2-chris@chris-wilson.co.uk
(cherry picked from commit 6dcb85a0ad990455ae7c596e3fc966ad9c1ba9c5)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 636e83f2 23-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush the existing fence before GGTT read/write

Our fence management is lazy, very lazy. If the user marks an object as
untiled, we do not immediately flush the fence but merely mark it as
dirty. On the next use we have to remember to check and remove the fence,
by which time we hope it is idle and we do not have to wait.

v2: Throw away the old fence on the next ggtt_pin.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111468
Fixes: 1f7fd484fff1 ("drm/i915: Replace i915_vma_put_fence()")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823153944.20630-1-chris@chris-wilson.co.uk


# 6dcb85a0 23-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Hold irq-off for the entire fake lock period

Sadly lockdep records when the irqs are re-enabled and then marks up the
fake lock as being irq-unsafe. Our hand is forced and so we must mark up
the entire fake lock critical section as irq-off.

Hopefully this is the last tweak required!

v2: Not quite, we need to mark the timeline spinlock as irqsafe. That
was a genuine bug being hidden by the earlier lockdep splat.

Fixes: d67739268cf0 ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823132700.25286-2-chris@chris-wilson.co.uk


# 1f7fd484 22-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Replace i915_vma_put_fence()

Avoid calling i915_vma_put_fence() by using our alternate paths that
bind a secondary vma avoiding the original fenced vma. For the few
instances where we need to release the fence (i.e. on binding when the
GGTT range becomes invalid), replace the put_fence with a revoke_fence.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190822061557.18402-1-chris@chris-wilson.co.uk


# 6846895f 21-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Replace PIN_NONFAULT with calls to PIN_NOEVICT

When under severe stress for GTT mappable space, the LRU eviction model
falls off a cliff. We spend all our time scanning the much larger
non-mappable area searching for something within the mappable zone we can
evict. Turn this on its head by only using the full vma for the object if
it is already pinned in the mappable zone or there is sufficient *free*
space to accommodate it (prioritizing speedy reuse). If there is not,
immediately fall back to using small chunks (tilerow for GTT mmap, single
pages for pwrite/relocation) and using random eviction before doing a full
search.

Testcase: igt/gem_concurrent_blt
References: https://bugs.freedesktop.org/show_bug.cgi?id=110848
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190821123234.19194-1-chris@chris-wilson.co.uk


# cc337560 19-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use 0 for the unordered context

Since commit 078dec3326e2 ("dma-buf: add dma_fence_get_stub") the 0
fence context became an impossible match as it is used for an always
signaled fence. We can simplify our timeline tracking by knowing that 0
always means no match.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190819184404.24200-1-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20190819175109.5241-1-chris@chris-wilson.co.uk


# 0075a20a 17-Aug-2019 Michal Wajdeczko <michal.wajdeczko@intel.com>

drm/i915/uc: Never fail on uC preparation step

Let's wait with decision about importance of uC failure to
hardware initialization step.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190817131144.26884-4-michal.wajdeczko@intel.com


# 8e7cb179 16-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Extract intel_frontbuffer active tracking

Move the active tracking for the frontbuffer operations out of the
i915_gem_object and into its own first class (refcounted) object. In the
process of detangling, we switch from low level request tracking to the
easier i915_active -- with the plan that this avoids any potential
atomic callbacks as the frontbuffer tracking wishes to sleep as it
flushes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190816074635.26062-1-chris@chris-wilson.co.uk


# 338aade9 15-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Convert timeline tracking to spinlock

Convert the active_list manipulation of timelines to use spinlocks so
that we can perform the updates from underneath a quick interrupt
callback, if need be.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190815205709.24285-2-chris@chris-wilson.co.uk


# 52791eee 11-Aug-2019 Christian König <christian.koenig@amd.com>

dma-buf: rename reservation_object to dma_resv

Be more consistent with the naming of the other DMA-buf objects.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/323401/


# a5f978c3 11-Aug-2019 Michal Wajdeczko <michal.wajdeczko@intel.com>

drm/i915/uc: Use -EIO code for GuC initialization failures

Since commit 6ca9a2beb54a ("drm/i915: Unwind i915_gem_init() failure")
we believed that we correctly handle all errors encountered during
GuC initialization, including special one that indicates request to
run driver with disabled GPU submission (-EIO).

Unfortunately since commit 121981fafe69 ("drm/i915/guc: Combine
enable_guc_loading|submission modparams") we stopped using that
error code to avoid unwanted fallback to execlist submission mode.

In result any GuC initialization failure was treated as non-recoverable
error leading to driver load abort, so we could not even read related
GuC error log to investigate cause of the problem.

For now always return -EIO on any uC hardware related failure.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190811195132.9660-5-michal.wajdeczko@intel.com


# 6b86f900 09-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Replace global bsd_dispatch_index with random seed

We keep a global seed for the legacy BSD round-robin selector, but in
our testing of multiple simultaneous client workloads, a random seed
spreads the load more evenly. (As even as an initial round-robin selector
can be!) Removing the global is one less variable we have to find a home
for!

We can simulate multi-client (both same and mixed workloads) using
igt/gem_wsim to work out optimal strategies and then compare our
simulation with the actual transcoder on multi-engine machines. This
fixed round-robin turns out to be one of the worst methods.

No user is advised to use this method; the current suggestion is to use
a virtual engine for agnostic batches, randomised submission or using
the busyness tracking to select the most idle engine at the time of
dispatch. At the present time, intel-media is explicit, but libva still
seems to use it, with the exception of batches that must execute on vcs0.
Oh well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190809091010.23281-2-chris@chris-wilson.co.uk


# c7302f20 08-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Defer final intel_wakeref_put to process context

As we need to acquire a mutex to serialise the final
intel_wakeref_put, we need to ensure that we are in process context at
that time. However, we want to allow operation on the intel_wakeref from
inside timer and other hardirq context, which means that need to defer
that final put to a workqueue.

Inside the final wakeref puts, we are safe to operate in any context, as
we are simply marking up the HW and state tracking for the potential
sleep. It's only the serialisation with the potential sleeping getting
that requires careful wait avoidance. This allows us to retain the
immediate processing as before (we only need to sleep over the same
races as the current mutex_lock).

v2: Add a selftest to ensure we exercise the code while lockdep watches.
v3: That test was extremely loud and complained about many things!
v4: Not a whale!

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111295
References: https://bugs.freedesktop.org/show_bug.cgi?id=111245
References: https://bugs.freedesktop.org/show_bug.cgi?id=111256
Fixes: 18398904ca9e ("drm/i915: Only recover active engines")
Fixes: 51fbd8de87dc ("drm/i915/pmu: Atomically acquire the gt_pm wakeref")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190808202758.10453-1-chris@chris-wilson.co.uk


# 38775829 07-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Allocate kernel_contexts directly

Ignore the central i915->kernel_context for allocating an engine, as
that GEM context is being phased out. For internal clients, we just need
the per-engine logical state, so allocate it at the point of use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190808110612.23539-1-chris@chris-wilson.co.uk


# 6da4a2c4 06-Aug-2019 Jani Nikula <jani.nikula@intel.com>

drm/i915: remove unnecessary includes of intel_display_types.h header

With its original name intel_drv.h the intel_display_types.h header was
superfluously cargo-cult included all over the place, while it's really
mostly about display internals. Remove the unnecessary includes.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/e3d737f0ab87c55969e62c1e077e15c04c238297.1565085692.git.jani.nikula@intel.com


# 1d455f8d 06-Aug-2019 Jani Nikula <jani.nikula@intel.com>

drm/i915: rename intel_drv.h to display/intel_display_types.h

Everything about the file is about display, and mostly about types
related to display. Move under display/ as intel_display_types.h to
reflect the facts.

There's still plenty to clean up, but start off with moving the file
where it logically belongs and naming according to contents.

v2: fix the include guard name in the renamed file

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190806113933.11799-1-jani.nikula@intel.com


# 750e76b4 06-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Move the [class][inst] lookup for engines onto the GT

To maintain a fast lookup from a GT centric irq handler, we want the
engine lookup tables on the intel_gt. To avoid having multiple copies of
the same multi-dimension lookup table, move the generic user engine
lookup into an rbtree (for fast and flexible indexing).

v2: Split uabi_instance cf uabi_class
v3: Set uabi_class/uabi_instance after collating all engines to provide a
stable uabi across parallel unordered construction.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> #v2
Link: https://patchwork.freedesktop.org/patch/msgid/20190806124300.24945-2-chris@chris-wilson.co.uk


# c29579d2 06-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Make caps.scheduler static

We do not notify userspace when the scheduler capabilities are changed
(due to wedging the driver) and as such userspace will expect the caps
to be static and unchanging. Make it so, and so we only need to compute
our caps once during driver registration.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190806124300.24945-1-chris@chris-wilson.co.uk


# 515b8b7e 02-Aug-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush the freed object list on file close

As we increase the number of RCU objects, it becomes easier for us to
have several hundred thousand objects in the deferred RCU free queues.
An example is gem_ctx_create/files which continually creates active
contexts, which are not immediately freed upon close as they are kept
alive by outstanding requests. This lack of backpressure allows the
context objects to persist until they overwhelm and starve the system.
We can increase our backpressure by flushing the freed object queue upon
closing the device fd which should then not impact other clients.

Testcase: igt/gem_ctx_create/*files
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190802212137.22207-2-chris@chris-wilson.co.uk


# 6bd0fbe1 02-Aug-2019 Michal Wajdeczko <michal.wajdeczko@intel.com>

drm/i915/wopcm: Don't fail on WOPCM partitioning failure

We don't have to immediately fail on WOPCM partitioning, we can wait
until we will start programming WOPCM registers. This should give us
more options if we decide to restore fallback in case of GuC failures.

v3: rebased

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190802184055.31988-7-michal.wajdeczko@intel.com


# 5d1ef2b4 02-Aug-2019 Michal Wajdeczko <michal.wajdeczko@intel.com>

drm/i915/uc: Inject probe errors into intel_uc_init_hw

Inject probe errors into intel_uc_init_hw to make sure we
correctly handle any uC initialization failure.

To avoid complains from CI about injected errors use
i915_probe_error to lower message level.

v4: rebased after moving hot fixes moved to separate patches

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> #v1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190802184055.31988-6-michal.wajdeczko@intel.com


# 50d84418 02-Aug-2019 Michal Wajdeczko <michal.wajdeczko@intel.com>

drm/i915: Add i915 to i915_inject_probe_failure

With i915 added to i915_inject_probe_failure we can use dedicated
printk when injecting artificial load failure.

Also make this function look like other i915 functions that return
error code and make it more flexible to return any provided error
code instead of previously assumed -ENODEV.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190802184055.31988-2-michal.wajdeczko@intel.com


# 6cf72db6 31-Jul-2019 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915/gt: Move gt_cleanup_early out of gem_cleanup_early

We don't call the init_early function from within the gem code, so we
shouldn't do it for the cleanup either.

v2: while at it, s/gt_cleanup_early/gt_late_release (Chris)
v3: s/late_release/driver_late_release/ (Chris)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190801005709.34092-1-daniele.ceraolospurio@intel.com


# 0de50e40 26-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Lift intel_engines_resume() to callers

Since the reset path wants to recover the engines itself, it only wants
to reinitialise the hardware using i915_gem_init_hw(). Pull the call to
intel_engines_resume() to the module init/resume path so we can avoid it
during reset.

Fixes: 79ffac8599c4 ("drm/i915: Invert the GEM wakeref hierarchy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190626154549.10066-3-chris@chris-wilson.co.uk
(cherry picked from commit 092be382a2602067766f190a113514d469162456)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 1b6c3c6d 30-Jul-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Move MOCS setup to intel_mocs.c

Hide the details of MOCS setup from i915_gem by moving both current calls
into one in intel_mocs_init.

Cc: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190713010940.17711-21-lucas.demarchi@intel.com
Link: https://patchwork.freedesktop.org/patch/msgid/20190730180407.5993-6-lucas.demarchi@intel.com


# a7a7a0e6 30-Jul-2019 Michel Thierry <michel.thierry@intel.com>

drm/i915/tgl: Tigerlake only has global MOCS registers

Until Icelake, each engine had its own set of 64 MOCS registers. In
order to simplify, Tigerlake moves to only 64 Global MOCS registers,
which are no longer part of the engine context. Since these registers
are now global, they also only need to be initialized once.

>From Gen12 onwards, MOCS must specify the target cache (3:2) and LRU
management (5:4) fields and cannot be programmed to 'use the value from
Private PAT', because these fields are no longer part of the PPAT. Also
cacheability control (1:0) field has changed, 00 no longer means 'use
controls from page table', but uncacheable (UC).

v2 (Lucas):
- Move the changes to the fault registers to a separate commit - the
old ones overlap with the range used by the new global MOCS
(requested by Daniele)
v3 (Lucas):
- Clarify comment about setting the unused entries to the same value
of index 0, that is the invalid entry (requested by Daniele)
- Move changes to DONE_REG and ERROR_GEN6 to a separate commit
(requested by Daniele)

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190730180407.5993-5-lucas.demarchi@intel.com


# 63064d82 30-Jul-2019 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915/uc: Move uC WOPCM setup in uc_init_hw

The register we write are not WOPCM regs but uC ones related to how
GuC and HuC are going to use the WOPCM, so it makes logical sense
for them to be programmed as part of uc_init_hw. The WOPCM map on the
other side is not uC-specific (although that is our main use-case), so
keep that separate.

v2: move write_and_verify to uncore, fix log, re-use err_out tag,
add intel_wopcm_guc_base, fix log

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190730230743.19542-2-daniele.ceraolospurio@intel.com


# a5627721 28-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Inline engine->init_context into its caller

We only use the init_context vfunc once while recording the default
context state, and we use the same sequence in each backend (eliding
steps that do not apply). Remove the vfunc for simplicity and
de-duplication.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190729113720.24830-1-chris@chris-wilson.co.uk


# 14f8a0eb 23-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Squelch nop wait-for-idle trace

If the system is already idle, omit the GEM_TRACE saying we are about to
wait for idle. It looks confusing in the logs to see a continual stream
of wait-for-idle, as one immediately assumes it is stuck in a loop.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190723091218.5886-1-chris@chris-wilson.co.uk


# bdae33b8 18-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use maximum write flush for pwrite_gtt

As recently disovered by forcing big-core (!llc) machines to use the GTT
paths, we need our full GTT write flush before manipulating the GTT PTE
or else the writes may be directed to the wrong page.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190718145407.21352-2-chris@chris-wilson.co.uk


# d45a4dd5 18-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop wmb() inside pread_gtt

Inside pread, we only ever read from the GTT so the serialising wmb()
instructions around the GGTT PTE updates are pointless.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190718145407.21352-1-chris@chris-wilson.co.uk


# ca7b2c1b 13-Jul-2019 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915/uc: Move intel functions to intel_uc

All the intel_uc_* can now be moved to work on the intel_uc structure
for better encapsulation of uc-related actions.

Note: I've introduced uc_to_gt instead of uc_to_i915 because the aim is
to move everything to be gt-focused in the medium term, so we would've
had to replace it soon anyway.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Acked-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190713100016.8026-8-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# e3f503f1 13-Jul-2019 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915/uc: replace uc init/fini misc

The "misc" terminology doesn't clearly explain what we intend to cover
in this phase. The only thing we used ot do in there apart from FW fetch
was initializing the log workqueue, with the latter being required only in
the very rare case where we enable the log relay. As we no longer create
our own workqueue, piggybacking on the system_highpri_wq instead, we can
rename the function to clarify that they only fetch/release the blobs.

v2: only create log wq when needed (Michal), reword commit msg
accordingly
v3: after rebase the wq is gone, reword commit msg accordingly

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190713100016.8026-2-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# cb823ed9 12-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Use intel_gt as the primary object for handling resets

Having taken the first step in encapsulating the functionality by moving
the related files under gt/, the next step is to start encapsulating by
passing around the relevant structs rather than the global
drm_i915_private. In this step, we pass intel_gt to intel_reset.c

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190712192953.9187-1-chris@chris-wilson.co.uk


# 78dae1ac 12-Jul-2019 Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>

drm/i915: Propagate "_remove" function name suffix down

Similar to the "_release" case, consistently replace mixed
"_cleanup"/"_fini"/"_fini_hw" components found in names of functions
called from i915_driver_remove() with "_remove" or "_driver_remove"
suffixes for better code readability.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190712112429.740-6-janusz.krzysztofik@linux.intel.com


# 3b58a945 12-Jul-2019 Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>

drm/i915: Propagate "_release" function name suffix down

Replace mixed "_fini"/"_cleanup"/"_cleanup_hw" suffixes found in names
of functions called from i915_driver_release() with "_release" suffix
consistently. This provides better code readability, especially
helpful when trying to work out which phase the code is in.

Functions names starting with "i915_driver_", i.e., those defined in
drivers/gpu/dri/i915/i915_drv.c, just have their "cleanup" or "fini"
parts of their names replaced with the "_release" suffix, while names
of functions coming from other source files have been suffixed with
"_driver_release" to avoid ambiguity with other possible .release entry
points.

v2: early_probe pairs better with late_release (Chris)
v3: fix typo in commit message (Joonas)

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190712112429.740-5-janusz.krzysztofik@linux.intel.com


# f2db53f1 12-Jul-2019 Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>

drm/i915: Replace "_load" with "_probe" consequently

Use the "_probe" nomenclature not only in i915_driver_probe() helper
name but also in other related function / variable names for
consistency. Only the userspace exposed name of a related module
parameter is left untouched.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190712112429.740-4-janusz.krzysztofik@linux.intel.com


# c03467ba 03-Jul-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Free pages before rcu-freeing the object

As we have dropped the final reference to the object, we do not need to
wait until after the rcu grace period to drop its pages. We still require
struct_mutex to completely unbind the object to release the pages, so we
still need a free-worker to manage that from process context. By
scheduling the release of pages before waiting for the rcu should mean
that we are not trapping those pages from beyond the reach of the
shrinker.

v2: Pass along the request to skip if the vma is busy to the underlying
unbind routine, to avoid checking the reservation underneath the
i915->mm.obj_lock which may be used from inside irq context.

v3: Flip the bit for unbinding while active, for later convenience.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111035
Fixes: a93615f900bd ("drm/i915: Throw away the active object retirement complexity")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190703091726.11690-6-chris@chris-wilson.co.uk


# 092be382 26-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Lift intel_engines_resume() to callers

Since the reset path wants to recover the engines itself, it only wants
to reinitialise the hardware using i915_gem_init_hw(). Pull the call to
intel_engines_resume() to the module init/resume path so we can avoid it
during reset.

Fixes: 79ffac8599c4 ("drm/i915: Invert the GEM wakeref hierarchy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190626154549.10066-3-chris@chris-wilson.co.uk


# 0c91621c 25-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Pass intel_gt to pm routines

Switch from passing the i915 container to newly named struct intel_gt.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190625130128.11009-2-chris@chris-wilson.co.uk


# c6fe28b0 21-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gt: Rename i915_gt_timelines

Since the anonymous i915_gt became struct intel_gt and encloses
struct i915_gt_timelines, rename i915_gt_timelines to intel_gt_timelines
to match its parentage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621131640.28864-1-chris@chris-wilson.co.uk


# db56f974 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Eliminate dual personality of i915_scratch_offset

Scratch vma lives under gt but the API used to work on i915. Make this
consistent by renaming the function to intel_gt_scratch_offset and make
it take struct intel_gt.

v2:
* Move to intel_gt. (Chris)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-33-tvrtko.ursulin@linux.intel.com


# f0c02c1b 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Rename i915_timeline to intel_timeline and move under gt

Move all timeline code under gt and rename to intel_gt prefix.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-32-tvrtko.ursulin@linux.intel.com


# 1d66377a 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Compartmentalize i915_gem_init_ggtt

Continuing on the theme of better logical organization of our code, make
the first step towards making the ggtt code better isolated from wider
struct drm_i915_private.

v2:
* Bring the ickle onion unwind back. (Chris)
* Rename to i915_init_ggtt. (Chris)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-27-tvrtko.ursulin@linux.intel.com


# baea429d 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Move i915_gem_chipset_flush to intel_gt

This aligns better with the rest of restructuring.

v2:
* Move call out of line. (Chris)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-24-tvrtko.ursulin@linux.intel.com


# a1c8a09e 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Convert i915_gem_flush_ggtt_writes to intel_gt

Having introduced struct intel_gt (named the anonymous structure in i915)
we can start using it to compartmentalize our code better. It makes more
sense logically to have the code internally like this and it will also
help with future split between gt and display in i915.

v2:
* Keep ggtt flush before fb obj flush. (Chris)

v3:
* Fix refactoring fail.
* Always flush ggtt writes. (Chris)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-23-tvrtko.ursulin@linux.intel.com


# 6b0a8dfd 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Stop using I915_READ/WRITE in intel_wopcm_init_hw

More legacy mmio accessor removal. We pass in intel_gt explicitly allowing
code to use new intel_uncore_read/write helpers.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-17-tvrtko.ursulin@linux.intel.com


# 8649187a 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Move intel_engines_resume into common init

Since this part still operates on i915 and not intel_gt, move it to the
common (top-level) function.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-16-tvrtko.ursulin@linux.intel.com


# abc584f9 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Convert i915_gem_init_hw to intel_gt

More removal of implicit dev_priv from using old mmio accessors.

Actually the top level function remains but is split into a part which
writes to i915 and part which operates on intel_gt in order to initialize
the hardware.

GuC and engines are the only odd ones out remaining.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-15-tvrtko.ursulin@linux.intel.com


# acb56d97 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Convert i915_ppgtt_init_hw to intel_gt

More removal of implicit dev_priv from using old mmio accessors.

v2:
* Rebase for uncore_to_i915 removal.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-13-tvrtko.ursulin@linux.intel.com


# 20a7f2fc 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Convert intel_mocs_init_l3cc_table to intel_gt

More removal of implicit dev_priv from using old mmio accessors.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-12-tvrtko.ursulin@linux.intel.com


# d10cfee4 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Convert gt workarounds to intel_gt

More conversion of i915_gem_init_hw to uncore.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-10-tvrtko.ursulin@linux.intel.com


# cf6844b2 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Convert init_unused_rings to intel_gt

More removal of implicit dev_priv from using old mmio accessors.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-9-tvrtko.ursulin@linux.intel.com


# 500bfa38 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Convert i915_gem_init_swizzling to intel_gt

Start using the newly introduced struct intel_gt to fuse together correct
logical init flow with uncore for more removal of implicit dev_priv in
mmio access.

v2:
* Move code to i915_gem_fence_reg. (Chris)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-7-tvrtko.ursulin@linux.intel.com


# 99f2eb96 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Move intel_gt_pm_init under intel_gt_init_early

And also rename to intel_gt_pm_init_early and make it operate on gt.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-5-tvrtko.ursulin@linux.intel.com


# 24635c51 21-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Move intel_gt initialization to a separate file

As it will grow in a following patch make a new home for it.

v2:
* Convert mock_gem_device as well. (Chris)

v3:
* Rename to intel_gt_init_early and move call site to i915_drv.c. (Chris)

v4:
* Adjust SPDX tags.
* No need to gt/ path when including intel_gt_types.h. (Chris)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-3-tvrtko.ursulin@linux.intel.com


# df0566a6 13-Jun-2019 Jani Nikula <jani.nikula@intel.com>

drm/i915: move modesetting core code under display/

Now that we have a new subdirectory for display code, continue by moving
modesetting core code.

display/intel_frontbuffer.h sticks out like a sore thumb, otherwise this
is, again, a surprisingly clean operation.

v2:
- don't move intel_sideband.[ch] (Ville)
- use tabs for Makefile file lists and sort them

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190613084416.6794-3-jani.nikula@intel.com


# ce476c80 14-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Keep contexts pinned until after the next kernel context switch

We need to keep the context image pinned in memory until after the GPU
has finished writing into it. Since it continues to write as we signal
the final breadcrumb, we need to keep it pinned until the request after
it is complete. Currently we know the order in which requests execute on
each engine, and so to remove that presumption we need to identify a
request/context-switch we know must occur after our completion. Any
request queued after the signal must imply a context switch, for
simplicity we use a fresh request from the kernel context.

The sequence of operations for keeping the context pinned until saved is:

- On context activation, we preallocate a node for each physical engine
the context may operate on. This is to avoid allocations during
unpinning, which may be from inside FS_RECLAIM context (aka the
shrinker)

- On context deactivation on retirement of the last active request (which
is before we know the context has been saved), we add the
preallocated node onto a barrier list on each engine

- On engine idling, we emit a switch to kernel context. When this
switch completes, we know that all previous contexts must have been
saved, and so on retiring this request we can finally unpin all the
contexts that were marked as deactivated prior to the switch.

We can enhance this in future by flushing all the idle contexts on a
regular heartbeat pulse of a switch to kernel context, which will also
be used to check for hung engines.

v2: intel_context_active_acquire/_release

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-1-chris@chris-wilson.co.uk


# c447ff7d 13-Jun-2019 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915: update with_intel_runtime_pm to use the rpm structure

Matching the underlying get/put functions.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190613232156.34940-8-daniele.ceraolospurio@intel.com


# d858d569 13-Jun-2019 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915: update rpm_get/put to use the rpm structure

The functions where internally already only using the structure, so we
need to just flip the interface.

v2: rebase

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190613232156.34940-7-daniele.ceraolospurio@intel.com


# 84383d2e 14-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refine i915_reset.lock_map

We already use a mutex to serialise i915_reset() and wedging, so all we
need it to link that into i915_request_wait() and we have our lock cycle
detection.

v2.5: Take error mutex for selftests

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190614071023.17929-3-chris@chris-wilson.co.uk


# 0cf289bd 13-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move fence register tracking from i915->mm to ggtt

As the fence registers only apply to regions inside the GGTT is makes
more sense that we track these as part of the i915_ggtt and not the
general mm. In the next patch, we will then pull the register locking
underneath the i915_ggtt.mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190613073254.24048-1-chris@chris-wilson.co.uk


# e33a4be8 11-Jun-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Remove I915_POSTING_READ_FW

Only a few call sites remain which have been converted to uncore mmio
accessors and so the macro can be removed.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190611104548.30545-2-tvrtko.ursulin@linux.intel.com


# ecab9be1 12-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Combine unbound/bound list tracking for objects

With async binding, we don't want to manage a bound/unbound list as we
may end up running before we even acquire the pages. All that is
required is keeping track of shrinkable objects, so reduce it to the
minimum list.

Fixes: 6951e5893b48 ("drm/i915: Move GEM object domain management from struct_mutex to local")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190612105720.30310-1-chris@chris-wilson.co.uk


# 33df8a76 12-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prevent lock-cycles between GPU waits and GPU resets

We cannot allow ourselves to wait on the GPU while holding any lock as we
may need to reset the GPU. While there is not an explicit lock between
the two operations, lockdep cannot detect the dependency. So let's tell
lockdep about the wait/reset dependency with an explicit lockmap.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190612085246.16374-1-chris@chris-wilson.co.uk


# a8cff4c8 10-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Promote i915->mm.obj_lock to be irqsafe

The intent is to be able to update the mm.lists from inside an irqsoff
section (e.g. from a softirq rcu workqueue), ergo we need to make the
i915->mm.obj_lock irqsafe.

v2: can_discard_pages() ensures we are shrinkable
v3: Beware shadowing of 'flags'

Fixes: 3b4fa9640ccd ("drm/i915: Track the purgeable objects on a separate eviction list")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110869
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190610145430.17717-1-chris@chris-wilson.co.uk


# 155ab883 05-Jun-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move object close under its own lock

Use i915_gem_object_lock() to guard the LUT and active reference to
allow us to break free of struct_mutex for handling GEM_CLOSE.

Testcase: igt/gem_close_race
Testcase: igt/gem_exec_parallel
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190606112320.9704-1-chris@chris-wilson.co.uk


# d82b4b26 30-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Report all objects with allocated pages to the shrinker

Currently, we try to report to the shrinker the precise number of
objects (pages) that are available to be reaped at this moment. This
requires searching all objects with allocated pages to see if they
fulfill the search criteria, and this count is performed quite
frequently. (The shrinker tries to free ~128 pages on each invocation,
before which we count all the objects; counting takes longer than
unbinding the objects!) If we take the pragmatic view that with
sufficient desire, all objects are eventually reapable (they become
inactive, or no longer used as framebuffer etc), we can simply return
the count of pinned pages maintained during get_pages/put_pages rather
than walk the lists every time.

The downside is that we may (slightly) over-report the number of
objects/pages we could shrink and so penalize ourselves by shrinking
more than required. This is mitigated by keeping the order in which we
shrink objects such that we avoid penalizing active and frequently used
objects, and if memory is so tight that we need to free them we would
need to anyway.

v2: Only expose shrinkable objects to the shrinker; a small reduction in
not considering stolen and foreign objects.
v3: Restore the tracking from a "backup" copy from before the gem/ split

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190530203500.26272-2-chris@chris-wilson.co.uk


# 3b4fa964 30-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track the purgeable objects on a separate eviction list

Currently the purgeable objects, I915_MADV_DONTNEED, are mixed in the
normal bound/unbound lists. Every shrinker pass starts with an attempt
to purge from this set of unneeded objects, which entails us doing a
walk over both lists looking for any candidates. If there are none, and
since we are shrinking we can reasonably assume that the lists are
full!, this becomes a very slow futile walk.

If we separate out the purgeable objects into own list, this search then
becomes its own phase that is preferentially handled during shrinking.
Instead the cost becomes that we then need to filter the purgeable list
if we want to distinguish between bound and unbound objects.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190530203500.26272-1-chris@chris-wilson.co.uk


# 47bc28d7 30-May-2019 Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>

drm/i915: Split off pci_driver.remove() tail to drm_driver.release()

In order to support driver hot unbind, some cleanup operations, now
performed on PCI driver remove, must be called later, after all device
file descriptors are closed.

Split out those operations from the tail of pci_driver.remove()
callback and put them into drm_driver.release() which is called as soon
as all references to the driver are put. As a result, those cleanups
will be now run on last drm_dev_put(), either still called from
pci_driver.remove() if all device file descriptors are already closed,
or on last drm_release() file operation.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190530133105.30467-1-janusz.krzysztofik@linux.intel.com


# 446e2d16 28-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move GEM client throttling to its own file

Continuing the decluttering of i915_gem.c by moving the client self
throttling into its own file.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-13-chris@chris-wilson.co.uk


# 3f43c876 28-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move GEM object busy checking to its own file

Continuing the decluttering of i915_gem.c by moving the object busy
checking into its own file.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-12-chris@chris-wilson.co.uk


# d45a1a53 28-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move GEM object waiting to its own file

Continuing the decluttering of i915_gem.c by moving the object wait
decomposition into its own file.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-11-chris@chris-wilson.co.uk


# 6951e589 28-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move GEM object domain management from struct_mutex to local

Use the per-object local lock to control the cache domain of the
individual GEM objects, not struct_mutex. This is a huge leap forward
for us in terms of object-level synchronisation; execbuffers are
coordinated using the ww_mutex and pread/pwrite is finally fully
serialised again.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-10-chris@chris-wilson.co.uk


# 37d63f8f 28-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pull scatterlist utils out of i915_gem.h

Out scatterlist utility routines can be pulled out of i915_gem.h for a
bit more decluttering.

v2: Push I915_GTT_PAGE_SIZE out of i915_scatterlist itself and into the
caller.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-9-chris@chris-wilson.co.uk


# 10be98a7 28-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move more GEM objects under gem/

Continuing the theme of separating out the GEM clutter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-8-chris@chris-wilson.co.uk


# f0e4a063 28-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move GEM domain management to its own file

Continuing the decluttering of i915_gem.c, that of the read/write
domains, perhaps the biggest of GEM's follies?

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-7-chris@chris-wilson.co.uk


# b414fcd5 28-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move mmap and friends to its own file

Continuing the decluttering of i915_gem.c, now the turn of do_mmap and
the faulthandlers

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-6-chris@chris-wilson.co.uk


# f033428d 28-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move phys objects to its own file

Continuing the decluttering of i915_gem.c, this time the legacy physical
object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-5-chris@chris-wilson.co.uk


# 8475355f 28-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move shmem object setup to its own file

Split the plain old shmem object into its own file to start decluttering
i915_gem.c

v2: Lose the confusing, hysterical raisins, suffix of _gtt.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-4-chris@chris-wilson.co.uk


# afa13085 28-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pull GEM ioctls interface to its own file

Declutter i915_drv/gem.h by moving the ioctl API into its own header.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-2-chris@chris-wilson.co.uk


# b27e35ae 26-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Keep user GGTT alive for a minimum of 250ms

Do not allow runtime pm autosuspend to remove userspace GGTT mmaps too
quickly. For example, igt sets the autosuspend delay to 0, and so we
immediately attempt to perform runtime suspend upon releasing the
wakeref. Unfortunately, that involves tearing down GGTT mmaps as they
require an active device.

Override the autosuspend for GGTT mmaps, by keeping the wakeref around
for 250ms after populating the PTE for a fresh mmap.

v2: Prefer refcount_t for its under/overflow error detection
v3: Flush the user runtime autosuspend prior to system system.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190527115114.13448-1-chris@chris-wilson.co.uk


# aa5ca8b7 09-May-2019 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Align dumb buffer stride to 4k to allow for gtt remapping

Align dumb buffer stride to 4k if the fb will be big enough to
require gtt remapping.

v2: Leave the stride alone for buffers that look to be for the cursor
v3: Make it not a hack (Daniel)

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190509122159.24376-7-ville.syrjala@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>


# 1a74fc0b 09-May-2019 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Add a new "remapped" gtt_view

To overcome display engine stride limits we'll want to remap the
pages in the GTT. To that end we need a new gtt_view type which
is just like the "rotated" type except not rotated.

v2: Use intel_remapped_plane_info base type
s/unused/unused_mbz/ (Chris)
Separate BUILD_BUG_ON()s (Chris)
Use I915_GTT_PAGE_SIZE (Chris)
v3: Use i915_gem_object_get_dma_address() (Chris)
Trim the sg (Tvrtko)
v4: Actually trim this time. Limit the max length
to one row of pages to keep things simple

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190509122159.24376-2-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>


# 79ea35bc 12-Apr-2019 Al Viro <viro@zeniv.linux.org.uk>

don't open-code file_count()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 45b9c968 01-May-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move the engine->destroy() vfunc onto the engine

Make the engine responsible for cleaning itself up!

This removes the i915->gt.cleanup vfunc that has been annoying the
casual reader and myself for the last several years, and helps keep a
future patch to add more cleanup tidy.

v2: Assert that engine->destroy is set after the backend starts
allocating its own state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190501103204.18632-1-chris@chris-wilson.co.uk


# 5e2a0419 26-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Switch back to an array of logical per-engine HW contexts

We switched to a tree of per-engine HW context to accommodate the
introduction of virtual engines. However, we plan to also support
multiple instances of the same engine within the GEM context, defeating
our use of the engine as a key to looking up the HW context. Just
allocate a logical per-engine instance and always use an index into the
ctx->engines[]. Later on, this ctx->engines[] may be replaced by a user
specified map.

v2: Add for_each_gem_engine() helper to iterator within the engines lock
v3: intel_context_create_request() helper
v4: s/unsigned long/unsigned int/ 4 billion engines is quite enough.
v5: Push iterator locking to caller

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-7-chris@chris-wilson.co.uk


# 11334c6a 26-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Split engine setup/init into two phases

In the next patch, we require the engine vfuncs setup prior to
initialising the pinned kernel contexts, so split the vfunc setup from
the engine initialisation and call it earlier.

v2: s/setup_xcs/setup_common/ for intel_ring_submission_setup()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-6-chris@chris-wilson.co.uk


# 79ffac85 24-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Invert the GEM wakeref hierarchy

In the current scheme, on submitting a request we take a single global
GEM wakeref, which trickles down to wake up all GT power domains. This
is undesirable as we would like to be able to localise our power
management to the available power domains and to remove the global GEM
operations from the heart of the driver. (The intent there is to push
global GEM decisions to the boundary as used by the GEM user interface.)

Now during request construction, each request is responsible via its
logical context to acquire a wakeref on each power domain it intends to
utilize. Currently, each request takes a wakeref on the engine(s) and
the engines themselves take a chipset wakeref. This gives us a
transition on each engine which we can extend if we want to insert more
powermangement control (such as soft rc6). The global GEM operations
that currently require a struct_mutex are reduced to listening to pm
events from the chipset GT wakeref. As we reduce the struct_mutex
requirement, these listeners should evaporate.

Perhaps the biggest immediate change is that this removes the
struct_mutex requirement around GT power management, allowing us greater
flexibility in request construction. Another important knock-on effect,
is that by tracking engine usage, we can insert a switch back to the
kernel context on that engine immediately, avoiding any extra delay or
inserting global synchronisation barriers. This makes tracking when an
engine and its associated contexts are idle much easier -- important for
when we forgo our assumed execution ordering and need idle barriers to
unpin used contexts. In the process, it means we remove a large chunk of
code whose only purpose was to switch back to the kernel context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk


# 23c3c3d0 24-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pull the GEM powermangement coupling into its own file

Split out the powermanagement portion (GT wakeref, suspend/resume) of
GEM from i915_gem.c into its own file.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-2-chris@chris-wilson.co.uk


# 112ed2d3 24-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move GraphicsTechnology files under gt/

Start partitioning off the code that talks to the hardware (GT) from the
uapi layers and move the device facing code under gt/

One casualty is s/intel_ringbuffer.h/intel_engine.h/ with the plan to
subdivide that header and body further (and split out the submission
code from the ringbuffer and logical context handling). This patch aims
to be simple motion so git can fixup inflight patches with little mess.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190424174839.7141-1-chris@chris-wilson.co.uk


# 929eec99 17-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid use-after-free in reporting create.size

We have to avoid chasing after a userspace race!

<3>[ 473.114328] BUG: KASAN: use-after-free in i915_gem_create+0x1d2/0x1f0 [i915]
<3>[ 473.114389] Read of size 8 at addr ffff88815bf1d840 by task gem_flink_race/1541

<4>[ 473.114464] CPU: 1 PID: 1541 Comm: gem_flink_race Tainted: G U 5.1.0-rc4-g7d07e025e786-kasan_88+ #1
<4>[ 473.114469] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J4205-ITX, BIOS P1.10 09/29/2016
<4>[ 473.114474] Call Trace:
<4>[ 473.114488] dump_stack+0x7c/0xbb
<4>[ 473.114612] ? i915_gem_create+0x1d2/0x1f0 [i915]
<4>[ 473.114621] print_address_description+0x65/0x270
<4>[ 473.114728] ? i915_gem_create+0x1d2/0x1f0 [i915]
<4>[ 473.114839] ? i915_gem_create+0x1d2/0x1f0 [i915]
<4>[ 473.114848] kasan_report+0x149/0x18d
<4>[ 473.114962] ? i915_gem_create+0x1d2/0x1f0 [i915]
<4>[ 473.115069] i915_gem_create+0x1d2/0x1f0 [i915]
<4>[ 473.115176] ? i915_gem_object_create.part.28+0x4b0/0x4b0 [i915]
<4>[ 473.115289] ? i915_gem_dumb_create+0x1a0/0x1a0 [i915]
<4>[ 473.115297] drm_ioctl_kernel+0x192/0x260
<4>[ 473.115306] ? drm_ioctl_permit+0x280/0x280
<4>[ 473.115326] drm_ioctl+0x67c/0x960
<4>[ 473.115438] ? i915_gem_dumb_create+0x1a0/0x1a0 [i915]
<4>[ 473.115448] ? drm_getstats+0x20/0x20
<4>[ 473.115459] ? __lock_acquire+0xa66/0x3fe0
<4>[ 473.115474] ? _raw_spin_unlock_irqrestore+0x39/0x60
<4>[ 473.115485] ? debug_object_active_state+0x2ea/0x4e0
<4>[ 473.115496] ? debug_show_all_locks+0x2d0/0x2d0
<4>[ 473.115513] do_vfs_ioctl+0x18d/0xfa0
<4>[ 473.115522] ? check_flags.part.27+0x440/0x440
<4>[ 473.115532] ? ioctl_preallocate+0x1a0/0x1a0
<4>[ 473.115547] ? __fget+0x2ac/0x410
<4>[ 473.115561] ? __ia32_sys_dup3+0xb0/0xb0
<4>[ 473.115569] ? rwlock_bug.part.0+0x90/0x90
<4>[ 473.115590] ksys_ioctl+0x35/0x70
<4>[ 473.115597] ? lockdep_hardirqs_off+0x1cb/0x2b0
<4>[ 473.115608] __x64_sys_ioctl+0x6a/0xb0
<4>[ 473.115614] ? lockdep_hardirqs_on+0x342/0x590
<4>[ 473.115623] do_syscall_64+0x97/0x400
<4>[ 473.115633] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 473.115641] RIP: 0033:0x7fce590d55d7
<4>[ 473.115649] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
<4>[ 473.115655] RSP: 002b:00007fce4d525ba8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
<4>[ 473.115662] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fce590d55d7
<4>[ 473.115667] RDX: 00007fce4d525c10 RSI: 00000000c010645b RDI: 0000000000000007
<4>[ 473.115672] RBP: 00007fce4d525c10 R08: 00007fce4d526700 R09: 00007fce4d526700
<4>[ 473.115677] R10: 0000000000000054 R11: 0000000000000246 R12: 00000000c010645b
<4>[ 473.115682] R13: 0000000000000007 R14: 0000000000000000 R15: 00007ffe0e4a7450

<3>[ 473.115731] Allocated by task 1541:
<4>[ 473.115766] kmem_cache_alloc+0xce/0x290
<4>[ 473.115895] i915_gem_object_create.part.28+0x1c/0x4b0 [i915]
<4>[ 473.116000] i915_gem_create+0xe3/0x1f0 [i915]
<4>[ 473.116008] drm_ioctl_kernel+0x192/0x260
<4>[ 473.116013] drm_ioctl+0x67c/0x960
<4>[ 473.116020] do_vfs_ioctl+0x18d/0xfa0
<4>[ 473.116026] ksys_ioctl+0x35/0x70
<4>[ 473.116032] __x64_sys_ioctl+0x6a/0xb0
<4>[ 473.116038] do_syscall_64+0x97/0x400
<4>[ 473.116044] entry_SYSCALL_64_after_hwframe+0x49/0xbe

<3>[ 473.116071] Freed by task 1542:
<4>[ 473.116101] kmem_cache_free+0xb7/0x2f0
<4>[ 473.116205] __i915_gem_free_objects+0x7d4/0xe10 [i915]
<4>[ 473.116311] i915_gem_create_ioctl+0xaa/0xd0 [i915]
<4>[ 473.116318] drm_ioctl_kernel+0x192/0x260
<4>[ 473.116323] drm_ioctl+0x67c/0x960
<4>[ 473.116330] do_vfs_ioctl+0x18d/0xfa0
<4>[ 473.116335] ksys_ioctl+0x35/0x70
<4>[ 473.116341] __x64_sys_ioctl+0x6a/0xb0
<4>[ 473.116347] do_syscall_64+0x97/0x400
<4>[ 473.116354] entry_SYSCALL_64_after_hwframe+0x49/0xbe

Testcase: igt/gem_flink_race/flink_close
Fixes: e163484afa8d ("drm/i915: Update size upon return from GEM_CREATE")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190417132507.27133-1-chris@chris-wilson.co.uk
(cherry picked from commit 99534023490686ce4453c45e5cb813535b9bff95)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 2d6692e6 19-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Start writeback from the shrinker

When we are called to relieve mempressue via the shrinker, the only way
we can make progress is either by discarding unwanted pages (those
objects that userspace has marked MADV_DONTNEED) or by reclaiming the
dirty objects via swap. As we know that is the only way to make further
progress, we can initiate the writeback as we invalidate the objects.
This means the objects we put onto the inactive anon lru list are
already marked for reclaim+writeback and so will trigger a wait upon the
writeback inside direct reclaim, greatly improving the success rate of
direct reclaim on i915 objects.

The corollary is that we may start a slow swap on opportunistic
mempressure from the likes of the compaction + migration kthreads. This
is limited by those threads only being allowed to shrink idle pages, but
also that if we reactivate the page before it is swapped out by gpu
activity, we only page the cost of repinning the page. The cost is most
felt when an object is reused after mempressure, which hopefully
excludes the latency sensitive tasks (as we are just extending the
impact of swap thrashing to them).

Apparently this is not the first time we've had this idea. Back in
commit 5537252b6b6d ("drm/i915: Invalidate our pages under memory
pressure") we wanted to start writeback but settled on invalidate after
Hugh Dickins warned us about a possibility of a deadlock within shmemfs
if we started writeback from shrink_slab. Looking at the callchain,
using writeback from i915_gem_shrink should be equivalent to the pageout
also employed by shrink_slab, i.e. it should not be any riskier afaict.

v2: Leave mmapings intact. At this point, the only mmapings of our
objects will be via CPU mmaps on the shmemfs filp, which are
out-of-scope for our LRU tracking. Instead leave those pages to the
inactive anon LRU page list for aging and pageout as normal.

v3: Be selective on which paths trigger writeback, in particular
excluding paths shrinking just to reclaim vm space (e.g. mmap, vmap
reapers) and avoid starting writeback on the entire process space from
within the pm freezer.

References: https://bugs.freedesktop.org/show_bug.cgi?id=108686
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Michal Hocko <mhocko@suse.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20190420115539.29081-1-chris@chris-wilson.co.uk


# 99534023 17-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid use-after-free in reporting create.size

We have to avoid chasing after a userspace race!

<3>[ 473.114328] BUG: KASAN: use-after-free in i915_gem_create+0x1d2/0x1f0 [i915]
<3>[ 473.114389] Read of size 8 at addr ffff88815bf1d840 by task gem_flink_race/1541

<4>[ 473.114464] CPU: 1 PID: 1541 Comm: gem_flink_race Tainted: G U 5.1.0-rc4-g7d07e025e786-kasan_88+ #1
<4>[ 473.114469] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J4205-ITX, BIOS P1.10 09/29/2016
<4>[ 473.114474] Call Trace:
<4>[ 473.114488] dump_stack+0x7c/0xbb
<4>[ 473.114612] ? i915_gem_create+0x1d2/0x1f0 [i915]
<4>[ 473.114621] print_address_description+0x65/0x270
<4>[ 473.114728] ? i915_gem_create+0x1d2/0x1f0 [i915]
<4>[ 473.114839] ? i915_gem_create+0x1d2/0x1f0 [i915]
<4>[ 473.114848] kasan_report+0x149/0x18d
<4>[ 473.114962] ? i915_gem_create+0x1d2/0x1f0 [i915]
<4>[ 473.115069] i915_gem_create+0x1d2/0x1f0 [i915]
<4>[ 473.115176] ? i915_gem_object_create.part.28+0x4b0/0x4b0 [i915]
<4>[ 473.115289] ? i915_gem_dumb_create+0x1a0/0x1a0 [i915]
<4>[ 473.115297] drm_ioctl_kernel+0x192/0x260
<4>[ 473.115306] ? drm_ioctl_permit+0x280/0x280
<4>[ 473.115326] drm_ioctl+0x67c/0x960
<4>[ 473.115438] ? i915_gem_dumb_create+0x1a0/0x1a0 [i915]
<4>[ 473.115448] ? drm_getstats+0x20/0x20
<4>[ 473.115459] ? __lock_acquire+0xa66/0x3fe0
<4>[ 473.115474] ? _raw_spin_unlock_irqrestore+0x39/0x60
<4>[ 473.115485] ? debug_object_active_state+0x2ea/0x4e0
<4>[ 473.115496] ? debug_show_all_locks+0x2d0/0x2d0
<4>[ 473.115513] do_vfs_ioctl+0x18d/0xfa0
<4>[ 473.115522] ? check_flags.part.27+0x440/0x440
<4>[ 473.115532] ? ioctl_preallocate+0x1a0/0x1a0
<4>[ 473.115547] ? __fget+0x2ac/0x410
<4>[ 473.115561] ? __ia32_sys_dup3+0xb0/0xb0
<4>[ 473.115569] ? rwlock_bug.part.0+0x90/0x90
<4>[ 473.115590] ksys_ioctl+0x35/0x70
<4>[ 473.115597] ? lockdep_hardirqs_off+0x1cb/0x2b0
<4>[ 473.115608] __x64_sys_ioctl+0x6a/0xb0
<4>[ 473.115614] ? lockdep_hardirqs_on+0x342/0x590
<4>[ 473.115623] do_syscall_64+0x97/0x400
<4>[ 473.115633] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 473.115641] RIP: 0033:0x7fce590d55d7
<4>[ 473.115649] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
<4>[ 473.115655] RSP: 002b:00007fce4d525ba8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
<4>[ 473.115662] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fce590d55d7
<4>[ 473.115667] RDX: 00007fce4d525c10 RSI: 00000000c010645b RDI: 0000000000000007
<4>[ 473.115672] RBP: 00007fce4d525c10 R08: 00007fce4d526700 R09: 00007fce4d526700
<4>[ 473.115677] R10: 0000000000000054 R11: 0000000000000246 R12: 00000000c010645b
<4>[ 473.115682] R13: 0000000000000007 R14: 0000000000000000 R15: 00007ffe0e4a7450

<3>[ 473.115731] Allocated by task 1541:
<4>[ 473.115766] kmem_cache_alloc+0xce/0x290
<4>[ 473.115895] i915_gem_object_create.part.28+0x1c/0x4b0 [i915]
<4>[ 473.116000] i915_gem_create+0xe3/0x1f0 [i915]
<4>[ 473.116008] drm_ioctl_kernel+0x192/0x260
<4>[ 473.116013] drm_ioctl+0x67c/0x960
<4>[ 473.116020] do_vfs_ioctl+0x18d/0xfa0
<4>[ 473.116026] ksys_ioctl+0x35/0x70
<4>[ 473.116032] __x64_sys_ioctl+0x6a/0xb0
<4>[ 473.116038] do_syscall_64+0x97/0x400
<4>[ 473.116044] entry_SYSCALL_64_after_hwframe+0x49/0xbe

<3>[ 473.116071] Freed by task 1542:
<4>[ 473.116101] kmem_cache_free+0xb7/0x2f0
<4>[ 473.116205] __i915_gem_free_objects+0x7d4/0xe10 [i915]
<4>[ 473.116311] i915_gem_create_ioctl+0xaa/0xd0 [i915]
<4>[ 473.116318] drm_ioctl_kernel+0x192/0x260
<4>[ 473.116323] drm_ioctl+0x67c/0x960
<4>[ 473.116330] do_vfs_ioctl+0x18d/0xfa0
<4>[ 473.116335] ksys_ioctl+0x35/0x70
<4>[ 473.116341] __x64_sys_ioctl+0x6a/0xb0
<4>[ 473.116347] do_syscall_64+0x97/0x400
<4>[ 473.116354] entry_SYSCALL_64_after_hwframe+0x49/0xbe

Testcase: igt/gem_flink_race/flink_close
Fixes: e163484afa8d ("drm/i915: Update size upon return from GEM_CREATE")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190417132507.27133-1-chris@chris-wilson.co.uk


# 254e1186 17-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Verify the engine workarounds stick on application

Read the engine workarounds back using the GPU after loading the initial
context state to verify that we are setting them correctly, and bail if
it fails.

v2: Break out the verification into its own loop

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190417075657.19456-3-chris@chris-wilson.co.uk


# 9726920b 10-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only reset the pinned kernel contexts on resume

On resume, we know that the only pinned contexts in danger of seeing
corruption are the kernel context, and so we do not need to walk the
list of all GEM contexts as we tracked them on each engine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190410190120.830-1-chris@chris-wilson.co.uk


# 696173b0 05-Apr-2019 Jani Nikula <jani.nikula@intel.com>

drm/i915: extract intel_pm.h from intel_drv.h

It used to be handy that we only had a couple of headers, but over time
intel_drv.h has become unwieldy. Extract declarations to a separate
header file corresponding to the implementation module, clarifying the
modularity of the driver.

Ensure the new header is self-contained, and do so with minimal further
includes, using forward declarations as needed. Include the new header
only where needed, and sort the modified include directives while at it
and as needed.

No functional changes.

v2: gen6_rps_reset_ei() is in i915_irq.c not intel_pm.c.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/adc6463b95eef3440fba9826793f7d1c5f3b0b4a.1554461791.git.jani.nikula@intel.com


# 6960d9cf 04-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Be precise in types for i915_gem_busy

Mixing u8 and -1u together leads to zero-extended integer expansion, and
comparing 0x000000ff against 0xffffffff, causing us to report a mixed
uabi-class request as not busy.

The input flag is a u8, and we want to generate a u32 uABI response,
mark our functions so.

Fixes: c8b502422bfe ("drm/i915: Remove last traces of exec-id (GEM_BUSY)")
Testcase: igt/gem_exec_balance/busy
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190404101914.7231-1-chris@chris-wilson.co.uk


# b01720bf 01-Apr-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prefault before locking pages in shmem_pwrite

If the user passes in a pointer to a GGTT mmaping of the same buffer
being written to, we can hit a deadlock in acquiring the shmemfs page
(once as the write destination and then as the read source).

[<0>] io_schedule+0xd/0x30
[<0>] __lock_page+0x105/0x1b0
[<0>] find_lock_entry+0x55/0x90
[<0>] shmem_getpage_gfp+0xbb/0x800
[<0>] shmem_read_mapping_page_gfp+0x2d/0x50
[<0>] shmem_get_pages+0x158/0x5d0 [i915]
[<0>] ____i915_gem_object_get_pages+0x17/0x90 [i915]
[<0>] __i915_gem_object_get_pages+0x57/0x70 [i915]
[<0>] i915_gem_fault+0x1b4/0x5c0 [i915]
[<0>] __do_fault+0x2d/0x80
[<0>] __handle_mm_fault+0xad4/0xfb0
[<0>] handle_mm_fault+0xe6/0x1f0
[<0>] __do_page_fault+0x18f/0x3f0
[<0>] page_fault+0x1b/0x20
[<0>] copy_user_enhanced_fast_string+0x7/0x10
[<0>] _copy_from_user+0x37/0x60
[<0>] shmem_pwrite+0xf0/0x160 [i915]
[<0>] i915_gem_pwrite_ioctl+0x14e/0x520 [i915]
[<0>] drm_ioctl_kernel+0x81/0xd0
[<0>] drm_ioctl+0x1a7/0x310
[<0>] do_vfs_ioctl+0x88/0x5d0
[<0>] ksys_ioctl+0x35/0x70
[<0>] __x64_sys_ioctl+0x11/0x20
[<0>] do_syscall_64+0x39/0xe0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

We can reduce (but not eliminate!) the chance of this happening by
faulting the user_data before we take the page lock in
pagecache_write_begin(). One way to eliminate the potential recursion
here is by disabling pagefaults for the copy, and handling the fallback
to use an alternative method -- so convert to use kmap_atomic (which
should disable preemption and pagefaulting for the copy) and report
ENODEV instead of EFAULT so that our caller tries again with a different
copy mechanism -- we already check that the page should have been
faultable so a false negative should be rare.

Testcase: igt/gem_pwrite/self
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190401133909.31203-1-chris@chris-wilson.co.uk


# ee8efa80 31-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Check domains for userptr on release

When we return pages to the system, we release control over them and
should defensively return them to the CPU write domain so that we catch
any external writes on reacquiring them (e.g. to transparently
swapout/swapin). While we did this defensive clflushing for ordinary
shmem pages, it was forgotten for userptr. Fortunately, userptr objects
are normally cache coherent and so oblivious to the forgotten domain
tracking.

References: a679f58d0510 ("drm/i915: Flush pages on acquisition")
References: 754a25442705 ("drm/i915: Skip object locking around a no-op set-domain ioctl")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190331094620.15185-1-chris@chris-wilson.co.uk


# e163484a 26-Mar-2019 Michał Winiarski <michal.winiarski@intel.com>

drm/i915: Update size upon return from GEM_CREATE

Since GEM_CREATE is trying to outsmart the user by rounding up unaligned
objects, we used to update the size returned to userspace.
This update seems to have been lost throughout the history.

v2: Use round_up(), reorder locals (Chris)

References: ff72145badb8 ("drm: dumb scanout create/mmap for intel/radeon (v3)")
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Janusz Krzysztofik <janusz.krzysztofik@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190326170218.13255-1-michal.winiarski@intel.com


# dd19f6bf 23-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove defunct intel_suspend_gt_powersave()

Since commit b7137e0cf1e5 ("drm/i915: Defer enabling rc6 til after we
submit the first batch/context"), intel_suspend_gt_powersave() has been
a no-op. As we still do not need to do anything explicitly on suspend
(we do everything required on idling), remove the defunct function.

References: b7137e0cf1e5 ("drm/i915: Defer enabling rc6 til after we submit the first batch/context")
Suggested-by: "Hiatt, Don" <don.hiatt@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190323214009.23294-1-chris@chris-wilson.co.uk


# b9d52d38 21-Mar-2019 Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>

drm/i915/guc: GuC suspend path cleanup

Adding a call to intel_uc_suspend in i915_gem_suspend, which
is a common point for the suspend/resume and hibernate paths.
This fixes an unbalanced call that causes issues with the CTB
register/deregister.

v2: Making the call unconditional (Daniele)
Moving the call to after the GEM_BUG_ON (Chris)

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190321203804.6845-1-sujaritha.sundaresan@intel.com


# 754a2544 21-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Skip object locking around a no-op set-domain ioctl

If we are already in the desired write domain of a set-domain ioctl,
then there is nothing for us to do and we can quickly return back to
userspace, avoiding any lock contention. By recognising that the
write_domain is always a subset of the read_domains, and excluding the
no-op case of requiring 0 read_domains in the ioctl, we can infer if the
current write_domain matches the target read_domains, there is nothing
for us to do.

Secondary aspect of this is that we undo the arbitrary fetching and
potential flushing of all pages for a set-domain(.write=CPU) call on a
fresh object -- which was introduced simply because we do the get-pages
before taking the struct_mutex.

References: 40e62d5d6be8 ("drm/i915: Acquire the backing storage outside of struct_mutex in set-domain")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190321161908.8007-2-chris@chris-wilson.co.uk


# a679f58d 21-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush pages on acquisition

When we return pages to the system, we ensure that they are marked as
being in the CPU domain since any external access is uncontrolled and we
must assume the worst. This means that we need to always flush the pages
on acquisition if we need to use them on the GPU, and from the beginning
have used set-domain. Set-domain is overkill for the purpose as it is a
general synchronisation barrier, but our intent is to only flush the
pages being swapped in. If we move that flush into the pages acquisition
phase, we know then that when we have obj->mm.pages, they are coherent
with the GPU and need only maintain that status without resorting to
heavy handed use of set-domain.

The principle knock-on effect for userspace is through mmap-gtt
pagefaulting. Our uAPI has always implied that the GTT mmap was async
(especially as when any pagefault occurs is unpredicatable to userspace)
and so userspace had to apply explicit domain control itself
(set-domain). However, swapping is transparent to the kernel, and so on
first fault we need to acquire the pages and make them coherent for
access through the GTT. Our use of set-domain here leaks into the uABI
that the first pagefault was synchronous. This is unintentional and
baring a few igt should be unoticed, nevertheless we bump the uABI
version for mmap-gtt to reflect the change in behaviour.

Another implication of the change is that gem_create() is presumed to
create an object that is coherent with the CPU and is in the CPU write
domain, so a set-domain(CPU) following a gem_create() would be a minor
operation that merely checked whether we could allocate all pages for
the object. On applying this change, a set-domain(CPU) causes a clflush
as we acquire the pages. This will have a small impact on mesa as we move
the clflush here on !llc from execbuf time to create, but that should
have minimal performance impact as the same clflush exists but is now
done early and because of the clflush issue, userspace recycles bo and
so should resist allocating fresh objects.

Internally, the presumption that objects are created in the CPU
write-domain and remain so through writes to obj->mm.mapping is more
prevalent than I expected; but easy enough to catch and apply a manual
flush.

For the future, we should push the page flush from the central
set_pages() into the callers so that we can more finely control when it
is applied, but for now doing it one location is easier to validate, at
the cost of sometimes flushing when there is no need.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Antonio Argenziano <antonio.argenziano@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190321161908.8007-1-chris@chris-wilson.co.uk


# 3ceea6a1 19-Mar-2019 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915: use intel_uncore for all forcewake get/put

Now that the internal code all works on intel_uncore, flip the
external-facing interface.

v2: fix GVT.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190319183543.13679-4-daniele.ceraolospurio@intel.com


# 6e514e37 04-Mar-2019 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

drm/i915: Switch to bitmap_zalloc()

Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190304092908.57382-2-andriy.shevchenko@linux.intel.com


# 000c4f90 14-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Sanity check mmap length against object size

We assumed that vm_mmap() would reject an attempt to mmap past the end of
the filp (our object), but we were wrong.

Applications that tried to use the mmap beyond the end of the object
would be greeted by a SIGBUS. After this patch, those applications will
be told about the error on creating the mmap, rather than at a random
moment on later access.

Reported-by: Antonio Argenziano <antonio.argenziano@intel.com>
Testcase: igt/gem_mmap/bad-size
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Antonio Argenziano <antonio.argenziano@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190314075829.16838-1-chris@chris-wilson.co.uk
(cherry picked from commit 794a11cb67201ad1bb61af510bb8460280feb3f3)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 794a11cb 14-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Sanity check mmap length against object size

We assumed that vm_mmap() would reject an attempt to mmap past the end of
the filp (our object), but we were wrong.

Applications that tried to use the mmap beyond the end of the object
would be greeted by a SIGBUS. After this patch, those applications will
be told about the error on creating the mmap, rather than at a random
moment on later access.

Reported-by: Antonio Argenziano <antonio.argenziano@intel.com>
Testcase: igt/gem_mmap/bad-size
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Antonio Argenziano <antonio.argenziano@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190314075829.16838-1-chris@chris-wilson.co.uk


# 831ebf18 08-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Suppress the "Failed to idle" warning for gem_eio

It is debatable whether having an error message on suspend for forcibly
cancelling outstanding work is worthwhile. We want to know if it occurs
in the wild (as we will then have to reconsider the approach!), but
equally is not fatal across suspend, as upon resume we automatically
clear the wedged status.

However, CI does trigger this scenario with gem_eio/suspend; as there we
are intentionally wedging the device upon suspend. The dilemma is how
not to trigger a failure report for the dmesg spam, for which the
quickest response is to suppress the warning in the kernel. I'd rather
mark it as accepted in gem_eio, but for now detecting when gem_eio is
playing games and cancelling the warning for that case seems a barely
acceptable hack.

Testcase: igt/gem_eio/suspend
Reference: 5861b013e2c7 ("drm/i915: Do a synchronous switch-to-kernel-context on idling")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308134512.19115-1-chris@chris-wilson.co.uk


# ca22f32a 05-Mar-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Relax mmap VMA check

Legacy behaviour was to allow non-page-aligned mmap requests, as does the
linux mmap(2) implementation by virtue of automatically rounding up for
the caller.

To avoid breaking legacy userspace relax the newly introduced fix.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 5c4604e757ba ("drm/i915: Prevent a race during I915_GEM_MMAP ioctl with WC set")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Adam Zabrocki <adamza@microsoft.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.0+
Cc: Akash Goel <akash.goel@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190305110409.28633-1-tvrtko.ursulin@linux.intel.com
(cherry picked from commit a90e1948efb648f567444f87f3c19b2a0787affd)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 08819549 08-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Introduce intel_context.pin_mutex for pin management

Introduce a mutex to start locking the HW contexts independently of
struct_mutex, with a view to reducing the coarse struct_mutex. The
intel_context.pin_mutex is used to guard the transition to and from being
pinned on the gpu, and so is required before starting to build any
request. The intel_context will then remain pinned until the request
completes, but the mutex can be released immediately unpin completion of
pinning the context.

A slight variant of the above is used by per-context sseu that wants to
inspect the pinned status of the context, and requires that it remains
stable (either !pinned or pinned) across its operation. By using the
pin_mutex to serialise operations while pin_count==0, we can take that
pin_mutex for stabilise the boolean pin status.

v2: for Tvrtko!
* Improved commit message.
* Dropped _gpu suffix from gen8_modify_rpcs_gpu.
v3: Repair the locking for sseu selftests

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-7-chris@chris-wilson.co.uk


# c4d52feb 08-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move over to intel_context_lookup()

In preparation for an ever growing number of engines and so ever
increasing static array of HW contexts within the GEM context, move the
array over to an rbtree, allocated upon first use.

Unfortunately, this imposes an rbtree lookup at a few frequent callsites,
but we should be able to mitigate those by moving over to using the HW
context as our primary type and so only incur the lookup on the boundary
with the user GEM context and engines.

v2: Check for no HW context in guc_stage_desc_init

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-4-chris@chris-wilson.co.uk


# 7d6ce558 08-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove has-kernel-context

We can no longer assume execution ordering, and in particular we cannot
assume which context will execute last. One side-effect of this is that
we cannot determine if the kernel-context is resident on the GPU, so
remove the routines that claimed to do so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308093657.8640-4-chris@chris-wilson.co.uk


# c6eeb479 08-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reduce presumption of request ordering for barriers

Currently we assume that we know the order in which requests run and so
can determine if we need to reissue a switch-to-kernel-context prior to
idling. That assumption does not hold for the future, so instead of
tracking which barriers have been used, simply determine if we have ever
switched away from the kernel context by using the engine and before
idling ensure that all engines that have been used since the last idle
are synchronously switched back to the kernel context for safety (and
else of shrinking memory while idle).

v2: Use intel_engine_mask_t and ALL_ENGINES

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308093657.8640-3-chris@chris-wilson.co.uk


# 604c37d7 08-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refactor common code to load initial power context

We load a context (the kernel context) on both module load and resume in
order to initialise some logical state onto the GPU. We can use the same
routine for both operations, which will become more useful as we
refactor rc6/rps enabling.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308093657.8640-2-chris@chris-wilson.co.uk


# 5861b013 08-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Do a synchronous switch-to-kernel-context on idling

When the system idles, we switch to the kernel context as a defensive
measure (no users are harmed if the kernel context is lost). Currently,
we issue a switch to kernel context and then come back later to see if
the kernel context is still current and the system is idle. However,
if we are no longer privy to the runqueue ordering, then we have to
relax our assumptions about the logical state of the GPU and the only
way to ensure that the kernel context is currently loaded is by issuing
a request to run after all others, and wait for it to complete all while
preventing anyone else from issuing their own requests.

v2: Pull wedging into switch_to_kernel_context_sync() but only after
waiting (though only for the same short delay) for the active context to
finish.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308093657.8640-1-chris@chris-wilson.co.uk


# 50b022af 07-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Force GPU idle on suspend

To facilitate the next patch to allow preemptible kernels not to incur
the wrath of hangcheck, we need to ensure that we can still suspend and
shutdown. That is we will not be able to rely on hangcheck to terminate
a blocking kernel and instead must manually do so ourselves. The
advantage is that we can apply more pressure!

As we now perform a GPU reset to clean up any residual kernels, we leave
the GPU in an unknown state and in particular can not talk to the GuC
before we reinitialise it following resume. For example, we no longer
need to tell the GuC to suspend itself, as it is already reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190307104530.21745-2-chris@chris-wilson.co.uk


# 3d606249 07-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make I915_GEM_IDLE_TIMEOUT into a macro

Currently we use HZ/5 for detecting a dead gpu on startup, and we will
wish to reuse this value for detecting a dead gpu on suspend, so convert
it into a macro for later convenience.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190307104530.21745-1-chris@chris-wilson.co.uk


# a90e1948 05-Mar-2019 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Relax mmap VMA check

Legacy behaviour was to allow non-page-aligned mmap requests, as does the
linux mmap(2) implementation by virtue of automatically rounding up for
the caller.

To avoid breaking legacy userspace relax the newly introduced fix.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 5c4604e757ba ("drm/i915: Prevent a race during I915_GEM_MMAP ioctl with WC set")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Adam Zabrocki <adamza@microsoft.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.0+
Cc: Akash Goel <akash.goel@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190305110409.28633-1-tvrtko.ursulin@linux.intel.com


# cf4331dd 05-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move find_active_request() to the engine

To find the active request, we need only search along the individual
engine for the right request. This does not require touching any global
GEM state, so move it into the engine compartment.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190305180332.30900-3-chris@chris-wilson.co.uk


# c8b50242 05-Mar-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove last traces of exec-id (GEM_BUSY)

As we allow per-context engine allows the legacy concept of
I915_EXEC_RING no longer applies universally. We are still exposing the
unrelated exec-id in GEM_BUSY, so transition this ioctl (once more
slightly changing its ABI, but no one cares) over to only reporting the
uabi-class (not instance as we can not foreseeably fit those into the
small bitmask).

The only user of the extended ring information from GEM_BUSY is ddx/sna,
which tries to use the non-rcs business information to guide which
engine to use for subsequent operations on foreign bo. All that matters
for it is the decision between rcs and !rcs, so it is unaffected by the
change in higher bits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190305162643.20243-1-chris@chris-wilson.co.uk


# d9948a10 28-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove second level open-coded rcu work

We currently use a worker queued from an rcu callback to determine when
a how grace period has elapsed while we remained idle. We use this idle
delay to infer that we will be idle for a while and this is a suitable
point at which we can trim our global memory caches.

Since we wrote that, this mechanism now exists as rcu_work, and having
converted the idle shrinkers over to using that, we can remove our own
variant.

v2: Say goodbye to gt.epoch as well.
v3: Remove the misplaced and redundant comment before parking globals

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-3-chris@chris-wilson.co.uk


# 13f1bfd3 28-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make object/vma allocation caches global

As our allocations are not device specific, we can move our slab caches
to a global scope.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-2-chris@chris-wilson.co.uk


# 32eb6bcf 28-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make request allocation caches global

As kmem_caches share the same properties (size, allocation/free behaviour)
for all potential devices, we can use global caches. While this
potential has worse fragmentation behaviour (one can argue that
different devices would have different activity lifetimes, but you can
also argue that activity is temporal across the system) it is the
default behaviour of the system at large to amalgamate matching caches.

The benefit for us is much reduced pointer dancing along the frequent
allocation paths.

v2: Defer shrinking until after a global grace period for futureproofing
multiple consumers of the slab caches, similar to the current strategy
for avoiding shrinking too early.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-1-chris@chris-wilson.co.uk


# 2d5eaad0 26-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Compute the global scheduler caps

Do a pass over all the engines upon starting to determine the global
scheduler capability flags (those that are agreed upon by all).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190226102404.29153-7-chris@chris-wilson.co.uk


# 772b5408 20-Feb-2019 Chengguang Xu <cgxu519@gmx.com>

drm/i915: remove redundant likely/unlikely annotation

unlikely has already included in IS_ERR(), so just
remove redundant likely/unlikely annotation.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190221020819.21832-1-cgxu519@gmx.com


# 43a8f684 21-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reorder struct_mutex-vs-reset_lock in i915_gem_fault()

Annoyingly, struct_mutex was not entirely eliminated from the reset
pathway; for reasons of its own, intel_display_resume() requires
struct_mutex to prepare the planes it already captured. To avoid the
immediate problem of a deadlock between the struct_mutex and the reset
srcu, we have to acquire the reset_lock before struct_mutex in
i915_gem_fault(). Now any wait underneath struct_mutex will result us in
having to forcibly reset all inflight rendering, less than ideal, but
better than a deadlock (and will do for the short term).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190221102924.13442-1-chris@chris-wilson.co.uk


# c41166f9 20-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Beware temporary wedging when determining -EIO

At a few points in our uABI, we check to see if the driver is wedged and
report -EIO back to the user in that case. However, as we perform the
check and reset asynchronously (where once before they were both
serialised by the struct_mutex), we may instead see the temporary wedging
used to cancel inflight rendering to avoid a deadlock during reset
(caused by either us timing out in our reset handler,
i915_wedge_on_timeout or with malice aforethought in intel_reset_prepare
for a stuck modeset). If we suspect this is the case, that is we see a
wedged driver *and* reset in progress, then wait until the reset is
resolved before reporting upon the wedged status.

v2: might_sleep() (Mika)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109580
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190220145637.23503-1-chris@chris-wilson.co.uk


# 62eb3c24 13-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Apply rps waitboosting for dma_fence_wait_timeout()

As time goes by, usage of generic ioctls such as drm_syncobj and
sync_file are on the increase bypassing i915-specific ioctls like
GEM_WAIT. Currently, we only apply waitboosting to our driver ioctls as
we track the file/client and account the waitboosting to them. However,
since commit 7b92c1bd0540 ("drm/i915: Avoid keeping waitboost active for
signaling threads"), we no longer have been applying the client
ratelimiting on waitboosts and so that information has only been used
for debug tracking.

Push the application of waitboosting down to the common
i915_request_wait, and apply it to all foreign fence waits as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190213092504.25709-1-chris@chris-wilson.co.uk


# aeaaa55c 12-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Recursive i915_reset_trylock() verboten

We cannot nest i915_reset_trylock() as the inner may wait for the
I915_RESET_BACKOFF which in turn is waiting upon sync_srcu who is
waiting for our outermost lock. As we take the reset srcu around the
fence update, we have to defer taking it in i915_gem_fault() until after
we acquire the pin on the fence to avoid nesting. This is a little ugly,
but still works. If a reset occurs between i915_vma_pin_fence() and the
second reset lock, the reset will restore the fence register back to the
pinned value before the reset lock allows us to proceed (our mmap won't
be revoked as we haven't yet marked it as being a userfault as that
requires us to hold the reset lock), so the pagefault is still
serialised with the revocation in reset.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109605
Fixes: 2caffbf11762 ("drm/i915: Revoke mmaps and prevent access to fence registers across reset")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190212130831.14425-1-chris@chris-wilson.co.uk


# 2e7bd10e 07-Feb-2019 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Prevent a race during I915_GEM_MMAP ioctl with WC set

Make sure the underlying VMA in the process address space is the
same as it was during vm_mmap to avoid applying WC to wrong VMA.

A more long-term solution would be to have vm_mmap_locked variant
in linux/mmap.h for when caller wants to hold mmap_sem for an
extended duration.

v2:
- Refactor the compare function

Fixes: 1816f9236303 ("drm/i915: Support creation of unbound wc user mappings for objects")
Reported-by: Adam Zabrocki <adamza@microsoft.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.0+
Cc: Akash Goel <akash.goel@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Adam Zabrocki <adamza@microsoft.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20190207085454.10598-1-joonas.lahtinen@linux.intel.com
(cherry picked from commit 5c4604e757ba9b193b09768d75a7d2105a5b883f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 2caffbf1 08-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Revoke mmaps and prevent access to fence registers across reset

Previously, we were able to rely on the recursive properties of
struct_mutex to allow us to serialise revoking mmaps and reacquiring the
FENCE registers with them being clobbered over a global device reset.
I then proceeded to throw out the baby with the bath water in order to
pursue a struct_mutex-less reset.

Perusing LWN for alternative strategies, the dilemma on how to serialise
access to a global resource on one side was answered by
https://lwn.net/Articles/202847/ -- Sleepable RCU:

1 int readside(void) {
2 int idx;
3 rcu_read_lock();
4 if (nomoresrcu) {
5 rcu_read_unlock();
6 return -EINVAL;
7 }
8 idx = srcu_read_lock(&ss);
9 rcu_read_unlock();
10 /* SRCU read-side critical section. */
11 srcu_read_unlock(&ss, idx);
12 return 0;
13 }
14
15 void cleanup(void)
16 {
17 nomoresrcu = 1;
18 synchronize_rcu();
19 synchronize_srcu(&ss);
20 cleanup_srcu_struct(&ss);
21 }

No more worrying about stop_machine, just an uber-complex mutex,
optimised for reads, with the overhead pushed to the rare reset path.

However, we do run the risk of a deadlock as we allocate underneath the
SRCU read lock, and the allocation may require a GPU reset, causing a
dependency cycle via the in-flight requests. We resolve that by declaring
the driver wedged and cancelling all in-flight rendering.

v2: Use expedited rcu barriers to match our earlier timing
characteristics.
v3: Try to annotate locking contexts for sparse
v4: Reduce selftest lock duration to avoid a reset deadlock with fences
v5: s/srcu/reset_backoff_srcu/
v6: Remove more stale comments

Testcase: igt/gem_mmap_gtt/hang
Fixes: eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on struct_mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190208153708.20023-2-chris@chris-wilson.co.uk


# ebfb6977 07-Feb-2019 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Handle vm_mmap error during I915_GEM_MMAP ioctl with WC set

Add err goto label and use it when VMA can't be established or changes
underneath.

v2:
- Dropping Fixes: as it's indeed impossible to race an object to the
error address. (Chris)
v3:
- Use IS_ERR_VALUE (Chris)

Reported-by: Adam Zabrocki <adamza@microsoft.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Adam Zabrocki <adamza@microsoft.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v2
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190207085454.10598-2-joonas.lahtinen@linux.intel.com


# 5c4604e7 07-Feb-2019 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Prevent a race during I915_GEM_MMAP ioctl with WC set

Make sure the underlying VMA in the process address space is the
same as it was during vm_mmap to avoid applying WC to wrong VMA.

A more long-term solution would be to have vm_mmap_locked variant
in linux/mmap.h for when caller wants to hold mmap_sem for an
extended duration.

v2:
- Refactor the compare function

Fixes: 1816f9236303 ("drm/i915: Support creation of unbound wc user mappings for objects")
Reported-by: Adam Zabrocki <adamza@microsoft.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.0+
Cc: Akash Goel <akash.goel@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Adam Zabrocki <adamza@microsoft.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20190207085454.10598-1-joonas.lahtinen@linux.intel.com


# 21950ee7 05-Feb-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pull i915_gem_active into the i915_active family

Looking forward, we need to break the struct_mutex dependency on
i915_gem_active. In the meantime, external use of i915_gem_active is
quite beguiling, little do new users suspect that it implies a barrier
as each request it tracks must be ordered wrt the previous one. As one
of many, it can be used to track activity across multiple timelines, a
shared fence, which fits our unordered request submission much better. We
need to steer external users away from the singular, exclusive fence
imposed by i915_gem_active to i915_active instead. As part of that
process, we move i915_gem_active out of i915_request.c into
i915_active.c to start separating the two concepts, and rename it to
i915_active_request (both to tie it to the concept of tracking just one
request, and to give it a longer, less appealing name).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-5-chris@chris-wilson.co.uk


# 85474441 29-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Identify active requests

To allow requests to forgo a common execution timeline, one question we
need to be able to answer is "is this request running?". To track
whether a request has started on HW, we can emit a breadcrumb at the
beginning of the request and check its timeline's HWSP to see if the
breadcrumb has advanced past the start of this request. (This is in
contrast to the global timeline where we need only ask if we are on the
global timeline and if the timeline has advanced past the end of the
previous request.)

There is still confusion from a preempted request, which has already
started but relinquished the HW to a high priority request. For the
common case, this discrepancy should be negligible. However, for
identification of hung requests, knowing which one was running at the
time of the hang will be much more important.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-2-chris@chris-wilson.co.uk


# 9407d3bd 28-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track active timelines

Now that we pin timelines around use, we have a clearly defined lifetime
and convenient points at which we can track only the active timelines.
This allows us to reduce the list iteration to only consider those
active timelines and not all.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-6-chris@chris-wilson.co.uk


# 5013eb8c 28-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track the context's seqno in its own timeline HWSP

Now that we have allocated ourselves a cacheline to store a breadcrumb,
we can emit a write from the GPU into the timeline's HWSP of the
per-context seqno as we complete each request. This drops the mirroring
of the per-engine HWSP and allows each context to operate independently.
We do not need to unwind the per-context timeline, and so requests are
always consistent with the timeline breadcrumb, greatly simplifying the
completion checks as we no longer need to be concerned about the
global_seqno changing mid check.

One complication though is that we have to be wary that the request may
outlive the HWSP and so avoid touching the potentially danging pointer
after we have retired the fence. We also have to guard our access of the
HWSP with RCU, the release of the obj->mm.pages should already be RCU-safe.

At this point, we are emitting both per-context and global seqno and
still using the single per-engine execution timeline for resolving
interrupts.

v2: s/fake_complete/mark_complete/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-5-chris@chris-wilson.co.uk


# 1e345568 28-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move list of timelines under its own lock

Currently, the list of timelines is serialised by the struct_mutex, but
to alleviate difficulties with using that mutex in future, move the
list management under its own dedicated mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128102356.15037-5-chris@chris-wilson.co.uk


# 528cbd17 28-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move vma lookup to its own lock

Remove the struct_mutex requirement for looking up the vma for an
object.

v2: Highlight how the race for duplicate vma creation is resolved on
reacquiring the lock with a short comment.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128102356.15037-3-chris@chris-wilson.co.uk


# 09d7e46b 28-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pull VM lists under the VM mutex.

A starting point to counter the pervasive struct_mutex. For the goal of
avoiding (or at least blocking under them!) global locks during user
request submission, a simple but important step is being able to manage
each clients GTT separately. For which, we want to replace using the
struct_mutex as the guard for all things GTT/VM and switch instead to a
specific mutex inside i915_address_space.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128102356.15037-2-chris@chris-wilson.co.uk


# 499197dc 28-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Stop tracking MRU activity on VMA

Our goal is to remove struct_mutex and replace it with fine grained
locking. One of the thorny issues is our eviction logic for reclaiming
space for an execbuffer (or GTT mmaping, among a few other examples).
While eviction itself is easy to move under a per-VM mutex, performing
the activity tracking is less agreeable. One solution is not to do any
MRU tracking and do a simple coarse evaluation during eviction of
active/inactive, with a loose temporal ordering of last
insertion/evaluation. That keeps all the locking constrained to when we
are manipulating the VM itself, neatly avoiding the tricky handling of
possible recursive locking during execbuf and elsewhere.

Note that discarding the MRU (currently implemented as a pair of lists,
to avoid scanning the active list for a NONBLOCKING search) is unlikely
to impact upon our efficiency to reclaim VM space (where we think a LRU
model is best) as our current strategy is to use random idle replacement
first before doing a search, and over time the use of softpinned 48b
per-ppGTT is growing (thereby eliminating any need to perform any eviction
searches, in theory at least) with the remaining users being found on
much older devices (gen2-gen6).

v2: Changelog and commentary rewritten to elaborate on the duality of a
single list being both an inactive and active list.
v3: Consolidate bool parameters into a single set of flags; don't
comment on the duality of a single variable being a multiplicity of
bits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128102356.15037-1-chris@chris-wilson.co.uk


# eb8d0f5a 25-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove GPU reset dependence on struct_mutex

Now that the submission backends are controlled via their own spinlocks,
with a wave of a magic wand we can lift the struct_mutex requirement
around GPU reset. That is we allow the submission frontend (userspace)
to keep on submitting while we process the GPU reset as we can suspend
the backend independently.

The major change is around the backoff/handoff strategy for performing
the reset. With no mutex deadlock, we no longer have to coordinate with
any waiter, and just perform the reset immediately.

Testcase: igt/gem_mmap_gtt/hang # regresses
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190125132230.22221-3-chris@chris-wilson.co.uk


# fcd70cd3 17-Jan-2019 Daniel Vetter <daniel.vetter@ffwll.ch>

drm: Split out drm_probe_helper.h

Having the probe helper stuff (which pretty much everyone needs) in
the drm_crtc_helper.h file (which atomic drivers should never need) is
confusing. Split them out.

To make sure I actually achieved the goal here I went through all
drivers. And indeed, all atomic drivers are now free of
drm_crtc_helper.h includes.

v2: Make it compile. There was so much compile fail on arm drivers
that I figured I'll better not include any of the acks on v1.

v3: Massive rebase because i915 has lost a lot of drmP.h includes, but
not all: Through drm_crtc_helper.h > drm_modeset_helper.h -> drmP.h
there was still one, which this patch largely removes. Which means
rolling out lots more includes all over.

This will also conflict with ongoing drmP.h cleanup by others I
expect.

v3: Rebase on top of atomic bochs.

v4: Review from Laurent for bridge/rcar/omap/shmob/core bits:
- (re)move some of the added includes, use the better include files in
other places (all suggested from Laurent adopted unchanged).
- sort alphabetically

v5: Actually try to sort them, and while at it, sort all the ones I
touch.

v6: Rebase onto i915 changes.

v7: Rebase once more.

Acked-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Neil Armstrong <narmstrong@baylibre.com>
Acked-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: CK Hu <ck.hu@mediatek.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: virtualization@lists.linux-foundation.org
Cc: etnaviv@lists.freedesktop.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: intel-gfx@lists.freedesktop.org
Cc: linux-mediatek@lists.infradead.org
Cc: linux-amlogic@lists.infradead.org
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: spice-devel@lists.freedesktop.org
Cc: amd-gfx@lists.freedesktop.org
Cc: linux-renesas-soc@vger.kernel.org
Cc: linux-rockchip@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-tegra@vger.kernel.org
Cc: xen-devel@lists.xen.org
Link: https://patchwork.freedesktop.org/patch/msgid/20190117210334.13234-1-daniel.vetter@ffwll.ch


# 739f3abd 16-Jan-2019 Jani Nikula <jani.nikula@intel.com>

drm/i915: small isolated c99 types to kernel types switch

Mixed C99 and kernel types use is getting ugly. Prefer kernel types.

sed -i 's/\buint\(8\|16\|32\|64\)_t\b/u\1/g'

Minor checkpatch fixes sprinkled on top of the changed lines.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/14ed72e7f04c9340a057855c5950b54811f8a477.1547629303.git.jani.nikula@intel.com


# 9f58892e 16-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pull all the reset functionality together into i915_reset.c

Currently the code to reset the GPU and our state is spread widely
across a few files. Pull the logic together into a common file.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190116153304.787-1-chris@chris-wilson.co.uk


# 18bb2bcc 14-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Serialise concurrent calls to i915_gem_set_wedged()

Make i915_gem_set_wedged() and i915_gem_unset_wedged() behaviour more
consistent if called concurrently, and only do the wedging and reporting
once, curtailing any possible race where we start unwedging in the middle
of a wedge.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190114210408.4561-2-chris@chris-wilson.co.uk


# 484d9a84 14-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/userptr: Avoid struct_mutex recursion for mmu_invalidate_range_start

Since commit 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu
notifiers") we have been able to report failure from
mmu_invalidate_range_start which allows us to use a trylock on the
struct_mutex to avoid potential recursion and report -EBUSY instead.
Furthermore, this allows us to pull the work into the main callback and
avoid the sleight-of-hand in using a workqueue to avoid lockdep.

However, not all paths to mmu_invalidate_range_start are prepared to
handle failure, so instead of reporting the recursion, deal with it by
propagating the failure upwards, who can decide themselves to handle it
or report it.

v2: Mark up the recursive lock behaviour and comment on the various weak
points.

v3: Follow commit 3824e41975ae ("drm/i915: Use mutex_lock_killable() from
inside the shrinker") and also use mutex_lock_killable().
v3.1: No leak on EINTR.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108375
References: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190115124442.3500-1-chris@chris-wilson.co.uk


# decd29e6 14-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only dump GPU state on set-wedged if interesting

As we may frequently mark the device as wedged to flush requests off it
during the normal course of events, quite often we have a large state
dump that is of no interest. Don't bother dumping it all if the engines
are all idle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190115122057.1677-1-chris@chris-wilson.co.uk


# 8d761e77 14-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Combined gt.awake/gt.power wakerefs

As the GT_IRQ power domain implies a wakeref, we can use it inplace of
our existing redundant rpm grab.

v2: Drop papering over forgetting to take the runtime wakeref in
selftests

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-20-chris@chris-wilson.co.uk


# 0e6e0be4 14-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Markup paired operations on display power domains

The majority of runtime-pm operations are bounded and scoped within a
function; these are easy to verify that the wakeref are handled
correctly. We can employ the compiler to help us, and reduce the number
of wakerefs tracked when debugging, by passing around cookies provided
by the various rpm_get functions to their rpm_put counterpart. This
makes the pairing explicit, and given the required wakeref cookie the
compiler can verify that we pass an initialised value to the rpm_put
(quite handy for double checking error paths).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-16-chris@chris-wilson.co.uk


# d4225a53 14-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Syntatic sugar for using intel_runtime_pm

Frequently, we use intel_runtime_pm_get/_put around a small block.
Formalise that usage by providing a macro to define such a block with an
automatic closure to scope the intel_runtime_pm wakeref to that block,
i.e. macro abuse smelling of python.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-15-chris@chris-wilson.co.uk


# 538ef96b 14-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gem: Track the rpm wakerefs

Keep track of the temporary rpm wakerefs used for user access to the
device, so that we can cancel them upon release and clearly identify any
leaks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-10-chris@chris-wilson.co.uk


# 506d1f62 14-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track GT wakeref

Record the wakeref used for keeping the device awake as the GPU is
executing requests and be sure to cancel the tracking upon parking.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-3-chris@chris-wilson.co.uk


# 16e4dd03 14-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Markup paired operations on wakerefs

The majority of runtime-pm operations are bounded and scoped within a
function; these are easy to verify that the wakeref are handled
correctly. We can employ the compiler to help us, and reduce the number
of wakerefs tracked when debugging, by passing around cookies provided
by the various rpm_get functions to their rpm_put counterpart. This
makes the pairing explicit, and given the required wakeref cookie the
compiler can verify that we pass an initialised value to the rpm_put
(quite handy for double checking error paths).

For regular builds, the compiler should be able to eliminate the unused
local variables and the program growth should be minimal. Fwiw, it came
out as a net improvement as gcc was able to refactor rpm_get and
rpm_get_if_in_use together,

v2: Just s/rpm_put/rpm_put_unchecked/ everywhere, leaving the manual
mark up for smaller more targeted patches.
v3: Mention the cookie in Returns

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-2-chris@chris-wilson.co.uk


# 2f80d7bd 08-Jan-2019 Jani Nikula <jani.nikula@intel.com>

drm/i915: drop all drmP.h includes

Needs just a few additional includes here and there.

Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190108082709.3748-1-jani.nikula@intel.com


# b9d126e7 04-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove partial attempt to swizzle on pread/pwrite

Our attempt to account for bit17 swizzling of pread/pwrite onto tiled
objects was flawed due to the simple fact that we do not always know the
swizzling for a particular page (due to the swizzling varying based on
location in certain unbalanced configurations). Furthermore, the
pread/pwrite paths are now unbalanced in that we are required to use the
GTT as in some cases we do not have direct CPU access to the backing
physical pages (thus some paths trying to account for the swizzle, but
others neglecting, chaos ensues).

There are no known users who do use pread/pwrite into a tiled object
(you need to manually detile anyway, so why now just use mmap and avoid
the copy?) and no user bug reports to indicate that it is being used in
the wild. As no one is hitting the buggy path, we can just remove the
buggy code.

v2: Just use the fault allowing kmap() + normal copy_(to|from)_user
v3: Avoid int overflow in computing 'length' from 'remain' (Tvrtko)

References: fe115628d567 ("drm/i915: Implement pwrite without struct-mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190105120758.9237-1-chris@chris-wilson.co.uk


# 55c15512 03-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Do not allow unwedging following a failed driver initialisation

If we declare the driver wedged during early initialisation, we leave
the driver in an undefined state (with respect to GEM execution). As
this leads to unexpected behaviour if we allow the user to unwedge the
device (through debugfs, and performed by igt at test start), do not.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190103213340.1669-1-chris@chris-wilson.co.uk


# 96d4f267 03-Jan-2019 Linus Torvalds <torvalds@linux-foundation.org>

Remove 'type' argument from access_ok() function

Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.

It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.

A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.

This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

There were a couple of notable cases:

- csky still had the old "verify_area()" name as an alias.

- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)

- microblaze used the type argument for a debug printout

but other than those oddities this should be a total no-op patch.

I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 55277e1f 03-Jan-2019 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Always try to reset the GPU on takeover

When we first introduced the reset to sanitize the GPU on taking over
from the BIOS and before returning control to third parties (the BIOS!),
we restricted it to only systems utilizing HW contexts as we were
uncertain of how stable our reset mechanism truly was. We now have
reasonable coverage across all machines that expose a GPU reset method,
and so we should be safe to sanitize the GPU state everywhere.

v2: We _have_ to skip the reset if it would clobber the display.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190103112104.19561-1-chris@chris-wilson.co.uk


# 1216e3c3 28-Dec-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop unused engine->irq_seqno_barrier w/a

Now that we have eliminated the CPU-side irq_seqno_barrier by moving the
delays on the GPU before emitting the MI_USER_INTERRUPT, we can remove
the engine->irq_seqno_barrier infrastructure. Though intentionally
slowing down the GPU is nasty, so is the code we can now remove!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181228171641.16531-6-chris@chris-wilson.co.uk


# ca79b0c2 28-Dec-2018 Arun KS <arunks@codeaurora.org>

mm: convert totalram_pages and totalhigh_pages variables to atomic

totalram_pages and totalhigh_pages are made static inline function.

Main motivation was that managed_page_count_lock handling was complicating
things. It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
better to remove the lock and convert variables to atomic, with preventing
poteintial store-to-read tearing as a bonus.

[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS <arunks@codeaurora.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6faf5916 28-Dec-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove HW semaphores for gen7 inter-engine synchronisation

The writing is on the wall for the existence of a single execution queue
along each engine, and as a consequence we will not be able to track
dependencies along the HW queue itself, i.e. we will not be able to use
HW semaphores on gen7 as they use a global set of registers (and unlike
gen8+ we can not effectively target memory to keep per-context seqno and
dependencies).

On the positive side, when we implement request reordering for gen7 we
also can not presume a simple execution queue and would also require
removing the current semaphore generation code. So this bring us another
step closer to request reordering for ringbuffer submission!

The negative side is that using interrupts to drive inter-engine
synchronisation is much slower (4us -> 15us to do a nop on each of the 3
engines on ivb). This is much better than it was at the time of introducing
the HW semaphores and equally important userspace weaned itself off
intermixing dependent BLT/RENDER operations (the prime culprit was glyph
rendering in UXA). So while we regress the microbenchmarks, it should not
impact the user.

References: https://bugs.freedesktop.org/show_bug.cgi?id=108888
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181228140736.32606-2-chris@chris-wilson.co.uk


# cf819eff 12-Dec-2018 Lucas De Marchi <lucas.demarchi@intel.com>

drm/i915: replace IS_GEN<N> with IS_GEN(..., N)

Define IS_GEN() similarly to our IS_GEN_RANGE(). but use gen instead of
gen_mask to do the comparison. Now callers can pass then gen as a parameter,
so we don't require one macro for each gen.

The following spatch was used to convert the users of these macros:

@@
expression e;
@@
(
- IS_GEN2(e)
+ IS_GEN(e, 2)
|
- IS_GEN3(e)
+ IS_GEN(e, 3)
|
- IS_GEN4(e)
+ IS_GEN(e, 4)
|
- IS_GEN5(e)
+ IS_GEN(e, 5)
|
- IS_GEN6(e)
+ IS_GEN(e, 6)
|
- IS_GEN7(e)
+ IS_GEN(e, 7)
|
- IS_GEN8(e)
+ IS_GEN(e, 8)
|
- IS_GEN9(e)
+ IS_GEN(e, 9)
|
- IS_GEN10(e)
+ IS_GEN(e, 10)
|
- IS_GEN11(e)
+ IS_GEN(e, 11)
)

v2: use IS_GEN rather than GT_GEN and compare to info.gen rather than
using the bitmask

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181212181044.15886-2-lucas.demarchi@intel.com


# fe78742d 04-Dec-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Allocate a common scratch page

Currently we allocate a scratch page for each engine, but since we only
ever write into it for post-sync operations, it is not exposed to
userspace nor do we care for coherency. As we then do not care about its
contents, we can use one page for all, reducing our allocations and
avoid complications by not assuming per-engine isolation.

For later use, it simplifies engine initialisation (by removing the
allocation that required struct_mutex!) and means that we can always rely
on there being a scratch page.

v2: Check that we allocated a large enough scratch for I830 w/a

Fixes: 06e562e7f515 ("drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5") # v4.18.20
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108850
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181204141522.13640-1-chris@chris-wilson.co.uk
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.18.20+
(cherry picked from commit 5179749925933575a67f9d8f16d0cc204f98a29f)
[Joonas: Use new function in gen9_init_indirectctx_bb too]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# dd847a70 07-Dec-2018 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Compile fix for 64b dma-fence seqno

Many errs of the form:
drivers/gpu/drm/i915/selftests/intel_hangcheck.c: In function ‘__igt_reset_evict_vma’:
./include/linux/kern_levels.h:5:18: error: format ‘%x’ expects argument of type ‘unsigned int’, but argum

Fixes: b312d8ca3a7c ("dma-buf: make fence sequence numbers 64 bit v2")
Cc: Christian König <christian.koenig@amd.com>
Cc: Chunming Zhou <david1.zhou@amd.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20181207123428.16257-1-mika.kuoppala@linux.intel.com


# 00936779 05-Dec-2018 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Record GT workarounds in a list

To enable later verification of GT workaround state at various stages of
driver lifetime, we record the list of applicable ones per platforms to a
list, from which they are also applied.

The added data structure is a simple array of register, mask and value
items, which is allocated on demand as workarounds are added to the list.

This is a temporary implementation which later in the series gets fused
with the existing per context workaround list handling. It is separated at
this stage since the following patch fixes a bug which needs to be as easy
to backport as possible.

Also, since in the following patch we will be adding a new class of
workarounds (per engine) which can be applied from interrupt context, we
straight away make the provision for safe read-modify-write cycle.

v2:
* Change dev_priv to i915 along the init path. (Chris Wilson)
* API rename. (Chris Wilson)

v3:
* Remove explicit list size tracking in favour of growing the allocation
in power of two chunks. (Chris Wilson)

v4:
Chris Wilson:
* Change wa_list_finish to early return.
* Copy workarounds using the compiler for static checking.
* Do not bother zeroing unused entries.
* Re-order struct i915_wa_list.

v5:
* kmalloc_array.
* Whitespace cleanup.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20181203133319.10174-1-tvrtko.ursulin@linux.intel.com
(cherry picked from commit 25d140faaa25f728159eb8c304eae53d88a7f14e)
Fixes: 59b449d5c82a ("drm/i915: Split out functions for different kinds of workarounds")
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 51797499 04-Dec-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Allocate a common scratch page

Currently we allocate a scratch page for each engine, but since we only
ever write into it for post-sync operations, it is not exposed to
userspace nor do we care for coherency. As we then do not care about its
contents, we can use one page for all, reducing our allocations and
avoid complications by not assuming per-engine isolation.

For later use, it simplifies engine initialisation (by removing the
allocation that required struct_mutex!) and means that we can always rely
on there being a scratch page.

v2: Check that we allocated a large enough scratch for I830 w/a

Fixes: 06e562e7f515 ("drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5") # v4.18.20
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108850
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181204141522.13640-1-chris@chris-wilson.co.uk
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.18.20+


# 094304be 02-Dec-2018 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Verify GT workaround state after GPU init

Since we now have all the GT workarounds in a table, by adding a simple
shared helper function we can now verify that their values are still
applied after some interesting events in the lifetime of the driver.

Initially we only do this after GPU initialization.

v2:
Chris Wilson:
* Simplify verification by realizing it's a simple xor and and.
* Remove verification from engine reset path.
* Return bool straight away from the verify API.

v3:
* API rename. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20181203125014.3219-4-tvrtko.ursulin@linux.intel.com


# 25d140fa 03-Dec-2018 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Record GT workarounds in a list

To enable later verification of GT workaround state at various stages of
driver lifetime, we record the list of applicable ones per platforms to a
list, from which they are also applied.

The added data structure is a simple array of register, mask and value
items, which is allocated on demand as workarounds are added to the list.

This is a temporary implementation which later in the series gets fused
with the existing per context workaround list handling. It is separated at
this stage since the following patch fixes a bug which needs to be as easy
to backport as possible.

Also, since in the following patch we will be adding a new class of
workarounds (per engine) which can be applied from interrupt context, we
straight away make the provision for safe read-modify-write cycle.

v2:
* Change dev_priv to i915 along the init path. (Chris Wilson)
* API rename. (Chris Wilson)

v3:
* Remove explicit list size tracking in favour of growing the allocation
in power of two chunks. (Chris Wilson)

v4:
Chris Wilson:
* Change wa_list_finish to early return.
* Copy workarounds using the compiler for static checking.
* Do not bother zeroing unused entries.
* Re-order struct i915_wa_list.

v5:
* kmalloc_array.
* Whitespace cleanup.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20181203133319.10174-1-tvrtko.ursulin@linux.intel.com


# 3800960a 03-Dec-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Complete the fences as they are cancelled due to wedging

We inspect the requests under the assumption that they will be marked as
completed when they are removed from the queue. Currently however, in the
process of wedging the requests will be removed from the queue before they
are completed, so rearrange the code to complete the fences before the
locks are dropped.

<1>[ 354.473346] BUG: unable to handle kernel NULL pointer dereference at 0000000000000250
<6>[ 354.473363] PGD 0 P4D 0
<4>[ 354.473370] Oops: 0000 [#1] PREEMPT SMP PTI
<4>[ 354.473380] CPU: 0 PID: 4470 Comm: gem_eio Tainted: G U 4.20.0-rc4-CI-CI_DRM_5216+ #1
<4>[ 354.473393] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018
<4>[ 354.473480] RIP: 0010:__i915_schedule+0x311/0x5e0 [i915]
<4>[ 354.473490] Code: 49 89 44 24 20 4d 89 4c 24 28 4d 89 29 44 39 b3 a0 04 00 00 7d 3a 41 8b 44 24 78 85 c0 74 13 48 8b 93 78 04 00 00 48 83 e2 fc <39> 82 50 02 00 00 79 1e 44 89 b3 a0 04 00 00 48 8d bb d0 03 00 00
<4>[ 354.473515] RSP: 0018:ffffc900001bba90 EFLAGS: 00010046
<4>[ 354.473524] RAX: 0000000000000003 RBX: ffff8882624c8008 RCX: f34a737800000000
<4>[ 354.473535] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8882624c8048
<4>[ 354.473545] RBP: ffffc900001bbab0 R08: 000000005963f1f1 R09: 0000000000000000
<4>[ 354.473556] R10: ffffc900001bba10 R11: ffff8882624c8060 R12: ffff88824fdd7b98
<4>[ 354.473567] R13: ffff88824fdd7bb8 R14: 0000000000000001 R15: ffff88824fdd7750
<4>[ 354.473578] FS: 00007f44b4b5b980(0000) GS:ffff888277e00000(0000) knlGS:0000000000000000
<4>[ 354.473590] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 354.473599] CR2: 0000000000000250 CR3: 000000026976e000 CR4: 0000000000340ef0
<4>[ 354.473611] Call Trace:
<4>[ 354.473622] ? lock_acquire+0xa6/0x1c0
<4>[ 354.473677] ? i915_schedule_bump_priority+0x57/0xd0 [i915]
<4>[ 354.473736] i915_schedule_bump_priority+0x72/0xd0 [i915]
<4>[ 354.473792] i915_request_wait+0x4db/0x840 [i915]
<4>[ 354.473804] ? get_pwq.isra.4+0x2c/0x50
<4>[ 354.473813] ? ___preempt_schedule+0x16/0x18
<4>[ 354.473824] ? wake_up_q+0x70/0x70
<4>[ 354.473831] ? wake_up_q+0x70/0x70
<4>[ 354.473882] ? gen6_rps_boost+0x118/0x120 [i915]
<4>[ 354.473936] i915_gem_object_wait_fence+0x8a/0x110 [i915]
<4>[ 354.473991] i915_gem_object_wait+0x113/0x500 [i915]
<4>[ 354.474047] i915_gem_wait_ioctl+0x11c/0x2f0 [i915]
<4>[ 354.474101] ? i915_gem_unset_wedged+0x210/0x210 [i915]
<4>[ 354.474113] drm_ioctl_kernel+0x81/0xf0
<4>[ 354.474123] drm_ioctl+0x2de/0x390
<4>[ 354.474175] ? i915_gem_unset_wedged+0x210/0x210 [i915]
<4>[ 354.474187] ? finish_task_switch+0x95/0x260
<4>[ 354.474197] ? lock_acquire+0xa6/0x1c0
<4>[ 354.474207] do_vfs_ioctl+0xa0/0x6e0
<4>[ 354.474217] ? __fget+0xfc/0x1e0
<4>[ 354.474225] ksys_ioctl+0x35/0x60
<4>[ 354.474233] __x64_sys_ioctl+0x11/0x20
<4>[ 354.474241] do_syscall_64+0x55/0x190
<4>[ 354.474251] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[ 354.474260] RIP: 0033:0x7f44b3de65d7
<4>[ 354.474267] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
<4>[ 354.474293] RSP: 002b:00007fff974948e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
<4>[ 354.474305] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f44b3de65d7
<4>[ 354.474316] RDX: 00007fff97494940 RSI: 00000000c010646c RDI: 0000000000000007
<4>[ 354.474327] RBP: 00007fff97494940 R08: 0000000000000000 R09: 00007f44b40bbc40
<4>[ 354.474337] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c010646c
<4>[ 354.474348] R13: 0000000000000007 R14: 0000000000000000 R15: 0000000000000000

v2: Avoid floating requests.
v3: Can't call dma_fence_signal() under the timeline lock!
v4: Can't call dma_fence_signal() from inside another fence either.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181203113701.12106-2-chris@chris-wilson.co.uk


# a1db9c54 08-Nov-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track rcu_head for our idle worker

While our little rcu worker might be able to be replaced now by the
dedicated rcu_work, in the meantime we should mark up the rcu_head for
correct debugobjects tracking.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181108092101.27598-2-chris@chris-wilson.co.uk


# 8811d616 09-Nov-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Initialise the obj->rcu head

Make the rcu_head known to the system, in particular for debugobjects.
And having declared it for debugobjects, we need to tidy up afterwards.

v2: mark the obj->rcu as being destroyed when we reuse its location for
the freed list.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108691
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181109090311.15321-1-chris@chris-wilson.co.uk


# 64e3d12f 06-Nov-2018 Kuo-Hsin Yang <vovoy@chromium.org>

mm, drm/i915: mark pinned shmemfs pages as unevictable

The i915 driver uses shmemfs to allocate backing storage for gem
objects. These shmemfs pages can be pinned (increased ref count) by
shmem_read_mapping_page_gfp(). When a lot of pages are pinned, vmscan
wastes a lot of time scanning these pinned pages. In some extreme case,
all pages in the inactive anon lru are pinned, and only the inactive
anon lru is scanned due to inactive_ratio, the system cannot swap and
invokes the oom-killer. Mark these pinned pages as unevictable to speed
up vmscan.

Export pagevec API check_move_unevictable_pages().

This patch was inspired by Chris Wilson's change [1].

[1]: https://patchwork.kernel.org/patch/9768741/

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Kuo-Hsin Yang <vovoy@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com> # mm part
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20181106132324.17390-1-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# e6db7f4d 05-Nov-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Break long iterations for get/put shmemfs pages

As we may have to iterate a few thousand elements to acquire and release
the shmemfs backing storage for a GPU object, we need to break up the
long loop with cond_resched() to retain a modicum of low latency for
other processes.

Testcase: igt/benchmarks/gem_syslatency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kuo-Hsin Yang <vovoy@chromium.org>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181105170640.26905-1-chris@chris-wilson.co.uk


# ab0d6a14 12-Oct-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Large page offsets for pread/pwrite

Handle integer overflow when computing the sub-page length for shmem
backed pread/pwrite.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181012140228.29783-1-chris@chris-wilson.co.uk
(cherry picked from commit a5e856a5348f6cd50889d125c40bbeec7328e466)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# a5e856a5 12-Oct-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Large page offsets for pread/pwrite

Handle integer overflow when computing the sub-page length for shmem
backed pread/pwrite.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181012140228.29783-1-chris@chris-wilson.co.uk


# e9eaf82d 01-Oct-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Priority boost for waiting clients

Latency is in the eye of the beholder. In the case where a client stops
and waits for the gpu, give that request chain a small priority boost
(not so that it overtakes higher priority clients, to preserve the
external ordering) so that ideally the wait completes earlier.

v2: Tvrtko recommends to keep the boost-from-user-stall as small as
possible and to allow new client flows to be preferred for interactivity
over stalls.

Testcase: igt/gem_sync/switch-default
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181001144755.7978-3-chris@chris-wilson.co.uk


# 3159f943 03-Nov-2017 Matthew Wilcox <willy@infradead.org>

xarray: Replace exceptional entries

Introduce xarray value entries and tagged pointers to replace radix
tree exceptional entries. This is a slight change in encoding to allow
the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a
value entry). It is also a change in emphasis; exceptional entries are
intimidating and different. As the comment explains, you can choose
to store values or pointers in the xarray and they are both first-class
citizens.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>


# f8e57863 26-Sep-2018 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Trim partial view sg lists

Partial views are small but there can be many of them, and since the sg
list space for them is allocated pessimistically, we can save some slab by
trimming the unused tail entries.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180926080353.20867-1-tvrtko.ursulin@linux.intel.com


# 6edafc4e 18-Sep-2018 José Roberto de Souza <jose.souza@intel.com>

drm/i915: Unset reset pch handshake when PCH is not present in one place

Right now RESET_PCH_HANDSHAKE_ENABLE is enabled all the times inside
of intel_power_domains_init_hw() and if PCH is NOP it is unsed in
i915_gem_init_hw().
So making skl_pch_reset_handshake() handle both cases and calling
it for the missing gens in intel_power_domains_init_hw().
Ivybridge have a different register and bits but with the same
objective so moving it too.

v2(Rodrigo):
- handling IVYBRIDGE case inside intel_pch_reset_handshake()

v4(Rodrigo and Ville):
- moving the enable/disable decision to callers

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180918204714.27306-2-jose.souza@intel.com


# 74f6e183 26-Sep-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Convert to BITS_PER_TYPE

In commit 9144d75e22ca ("include/linux/bitops.h: introduce BITS_PER_TYPE"),
we made BITS_PER_TYPE available to all and now we can use the macro to
replace some open-coded computation of sizeof(T) * BITS_PER_BYTE.

Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180926104707.17410-1-chris@chris-wilson.co.uk


# 8e1cb32d 20-Sep-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Park the GPU on module load

Once we have flushed the first request through the system to both load a
context and record the default state; tell the GPU to park and idle
itself, putting itself immediately (hopefully at least) into a
powersaving state, and allowing ourselves to start from known state
after setting up all our bookkeeping.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180920161343.1117-1-chris@chris-wilson.co.uk


# c6d22ab6 20-Sep-2018 Matthew Auld <matthew.auld@intel.com>

drm/i915: don't assume struct page in i915_sg_trim

If we copy all the contents of the sg across and not just the page link,
we can then also put it to work in fake_get_huge_pages and beyond.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180920142707.19659-1-matthew.auld@intel.com


# 8db601f0 14-Sep-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Reset CSB pointers on canceling requests (wedging)

The prior assumption was that we did not need to reset the CSB on
wedging when cancelling the outstanding requests as it would be cleaned
up in the subsequent reset prior to restarting the GPU. However, what
was not accounted for was that in preparing for the reset, we would try
to process the outstanding CSB entries. If the GPU happened to complete
a CS event just as we were performing the cancellation of requests, that
event would be kept in the CSB until the reset -- but our bookkeeping
was cleared, causing confusion when trying to complete the CS event.

v2: Use a sanitize on unwedge to avoid interfering with eio suspend
(where we intentionally disable GPU reset).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107925
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180914080017.30308-3-chris@chris-wilson.co.uk


# 666424ab 14-Sep-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Use coherent writes into the context image

That we use a WB mapping for updating the RING_TAIL register inside the
context image even on !llc machines has been a source of consternation
for every reader. It appears to work on bsw+, but it may just have been
that we have been incredibly bad at detecting the errors.

v2: With extra enthusiasm.
v3: Drop force of map type for pinned default_state as by the time we
pin it, the map type is always WB and doesn't conflict with the earlier
use by ce->state.
v4: Transfer engine->default_state from MAP_WC to MAP_WB on creation so
we do not need the MAP_FORCE littered around the backends

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180914123504.2062-3-chris@chris-wilson.co.uk


# 37d7c9cc 14-Sep-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Check engine->default_state mapping on module load

Check we can indeed acquire a WB mapping of the context image on module
load. Later this will give us the opportunity to validate that we can
switch from WC to WB as required.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180914123504.2062-2-chris@chris-wilson.co.uk


# e0ff7a7c 03-Sep-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Early rejection of buffer allocations larger than RAM

We currently try to pin and allocate the whole buffer at a time. If that
object is larger than RAM, we will try to pin the whole of physical
memory, force the machine into oom, and then still fail the allocation.

If the request is obviously too large, error out early. We opt to do
this in the backend to make it easy to use alternate paths that do not
require the entire object pinned, or may easily handle proxy objects
that are larger than physical memory.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180903083337.13134-4-chris@chris-wilson.co.uk


# aae7c06b 03-Sep-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flag any possible writes for a GTT fault

We do not explicitly mark the PTE for the user's GTT mmap as being
wrprotect, so we don't get a refault when we would need to change a
read-only mmapping into read-write. As such, we must presume that if the
vma has PROT_WRITE it may be written to, although this is supposed to be
indicated by set-domain there are cases (e.g. after swap) where
userspace may not be aware of the implicit domain change.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180903083337.13134-2-chris@chris-wilson.co.uk


# 3f51b7e1 30-Aug-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/selftests: Add a simple exerciser for suspend/hibernate

Although we cannot do a full system-level test of suspend/hibernate from
deep with the kernel selftests, we can exercise the GEM subsystem in
isolation and simulate the external effects (such as losing stolen
contents and trashing the register state).

v2: Don't forget to hold rpm
v3: Suspend the GTT mappings, and more rpm!

References: https://bugs.freedesktop.org/show_bug.cgi?id=96526
References: 5ab57c702069 ("drm/i915: Flush logical context image out to memory upon suspend")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jakub Bartmiński <jakub.bartminski@intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Jakub Bartmiński <jakub.bartminski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180830134806.21939-1-chris@chris-wilson.co.uk


# 30b71084 12-Aug-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Cleanup gt powerstate from gem

Since the gt powerstate is allocated by i915_gem_init, clean it from
i915_gem_fini for symmetry and to correct the imbalance on error.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180812223642.24865-1-chris@chris-wilson.co.uk


# c1e63f6d 08-Aug-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Warn if we hit the timeout for wait-for-idle

Hitting the timeout and finding that all engines are actually idle is
indicative of an interrupt delivery problem. This problem is an issue
that we need to fix, so make sure we log it and provide the GEM trace.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180808105101.913-1-chris@chris-wilson.co.uk


# ec5b65a9 26-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Don't disable the GPU for older gen on wedging

If we issue a device level GPU reset on the older gen, it will disable
key components of the GMCH and the display engine. The purpose of
wedging is to simply prevent further GEM usage without disabling KMS, so
we need to be careful when we do issue the reset on wedging.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180726085033.4044-3-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>


# 7ed43df7 26-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Restore sane defaults for KMS on GEM error load

If we fail during GEM initialisation, we scrub the HW state by
performing a device level GPU resuet. However, we want to leave the
system in a usable state (with functioning KMS but no GEM) so after
scrubbing the HW state, we need to restore some sane defaults and
re-enable the low-level common parts of the GPU (such as the GMCH).

v2: Restore GTT entries.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180726085033.4044-2-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>


# d899aceb 25-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark up object tiling-and-stride getters as const

For that little bit of defense against a tired programmer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180725155447.11909-1-chris@chris-wilson.co.uk


# 3970c65c 23-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Skip repeated calls to i915_gem_set_wedged()

If we already wedged, i915_gem_set_wedged() becomes a complicated no-op.

References: https://bugs.freedesktop.org/show_bug.cgi?id=107343
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180723145335.24579-1-chris@chris-wilson.co.uk


# 900ccf30 20-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only force GGTT coherency w/a on required chipsets

Not all chipsets have an internal buffer delaying the visibility of
writes via the GGTT being visible by other physical paths, but we use a
very heavy workaround for all. We only need to apply that workarounds to
the chipsets we know suffer from the delay and the resulting coherency
issue.

Similarly, the same inconsistent coherency fouls up our ABI promise that
a write into a mmap_gtt is immediately visible to others. Since the HW
has made that a lie, let userspace know when that contract is broken.
(Not that userspace would want to use mmap_gtt on those chipsets for
other performance reasons...)

Testcase: igt/drv_selftest/live_coherency
Testcase: igt/gem_mmap_gtt/coherency
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100587
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180720101910.11153-1-chris@chris-wilson.co.uk


# 01f8f33e 17-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Always retire residual requests before suspend

If the driver is wedged, we skip idling the GPU. However, we may still
have a few requests still not retired following the wedging (since they
will be waiting for a background worker trying to acquire struct_mutex).
As we hold the struct_mutex, always do a quick request retirement in
order to flush the wedged path.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107257
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180717084121.28185-1-chris@chris-wilson.co.uk


# a8bd3b88 17-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush chipset caches after GGTT writes

Our I915g (early gen3, the oldest machine we have in the farm) is still
reporting occasional incoherency performing the following operations:

1) write through GGTT (indirect write into memory)
2) write through either CPU or WC (direct write into memory)
3) read from GGTT (indirect read)

Instead of reporting the value from (2), the read from GGTT reports the
earlier value written via the GGTT. We have made sure that the writes are
flushed from the CPU (commit 3a32497f0dbe ("drm/i915/selftests: Provide
full mb() around clflush") and commit add00e6d896f ("drm/i915: Flush the
WCB following a WC write")), but still see the error, just less
frequently. The only remaining cache that might be affected here is a
chipset cache, so flush that as well.

Testcase: igt/drv_selftest/live_coherency #gdg
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180717092655.28417-1-chris@chris-wilson.co.uk


# f8c1cce3 12-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reject attempted pwrites into a read-only object

If the user created a read-only object, they should not be allowed to
circumvent the write protection using the pwrite ioctl.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180712185315.3288-5-chris@chris-wilson.co.uk


# 3e977ac6 12-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prevent writing into a read-only object via a GGTT mmap

If the user has created a read-only object, they should not be allowed
to circumvent the write protection by using a GGTT mmapping. Deny it.

Also most machines do not support read-only GGTT PTEs, so again we have
to reject attempted writes. Fortunately, this is known a priori, so we
can at least reject in the call to create the mmap (with a sanity check
in the fault handler).

v2: Check the vma->vm_flags during mmap() to allow readonly access.
v3: Remove VM_MAYWRITE to curtail mprotect()

Testcase: igt/gem_userptr_blits/readonly_mmap*
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> #v1
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180712185315.3288-4-chris@chris-wilson.co.uk


# 60c0a66e 12-Jul-2018 Michał Winiarski <michal.winiarski@intel.com>

drm/i915: Tidy error handling in i915_gem_init_hw

Let's reorder things so that we can do onion teardown rather than double
goto.

References: b96f6ebfd024 ("drm/i915: Correctly handle error path in i915_gem_init_hw")
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180712124810.25241-1-michal.winiarski@intel.com


# 8bcf9f70 10-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush the residual parking on emergency shutdown

On unwinding following a critical failure inside GEM init, we also need
to be sure to flush the workers before unloading the module.

Testcase: igt/drv_module_reload/basic-reload-inject
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180710094421.16223-1-chris@chris-wilson.co.uk


# bf06112f 09-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Tidy i915_gem_suspend()

In the next patch, we will make a fairly minor change to flush
outstanding resets before suspend. In order to keep churn to a minimum
in that functional patch, we fix up the comments and coding style now.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180709130208.11730-7-chris@chris-wilson.co.uk


# 2621cefa 09-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Provide a timeout to i915_gem_wait_for_idle() on setup

With a broken GPU we expect it to fail during the initial
GPU setup where do a couple of context switches to record the defaults.
This is a task that takes a few milliseconds even on the slowest of
devices, but we may have to wait 60s for hangcheck to give in and
declare the machine inoperable. In this a case where any gpu hang is
unacceptable, both from a timeliness and practical standpoint.

We can therefore set a timeout on our wait-for-idle that is shorter than
the hangcheck (which may be up to 60s for a declaring a wedged driver)
and so detect the broken GPU much more quickly during driver load (and
so prevent stalling userspace for ages).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-2-chris@chris-wilson.co.uk


# ec625fb9 09-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Provide a timeout to i915_gem_wait_for_idle()

Usually we have no idea about the upper bound we need to wait to catch
up with userspace when idling the device, but in a few situations we
know the system was idle beforehand and can provide a short timeout in
order to very quickly catch a failure, long before hangcheck kicks in.

In the following patches, we will use the timeout to curtain two overly
long waits, where we know we can expect the GPU to complete within a
reasonable time or declare it broken.

In particular, with a broken GPU we expect it to fail during the initial
GPU setup where do a couple of context switches to record the defaults.
This is a task that takes a few milliseconds even on the slowest of
devices, but we may have to wait 60s for hangcheck to give in and
declare the machine inoperable. In this a case where any gpu hang is
unacceptable, both from a timeliness and practical standpoint.

The other improvement is that in selftests, we do not need to arm an
independent timer to inject a wedge, as we can just limit the timeout on
the wait directly.

v2: Include the timeout parameter in the trace.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-1-chris@chris-wilson.co.uk


# 890fd185 06-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Replace nested subclassing with explicit subclasses

In the next patch, we will want a third distinct class of timeline that
may overlap with the current pair of client and engine timeline classes.
Rather than use the ad hoc markup of SINGLE_DEPTH_NESTING, initialise
the different timeline classes with an explicit subclass.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180706210710.16251-1-chris@chris-wilson.co.uk


# 6dd7526f 06-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Export i915_request_skip()

In the next patch, we will want to start skipping requests on failing to
complete their payloads. So export the utility function current used to
make requests inoperable following a failed gpu reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180706103947.15919-2-chris@chris-wilson.co.uk


# add00e6d 05-Jul-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush the WCB following a WC write

If we have just completed a WC write, we must ensure that the WCB (Write
Combining Buffer) is flushed out to main memory before we can expect to
see the results. This is especially important when mixing WC with GTT as
the physical paths are different and cachelines are not naturally flushed.

Testcase: igt/drv_selftests/live_coherency #gdg
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180706115402.18547-1-chris@chris-wilson.co.uk


# f0d759f0 28-Jun-2018 Gustavo A. R. Silva <gustavo@embeddedor.com>

drm/i915: Mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 141432
Addresses-Coverity-ID: 141433
Addresses-Coverity-ID: 141434
Addresses-Coverity-ID: 141435
Addresses-Coverity-ID: 141436
Addresses-Coverity-ID: 1357360
Addresses-Coverity-ID: 1357403
Addresses-Coverity-ID: 1357433
Addresses-Coverity-ID: 1392622
Addresses-Coverity-ID: 1415273
Addresses-Coverity-ID: 1435752
Addresses-Coverity-ID: 1441500
Addresses-Coverity-ID: 1454596
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628223541.GA17665@embeddedor.com


# d403397c 30-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Try GGTT mmapping whole object as partial

If the whole object is already pinned by HW for use as scanout, we will
fail to move it to the mappable region and so must resort to using a
partial VMA covering the whole object.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104513
Fixes: aa136d9d72c2 ("drm/i915: Convert partial ggtt vma to full ggtt if it spans the entire object")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180630090509.469-1-chris@chris-wilson.co.uk
(cherry picked from commit 7e7367d3bc6cf27dd7e007e7897fcebfeff1ee8b)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 7e7367d3 30-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Try GGTT mmapping whole object as partial

If the whole object is already pinned by HW for use as scanout, we will
fail to move it to the mappable region and so must resort to using a
partial VMA covering the whole object.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104513
Fixes: aa136d9d72c2 ("drm/i915: Convert partial ggtt vma to full ggtt if it spans the entire object")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180630090509.469-1-chris@chris-wilson.co.uk


# f7dc0157 28-Jun-2018 Michal Wajdeczko <michal.wajdeczko@intel.com>

drm/i915/uc: Fetch GuC/HuC firmwares from guc/huc specific init

We're fetching GuC/HuC firmwares directly from uc level during
init_early stage but this breaks guc/huc struct isolation and
also strict SW-only initialization rule for init_early. Move fw
fetching to init phase and do it separately per guc/huc struct.

v2: don't forget to move wopcm_init - Michele
v3: fetch in init_misc phase - Michal

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com> #2
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628141522.62788-2-michal.wajdeczko@intel.com


# a61b47f6 26-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wait for engines to idle before retiring

In the next^W forthcoming patch, we will start to defer retiring the
request from the engine list if it is still active on the submission
backend. To preserve the semantics that after wait-for-idle completes
the system is idle and fully retired, we need to therefore wait for the
backends to idle before calling i915_retire_requests().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180627115334.16282-1-chris@chris-wilson.co.uk


# bc64e054 15-Jun-2018 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Fix context ban and hang accounting for client

If client is smart or lucky enough to create a new context
after each hang, our context banning mechanism will never
catch up, and as a result of that it will be saved from
client banning. This can result in a never ending streak of
gpu hangs caused by bad or malicious client, preventing
access from other legit gpu clients.

Fix this by always incrementing per client ban score if
it hangs in short successions regardless of context ban
scoring. The exception are non bannable contexts. They remain
detached from client ban scoring mechanism.

v2: xchg timestamp, tidyup (Chris)
v3: comment, bannable & banned together (Chris)

Fixes: b083a0870c79 ("drm/i915: Add per client max context ban limit")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180615104429.31477-1-mika.kuoppala@linux.intel.com
(cherry picked from commit 14921f3cef85b0167a9145e5f29b9dfc3b2a84dc)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# bcc2661e 18-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only show debug for state changes when banning

Since we trigger 10,000s of hangs and resets during selftesting, we emit
many, many thousands of lines of useless debug messages. Reduce the
frequency by only logging a change in state of a guilty context.

Fixes: 14921f3cef85 ("drm/i915: Fix context ban and hang accounting for client")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180618073135.10849-1-chris@chris-wilson.co.uk


# 4fdd5b4e 16-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix fallout of fake reset along resume

commit b2209e62a450 ("drm/i915/execlists: Reset the CSB head tracking on
reset/sanitization") and commit 1288786b18f7 ("drm/i915: Move GEM sanitize
from resume_early to resume") show the conflicting requirements on the
code. We must reset the GPU before trashing live state on a fast resume
(hibernation debug, or error paths), but we must only reset our state
tracking iff the GPU is reset (or power cycled). This is tricky if we
are disabling GPU reset to simulate broken hardware; we reset our state
tracking but the GPU is left intact and recovers from its stale state.

v2: Again without the assertion for forcewake, no longer required since
commit b3ee09a4de33 ("drm/i915/ringbuffer: Fix context restore upon reset")
as the contexts are reset from the CS ensuring everything is powered up.

Fixes: b2209e62a450 ("drm/i915/execlists: Reset the CSB head tracking on reset/sanitization")
Fixes: 1288786b18f7 ("drm/i915: Move GEM sanitize from resume_early to resume")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180616202534.18767-1-chris@chris-wilson.co.uk


# 042ed2db 15-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Be irqsafe inside reset

As we want to be able to call i915_reset_engine and co from a softirq or
timer context, we need to be irqsafe at all times. So we have to forgo
the simple spin_lock_irq for the full spin_lock_irqsave.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180615093137.14270-1-chris@chris-wilson.co.uk


# 14921f3c 15-Jun-2018 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Fix context ban and hang accounting for client

If client is smart or lucky enough to create a new context
after each hang, our context banning mechanism will never
catch up, and as a result of that it will be saved from
client banning. This can result in a never ending streak of
gpu hangs caused by bad or malicious client, preventing
access from other legit gpu clients.

Fix this by always incrementing per client ban score if
it hangs in short successions regardless of context ban
scoring. The exception are non bannable contexts. They remain
detached from client ban scoring mechanism.

v2: xchg timestamp, tidyup (Chris)
v3: comment, bannable & banned together (Chris)

Fixes: b083a0870c79 ("drm/i915: Add per client max context ban limit")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180615104429.31477-1-mika.kuoppala@linux.intel.com


# 697b9a87 12-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make closing request flush mandatory

For symmetry, simplicity and ensuring the request is always truly idle
upon its completion, always emit the closing flush prior to emitting the
request breadcrumb. Previously, we would only emit the flush if we had
started a user batch, but this just leaves all the other paths open to
speculation (do they affect the GPU caches or not?) With mm switching, a
key requirement is that the GPU is flushed and invalidated before hand,
so for absolute safety, we want that closing flush be mandatory.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180612105135.4459-1-chris@chris-wilson.co.uk


# acd1c1e6 11-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refactor unsettting obj->mm.pages

As i915_gem_object_phys_attach() wants to play dirty and mess around
with obj->mm.pages itself (replacing the shmemfs with a DMA allocation),
refactor the gubbins so into i915_gem_object_unset_pages() that we don't
have to duplicate all the secrets.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180611075532.26534-1-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/152871104647.1718.8796913290418060204@jlahtine-desk.ger.corp.intel.com


# 51c18bf7 08-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Squash GEM load failure message (again)

Due to a silent conflict (silent because we are trying to fix the CI
test that is meant to exercising these failures!) between commit
51e645b6652c ("drm/i915: Mark the GPU as wedged without error on fault
injection") and commit 8571a05a9dd0 ("drm/i915: Use GEM suspend when
aborting initialisation"), we failed to actually squash the error
message after injecting the load failure.

Rearrange the code to export i915_load_failure() for better logging of
real errors (and quiet logging of injected errors).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180609111058.2660-1-chris@chris-wilson.co.uk


# 51e645b6 07-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark the GPU as wedged without error on fault injection

If we have been instructed (by CI) to inject a fault to load the module
with a wedged GPU, do so quietly less we upset CI.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180607134558.31150-1-chris@chris-wilson.co.uk


# 52137010 06-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Change i915_gem_fault() to return vm_fault_t

In preparation for vm_fault_t becoming a distinct type, convert the
fault handler (i915_gem_fault()) over to the new interface.

Based on a patch by Souptick Joarder

References: 1c8f422059ae ("mm: change return type to vm_fault_t")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180606214520.20220-1-chris@chris-wilson.co.uk


# 8571a05a 06-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use GEM suspend when aborting initialisation

As part of our GEM initialisation now, we send a request to the hardware
in order to record the initial GPU state. This coupled with deferred
idle workers, makes aborting on error tricky. We already have the
mechanism in place to wait on the GPU and cancel all the deferred
workers for suspend, so let's reuse it during the error teardown. It is
already used in places for later init error handling, but doing so at
this point is slightly ugly due to the mutex dance (it's ok, the module
load is still single threaded).

Testcase: igt/drv_module_reload/basic-reload-inject
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180606145441.4460-1-chris@chris-wilson.co.uk


# 82ad6443 05-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/gtt: Rename i915_hw_ppgtt base member

In the near future, I want to subclass gen6_hw_ppgtt as it contains a
few specialised members and I wish to add more. To avoid the ugliness of
using ppgtt->base.base, rename the i915_hw_ppgtt base member
(i915_address_space) as vm, which is our common shorthand for an
i915_address_space local.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180605153758.18422-1-chris@chris-wilson.co.uk


# 420980ca 05-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Swap magics and use SZ_1M

Since the kernel provides SZ_1M, use it in preference of 1 << 20.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180605135746.8020-1-chris@chris-wilson.co.uk


# b96f6ebf 04-Jun-2018 Michal Wajdeczko <michal.wajdeczko@intel.com>

drm/i915: Correctly handle error path in i915_gem_init_hw

In function gem_init_hw() we are calling uc_init_hw() but in case
of error later in function, we missed to call matching uc_fini_hw()

v2: pulled out from the series

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180605122443.23776-1-michal.wajdeczko@intel.com


# 52b2416c 08-May-2018 Changbin Du <changbin.du@intel.com>

drm/i915: Add new vGPU cap info bit VGT_CAPS_HUGE_GTT

This adds a new vGPU cap info bit VGT_CAPS_HUGE_GTT, which is to detect
whether the host supports shadowing of huge gtt pages. If host does
support it, remove the page sizes restriction for vGPU.

Signed-off-by: Changbin Du <changbin.du@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1525770425-5373-1-git-send-email-changbin.du@intel.com


# 8979187a 04-Jun-2018 Michal Wajdeczko <michal.wajdeczko@intel.com>

drm/i915: Move i915_gem_fini to i915_gem.c

We should keep i915_gem_init/fini functions together for easier
tracking of their symmetry.

v2: rebased, pulled out from the series

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180604090032.20840-1-michal.wajdeczko@intel.com


# 95c778da 01-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Apply the full CPU domain markup before freezing

Let's not take any chances by using a shortcut to mark the objects as in
the CPU domain upon freezing (all pages will be written to disk and so
on restore all objects will start from the CPU domain). Currently, we
simply mark the objects as being in the CPU domain, bypassing the
flushes. Let's call the full domain transfer function so that we have
less special case code (and symmetry with the suspend path) even though
it will be mostly redundant.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180601144125.18026-2-chris@chris-wilson.co.uk


# 9776f472 01-Jun-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush all writes before suspend

As we have already suspended the device, this should be a no-op except
for marking that all writes are indeed complete. The downside is that
we then have to walk all the lists of objects for what should be a no-op
(in some cases they will be mmio read to ensure the GGTT writes are
indeed flushed, and clflushes to ensure that cpu writes are in memory).

It seems prudent and the safer course for us to ensure all writes are
flushed to memory before suspend.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180601144125.18026-1-chris@chris-wilson.co.uk


# 1934f5de 31-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Assert we idle in the kernel context

Now that we always switch to the kernel context upon idling, we can
make that assertion.

References: 4dfacb0bcbee ("drm/i915: Switch to kernel context before idling at runtime")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180531224057.6036-1-chris@chris-wilson.co.uk


# ec92ad00 31-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only sanitize GEM from late suspend

During testing we encounter a conflict between SUSPEND_TEST_DEVICES and
disabling reset (gem_eio/suspend). This results in the device continuing
on without being reset, but since it has gone through HW sanitization to
account for the suspend/resume cycle, we have to assume the device has
been reset to its defaults. A simple way around this is to skip the
sanitize phase for SUSPEND_TEST_DEVICES by moving it to suspend-late.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180531082246.9763-4-chris@chris-wilson.co.uk


# c3160da9 31-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: After reset on sanitization, reset the engine backends

As we reset the GPU on suspend/resume, we also do need to reset the
engine state tracking so call into the engine backends. This is
especially important so that we can also sanitize the state tracking
across resume.

References: https://bugs.freedesktop.org/show_bug.cgi?id=106702
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180531082246.9763-3-chris@chris-wilson.co.uk


# 0606035f 31-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: "Race-to-idle" after switching to the kernel context

During suspend we want to flush out all active contexts and their
rendering. To do so we queue a request from the kernel's context, once
we know that request is done, we know the GPU is completely idle. To
speed up that switch bump the GPU clocks.

Switching to the kernel context prior to idling is also used to enforce
a barrier before changing OA properties, and when evicting active
rendering from the global GTT. All cases where we do want to
race-to-idle.

v2: Limit the boosting to only the switch before suspend.
v3: Limit it to the wait-for-idle on suspend.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Tested-by: David Weinehall <david.weinehall@linux.intel.com> #v1
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180531082246.9763-2-chris@chris-wilson.co.uk


# 4dfacb0b 31-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Switch to kernel context before idling at runtime

We can reduce our exposure to random neutrinos by resting on the kernel
context having flushed out the user contexts to system memory and
beyond. The corollary is that we then we require two passes through the
idle handler to go to sleep, which on a truly idle system involves an
extra pass through the slow and irregular retire work handler.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180531082246.9763-1-chris@chris-wilson.co.uk


# bc61ec46 29-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove stale asserts from i915_gem_find_active_request()

Since we use i915_gem_find_active_request() from inside
intel_engine_dump() and may call that at any time, we do not guarantee
that the engine is paused nor that the signal kthreads and irq handler
are suspended, so we cannot assert that the breadcrumb doesn't advance
and that the irq hasn't happened on another CPU signaling the request we
believe to be idle.

The second assert removed (that request->engine == engine) remains
valid, but is now more rigorously checked during retirement.

Fixes: f636edb214a5 ("drm/i915: Make i915_engine_info pretty printer to standalone")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180529132922.6831-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
(cherry picked from commit cc7cc5343584d90e74b7c929ff2c9a2ec8b49cfe)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# cc7cc534 29-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove stale asserts from i915_gem_find_active_request()

Since we use i915_gem_find_active_request() from inside
intel_engine_dump() and may call that at any time, we do not guarantee
that the engine is paused nor that the signal kthreads and irq handler
are suspended, so we cannot assert that the breadcrumb doesn't advance
and that the irq hasn't happened on another CPU signaling the request we
believe to be idle.

The second assert removed (that request->engine == engine) remains
valid, but is now more rigorously checked during retirement.

Fixes: f636edb214a5 ("drm/i915: Make i915_engine_info pretty printer to standalone")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180529132922.6831-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 09a4c02e 24-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Look for an active kernel context before switching

We were not very carefully checking to see if an older request on the
engine was an earlier switch-to-kernel-context before deciding to emit a
new switch. The end result would be that we could get into a permanent
loop of trying to emit a new request to perform the switch simply to
flush the existing switch.

What we need is a means of tracking the completion of each timeline
versus the kernel context, that is to detect if a more recent request
has been submitted that would result in a switch away from the kernel
context. To realise this, we need only to look in our syncmap on the
kernel context and check that we have synchronized against all active
rings.

v2: Since all ringbuffer clients currently share the same timeline, we do
have to use the gem_context to distinguish clients.

As a bonus, include all the tracing used to debug the death inside
suspend.

v3: Test, test, test. Construct a selftest to exercise and assert the
expected behaviour that multiple switch-to-contexts do not emit
redundant requests.

Reported-by: Mika Kuoppala <mika.kuoppala@intel.com>
Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180524081135.15278-1-chris@chris-wilson.co.uk


# 1fc44d9b 17-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Store a pointer to intel_context in i915_request

To ease the frequent and ugly pointer dance of
&request->gem_context->engine[request->engine->id] during request
submission, store that pointer as request->hw_context. One major
advantage that we will exploit later is that this decouples the logical
context state from the engine itself.

v2: Set mock_context->ops so we don't crash and burn in selftests.
Cleanups from Tvrtko.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-3-chris@chris-wilson.co.uk


# 4e0d64db 17-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move request->ctx aside

In the next patch, we want to store the intel_context pointer inside
i915_request, as it is frequently access via a convoluted dance when
submitting the request to hw. Having two context pointers inside
i915_request leads to confusion so first rename the existing
i915_gem_context pointer to i915_request.gem_context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-1-chris@chris-wilson.co.uk


# 3f6e9822 16-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Stop parking the signaler around reset

We cannot call kthread_park() from softirq context, so let's avoid it
entirely during the reset. We wanted to suspend the signaler so that it
would not mark a request as complete at the same time as we marked it as
being in error. Instead of parking the signaling, stop the engine from
advancing so that the GPU doesn't emit the breadcrumb for our chosen
"guilty" request.

v2: Refactor setting STOP_RING so that we don't have the same code thrice

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michałt Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180516183355.10553-8-chris@chris-wilson.co.uk


# 5adfb772 16-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move engine reset prepare/finish to backends

In preparation to more carefully handling incomplete preemption during
reset by execlists, we move the existing code wholesale to the backends
under a couple of new reset vfuncs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180516183355.10553-4-chris@chris-wilson.co.uk


# f351d087 16-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only sync tasklets once for recursive reset preparation

When setting up reset, we may need to recursively prepare an engine. In
which case we should only synchronously flush the tasklets on the outer
most call, the inner calls will then be inside an atomic section where
the tasklet will never be run (and so the sync will never complete).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180516183355.10553-2-chris@chris-wilson.co.uk


# b8444cf8 16-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove tasklet flush before disable

The idea was to try and let the existing tasklet run to completion
before we began the reset, but it involves a racy check against anything
else that tries to run the tasklet. Rather than acknowledge and ignore
the race, let it be and don't try and be too clever.

The tasklet will resume execution after reset (after spinning a bit
during reset), but before we allow it to resume we will have cleared all
the pending state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180516183355.10553-1-chris@chris-wilson.co.uk


# 0c591a40 12-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark up nested spinlocks

When we process the outstanding requests upon banning a context, we need
to acquire both the engine and the client's timeline, nesting the locks.
This requires explicit markup as the two timelines are now of the same
class, since commit a89d1f921c15 ("drm/i915: Split i915_gem_timeline into
individual timelines").

Testcase: igt/gem_eio/banned
Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180512084957.9829-1-chris@chris-wilson.co.uk


# 4f6d8fcf 07-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush submission tasklet after bumping priority

When called from process context tasklet_schedule() defers itself to
ksoftirqd. From experience this may cause unacceptable latencies of over
200ms in executing the submission tasklet, our goal is to reprioritise
the HW execution queue and trigger HW preemption immediately, so disable
bh over the call to schedule and force the tasklet to run afterwards if
scheduled.

v2: Keep rcu_read_lock() around for PREEMPT_RCU

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180507135731.10587-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 3365e226 03-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Lazily unbind vma on close

When userspace is passing around swapbuffers using DRI, we frequently
have to open and close the same object in the foreign address space.
This shows itself as the same object being rebound at roughly 30fps
(with a second object also being rebound at 30fps), which involves us
having to rewrite the page tables and maintain the drm_mm range manager
every time.

However, since the object still exists and it is only the local handle
that disappears, if we are lazy and do not unbind the VMA immediately
when the local user closes the object but defer it until the GPU is
idle, then we can reuse the same VMA binding. We still have to be
careful to mark the handle and lookup tables as closed to maintain the
uABI, just allowing the underlying VMA to be resurrected if the user is
able to access the same object from the same context again.

If the object itself is destroyed (neither userspace keeping a handle to
it), the VMA will be reaped immediately as usual.

In the future, this will be even more useful as instantiating a new VMA
for use on the GPU will become heavier. A nuisance indeed, so nip it in
the bud.

v2: s/__i915_vma_final_close/i915_vma_destroy/ etc.
v3: Leave a hint as to why we deferred the unbind on close.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180503195115.22309-1-chris@chris-wilson.co.uk


# a89d1f92 02-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Split i915_gem_timeline into individual timelines

We need to move to a more flexible timeline that doesn't assume one
fence context per engine, and so allow for a single timeline to be used
across a combination of engines. This means that preallocating a fence
context per engine is now a hindrance, and so we want to introduce the
singular timeline. From the code perspective, this has the notable
advantage of clearing up a lot of mirky semantics and some clumsy
pointer chasing.

By splitting the timeline up into a single entity rather than an array
of per-engine timelines, we can realise the goal of the previous patch
of tracking the timeline alongside the ring.

v2: Tweak wait_for_idle to stop the compiling thinking that ret may be
uninitialised.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-2-chris@chris-wilson.co.uk


# 65fcb806 02-May-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move timeline from GTT to ring

In the future, we want to move a request between engines. To achieve
this, we first realise that we have two timelines in effect here. The
first runs through the GTT is required for ordering vma access, which is
tracked currently by engine. The second is implied by sequential
execution of commands inside the ringbuffer. This timeline is one that
maps to userspace's expectations when submitting requests (i.e. given the
same context, batch A is executed before batch B). As the rings's
timelines map to userspace and the GTT timeline an implementation
detail, move the timeline from the GTT into the ring itself (per-context
in logical-ring-contexts/execlists, or a global per-engine timeline for
the shared ringbuffers in legacy submission.

The two timelines are still assumed to be equivalent at the moment (no
migrating requests between engines yet) and so we can simply move from
one to the other without adding extra ordering.

v2: Reinforce that one isn't allowed to mix the engine execution
timeline with the client timeline from userspace (on the ring).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-1-chris@chris-wilson.co.uk


# 643b450a 30-Apr-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only track live rings for retiring

We don't need to track every ring for its lifetime as they are managed
by the contexts/engines. What we do want to track are the live rings so
that we can sporadically clean up requests if userspace falls behind. We
can simply restrict the gt->rings list to being only gt->live_rings.

v2: s/live/active/ for consistency with gt.active_requests

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-4-chris@chris-wilson.co.uk


# b887d615 30-Apr-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Retire requests along rings

In the next patch, rings are the central timeline as requests may jump
between engines. Therefore in the future as we retire in order along the
engine timeline, we may retire out-of-order within a ring (as the ring now
occurs along multiple engines), leading to much hilarity in miscomputing
the position of ring->head.

As an added bonus, retiring along the ring reduces the penalty of having
one execlists client do cleanup for another (old legacy submission
shares a ring between all clients). The downside is that slow and
irregular (off the critical path) process of cleaning up stale requests
after userspace becomes a modicum less efficient.

In the long run, it will become apparent that the ordered
ring->request_list matches the ring->timeline, a fun challenge for the
future will be unifying the two lists to avoid duplication!

v2: We need both engine-order and ring-order processing to maintain our
knowledge of where individual rings have completed upto as well as
knowing what was last executing on any engine. And finally by decoupling
retiring the contexts on the engine and the timelines along the rings,
we do have to keep a reference to the context on each request
(previously it was guaranteed by the context being pinned).

v3: Not just a reference to the context, but we need to keep it pinned
as we manipulate the rings; i.e. we need a pin for both the manipulation
of the engine state during its retirements, and a separate pin for the
manipulation of the ring state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-3-chris@chris-wilson.co.uk


# ab82a063 30-Apr-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wrap engine->context_pin() and engine->context_unpin()

Make life easier in upcoming patches by moving the context_pin and
context_unpin vfuncs into inline helpers.

v2: Fixup mock_engine to mark the context as pinned on use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-2-chris@chris-wilson.co.uk


# 7f961d79 26-Apr-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Compile out engine debug for release

The majority of the engine state dumping is too voluminous to be useful
outside of a controlled setup, though a few do accompany severe errors.
Keep the debug dumps next to the errors, but hide the others behind a CI
compile flag. This becomes more useful when adding more dumps to latency
sensitive paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180426103219.22181-1-chris@chris-wilson.co.uk


# b7268c5e 18-Apr-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pack params to engine->schedule() into a struct

Today we only want to pass along the priority to engine->schedule(), but
in the future we want to have much more control over the various aspects
of the GPU during a context's execution, for example controlling the
frequency allowed. As we need an ever growing number of parameters for
scheduling, move those into a struct for convenience.

v2: Move the anonymous struct into its own function for legibility and
ye olde gcc.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180418184052.7129-3-chris@chris-wilson.co.uk


# 59b449d5 10-Apr-2018 Oscar Mateo <oscar.mateo@intel.com>

drm/i915: Split out functions for different kinds of workarounds

There are different kind of workarounds (those that modify registers that
live in the context image, those that modify global registers, those that
whitelist registers, etc...) and they have different requirements in terms
of where they are applied and how. Also, by splitting them apart, it should
be easier to decide where a new workaround should go.

v2:
- Add multiple MISSING_CASE
- Rebased

v3:
- Rename mmio_workarounds to gt_workarounds (Chris, Mika)
- Create empty placeholders for BDW and CHV GT WAs
- Rebased

v4: Rebased

v5:
- Rebased
- FORCE_TO_NONPRIV register exists since BDW, so make a path
for it to achieve universality, even if empty (Chris)

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[ickle: appease checkpatch]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1523376767-18480-2-git-send-email-oscar.mateo@intel.com


# 3834dc1f 10-Apr-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Don't fiddle with rps/rc6 across GPU reset

Resetting the GPU doesn't affect the RPS/RC6 state, so we can stop
forcibly reloading the registers.

Ville suggested this many moons ago, I said at that time that sanitizing
was no harm and meant that our bookkeeping was kept consistent with the
HW. However, in a forthcoming series, we want to split rps/rc6 GT
powermanagement and one of the key simplifications is the control of
when we enable it. Performing a crude sanitize in the middle of
i915_gem_reset() is then a huge wart.

Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180410133354.13425-1-chris@chris-wilson.co.uk


# d0667e9c 06-Apr-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pass the set of guilty engines to i915_reset()

Currently, we rely on inspecting the hangcheck state from within the
i915_reset() routines to determine which engines were guilty of the
hang. This is problematic for cases where we want to run
i915_handle_error() and call i915_reset() independently of hangcheck.
Instead of relying on the indirect parameter passing, turn it into an
explicit parameter providing the set of stalled engines which then are
treated as guilty until proven innocent.

While we are removing the implicit stalled parameter, also make the
reason into an explicit parameter to i915_reset(). We still need a
back-channel for i915_handle_error() to hand over the task to the locked
waiter, but let's keep that its own channel rather than incriminate
another.

This leaves stalled/seqno as being private to hangcheck, with no more
nefarious snooping by reset, be it whole-device or per-engine. \o/

The only real issue now is that this makes it crystal clear that we
don't actually do any testing of hangcheck per se in
drv_selftest/live_hangcheck, merely of resets!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180406220354.18911-2-chris@chris-wilson.co.uk


# bba0869b 06-Apr-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Treat i915_reset_engine() as guilty until proven innocent

If we are resetting just one engine, we know it has stalled. So we can
pass the stalled parameter directly to i915_gem_reset_engine(), which
alleviates the necessity to poke at the generic engine->hangcheck.stalled
magic variable, leaving that under control of hangcheck as its name
implies. Other than simplifying by removing the indirect parameter along
this path, this allows us to introduce new reset mechanisms that run
independently of hangcheck.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180406220354.18911-1-chris@chris-wilson.co.uk


# e4d2006f 06-Apr-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Split out parking from the idle worker for reuse

We will want to park GEM before disengaging the drive^W^W^W unwedging.
Since we already do the work for idling, expose the guts as a new
function that we can then reuse.

v2: Just skip if already parked; makes it more forgiving to use by
future callers.
v3: Extract mark_busy, rename it to i915_gem_unpark and place it next to
i915_gem_park so that we can evaluate it for symmetry more easily.
Calling GEM from inside i915_request looks to be a bit of a layering
violation, for the moment I am imaging them as being notify_cb.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> #v1
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180406155144.27791-1-chris@chris-wilson.co.uk


# a0de908d 22-Mar-2018 Michal Wajdeczko <michal.wajdeczko@intel.com>

drm/i915: Reorder early initialization

In upcoming patch, we want to perform more actions in early
initialization of the uC. This reordering will help resolve
new dependencies that will be introduced by future patch.

v2: s/i915_gem_load_init/i915_gem_init_early (Chris)
v3: s/i915_gem_load_cleanup/i915_gem_cleanup_early (Michal)

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180323123451.59244-1-michal.wajdeczko@intel.com


# 73e2232a 07-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only call tasklet_kill() on the first prepare_reset

tasklet_kill() will spin waiting for the current tasklet to be executed.
However, if tasklet_disable() has been called, then the tasklet is never
executed but permanently put back onto the runlist until
tasklet_enable() is called. Ergo, we cannot use tasklet_kill() inside a
disable/enable pair. This is the case when we call set-wedge from inside
i915_reset(), and another request was submitted to us concurrent to the
reset.

Fixes: 963ddd63c314 ("drm/i915: Suspend submission tasklets around wedging")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-6-chris@chris-wilson.co.uk
(cherry picked from commit 68ad361285a9cc73b259f59adbaafde196c15987)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 7e9d3a4a 07-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection

Similar to the staging around handling of engine->submit_request, we
need to stop adding to the execlists->queue prior to calling
engine->cancel_requests. cancel_requests will move requests from the
queue onto the timeline, so if we add a request onto the queue after that
point, it will be lost.

Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-5-chris@chris-wilson.co.uk
(cherry picked from commit 47650db02dd52267953df81438c93cf8a0eb0e5e)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# ac697ae8 15-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Stop engines when declaring the machine wedged

If we fail to reset the GPU, we declare the machine wedged. However, the
GPU may well still be running in the background with an in-flight
request. So despite our efforts in cleaning up the request queue and
faking the breadcrumb in the HWSP, the GPU may eventually write the
in-flght seqno there breaking all of our assumptions and throwing the
driver into a deep turmoil, wedging beyond wedged.

To avoid this we ideally want to reset the GPU. Since that has already
failed, make sure the rings have the stop bit set instead. This is part
of the normal GPU reset sequence, but that is actually disabled by
igt/gem_eio to force the wedged state. If we assume the worst, we must
poke at the bit again before we give up.

v2: Move the intel_gpu_reset() from set-wedged in the reset error path
into i915_gem_set_wedged() itself. Even if the reset fails (e.g. if it is
disabled by gem_eio), it still tries to make sure the engines are
stopped. For i915_gem_set_wedged() callers from outside of i915_reset(),
this should make sure the GPU is disabled while the driver is marked as
being wedged.

Testcase: igt/gem_eio
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180315151015.22741-1-chris@chris-wilson.co.uk


# d9b13c4d 15-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Trace GEM steps between submit and wedging

We still have an odd race with wedging/unwedging as shown by igt/gem_eio
that defies expectations. Add some more trace_printks to try and
visualize the flow over the precipice.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180315131451.4060-1-chris@chris-wilson.co.uk


# f08e2035 13-Mar-2018 Jackie Li <yaodong.li@intel.com>

drm/i915/guc: Check the locking status of GuC WOPCM registers

GuC WOPCM registers are write-once registers. Current driver code accesses
these registers without checking the accessibility to these registers which
will lead to unpredictable driver behaviors if these registers were touch
by other components (such as faulty BIOS code).

This patch moves the GuC WOPCM registers updating code into intel_wopcm.c
and adds check before and after the update to GuC WOPCM registers so that
we can make sure the driver is in a known state after writing to these
write-once registers.

v6:
- Made sure module reloading won't bug the kernel while doing
locking status checking

v7:
- Fixed patch format issues

v8:
- Fixed coding style issue on register lock bit macro definition (Sagar)

v9:
- Avoided to use redundant !! to cast uint to bool (Chris)
- Return error code instead of GEM_BUG_ON for locked with invalid register
values case (Sagar)
- Updated guc_wopcm_hw_init to use guc_wopcm as first parameter (Michal)
- Added code to set and validate the HuC_LOADING_AGENT_GUC bit in GuC
WOPCM offset register based on the presence of HuC firmware (Michal)
- Use bit fields instead of macros for GuC WOPCM flags (Michal)

v10:
- Refined variable names, removed redundant comments (Joonas)
- Introduced lockable_reg to handle the write once register write and
propagate the write error to caller (Joonas)
- Used lockable_reg abstraction to avoid locking bit check on generic
i915_reg_t (Michal)
- Added log message for error paths (Michal)
- Removed hw_updated flag and only relies on real hardware status

v11:
- Replaced lockable_reg with simplified function (Michal)
- Used new macros for locking bits of WOPCM size/offset registers instead
of using BIT(0) directly (Michal)
- use intel_wopcm_init_hw() called from intel_gem_init_hw() to do GuC
WOPCM register setup instead of calling from intel_uc_init_hw() (Michal)

v12:
- Updated function kernel-doc to align with code changes (Michal)
- Updated code to use wopcm pointer directly (Michal)

v13:
- Updated the ordering of s-o-b/cc/r-b tags (Sagar)

BSpec: 10875, 10833

Signed-off-by: Jackie Li <yaodong.li@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> (v11)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v12)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1520987574-19351-5-git-send-email-yaodong.li@intel.com


# 6b0478fb 13-Mar-2018 Jackie Li <yaodong.li@intel.com>

drm/i915: Implement dynamic GuC WOPCM offset and size calculation

Hardware may have specific restrictions on GuC WOPCM offset and size. On
Gen9, the value of the GuC WOPCM size register needs to be larger than the
value of GuC WOPCM offset register + a Gen9 specific offset (144KB) for
reserved GuC WOPCM. Fail to enforce such a restriction on GuC WOPCM size
will lead to GuC firmware execution failures. On the other hand, with
current static GuC WOPCM offset and size values (512KB for both offset and
size), the GuC WOPCM size verification will fail on Gen9 even if it can be
fixed by lowering the GuC WOPCM offset by calculating its value based on
HuC firmware size (which is likely less than 200KB on Gen9), so that we can
have a GuC WOPCM size value which is large enough to pass the GuC WOPCM
size check.

This patch updates the reserved GuC WOPCM size for RC6 context on Gen9 to
24KB to strictly align with the Gen9 GuC WOPCM layout. It also adds support
to verify the GuC WOPCM size aganist the Gen9 hardware restrictions. To
meet all above requirements, let's provide dynamic partitioning of the
WOPCM that will be based on platform specific HuC/GuC firmware sizes.

v2:
- Removed intel_wopcm_init (Ville/Sagar/Joonas)
- Renamed and Moved the intel_wopcm_partition into intel_guc (Sagar)
- Removed unnecessary function calls (Joonas)
- Init GuC WOPCM partition as soon as firmware fetching is completed

v3:
- Fixed indentation issues (Chris)
- Removed layering violation code (Chris/Michal)
- Created separat files for GuC wopcm code (Michal)
- Used inline function to avoid code duplication (Michal)

v4:
- Preset the GuC WOPCM top during early GuC init (Chris)
- Fail intel_uc_init_hw() as soon as GuC WOPCM partitioning failed

v5:
- Moved GuC DMA WOPCM register updating code into intel_wopcm.c
- Took care of the locking status before writing to GuC DMA
Write-Once registers. (Joonas)

v6:
- Made sure the GuC WOPCM size to be multiple of 4K (4K aligned)

v8:
- Updated comments and fixed naming issues (Sagar/Joonas)
- Updated commit message to include more description about the hardware
restriction on GuC WOPCM size (Sagar)

v9:
- Minor changes variable names and code comments (Sagar)
- Added detailed GuC WOPCM layout drawing (Sagar/Michal)
- Refined macro definitions to be reader friendly (Michal)
- Removed redundent check to valid flag (Michal)
- Unified first parameter for exported GuC WOPCM functions (Michal)
- Refined the name and parameter list of hardware restriction checking
functions (Michal)

v10:
- Used shorter function name for internal functions (Joonas)
- Moved init-ealry function into c file (Joonas)
- Consolidated and removed redundant size checks (Joonas/Michal)
- Removed unnecessary unlikely() from code which is only called once
during boot (Joonas)
- More fixes to kernel-doc format and content (Michal)
- Avoided the use of PAGE_MASK for 4K pages (Michal)
- Added error log messages to error paths (Michal)

v11:
- Replaced intel_guc_wopcm with more generic intel_wopcm and attached
intel_wopcm to drm_i915_private instead intel_guc (Michal)
- dynamic calculation of GuC non-wopcm memory start (a.k.a WOPCM Top
offset from GuC WOPCM base) (Michal)
- Moved WOPCM marco definitions into .c source file (Michal)
- Exported WOPCM layout diagram as kernel-doc (Michal)

v12:
- Updated naming, function kernel-doc to align with new changes (Michal)

v13:
- Updated the ordering of s-o-b/cc/r-b tags (Sagar)
- Corrected one tense error in comment (Sagar)
- Corrected typos and removed spurious comments (Joonas)

Bspec: 12690

Signed-off-by: Jackie Li <yaodong.li@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Spotswood <john.a.spotswood@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> (v8)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v9)
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> (v11)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v12)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1520987574-19351-2-git-send-email-yaodong.li@intel.com


# 629820fc 09-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Show GEM_TRACE when detecting a failed GPU idle

If we timeout waiting for the GPU to idle, something went seriously
wrong. We currently dump the engine state, but we can also dump the
ftrace buffer showing our last operations (when available).

In passing, note that since commit 559e040f1f08 ("drm/i915: Show the GPU
state when declaring wedged", we now show the engine state twice, once
in detecting the failed idle and then again on declaring wedged.

v2: ftrace_dump() takes a parameter specifying whether to dump all cpu
buffers or the local cpu's.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180309101114.1138-1-chris@chris-wilson.co.uk


# 07bcd99b 06-Mar-2018 Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>

drm/i915/frontbuffer: Pull frontbuffer_flush out of gem_obj_pin_to_display

i915_gem_obj_pin_to_display() calls frontbuffer_flush with origin set to
DIRTYFB. The callers however are at a vantage point to decide if hardware
frontbuffer tracking can do the flush for us. For example, legacy cursor
updates, like flips, write to MMIO registers, which then triggers PSR flush
by the hardware. Moving frontbuffer_flush out will enable us to skip a
software initiated flush by setting origin to FLIP. Thanks to Chris for the
idea.

v2:
Rebased due to Ville adding intel_plane_pin_fb().
Minor code reordering as fb_obj_flush doesn't need struct_mutex (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307033420.3086-1-dhinakaran.pandiyan@intel.com


# c37d5728 12-Mar-2018 Michal Wajdeczko <michal.wajdeczko@intel.com>

drm/i915/uc: Sanitize uC together with GEM

Instead of dancing around uC on reset/suspend/resume scenarios,
explicitly sanitize uC when we sanitize GEM to force uC reload
and start from known beginning.

v2: don't forget about reset path (Daniele)
sanitize uc before gem initiated full reset (Daniele)
v3: drop redundant disable_communication in init_hw (Daniele)

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180312130308.22952-3-michal.wajdeczko@intel.com


# 82813ba9 07-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only prune fences after wait-for-all

Currently, we only allow ourselves to prune the fences so long as
all the waits completed (i.e. all the fences we checked were signaled),
and that the reservation snapshot did not change across the wait.
However, if we only waited for a subset of the reservation object, i.e.
just waiting for the last writer to complete as opposed to all readers
as well, then we would erroneously conclude we could prune the fences as
indeed although all of our waits were successful, they did not represent
the totality of the reservation object.

v2: We only need to check the shared fences due to construction (i.e.
all of the shared fences will be later than the exclusive fence, if
any).

Fixes: e54ca9774777 ("drm/i915: Remove completed fences after a wait")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307171303.29466-1-chris@chris-wilson.co.uk
(cherry picked from commit fa73055b8442c97b3ba7cd0aa57cd2ad32124201)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 68ad3612 07-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only call tasklet_kill() on the first prepare_reset

tasklet_kill() will spin waiting for the current tasklet to be executed.
However, if tasklet_disable() has been called, then the tasklet is never
executed but permanently put back onto the runlist until
tasklet_enable() is called. Ergo, we cannot use tasklet_kill() inside a
disable/enable pair. This is the case when we call set-wedge from inside
i915_reset(), and another request was submitted to us concurrent to the
reset.

Fixes: 963ddd63c314 ("drm/i915: Suspend submission tasklets around wedging")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-6-chris@chris-wilson.co.uk


# 47650db0 07-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection

Similar to the staging around handling of engine->submit_request, we
need to stop adding to the execlists->queue prior to calling
engine->cancel_requests. cancel_requests will move requests from the
queue onto the timeline, so if we add a request onto the queue after that
point, it will be lost.

Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-5-chris@chris-wilson.co.uk


# 2d4ecace 07-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Finish the wait-for-wedge by retiring all the inflight requests

Before we reset the GPU after marking the device as wedged, we wait for
all the remaining requests to be completed (and marked as EIO).
Afterwards, we should flush the request lists so the next batch start
with the driver in an idle state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-1-chris@chris-wilson.co.uk


# fa73055b 07-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only prune fences after wait-for-all

Currently, we only allow ourselves to prune the fences so long as
all the waits completed (i.e. all the fences we checked were signaled),
and that the reservation snapshot did not change across the wait.
However, if we only waited for a subset of the reservation object, i.e.
just waiting for the last writer to complete as opposed to all readers
as well, then we would erroneously conclude we could prune the fences as
indeed although all of our waits were successful, they did not represent
the totality of the reservation object.

v2: We only need to check the shared fences due to construction (i.e.
all of the shared fences will be later than the exclusive fence, if
any).

Fixes: e54ca9774777 ("drm/i915: Remove completed fences after a wait")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307171303.29466-1-chris@chris-wilson.co.uk


# 88d3dfb6 02-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Suspend submission tasklets around wedging

After staring hard at sequences like

[ 28.199013] systemd-1 2..s. 26062228us : execlists_submission_tasklet: rcs0 cs-irq head=0 [0?], tail=1 [1?]
[ 28.199095] systemd-1 2..s. 26062229us : execlists_submission_tasklet: rcs0 csb[1]: status=0x00000018:0x00000000, active=0x1
[ 28.199177] systemd-1 2..s. 26062230us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.1, seqno=3, prio=-1024
[ 28.199258] systemd-1 2..s. 26062231us : execlists_submission_tasklet: rcs0 completed ctx=0
[ 28.199340] gem_eio-829 1..s1 26066853us : execlists_submission_tasklet: rcs0 in[0]: ctx=1.1, seqno=1, prio=0
[ 28.199421] <idle>-0 2..s. 26066863us : execlists_submission_tasklet: rcs0 cs-irq head=1 [1?], tail=2 [2?]
[ 28.199503] <idle>-0 2..s. 26066865us : execlists_submission_tasklet: rcs0 csb[2]: status=0x00000001:0x00000000, active=0x1
[ 28.199585] gem_eio-829 1..s1 26067077us : execlists_submission_tasklet: rcs0 in[1]: ctx=3.1, seqno=2, prio=0
[ 28.199667] gem_eio-829 1..s1 26067078us : execlists_submission_tasklet: rcs0 in[0]: ctx=1.2, seqno=1, prio=0
[ 28.199749] <idle>-0 2..s. 26067084us : execlists_submission_tasklet: rcs0 cs-irq head=2 [2?], tail=3 [3?]
[ 28.199830] <idle>-0 2..s. 26067085us : execlists_submission_tasklet: rcs0 csb[3]: status=0x00008002:0x00000001, active=0x1
[ 28.199912] <idle>-0 2..s. 26067086us : execlists_submission_tasklet: rcs0 out[0]: ctx=1.2, seqno=1, prio=0
[ 28.199994] gem_eio-829 2..s. 28246084us : execlists_submission_tasklet: rcs0 cs-irq head=3 [3?], tail=4 [4?]
[ 28.200096] gem_eio-829 2..s. 28246088us : execlists_submission_tasklet: rcs0 csb[4]: status=0x00000014:0x00000001, active=0x5
[ 28.200178] gem_eio-829 2..s. 28246089us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.0, seqno=0, prio=0
[ 28.200260] gem_eio-829 2..s. 28246127us : execlists_submission_tasklet: execlists_submission_tasklet:886 GEM_BUG_ON(buf[2 * head + 1] != port->context_id)

the conclusion is that the only place where the ports are reset to zero,
is from engine->cancel_requests called during i915_gem_set_wedged().

The race is horrible as it results from calling set-wedged on active HW
(the GPU reset failed) and as such we need to be careful as the HW state
changes beneath us. Fortunately, it's the same scary conditions as
affect normal reset, so we can reuse the same machinery to disable state
tracking as we clobber it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104945
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180302113324.23189-2-chris@chris-wilson.co.uk
(cherry picked from commit 963ddd63c314e9b5d9cd999873d473a93aed5380)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 7cfca4af 02-Mar-2018 Michal Wajdeczko <michal.wajdeczko@intel.com>

drm/i915/uc: Introduce intel_uc_suspend|resume

We want to use higher level 'uc' functions as the main entry points to
the GuC/HuC code to hide some details and keep code layered.

While here, move call to disable_guc_interrupts after sending suspend
action to the GuC to allow it work also with CTB as comm mechanism.

v2: update commit msg (Sagar)

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180302111550.21328-1-michal.wajdeczko@intel.com


# 963ddd63 02-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Suspend submission tasklets around wedging

After staring hard at sequences like

[ 28.199013] systemd-1 2..s. 26062228us : execlists_submission_tasklet: rcs0 cs-irq head=0 [0?], tail=1 [1?]
[ 28.199095] systemd-1 2..s. 26062229us : execlists_submission_tasklet: rcs0 csb[1]: status=0x00000018:0x00000000, active=0x1
[ 28.199177] systemd-1 2..s. 26062230us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.1, seqno=3, prio=-1024
[ 28.199258] systemd-1 2..s. 26062231us : execlists_submission_tasklet: rcs0 completed ctx=0
[ 28.199340] gem_eio-829 1..s1 26066853us : execlists_submission_tasklet: rcs0 in[0]: ctx=1.1, seqno=1, prio=0
[ 28.199421] <idle>-0 2..s. 26066863us : execlists_submission_tasklet: rcs0 cs-irq head=1 [1?], tail=2 [2?]
[ 28.199503] <idle>-0 2..s. 26066865us : execlists_submission_tasklet: rcs0 csb[2]: status=0x00000001:0x00000000, active=0x1
[ 28.199585] gem_eio-829 1..s1 26067077us : execlists_submission_tasklet: rcs0 in[1]: ctx=3.1, seqno=2, prio=0
[ 28.199667] gem_eio-829 1..s1 26067078us : execlists_submission_tasklet: rcs0 in[0]: ctx=1.2, seqno=1, prio=0
[ 28.199749] <idle>-0 2..s. 26067084us : execlists_submission_tasklet: rcs0 cs-irq head=2 [2?], tail=3 [3?]
[ 28.199830] <idle>-0 2..s. 26067085us : execlists_submission_tasklet: rcs0 csb[3]: status=0x00008002:0x00000001, active=0x1
[ 28.199912] <idle>-0 2..s. 26067086us : execlists_submission_tasklet: rcs0 out[0]: ctx=1.2, seqno=1, prio=0
[ 28.199994] gem_eio-829 2..s. 28246084us : execlists_submission_tasklet: rcs0 cs-irq head=3 [3?], tail=4 [4?]
[ 28.200096] gem_eio-829 2..s. 28246088us : execlists_submission_tasklet: rcs0 csb[4]: status=0x00000014:0x00000001, active=0x5
[ 28.200178] gem_eio-829 2..s. 28246089us : execlists_submission_tasklet: rcs0 out[0]: ctx=0.0, seqno=0, prio=0
[ 28.200260] gem_eio-829 2..s. 28246127us : execlists_submission_tasklet: execlists_submission_tasklet:886 GEM_BUG_ON(buf[2 * head + 1] != port->context_id)

the conclusion is that the only place where the ports are reset to zero,
is from engine->cancel_requests called during i915_gem_set_wedged().

The race is horrible as it results from calling set-wedged on active HW
(the GPU reset failed) and as such we need to be careful as the HW state
changes beneath us. Fortunately, it's the same scary conditions as
affect normal reset, so we can reuse the same machinery to disable state
tracking as we clobber it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104945
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180302113324.23189-2-chris@chris-wilson.co.uk


# ffed7bd2 01-Mar-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Replace open-coded wait-for loop

Now that we can pass arbitrary commands into the base __wait_for()
macro, we can reimplement the open-coded wait-for inside
i915_gem_idle_work_handler() using the new macro. This means that instead
of using ktime, we now use jiffies, and benefit from the exponential sleep
backoff that allows a fast response if the HW settles quickly.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180301103338.5380-1-chris@chris-wilson.co.uk


# e61e0f51 21-Feb-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename drm_i915_gem_request to i915_request

We want to de-emphasize the link between the request (dependency,
execution and fence tracking) from GEM and so rename the struct from
drm_i915_gem_request to i915_request. That is we may implement the GEM
user interface on top of requests, but they are an abstraction for
tracking execution rather than an implementation detail of GEM. (Since
they are not tied to HW, we keep the i915 prefix as opposed to intel.)

In short, the spatch:
@@

@@
- struct drm_i915_gem_request
+ struct i915_request

A corollary to contracting the type name, we also harmonise on using
'rq' shorthand for local variables where space if of the essence and
repetition makes 'request' unwieldy. For globals and struct members,
'request' is still much preferred for its clarity.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180221095636.6649-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 5935485f 20-Feb-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move the policy for placement of the GGTT vma into the caller

Currently we make the unilateral decision inside
i915_gem_object_pin_to_display() where the VMA should resided (inside
the fence and mappable region or above?). This is not our decision to
make as it impacts on how the display engine can use the resulting
scanout object, and it would rather instruct us where to place the VMA so
that it can enable the features it wants. As such, make the pin flags an
argument to i915_gem_object_pin_to_display() and control them from
intel_pin_and_fence_fb_obj()

Whilst taking control of the mapping for ourselves, start tracking how
we use it to avoid trying to free a fence we never claimed:

<3>[ 227.151869] GEM_BUG_ON(vma->fence->pin_count <= 0)
<4>[ 227.152064] ------------[ cut here ]------------
<2>[ 227.152068] kernel BUG at drivers/gpu/drm/i915/i915_vma.h:391!
<4>[ 227.152084] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
<0>[ 227.152092] Dumping ftrace buffer:
<0>[ 227.152099] (ftrace buffer empty)
<4>[ 227.152102] Modules linked in: i915 snd_hda_codec_analog snd_hda_codec_generic coretemp snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm lpc_ich e1000e mei_me mei prime_numbers
<4>[ 227.152131] CPU: 1 PID: 1587 Comm: kworker/u16:49 Tainted: G U 4.16.0-rc1-gbab67b2f6177-kasan_7+ #1
<4>[ 227.152134] Hardware name: Dell Inc. OptiPlex 755 /0PU052, BIOS A08 02/19/2008
<4>[ 227.152236] Workqueue: events_unbound intel_atomic_commit_work [i915]
<4>[ 227.152292] RIP: 0010:intel_unpin_fb_vma+0x23a/0x2a0 [i915]
<4>[ 227.152295] RSP: 0018:ffff88005aad7b68 EFLAGS: 00010286
<4>[ 227.152300] RAX: 0000000000000026 RBX: ffff88005c359580 RCX: 0000000000000000
<4>[ 227.152304] RDX: 0000000000000026 RSI: ffffffff8707d840 RDI: ffffed000b55af63
<4>[ 227.152307] RBP: ffff880056817e58 R08: 0000000000000001 R09: 0000000000000000
<4>[ 227.152311] R10: ffff88005aad7b88 R11: 0000000000000000 R12: ffff8800568184d0
<4>[ 227.152314] R13: ffff880065b5ab08 R14: 0000000000000000 R15: dffffc0000000000
<4>[ 227.152318] FS: 0000000000000000(0000) GS:ffff88006ac40000(0000) knlGS:0000000000000000
<4>[ 227.152322] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 227.152325] CR2: 00007f5fb25550a8 CR3: 0000000068c78000 CR4: 00000000000006e0
<4>[ 227.152328] Call Trace:
<4>[ 227.152385] intel_cleanup_plane_fb+0x6b/0xd0 [i915]
<4>[ 227.152395] drm_atomic_helper_cleanup_planes+0x166/0x280
<4>[ 227.152452] intel_atomic_commit_tail+0x159d/0x3380 [i915]
<4>[ 227.152463] ? process_one_work+0x66e/0x1460
<4>[ 227.152516] ? skl_update_crtcs+0x9c0/0x9c0 [i915]
<4>[ 227.152523] ? lock_acquire+0x13d/0x390
<4>[ 227.152527] ? lock_acquire+0x13d/0x390
<4>[ 227.152534] process_one_work+0x71a/0x1460
<4>[ 227.152540] ? __schedule+0x815/0x1e20
<4>[ 227.152547] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
<4>[ 227.152553] ? _raw_spin_lock_irq+0xa/0x40
<4>[ 227.152559] worker_thread+0xdf/0xf60
<4>[ 227.152569] ? process_one_work+0x1460/0x1460
<4>[ 227.152573] kthread+0x2cf/0x3c0
<4>[ 227.152578] ? _kthread_create_on_node+0xa0/0xa0
<4>[ 227.152583] ret_from_fork+0x3a/0x50
<4>[ 227.152591] Code: c6 00 11 86 c0 48 c7 c7 e0 bd 85 c0 e8 60 e7 a9 c4 0f ff e9 1f fe ff ff 48 c7 c6 40 10 86 c0 48 c7 c7 e0 ca 85 c0 e8 2b 95 bd c4 <0f> 0b 48 89 ef e8 4c 44 e8 c4 e9 ef fd ff ff e8 42 44 e8 c4 e9
<1>[ 227.152720] RIP: intel_unpin_fb_vma+0x23a/0x2a0 [i915] RSP: ffff88005aad7b68

v2: i915_vma_pin_fence() is a no-op if a fence isn't required, so check
vma->fence as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180220134208.24988-2-chris@chris-wilson.co.uk


# ac87a6fd 20-Feb-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Also check view->type for a normal GGTT view

We cannot simply use !view as shorthand for all normal GGTT views as a
few callers will always populate a i915_ggtt_view struct and set the
type to NORMAL instead. So check for (!view || view->type == NORMAL)
inside i915_gem_object_ggtt_pin().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180220134208.24988-1-chris@chris-wilson.co.uk


# c9c70471 19-Feb-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track number of pending freed objects

During igt, we frequently call into the driver to reset both HW and
driver state (idling the device, waiting for it to become idle and
freeing off old objects) to ensure that we start each test/subtest/pass
from known state. This process incurs an RCU barrier or two to ensure
that any such pending frees are indeed flushed before we return.
However, unconditionally waiting on the RCU barrier adds needless delay
to many callers, which adds up to several seconds when repeated thousands
of times. We can skip the rcu_barrier() if by tracking how many outstanding
frees we have, we know there are none.

The same path is used along suspend, where we may be able to save the
unconditional RCU barrier.

To put it into perspective with a completely meaningless
microbenchmark, igt/gem_sync/idle is improved from 50ms to 30us on bdw.

v2: Remove the extra synchronize_rcu() inside i915_drop_caches_set()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180219220631.25001-1-chris@chris-wilson.co.uk


# c0a51fd0 16-Feb-2018 Christian König <ckoenig.leichtzumerken@gmail.com>

drm: move read_domains and write_domain into i915

i915 is the only driver using those fields in the drm_gem_object
structure, so they only waste memory for all other drivers.

Move the fields into drm_i915_gem_object instead and patch the i915 code
with the following sed commands:

sed -i "s/obj->base.read_domains/obj->read_domains/g" drivers/gpu/drm/i915/*.c drivers/gpu/drm/i915/*/*.c
sed -i "s/obj->base.write_domain/obj->write_domain/g" drivers/gpu/drm/i915/*.c drivers/gpu/drm/i915/*/*.c

Change is only compile tested.

v2: move fields around as suggested by Chris.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180216124338.9087-1-christian.koenig@amd.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# c56b89f1 09-Feb-2018 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Use INTEL_GEN everywhere

Coccinelle patch:

@@
identifier p;
@@
-INTEL_INFO(p)->gen
+INTEL_GEN(p)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180208130606.15556-12-tvrtko.ursulin@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180209215847.6660-1-chris@chris-wilson.co.uk


# 0d73e7a0 07-Feb-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark the device as wedged from the beginning of set-wedged

Reduce the window of opportunity for set-wedged being called
concurrently with reset (after i915_reset() has performed the
i915_gem_unset_wedged()) by moving the set_bit(I915_WEDGED) to before we
complete the inflight requests. When i915_reset() is being blocked on a
request, such completion may allow it to start and beginning resetting
the GPU before i915_gem_set_wedged() has finished (and so before
set-wedge will have marked the device as wedged). As such,
i915_gem_init_hw() may see a wedged device even from inside
i915_reset().

References: 36703e79a982 ("drm/i915: Break modeset deadlocks on reset")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180207151350.20883-1-chris@chris-wilson.co.uk


# ce1599a4 07-Feb-2018 Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

drm/i915: do not stop engines on sanitize if i915.reset=0

Since commit 5896a5c8c9c0 (drm/i915: Always stop the rings before a
missing GPU reset) we attempt to stop the engines during gem_sanitize
even if reset=0 and nothing bad happened on the gpu.
The specs says that the STOP_RINGS bit needs to be cleared to resume
normal operation, but for some reason the value of the bit seems to be
changing without us writing to it (maybe rc6 entry/exit?), so normal
operation resumes correctly. However, it still feels incorrect to stop
the engines if there hasn't been any issue so skip the whole reset
call in gem_sanitize if i915.reset=0

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180207212440.13438-1-daniele.ceraolospurio@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 3fed1808 07-Feb-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move the scheduler feature bits into the purview of the engines

Rather than having the high level ioctl interface guess the underlying
implementation details, having the implementation declare what
capabilities it exports. We define an intel_driver_caps, similar to the
intel_device_info, which instead of trying to describe the HW gives
details on what the driver itself supports. This is then populated by
the engine backend for the new scheduler capability field for use
elsewhere.

v2: Use caps.scheduler for validating CONTEXT_PARAM_SET_PRIORITY (Mika)
One less assumption of engine[RCS] \o/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180207210544.26351-2-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>


# 8177e112 07-Feb-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Tidy up some error messages around reset failure

On blb and pnv, we are seeing sporadic

i915 0000:00:02.0: Resetting chip after gpu hang
[drm:intel_gpu_reset [i915]] rcs0: timed out on STOP_RING
[drm:i915_reset [i915]] *ERROR* Failed hw init on reset -5

which notably lack the actual root cause of the error. Ostensibly it
should be the init_ring_common() that failed, but it's error paths are
covered by DRM_ERROR.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180207111545.17078-1-chris@chris-wilson.co.uk


# 01b8fdc5 05-Feb-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Skip post-reset request emission if the engine is not idle

Since commit 7b6da818d86f ("drm/i915: Restore the kernel context after a
GPU reset on an idle engine") we submit a request following the engine
reset. The intent is that we don't submit a request if the engine is
busy (as it will restart active by itself) but we only checked to see if
there were remaining requests in flight on the hardware and skipped
checking to see if there were any ready requests that would be
immediately submitted on restart (the same time as our new request would
be). Having convinced the engine to appear idle in the previous patch,
we can use intel_engine_is_idle() as a better test to only submit a new
request if there are no pending requests.

As it happens, this is tripping up igt/drv_selftest/live_hangcheck in CI
as we overfill the kernel_context ringbuffer trigger an infinite
recursion from within the reset.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104786
References: 7b6da818d86f ("drm/i915: Restore the kernel context after a GPU reset on an idle engine")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180205152431.12163-4-chris@chris-wilson.co.uk


# 24eae08d 05-Feb-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove unbannable context spam from reset

During testing, we trigger a lot of resets on an unbannable context
leading to massive amounts of irrelevant debug spam. Remove the
ban_score accounting and message for the unbannable context so that we
improve the signal:noise in the log messages for when the unexpected
occurs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180205092201.19476-7-chris@chris-wilson.co.uk


# 559e040f 05-Feb-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Show the GPU state when declaring wedged

Dump each engine state when i915_gem_set_wedged() is called to give us
some more clues as to why we had to terminate the GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180205092201.19476-5-chris@chris-wilson.co.uk


# 9e519bc8 05-Feb-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add some newlines to intel_engine_dump() headers

The headers should be on a separate line for consistency, so add the
missing trailing newline in a few intel_engine_dump() callers.

Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180205100618.11001-1-chris@chris-wilson.co.uk


# b26a32a8 29-Jan-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Always run hangcheck while the GPU is busy

Previously, we relied on only running the hangcheck while somebody was
waiting on the GPU, in order to minimise the amount of time hangcheck
had to run. (If nobody was watching the GPU, nobody would notice if the
GPU wasn't responding -- eventually somebody would care and so kick
hangcheck into action.) However, this falls apart from around commit
4680816be336 ("drm/i915: Wait first for submission, before waiting for
request completion"), as not all waiters declare themselves to hangcheck
and so we could switch off hangcheck and miss GPU hangs even when
waiting under the struct_mutex.

If we enable hangcheck from the first request submission, and let it run
until the GPU is idle again, we forgo all the complexity involved with
only enabling around waiters. We just have to remember to be careful that
we do not declare a GPU hang when idly waiting for the next request to
be come ready, as we will run hangcheck continuously even when the
engines are stalled waiting for external events. This should be true
already as we should only be tracking requests submitted to hardware for
execution as an indicator that the engine is busy.

Fixes: 4680816be336 ("drm/i915: Wait first for submission, before waiting for request completion"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104840
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180129144104.3921-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
(cherry picked from commit 889230489b6b138ba97ba2f13fc9644a3d16d0d2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# c950af50 10-Jan-2018 Sagar Arun Kamble <sagar.a.kamble@intel.com>

drm/i915/guc: Add uc_fini_wq in gem_init unwind path

While moving code around for solving lockdep issue for GuC log relay,
spotted that uc_fini_wq is not being called in failure path in gem_init.
Missed in the below commit. Add it.

v2: Removed GEM_BUG_ON(!HAS_GUC()) from intel_uc_fini_wq as init happens
only based on enable_guc module parameter and does not consider has_guc
capability. (Michal)

Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Fixes: 3176ff49bc3e ("drm/i915/guc: Move GuC workqueue allocations outside of the mutex")
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1515588857-10283-1-git-send-email-sagar.a.kamble@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit da943b5ab071584fcb9cfa896dc8c643d376f362)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 88923048 29-Jan-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Always run hangcheck while the GPU is busy

Previously, we relied on only running the hangcheck while somebody was
waiting on the GPU, in order to minimise the amount of time hangcheck
had to run. (If nobody was watching the GPU, nobody would notice if the
GPU wasn't responding -- eventually somebody would care and so kick
hangcheck into action.) However, this falls apart from around commit
4680816be336 ("drm/i915: Wait first for submission, before waiting for
request completion"), as not all waiters declare themselves to hangcheck
and so we could switch off hangcheck and miss GPU hangs even when
waiting under the struct_mutex.

If we enable hangcheck from the first request submission, and let it run
until the GPU is idle again, we forgo all the complexity involved with
only enabling around waiters. We just have to remember to be careful that
we do not declare a GPU hang when idly waiting for the next request to
be come ready, as we will run hangcheck continuously even when the
engines are stalled waiting for external events. This should be true
already as we should only be tracking requests submitted to hardware for
execution as an indicator that the engine is busy.

Fixes: 4680816be336 ("drm/i915: Wait first for submission, before waiting for request completion"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104840
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180129144104.3921-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>


# 70deeadd 24-Jan-2018 Sagar Arun Kamble <sagar.a.kamble@intel.com>

drm/i915/guc: Fix lockdep due to log relay channel handling under struct_mutex

This patch fixes lockdep issue due to circular locking dependency of
struct_mutex, i_mutex_key, mmap_sem, relay_channels_mutex.
For GuC log relay channel we create debugfs file that requires i_mutex_key
lock and we are doing that under struct_mutex. So we introduced newer
dependency as:
&dev->struct_mutex --> &sb->s_type->i_mutex_key#3 --> &mm->mmap_sem
However, there is dependency from mmap_sem to struct_mutex. Hence we
separate the relay create/destroy operation from under struct_mutex.
Also added runtime check of relay buffer status.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

======================================================
WARNING: possible circular locking dependency detected
4.15.0-rc6-CI-Patchwork_7614+ #1 Not tainted
------------------------------------------------------
debugfs_test/1388 is trying to acquire lock:
(&dev->struct_mutex){+.+.}, at: [<00000000d5e1d915>] i915_mutex_lock_interruptible+0x47/0x130 [i915]

but task is already holding lock:
(&mm->mmap_sem){++++}, at: [<0000000029a9c131>] __do_page_fault+0x106/0x560

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&mm->mmap_sem){++++}:
_copy_to_user+0x1e/0x70
filldir+0x8c/0xf0
dcache_readdir+0xeb/0x160
iterate_dir+0xdc/0x140
SyS_getdents+0xa0/0x130
entry_SYSCALL_64_fastpath+0x1c/0x89

-> #2 (&sb->s_type->i_mutex_key#3){++++}:
start_creating+0x59/0x110
__debugfs_create_file+0x2e/0xe0
relay_create_buf_file+0x62/0x80
relay_late_setup_files+0x84/0x250
guc_log_late_setup+0x4f/0x110 [i915]
i915_guc_log_register+0x32/0x40 [i915]
i915_driver_load+0x7b6/0x1720 [i915]
i915_pci_probe+0x2e/0x90 [i915]
pci_device_probe+0x9c/0x120
driver_probe_device+0x2a3/0x480
__driver_attach+0xd9/0xe0
bus_for_each_dev+0x57/0x90
bus_add_driver+0x168/0x260
driver_register+0x52/0xc0
do_one_initcall+0x39/0x150
do_init_module+0x56/0x1ef
load_module+0x231c/0x2d70
SyS_finit_module+0xa5/0xe0
entry_SYSCALL_64_fastpath+0x1c/0x89

-> #1 (relay_channels_mutex){+.+.}:
relay_open+0x12c/0x2b0
intel_guc_log_runtime_create+0xab/0x230 [i915]
intel_guc_init+0x81/0x120 [i915]
intel_uc_init+0x29/0xa0 [i915]
i915_gem_init+0x182/0x530 [i915]
i915_driver_load+0xaa9/0x1720 [i915]
i915_pci_probe+0x2e/0x90 [i915]
pci_device_probe+0x9c/0x120
driver_probe_device+0x2a3/0x480
__driver_attach+0xd9/0xe0
bus_for_each_dev+0x57/0x90
bus_add_driver+0x168/0x260
driver_register+0x52/0xc0
do_one_initcall+0x39/0x150
do_init_module+0x56/0x1ef
load_module+0x231c/0x2d70
SyS_finit_module+0xa5/0xe0
entry_SYSCALL_64_fastpath+0x1c/0x89

-> #0 (&dev->struct_mutex){+.+.}:
__mutex_lock+0x81/0x9b0
i915_mutex_lock_interruptible+0x47/0x130 [i915]
i915_gem_fault+0x201/0x790 [i915]
__do_fault+0x15/0x70
__handle_mm_fault+0x677/0xdc0
handle_mm_fault+0x14f/0x2f0
__do_page_fault+0x2d1/0x560
page_fault+0x4c/0x60

other info that might help us debug this:

Chain exists of:
&dev->struct_mutex --> &sb->s_type->i_mutex_key#3 --> &mm->mmap_sem

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&mm->mmap_sem);
lock(&sb->s_type->i_mutex_key#3);
lock(&mm->mmap_sem);
lock(&dev->struct_mutex);

*** DEADLOCK ***

1 lock held by debugfs_test/1388:
#0: (&mm->mmap_sem){++++}, at: [<0000000029a9c131>] __do_page_fault+0x106/0x560

stack backtrace:
CPU: 2 PID: 1388 Comm: debugfs_test Not tainted 4.15.0-rc6-CI-Patchwork_7614+ #1
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J4205-ITX, BIOS P1.10 09/29/2016
Call Trace:
dump_stack+0x5f/0x86
print_circular_bug.isra.18+0x1d0/0x2c0
__lock_acquire+0x14ae/0x1b60
? lock_acquire+0xaf/0x200
lock_acquire+0xaf/0x200
? i915_mutex_lock_interruptible+0x47/0x130 [i915]
__mutex_lock+0x81/0x9b0
? i915_mutex_lock_interruptible+0x47/0x130 [i915]
? i915_mutex_lock_interruptible+0x47/0x130 [i915]
? i915_mutex_lock_interruptible+0x47/0x130 [i915]
i915_mutex_lock_interruptible+0x47/0x130 [i915]
? __pm_runtime_resume+0x4f/0x80
i915_gem_fault+0x201/0x790 [i915]
__do_fault+0x15/0x70
? _raw_spin_unlock+0x29/0x40
__handle_mm_fault+0x677/0xdc0
handle_mm_fault+0x14f/0x2f0
__do_page_fault+0x2d1/0x560
? page_fault+0x36/0x60
page_fault+0x4c/0x60

v2: Added lock protection to guc->log.runtime.relay_chan (Chris)
Fixed locking inside guc_flush_logs uncovered by new lockdep.

v3: Locking guc_read_update_log_buffer entirely with relay_lock. (Chris)
Prepared intel_guc_init_early. Moved relay_lock inside relay_create
relay_destroy, relay_file_create, guc_read_update_log_buffer. (Michal)
Removed struct_mutex lock around guc_log_flush and removed usage
of guc_log_has_relay() from runtime_create path as it needs
struct_mutex lock.

v4: Handle NULL relay sub buffer pointer earlier in read_update_log_buffer
(Chris). Fixed comment suffix **/. (Michal)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104693
Testcase: igt/debugfs_test/read_all_entries # with enable_guc=1 and guc_log_level=1
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1516808821-3638-3-git-send-email-sagar.a.kamble@intel.com


# 84a10749 24-Jan-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Shrink the GEM kmem_caches upon idling

When we finally decide the gpu is idle, that is a good time to shrink
our kmem_caches.

v3: Defer until an rcu grace period after we idle.
v4: Think about epoch wraparound and how likely that is.
v5: Use I915_EPOCH_INVALID magic.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180124113608.14909-2-chris@chris-wilson.co.uk


# e9af4ea2 18-Jan-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid waitboosting on the active request

Watching a light workload on Baytrail (running glxgears and a 1080p
decode), instead of the system remaining at low frequency, the glxgears
would regularly trigger waitboosting after which it would have to spend
a few seconds throttling back down. In this case, the waitboosting is
counter productive as the minimal wait for glxgears doesn't prevent it
from functioning correctly and delivering frames on time. In this case,
glxgears happens to almost always be waiting on the current request,
which we already expect to complete quickly (see i915_spin_request) and
so avoiding the waitboost on the active request and spinning instead
provides the best latency without overcommitting to upclocking.
However, if the system falls behind we still force the waitboost.
Similarly, we will also trigger upclocking if we detect the system is
not delivering frames on time - again using a mechanism that tries to
detect a miss and not preemptively upclock.

v2: Also skip boosting for after missed vblank if the desired request is
already active.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180118131609.16574-1-chris@chris-wilson.co.uk


# 2ef1e729 15-Jan-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rewrite some comments around RCU-deferred object free

Tvrtko noticed that the comments describing the interaction of RCU and
the deferred worker for freeing drm_i915_gem_object were a little
confusing, so attempt to bring some sense to them.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180115205759.13884-1-chris@chris-wilson.co.uk


# beacbd16 14-Jan-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use our singlethreaded wq for freeing objects

As freeing the objects require serialisation on struct_mutex, we should
prefer to use our singlethreaded driver wq that is dedicated to work
requiring struct_mutex (hence serialised).The benefit should be less
clutter on the system wq, allowing it to make progress even when the
driver/struct_mutex is heavily contended.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180115122846.15193-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 5005c851 06-Jan-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Don't adjust priority on an already signaled fence

When we retire a signaled fence, we free the dependency tree. However,
we skip clearing the list so that if we then try to adjust the priority
of the signaled fence, we may walk the list of freed dependencies.

[ 3083.156757] ==================================================================
[ 3083.156806] BUG: KASAN: use-after-free in execlists_schedule+0x199/0x660 [i915]
[ 3083.156810] Read of size 8 at addr ffff8806bf20f400 by task Xorg/831

[ 3083.156815] CPU: 0 PID: 831 Comm: Xorg Not tainted 4.15.0-rc6-no-psn+ #1
[ 3083.156817] Hardware name: Notebook N24_25BU/N24_25BU, BIOS 5.12 02/17/2017
[ 3083.156818] Call Trace:
[ 3083.156823] dump_stack+0x5c/0x7a
[ 3083.156827] print_address_description+0x6b/0x290
[ 3083.156830] kasan_report+0x28f/0x380
[ 3083.156872] ? execlists_schedule+0x199/0x660 [i915]
[ 3083.156914] execlists_schedule+0x199/0x660 [i915]
[ 3083.156956] ? intel_crtc_atomic_check+0x146/0x4e0 [i915]
[ 3083.156997] ? execlists_submit_request+0xe0/0xe0 [i915]
[ 3083.157038] ? i915_vma_misplaced.part.4+0x25/0xb0 [i915]
[ 3083.157079] ? __i915_vma_do_pin+0x7c8/0xc80 [i915]
[ 3083.157121] ? intel_atomic_state_alloc+0x44/0x60 [i915]
[ 3083.157130] ? drm_atomic_helper_page_flip+0x3e/0xb0 [drm_kms_helper]
[ 3083.157145] ? drm_mode_page_flip_ioctl+0x7d2/0x850 [drm]
[ 3083.157159] ? drm_ioctl_kernel+0xa7/0xf0 [drm]
[ 3083.157172] ? drm_ioctl+0x45b/0x560 [drm]
[ 3083.157211] i915_gem_object_wait_priority+0x14c/0x2c0 [i915]
[ 3083.157251] ? i915_gem_get_aperture_ioctl+0x150/0x150 [i915]
[ 3083.157290] ? i915_vma_pin_fence+0x1d8/0x320 [i915]
[ 3083.157331] ? intel_pin_and_fence_fb_obj+0x175/0x250 [i915]
[ 3083.157372] ? intel_rotation_info_size+0x60/0x60 [i915]
[ 3083.157413] ? intel_link_compute_m_n+0x80/0x80 [i915]
[ 3083.157428] ? drm_dev_printk+0x1b0/0x1b0 [drm]
[ 3083.157443] ? drm_dev_printk+0x1b0/0x1b0 [drm]
[ 3083.157485] intel_prepare_plane_fb+0x2f8/0x5a0 [i915]
[ 3083.157527] ? intel_crtc_get_vblank_counter+0x80/0x80 [i915]
[ 3083.157536] drm_atomic_helper_prepare_planes+0xa0/0x1c0 [drm_kms_helper]
[ 3083.157587] intel_atomic_commit+0x12e/0x4e0 [i915]
[ 3083.157605] drm_atomic_helper_page_flip+0xa2/0xb0 [drm_kms_helper]
[ 3083.157621] drm_mode_page_flip_ioctl+0x7d2/0x850 [drm]
[ 3083.157638] ? drm_mode_cursor2_ioctl+0x10/0x10 [drm]
[ 3083.157652] ? drm_lease_owner+0x1a/0x30 [drm]
[ 3083.157668] ? drm_mode_cursor2_ioctl+0x10/0x10 [drm]
[ 3083.157681] drm_ioctl_kernel+0xa7/0xf0 [drm]
[ 3083.157696] drm_ioctl+0x45b/0x560 [drm]
[ 3083.157711] ? drm_mode_cursor2_ioctl+0x10/0x10 [drm]
[ 3083.157725] ? drm_getstats+0x20/0x20 [drm]
[ 3083.157729] ? timerqueue_del+0x49/0x80
[ 3083.157732] ? __remove_hrtimer+0x62/0xb0
[ 3083.157735] ? hrtimer_try_to_cancel+0x173/0x210
[ 3083.157738] do_vfs_ioctl+0x13b/0x880
[ 3083.157741] ? ioctl_preallocate+0x140/0x140
[ 3083.157744] ? _raw_spin_unlock_irq+0xe/0x30
[ 3083.157746] ? do_setitimer+0x234/0x370
[ 3083.157750] ? SyS_setitimer+0x19e/0x1b0
[ 3083.157752] ? SyS_alarm+0x140/0x140
[ 3083.157755] ? __rcu_read_unlock+0x66/0x80
[ 3083.157757] ? __fget+0xc4/0x100
[ 3083.157760] SyS_ioctl+0x74/0x80
[ 3083.157763] entry_SYSCALL_64_fastpath+0x1a/0x7d
[ 3083.157765] RIP: 0033:0x7f6135d0c6a7
[ 3083.157767] RSP: 002b:00007fff01451888 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
[ 3083.157769] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f6135d0c6a7
[ 3083.157771] RDX: 00007fff01451950 RSI: 00000000c01864b0 RDI: 000000000000000c
[ 3083.157772] RBP: 00007f613076f600 R08: 0000000000000001 R09: 0000000000000000
[ 3083.157773] R10: 0000000000000060 R11: 0000000000003246 R12: 0000000000000000
[ 3083.157774] R13: 0000000000000060 R14: 000000000000001b R15: 0000000000000060

[ 3083.157779] Allocated by task 831:
[ 3083.157783] kmem_cache_alloc+0xc0/0x200
[ 3083.157822] i915_gem_request_await_dma_fence+0x2c4/0x5d0 [i915]
[ 3083.157861] i915_gem_request_await_object+0x321/0x370 [i915]
[ 3083.157900] i915_gem_do_execbuffer+0x1165/0x19c0 [i915]
[ 3083.157937] i915_gem_execbuffer2+0x1ad/0x550 [i915]
[ 3083.157950] drm_ioctl_kernel+0xa7/0xf0 [drm]
[ 3083.157962] drm_ioctl+0x45b/0x560 [drm]
[ 3083.157964] do_vfs_ioctl+0x13b/0x880
[ 3083.157966] SyS_ioctl+0x74/0x80
[ 3083.157968] entry_SYSCALL_64_fastpath+0x1a/0x7d

[ 3083.157971] Freed by task 831:
[ 3083.157973] kmem_cache_free+0x77/0x220
[ 3083.158012] i915_gem_request_retire+0x72c/0xa70 [i915]
[ 3083.158051] i915_gem_request_alloc+0x1e9/0x8b0 [i915]
[ 3083.158089] i915_gem_do_execbuffer+0xa96/0x19c0 [i915]
[ 3083.158127] i915_gem_execbuffer2+0x1ad/0x550 [i915]
[ 3083.158140] drm_ioctl_kernel+0xa7/0xf0 [drm]
[ 3083.158153] drm_ioctl+0x45b/0x560 [drm]
[ 3083.158155] do_vfs_ioctl+0x13b/0x880
[ 3083.158156] SyS_ioctl+0x74/0x80
[ 3083.158158] entry_SYSCALL_64_fastpath+0x1a/0x7d

[ 3083.158162] The buggy address belongs to the object at ffff8806bf20f400
which belongs to the cache i915_dependency of size 64
[ 3083.158166] The buggy address is located 0 bytes inside of
64-byte region [ffff8806bf20f400, ffff8806bf20f440)
[ 3083.158168] The buggy address belongs to the page:
[ 3083.158171] page:00000000d43decc4 count:1 mapcount:0 mapping: (null) index:0x0
[ 3083.158174] flags: 0x17ffe0000000100(slab)
[ 3083.158179] raw: 017ffe0000000100 0000000000000000 0000000000000000 0000000180200020
[ 3083.158182] raw: ffffea001afc16c0 0000000500000005 ffff880731b881c0 0000000000000000
[ 3083.158184] page dumped because: kasan: bad access detected

[ 3083.158187] Memory state around the buggy address:
[ 3083.158190] ffff8806bf20f300: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 3083.158192] ffff8806bf20f380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 3083.158195] >ffff8806bf20f400: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 3083.158196] ^
[ 3083.158199] ffff8806bf20f480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 3083.158201] ffff8806bf20f500: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 3083.158203] ==================================================================

Reported-by: Alexandru Chirvasitu <achirvasub@gmail.com>
Reported-by: Mike Keehan <mike@keehan.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104436
Fixes: 1f181225f8ec ("drm/i915/execlists: Keep request->priority for its lifetime")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Alexandru Chirvasitu <achirvasub@gmail.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Tested-by: Alexandru Chirvasitu <achirvasub@gmail.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180106105618.13532-1-chris@chris-wilson.co.uk
(cherry picked from commit c218ee03b9315073ce43992792554dafa0626eb8)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# da943b5a 10-Jan-2018 Sagar Arun Kamble <sagar.a.kamble@intel.com>

drm/i915/guc: Add uc_fini_wq in gem_init unwind path

While moving code around for solving lockdep issue for GuC log relay,
spotted that uc_fini_wq is not being called in failure path in gem_init.
Missed in the below commit. Add it.

v2: Removed GEM_BUG_ON(!HAS_GUC()) from intel_uc_fini_wq as init happens
only based on enable_guc module parameter and does not consider has_guc
capability. (Michal)

Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Fixes: 3176ff49bc3e ("drm/i915/guc: Move GuC workqueue allocations outside of the mutex")
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1515588857-10283-1-git-send-email-sagar.a.kamble@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# c218ee03 06-Jan-2018 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Don't adjust priority on an already signaled fence

When we retire a signaled fence, we free the dependency tree. However,
we skip clearing the list so that if we then try to adjust the priority
of the signaled fence, we may walk the list of freed dependencies.

[ 3083.156757] ==================================================================
[ 3083.156806] BUG: KASAN: use-after-free in execlists_schedule+0x199/0x660 [i915]
[ 3083.156810] Read of size 8 at addr ffff8806bf20f400 by task Xorg/831

[ 3083.156815] CPU: 0 PID: 831 Comm: Xorg Not tainted 4.15.0-rc6-no-psn+ #1
[ 3083.156817] Hardware name: Notebook N24_25BU/N24_25BU, BIOS 5.12 02/17/2017
[ 3083.156818] Call Trace:
[ 3083.156823] dump_stack+0x5c/0x7a
[ 3083.156827] print_address_description+0x6b/0x290
[ 3083.156830] kasan_report+0x28f/0x380
[ 3083.156872] ? execlists_schedule+0x199/0x660 [i915]
[ 3083.156914] execlists_schedule+0x199/0x660 [i915]
[ 3083.156956] ? intel_crtc_atomic_check+0x146/0x4e0 [i915]
[ 3083.156997] ? execlists_submit_request+0xe0/0xe0 [i915]
[ 3083.157038] ? i915_vma_misplaced.part.4+0x25/0xb0 [i915]
[ 3083.157079] ? __i915_vma_do_pin+0x7c8/0xc80 [i915]
[ 3083.157121] ? intel_atomic_state_alloc+0x44/0x60 [i915]
[ 3083.157130] ? drm_atomic_helper_page_flip+0x3e/0xb0 [drm_kms_helper]
[ 3083.157145] ? drm_mode_page_flip_ioctl+0x7d2/0x850 [drm]
[ 3083.157159] ? drm_ioctl_kernel+0xa7/0xf0 [drm]
[ 3083.157172] ? drm_ioctl+0x45b/0x560 [drm]
[ 3083.157211] i915_gem_object_wait_priority+0x14c/0x2c0 [i915]
[ 3083.157251] ? i915_gem_get_aperture_ioctl+0x150/0x150 [i915]
[ 3083.157290] ? i915_vma_pin_fence+0x1d8/0x320 [i915]
[ 3083.157331] ? intel_pin_and_fence_fb_obj+0x175/0x250 [i915]
[ 3083.157372] ? intel_rotation_info_size+0x60/0x60 [i915]
[ 3083.157413] ? intel_link_compute_m_n+0x80/0x80 [i915]
[ 3083.157428] ? drm_dev_printk+0x1b0/0x1b0 [drm]
[ 3083.157443] ? drm_dev_printk+0x1b0/0x1b0 [drm]
[ 3083.157485] intel_prepare_plane_fb+0x2f8/0x5a0 [i915]
[ 3083.157527] ? intel_crtc_get_vblank_counter+0x80/0x80 [i915]
[ 3083.157536] drm_atomic_helper_prepare_planes+0xa0/0x1c0 [drm_kms_helper]
[ 3083.157587] intel_atomic_commit+0x12e/0x4e0 [i915]
[ 3083.157605] drm_atomic_helper_page_flip+0xa2/0xb0 [drm_kms_helper]
[ 3083.157621] drm_mode_page_flip_ioctl+0x7d2/0x850 [drm]
[ 3083.157638] ? drm_mode_cursor2_ioctl+0x10/0x10 [drm]
[ 3083.157652] ? drm_lease_owner+0x1a/0x30 [drm]
[ 3083.157668] ? drm_mode_cursor2_ioctl+0x10/0x10 [drm]
[ 3083.157681] drm_ioctl_kernel+0xa7/0xf0 [drm]
[ 3083.157696] drm_ioctl+0x45b/0x560 [drm]
[ 3083.157711] ? drm_mode_cursor2_ioctl+0x10/0x10 [drm]
[ 3083.157725] ? drm_getstats+0x20/0x20 [drm]
[ 3083.157729] ? timerqueue_del+0x49/0x80
[ 3083.157732] ? __remove_hrtimer+0x62/0xb0
[ 3083.157735] ? hrtimer_try_to_cancel+0x173/0x210
[ 3083.157738] do_vfs_ioctl+0x13b/0x880
[ 3083.157741] ? ioctl_preallocate+0x140/0x140
[ 3083.157744] ? _raw_spin_unlock_irq+0xe/0x30
[ 3083.157746] ? do_setitimer+0x234/0x370
[ 3083.157750] ? SyS_setitimer+0x19e/0x1b0
[ 3083.157752] ? SyS_alarm+0x140/0x140
[ 3083.157755] ? __rcu_read_unlock+0x66/0x80
[ 3083.157757] ? __fget+0xc4/0x100
[ 3083.157760] SyS_ioctl+0x74/0x80
[ 3083.157763] entry_SYSCALL_64_fastpath+0x1a/0x7d
[ 3083.157765] RIP: 0033:0x7f6135d0c6a7
[ 3083.157767] RSP: 002b:00007fff01451888 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
[ 3083.157769] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f6135d0c6a7
[ 3083.157771] RDX: 00007fff01451950 RSI: 00000000c01864b0 RDI: 000000000000000c
[ 3083.157772] RBP: 00007f613076f600 R08: 0000000000000001 R09: 0000000000000000
[ 3083.157773] R10: 0000000000000060 R11: 0000000000003246 R12: 0000000000000000
[ 3083.157774] R13: 0000000000000060 R14: 000000000000001b R15: 0000000000000060

[ 3083.157779] Allocated by task 831:
[ 3083.157783] kmem_cache_alloc+0xc0/0x200
[ 3083.157822] i915_gem_request_await_dma_fence+0x2c4/0x5d0 [i915]
[ 3083.157861] i915_gem_request_await_object+0x321/0x370 [i915]
[ 3083.157900] i915_gem_do_execbuffer+0x1165/0x19c0 [i915]
[ 3083.157937] i915_gem_execbuffer2+0x1ad/0x550 [i915]
[ 3083.157950] drm_ioctl_kernel+0xa7/0xf0 [drm]
[ 3083.157962] drm_ioctl+0x45b/0x560 [drm]
[ 3083.157964] do_vfs_ioctl+0x13b/0x880
[ 3083.157966] SyS_ioctl+0x74/0x80
[ 3083.157968] entry_SYSCALL_64_fastpath+0x1a/0x7d

[ 3083.157971] Freed by task 831:
[ 3083.157973] kmem_cache_free+0x77/0x220
[ 3083.158012] i915_gem_request_retire+0x72c/0xa70 [i915]
[ 3083.158051] i915_gem_request_alloc+0x1e9/0x8b0 [i915]
[ 3083.158089] i915_gem_do_execbuffer+0xa96/0x19c0 [i915]
[ 3083.158127] i915_gem_execbuffer2+0x1ad/0x550 [i915]
[ 3083.158140] drm_ioctl_kernel+0xa7/0xf0 [drm]
[ 3083.158153] drm_ioctl+0x45b/0x560 [drm]
[ 3083.158155] do_vfs_ioctl+0x13b/0x880
[ 3083.158156] SyS_ioctl+0x74/0x80
[ 3083.158158] entry_SYSCALL_64_fastpath+0x1a/0x7d

[ 3083.158162] The buggy address belongs to the object at ffff8806bf20f400
which belongs to the cache i915_dependency of size 64
[ 3083.158166] The buggy address is located 0 bytes inside of
64-byte region [ffff8806bf20f400, ffff8806bf20f440)
[ 3083.158168] The buggy address belongs to the page:
[ 3083.158171] page:00000000d43decc4 count:1 mapcount:0 mapping: (null) index:0x0
[ 3083.158174] flags: 0x17ffe0000000100(slab)
[ 3083.158179] raw: 017ffe0000000100 0000000000000000 0000000000000000 0000000180200020
[ 3083.158182] raw: ffffea001afc16c0 0000000500000005 ffff880731b881c0 0000000000000000
[ 3083.158184] page dumped because: kasan: bad access detected

[ 3083.158187] Memory state around the buggy address:
[ 3083.158190] ffff8806bf20f300: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 3083.158192] ffff8806bf20f380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 3083.158195] >ffff8806bf20f400: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 3083.158196] ^
[ 3083.158199] ffff8806bf20f480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 3083.158201] ffff8806bf20f500: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[ 3083.158203] ==================================================================

Reported-by: Alexandru Chirvasitu <achirvasub@gmail.com>
Reported-by: Mike Keehan <mike@keehan.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104436
Fixes: 1f181225f8ec ("drm/i915/execlists: Keep request->priority for its lifetime")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Alexandru Chirvasitu <achirvasub@gmail.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Tested-by: Alexandru Chirvasitu <achirvasub@gmail.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180106105618.13532-1-chris@chris-wilson.co.uk


# fcb1de54 19-Dec-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add a strong mb to resetting the has-CS-interrupt bit

After a reset, the state of the CSB registers are scrubbed and not valid
until a powercontext is reloaded. We only know when a powercontext has
been reloaded once we see a CS-interrupt, before then we must ignore the
CSB registers within the execlists_submission_tasklet. However, glk is
sporadically dying with an illegal CSB pointer value (both in the HSWP
and mmio) suggesting that it is running with the CS-interrupt bit set
before the powercontext has been reloaded. Make sure the clearing of
that bit is serialised on reset with the re-enabling of the tasklet.

References: https://bugs.freedesktop.org/show_bug.cgi?id=104262
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171219090110.11153-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# b65a9b98 18-Dec-2017 Matthew Auld <matthew.auld@intel.com>

drm/i915: prefer i915_gem_object_has_pages()

We have an existing helper for testing obj->mm.pages, so use it.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171218103855.25274-1-matthew.auld@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 7b6da818 15-Dec-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Restore the kernel context after a GPU reset on an idle engine

As part of the system requirement for powersaving is that we always have
a context loaded. Upon boot and resume, we load the kernel_context to
ensure that some valid state is set before powersaving kicks in, we
should do so after a full GPU reset as well. We only need to do so for
an idle engine, as any active engines will restart by executing the
stuck request, loading its context. For the idle engine, we create a
new request to load the kernel_context instead.

For whatever reason, perfoming a dummy execute on the idle engine after
reset papers over a subsequent GPU hang in rare circumstances, even on
machines not using contexts (e.g. Pineview).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104259
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104261
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171216000334.8197-1-chris@chris-wilson.co.uk


# 2797c4a1 04-Dec-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush pending GTT writes before unbinding

From the shrinker paths, we want to relinquish the GPU and GGTT access to
the object, releasing the backing storage back to the system for
swapout. As a part of that process we would unpin the pages, marking
them for access by the CPU (for the swapout/swapin). However, if that
process was interrupted after unbind the vma, we missed a flush of the
inflight GGTT writes before we made that GTT space available again for
reuse, with the prospect that we would redirect them to another page.

The bug dates back to the introduction of multiple GGTT vma, but the
code itself dates to commit 02bef8f98d26 ("drm/i915: Unbind closed vma
for i915_gem_object_unbind()").

Fixes: 02bef8f98d26 ("drm/i915: Unbind closed vma for i915_gem_object_unbind()")
Fixes: c5ad54cf7dd8 ("drm/i915: Use partial view in mmap fault handler")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171204132513.7303-1-chris@chris-wilson.co.uk
(cherry picked from commit 5888fc9eac3c2ff96e76aeeb865fdb46ab2d711e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 61b5c158 13-Dec-2017 Michał Winiarski <michal.winiarski@intel.com>

drm/i915/guc: Extract guc_init from guc_init_hw

After GPU reset, GuC HW needs to be reinitialized (with FW reload).
Unfortunately, we're doing some extra work there (mostly allocating stuff),
work that can be moved to guc_init and called once at driver load time.

As a side effect we're no longer hitting an assert in
i915_ggtt_enable_guc on suspend/resume.

v2: Do not duplicate disable_communication / reset_guc_interrupts
v3: Add proper teardown after rebase

References: 04f7b24eccdf ("drm/i915/guc: Assert that we switch between known ggtt->invalidate functions")
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213221352.7173-3-michal.winiarski@intel.com


# 3176ff49 13-Dec-2017 Michał Winiarski <michal.winiarski@intel.com>

drm/i915/guc: Move GuC workqueue allocations outside of the mutex

This gets rid of the following lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
4.15.0-rc2-CI-Patchwork_7428+ #1 Not tainted
------------------------------------------------------
debugfs_test/1351 is trying to acquire lock:
(&dev->struct_mutex){+.+.}, at: [<000000009d90d1a3>] i915_mutex_lock_interruptible+0x47/0x130 [i915]

but task is already holding lock:
(&mm->mmap_sem){++++}, at: [<000000005df01c1e>] __do_page_fault+0x106/0x560

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #6 (&mm->mmap_sem){++++}:
__might_fault+0x63/0x90
_copy_to_user+0x1e/0x70
filldir+0x8c/0xf0
dcache_readdir+0xeb/0x160
iterate_dir+0xe6/0x150
SyS_getdents+0xa0/0x130
entry_SYSCALL_64_fastpath+0x1c/0x89

-> #5 (&sb->s_type->i_mutex_key#5){++++}:
lockref_get+0x9/0x20

-> #4 ((completion)&req.done){+.+.}:
wait_for_common+0x54/0x210
devtmpfs_create_node+0x130/0x150
device_add+0x5ad/0x5e0
device_create_groups_vargs+0xd4/0xe0
device_create+0x35/0x40
msr_device_create+0x22/0x40
cpuhp_invoke_callback+0xc5/0xbf0
cpuhp_thread_fun+0x167/0x210
smpboot_thread_fn+0x17f/0x270
kthread+0x173/0x1b0
ret_from_fork+0x24/0x30

-> #3 (cpuhp_state-up){+.+.}:
cpuhp_issue_call+0x132/0x1c0
__cpuhp_setup_state_cpuslocked+0x12f/0x2a0
__cpuhp_setup_state+0x3a/0x50
page_writeback_init+0x3a/0x5c
start_kernel+0x393/0x3e2
secondary_startup_64+0xa5/0xb0

-> #2 (cpuhp_state_mutex){+.+.}:
__mutex_lock+0x81/0x9b0
__cpuhp_setup_state_cpuslocked+0x4b/0x2a0
__cpuhp_setup_state+0x3a/0x50
page_alloc_init+0x1f/0x26
start_kernel+0x139/0x3e2
secondary_startup_64+0xa5/0xb0

-> #1 (cpu_hotplug_lock.rw_sem){++++}:
cpus_read_lock+0x34/0xa0
apply_workqueue_attrs+0xd/0x40
__alloc_workqueue_key+0x2c7/0x4e1
intel_guc_submission_init+0x10c/0x650 [i915]
intel_uc_init_hw+0x29e/0x460 [i915]
i915_gem_init_hw+0xca/0x290 [i915]
i915_gem_init+0x115/0x3a0 [i915]
i915_driver_load+0x9a8/0x16c0 [i915]
i915_pci_probe+0x2e/0x90 [i915]
pci_device_probe+0x9c/0x120
driver_probe_device+0x2a3/0x480
__driver_attach+0xd9/0xe0
bus_for_each_dev+0x57/0x90
bus_add_driver+0x168/0x260
driver_register+0x52/0xc0
do_one_initcall+0x39/0x150
do_init_module+0x56/0x1ef
load_module+0x231c/0x2d70
SyS_finit_module+0xa5/0xe0
entry_SYSCALL_64_fastpath+0x1c/0x89

-> #0 (&dev->struct_mutex){+.+.}:
lock_acquire+0xaf/0x200
__mutex_lock+0x81/0x9b0
i915_mutex_lock_interruptible+0x47/0x130 [i915]
i915_gem_fault+0x201/0x760 [i915]
__do_fault+0x15/0x70
__handle_mm_fault+0x85b/0xe40
handle_mm_fault+0x14f/0x2f0
__do_page_fault+0x2d1/0x560
page_fault+0x22/0x30

other info that might help us debug this:

Chain exists of:
&dev->struct_mutex --> &sb->s_type->i_mutex_key#5 --> &mm->mmap_sem

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&mm->mmap_sem);
lock(&sb->s_type->i_mutex_key#5);
lock(&mm->mmap_sem);
lock(&dev->struct_mutex);

*** DEADLOCK ***

1 lock held by debugfs_test/1351:
#0: (&mm->mmap_sem){++++}, at: [<000000005df01c1e>] __do_page_fault+0x106/0x560

stack backtrace:
CPU: 2 PID: 1351 Comm: debugfs_test Not tainted 4.15.0-rc2-CI-Patchwork_7428+ #1
Hardware name: /NUC6i5SYB, BIOS SYSKLi35.86A.0057.2017.0119.1758 01/19/2017
Call Trace:
dump_stack+0x5f/0x86
print_circular_bug+0x230/0x3b0
check_prev_add+0x439/0x7b0
? lockdep_init_map_crosslock+0x20/0x20
? unwind_get_return_address+0x16/0x30
? __lock_acquire+0x1385/0x15a0
__lock_acquire+0x1385/0x15a0
lock_acquire+0xaf/0x200
? i915_mutex_lock_interruptible+0x47/0x130 [i915]
__mutex_lock+0x81/0x9b0
? i915_mutex_lock_interruptible+0x47/0x130 [i915]
? i915_mutex_lock_interruptible+0x47/0x130 [i915]
? i915_mutex_lock_interruptible+0x47/0x130 [i915]
i915_mutex_lock_interruptible+0x47/0x130 [i915]
? __pm_runtime_resume+0x4f/0x80
i915_gem_fault+0x201/0x760 [i915]
__do_fault+0x15/0x70
__handle_mm_fault+0x85b/0xe40
handle_mm_fault+0x14f/0x2f0
__do_page_fault+0x2d1/0x560
page_fault+0x22/0x30
RIP: 0033:0x7f98d6f49116
RSP: 002b:00007ffd6ffc3278 EFLAGS: 00010283
RAX: 00007f98d39a2bc0 RBX: 0000000000000000 RCX: 0000000000001680
RDX: 0000000000001680 RSI: 00007ffd6ffc3400 RDI: 00007f98d39a2bc0
RBP: 00007ffd6ffc33a0 R08: 0000000000000000 R09: 00000000000005a0
R10: 000055e847c2a830 R11: 0000000000000002 R12: 0000000000000001
R13: 000055e847c1d040 R14: 00007ffd6ffc3400 R15: 00007f98d6752ba0

v2: Init preempt_work unconditionally (Chris)
v3: Mention that we need the enable_guc=1 for lockdep splat (Chris)

Testcase: igt/debugfs_test/read_all_entries # with i915.enable_guc=1
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213221352.7173-2-michal.winiarski@intel.com


# 6ca9a2be 13-Dec-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Unwind i915_gem_init() failure

Since Michal introduced new user controllable errors other than -EIO
during i915_gem_init(), we need to actually unwind on the error path as
we have to abort the module load (and we expect to do so cleanly!).

As we now teardown key state and then mark the driver as wedged (on
EIO), we have to be careful to not allow ourselves to resume and
unwedge, thus attempting to use the uninitialised driver.

v2: Try not to free driver state for the suppressed EIO
v3: Use load-fault-injection to test both error/recovery paths.

References: 8620eb1dbbf2 ("drm/i915/uc: Don't use -EIO to report missing firmware")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171213134347.4608-1-chris@chris-wilson.co.uk


# d7dc4131 12-Dec-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Don't check #active_requests from i915_gem_wait_for_idle()

i915_gem_wait_for_idle() is called from inside the shrinker, to ensure
that we drain the last resources from the GPU in dire circumstances (OOM).
As we may allocate whilst building a request, it is then possible to hit
the shrinker with a request under construction, and so we must account
for the incomplete request whilst waiting. In particular, we
preincrement (in reserve_engine) the i915->gt.active_requests counter
and mark the GPU as busy, therefore we can not use that counter for
shortcircuiting the wait-for-idle.

[ 950.859024] GEM_BUG_ON(i915->gt.active_requests)
[ 950.859041] WARNING: CPU: 2 PID: 2178 at drivers/gpu/drm/i915/i915_gem.c:3615 i915_gem_wait_for_idle.part.56+0x166/0x4e0
[ 950.859041] Modules linked in: ccm tun fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_security iptable_raw arc4 iwldvm mac80211 snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_codec btusb snd_hda_core btrtl btbcm iwlwifi snd_hwdep btintel bluetooth snd_seq snd_seq_device snd_pcm ecdh_generic x86_pkg_temp_thermal tpm_infineon coretemp tpm_tis crc32_pclmul wmi_bmof crc32c_intel iTCO_wdt hp_wmi snd_timer iTCO_vendor_support sparse_keymap tpm_tis_core mei_me cfg80211
[ 950.859082] snd joydev tpm mei rfkill pcspkr wmi soundcore lpc_ich hp_accel lis3lv02d input_polldev binfmt_misc e1000e ptp serio_raw pps_core
[ 950.859094] CPU: 2 PID: 2178 Comm: gem_exec_nop Tainted: G U 4.15.0-rc2+ #900
[ 950.859102] Hardware name: Hewlett-Packard HP ProBook 6360b/1620, BIOS 68SCF Ver. B.42 12/29/2010
[ 950.859107] task: c5119cb4 task.stack: f3ccb8d8
[ 950.859112] EIP: i915_gem_wait_for_idle.part.56+0x166/0x4e0
[ 950.859113] EFLAGS: 00010296 CPU: 2
[ 950.859114] EAX: 00000024 EBX: f36c1888 ECX: f777a044 EDX: 00000007
[ 950.859115] ESI: f36c1888 EDI: edd53958 EBP: edd53970 ESP: edd53938
[ 950.859116] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 950.859117] CR0: 80050033 CR2: b7f39000 CR3: 2f2b3000 CR4: 000406d0
[ 950.859118] Call Trace:
[ 950.859125] ? drm_printk+0x70/0x70
[ 950.859129] i915_gem_wait_for_idle+0x18/0x30
[ 950.859133] i915_gem_shrink+0x360/0x410
[ 950.859138] ? vmpressure+0xa8/0xf0
[ 950.859142] ? ktime_get+0x4a/0x100
[ 950.859147] i915_gem_shrink_all+0x21/0x40
[ 950.859151] i915_gem_shrinker_oom+0x23/0x130
[ 950.859156] notifier_call_chain+0x4e/0x70
[ 950.859160] __blocking_notifier_call_chain+0x2f/0x60
[ 950.859164] blocking_notifier_call_chain+0x11/0x20
[ 950.859169] out_of_memory+0x207/0x280
[ 950.859174] __alloc_pages_nodemask+0xd47/0xe60
[ 950.859179] new_slab+0x32d/0x450
[ 950.859183] ___slab_alloc.constprop.81+0x358/0x4e0
[ 950.859189] ? i915_sw_fence_await_dma_fence+0x53/0x160
[ 950.859193] ? __slab_free+0x1fe/0x310
[ 950.859197] ? native_sched_clock+0x1e/0xc0
[ 950.859201] ? i915_gem_request_alloc+0xcf/0x510
[ 950.859205] ? sched_clock+0x9/0x10
[ 950.859209] __slab_alloc.constprop.80+0x29/0x40
[ 950.859212] ? __slab_alloc.constprop.80+0x29/0x40
[ 950.859216] kmem_cache_alloc_trace+0x160/0x1a0
[ 950.859220] ? i915_sw_fence_await_dma_fence+0x53/0x160
[ 950.859224] i915_sw_fence_await_dma_fence+0x53/0x160
[ 950.859229] i915_gem_request_await_dma_fence+0x1eb/0x390
[ 950.859233] i915_gem_request_await_object+0xee/0x230
[ 950.859239] i915_gem_do_execbuffer+0xc16/0x1200
[ 950.859246] ? irqtime_account_irq+0x3e/0xc0
[ 950.859251] ? irq_exit+0x4f/0xb0
[ 950.859257] ? smp_apic_timer_interrupt+0x5f/0x110
[ 950.859261] ? apic_timer_interrupt+0x35/0x3c
[ 950.859266] i915_gem_execbuffer2_ioctl+0x212/0x440
[ 950.859270] ? apic_timer_interrupt+0x35/0x3c
[ 950.859274] ? i915_gem_do_execbuffer+0x1200/0x1200
[ 950.859279] ? insn_get_seg_base+0x1b/0x50
[ 950.859283] ? i915_gem_do_execbuffer+0x1200/0x1200
[ 950.859287] drm_ioctl_kernel+0x51/0xa0
[ 950.859291] drm_ioctl+0x2a3/0x350
[ 950.859294] ? i915_gem_do_execbuffer+0x1200/0x1200
[ 950.859300] ? sched_clock+0x9/0x10
[ 950.859303] ? drm_getunique+0x70/0x70
[ 950.859308] do_vfs_ioctl+0x7d/0x640
[ 950.859311] ? native_sched_clock+0x1e/0xc0
[ 950.859315] ? sched_clock+0x9/0x10
[ 950.859319] ? sched_clock_cpu+0x13/0x120
[ 950.859323] SyS_ioctl+0x4e/0x80
[ 950.859326] do_fast_syscall_32+0x75/0x250
[ 950.859331] ? irq_exit+0x4f/0xb0
[ 950.859334] entry_SYSENTER_32+0x47/0x71
[ 950.859338] EIP: 0xb7f81d11
[ 950.859339] EFLAGS: 00000296 CPU: 2
[ 950.859340] EAX: ffffffda EBX: 00000003 ECX: 40406469 EDX: bfde4c20
[ 950.859340] ESI: 00000003 EDI: 40406469 EBP: 00000003 ESP: bfde4b38
[ 950.859341] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[ 950.859343] Code: e8 30 60 01 00 83 c4 10 83 c3 04 39 f3 75 e0 8b 45 d8 8b 80 14 37 00 00 85 c0 74 13 68 dd 33 e4 c0 68 49 6f e3 c0 e8 4a 55 be ff <0f> ff 5e 5f b8 fe ff ff 3f bb 0a 00 00 00 e8 b7 14 c4 ff 8b 15

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171212132148.8124-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 59e4b19d 11-Dec-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Dump the engine state before declaring wedged from wait_for_engines()

If wait_for_engines() fails and we resort to declaring the HW wedged,
dump the engine state for debugging.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171211194135.27095-2-chris@chris-wilson.co.uk


# ee42c00e 11-Dec-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Bump timeout for wait_for_engines()

Extract the timeout we use in i915_gem_idle_work_handler() and reuse it
for wait_for_engines() in i915_gem_wait_for_idle(). It too has the same
problem in sometimes having to wait for an extended period before the HW
settles, so make use of the same timeout.

References: 5427f207852d ("drm/i915: Bump wait-times for the final CS interrupt before parking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171211194135.27095-1-chris@chris-wilson.co.uk


# 73ebd503 11-Dec-2017 Matthew Auld <matthew.auld@intel.com>

drm/i915: make mappable struct resource centric

Now that we are using struct resource to track the stolen region, it is
more convenient if we track the mappable region in a resource as well.

v2: prefer iomap and gmadr naming scheme
prefer DEFINE_RES_MEM

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171211151822.20953-8-matthew.auld@intel.com


# b6876374 05-Dec-2017 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Restore GT performance in headless mode with DMC loaded

It seems that the DMC likes to transition between the DC states a lot when
there are no connected displays (no active power domains) during command
submission.

This activity on DC states has a negative impact on the performance of the
chip with huge latencies observed in the interrupt handlers and elsewhere.
Simple tests like igt/gem_latency -n 0 are slowed down by a factor of
eight.

Work around it by introducing a new power domain named,
POWER_DOMAIN_GT_IRQ, associtated with the "DC off" power well, which is
held for the duration of command submission activity.

CNL has the same problem which will be addressed as a follow-up. Doing
that requires a fix for a DC6 context corruption problem in the CNL DMC
firmware which is yet to be released.

v2:
* Add commit text as comment in i915_gem_mark_busy. (Chris Wilson)
* Protect macro body with braces. (Jani Nikula)

v3:
* Add dedicated power domain for clarity. (Chris, Imre)
* Commit message and comment text updates.
* Apply to all big-core GEN9 parts apart for Skylake which is pending DMC
firmware release.

v4:
* Power domain should be inner to device runtime pm. (Chris)
* Simplify NEEDS_CSR_GT_PERF_WA macro. (Chris)
* Handle async DMC loading by moving the GT_IRQ power domain logic into
intel_runtime_pm. (Daniel, Chris)
* Include small core GEN9 as well. (Imre)

v5
* Special handling for async DMC load is not needed since on failure the
power domain reference is kept permanently taken. (Imre)

v6:
* Drop the NEEDS_CSR_GT_PERF_WA macro since all firmwares have now been
deployed. (Imre, Chris)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100572
Testcase: igt/gem_exec_nop/headless
Cc: Imre Deak <imre.deak@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v5)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[Imre: Add note about applying the WA on CNL as a follow-up]
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171205132854.26380-1-tvrtko.ursulin@linux.intel.com


# e2189dd0 07-Dec-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refactor common list iteration over GGTT vma

In quite a few places, we have a list iteration over the vma on an
object that only want to inspect GGTT vma. By construction, these are
placed at the start of the list, so we have copied that knowledge into
many callsites. Pull that knowledge back to i915_vma.h and provide a
for_each_ggtt_vma() to tidy up the code.

v2: Add a backreference from vma_create() to remind ourselves why we put
ggtt vma at the head of the obj->vma_list (and ppgtt vma at the tail).
v3: Fixup s/vma/V/

Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171207211407.31549-1-chris@chris-wilson.co.uk


# 7125397b 05-Dec-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track GGTT writes on the vma

As writes through the GTT and GGTT PTE updates do not share the same
path, they are not strictly ordered and so we must explicitly flush the
indirect writes prior to modifying the PTE. We do track outstanding GGTT
writes on the object itself, but since the object may have multiple GGTT
vma, that is overly coarse as we can track and flush individual vma as
required.

Whilst here, update the GGTT flushing behaviour for Cannonlake.

v2: Hard-code ring offset to allow use during unload (after RCS may have
been freed, or never existed!)

References: https://bugs.freedesktop.org/show_bug.cgi?id=104002
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171206124914.19960-2-chris@chris-wilson.co.uk


# 010e3e68 05-Dec-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove vma from object on destroy, not close

Originally we translated from the object to the vma by walking
obj->vma_list to find the matching vm (for user lookups). Now we process
user lookups using the rbtree, and we only use obj->vma_list itself for
maintaining state (e.g. ensuring that all vma are flushed or rebound).
As such maintenance needs to go on beyond the user's awareness of the
vma, defer removal of the vma from the obj->vma_list from i915_vma_close()
to i915_vma_destroy()

Fixes: 5888fc9eac3c ("drm/i915: Flush pending GTT writes before unbinding")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104155
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171206124914.19960-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# ef78970a 22-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Call i915_gem_init_userptr() before taking struct_mutex

We don't need struct_mutex to initialise userptr (it just allocates a
workqueue for itself etc), but we do need struct_mutex later on in
i915_gem_init() in order to feed requests onto the HW.

This should break the chain

[ 385.697902] ======================================================
[ 385.697907] WARNING: possible circular locking dependency detected
[ 385.697913] 4.14.0-CI-Patchwork_7234+ #1 Tainted: G U
[ 385.697917] ------------------------------------------------------
[ 385.697922] perf_pmu/2631 is trying to acquire lock:
[ 385.697927] (&mm->mmap_sem){++++}, at: [<ffffffff811bfe1e>] __might_fault+0x3e/0x90
[ 385.697941]
but task is already holding lock:
[ 385.697946] (&cpuctx_mutex){+.+.}, at: [<ffffffff8116fe8c>] perf_event_ctx_lock_nested+0xbc/0x1d0
[ 385.697957]
which lock already depends on the new lock.

[ 385.697963]
the existing dependency chain (in reverse order) is:
[ 385.697970]
-> #4 (&cpuctx_mutex){+.+.}:
[ 385.697980] __mutex_lock+0x86/0x9b0
[ 385.697985] perf_event_init_cpu+0x5a/0x90
[ 385.697991] perf_event_init+0x178/0x1a4
[ 385.697997] start_kernel+0x27f/0x3f1
[ 385.698003] verify_cpu+0x0/0xfb
[ 385.698006]
-> #3 (pmus_lock){+.+.}:
[ 385.698015] __mutex_lock+0x86/0x9b0
[ 385.698020] perf_event_init_cpu+0x21/0x90
[ 385.698025] cpuhp_invoke_callback+0xca/0xc00
[ 385.698030] _cpu_up+0xa7/0x170
[ 385.698035] do_cpu_up+0x57/0x70
[ 385.698039] smp_init+0x62/0xa6
[ 385.698044] kernel_init_freeable+0x97/0x193
[ 385.698050] kernel_init+0xa/0x100
[ 385.698055] ret_from_fork+0x27/0x40
[ 385.698058]
-> #2 (cpu_hotplug_lock.rw_sem){++++}:
[ 385.698068] cpus_read_lock+0x39/0xa0
[ 385.698073] apply_workqueue_attrs+0x12/0x50
[ 385.698078] __alloc_workqueue_key+0x1d8/0x4d8
[ 385.698134] i915_gem_init_userptr+0x5f/0x80 [i915]
[ 385.698176] i915_gem_init+0x7c/0x390 [i915]
[ 385.698213] i915_driver_load+0x99e/0x15c0 [i915]
[ 385.698250] i915_pci_probe+0x33/0x90 [i915]
[ 385.698256] pci_device_probe+0xa1/0x130
[ 385.698262] driver_probe_device+0x293/0x440
[ 385.698267] __driver_attach+0xde/0xe0
[ 385.698272] bus_for_each_dev+0x5c/0x90
[ 385.698277] bus_add_driver+0x16d/0x260
[ 385.698282] driver_register+0x57/0xc0
[ 385.698287] do_one_initcall+0x3e/0x160
[ 385.698292] do_init_module+0x5b/0x1fa
[ 385.698297] load_module+0x2374/0x2dc0
[ 385.698302] SyS_finit_module+0xaa/0xe0
[ 385.698307] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 385.698311]
-> #1 (&dev->struct_mutex){+.+.}:
[ 385.698320] __mutex_lock+0x86/0x9b0
[ 385.698361] i915_mutex_lock_interruptible+0x4c/0x130 [i915]
[ 385.698403] i915_gem_fault+0x206/0x760 [i915]
[ 385.698409] __do_fault+0x1a/0x70
[ 385.698413] __handle_mm_fault+0x7c4/0xdb0
[ 385.698417] handle_mm_fault+0x154/0x300
[ 385.698440] __do_page_fault+0x2d6/0x570
[ 385.698445] page_fault+0x22/0x30
[ 385.698449]
-> #0 (&mm->mmap_sem){++++}:
[ 385.698459] lock_acquire+0xaf/0x200
[ 385.698464] __might_fault+0x68/0x90
[ 385.698470] _copy_to_user+0x1e/0x70
[ 385.698475] perf_read+0x1aa/0x290
[ 385.698480] __vfs_read+0x23/0x120
[ 385.698484] vfs_read+0xa3/0x150
[ 385.698488] SyS_read+0x45/0xb0
[ 385.698493] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 385.698497]
other info that might help us debug this:

[ 385.698505] Chain exists of:
&mm->mmap_sem --> pmus_lock --> &cpuctx_mutex

[ 385.698517] Possible unsafe locking scenario:

[ 385.698522] CPU0 CPU1
[ 385.698526] ---- ----
[ 385.698529] lock(&cpuctx_mutex);
[ 385.698553] lock(pmus_lock);
[ 385.698558] lock(&cpuctx_mutex);
[ 385.698564] lock(&mm->mmap_sem);
[ 385.698568]
*** DEADLOCK ***

[ 385.698574] 1 lock held by perf_pmu/2631:
[ 385.698578] #0: (&cpuctx_mutex){+.+.}, at: [<ffffffff8116fe8c>] perf_event_ctx_lock_nested+0xbc/0x1d0
[ 385.698589]
stack backtrace:
[ 385.698595] CPU: 3 PID: 2631 Comm: perf_pmu Tainted: G U 4.14.0-CI-Patchwork_7234+ #1
[ 385.698602] Hardware name: /NUC6CAYB, BIOS AYAPLCEL.86A.0040.2017.0619.1722 06/19/2017
[ 385.698609] Call Trace:
[ 385.698615] dump_stack+0x5f/0x86
[ 385.698621] print_circular_bug.isra.18+0x1d0/0x2c0
[ 385.698627] __lock_acquire+0x19c3/0x1b60
[ 385.698634] ? generic_exec_single+0x77/0xe0
[ 385.698640] ? lock_acquire+0xaf/0x200
[ 385.698644] lock_acquire+0xaf/0x200
[ 385.698650] ? __might_fault+0x3e/0x90
[ 385.698655] __might_fault+0x68/0x90
[ 385.698660] ? __might_fault+0x3e/0x90
[ 385.698665] _copy_to_user+0x1e/0x70
[ 385.698670] perf_read+0x1aa/0x290
[ 385.698675] __vfs_read+0x23/0x120
[ 385.698682] ? __fget+0x101/0x1f0
[ 385.698686] vfs_read+0xa3/0x150
[ 385.698691] SyS_read+0x45/0xb0
[ 385.698696] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 385.698701] RIP: 0033:0x7ff1c46876ed
[ 385.698705] RSP: 002b:00007fff13552f90 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
[ 385.698712] RAX: ffffffffffffffda RBX: ffffc90000647ff0 RCX: 00007ff1c46876ed
[ 385.698718] RDX: 0000000000000010 RSI: 00007fff13552fa0 RDI: 0000000000000005
[ 385.698723] RBP: 000056063d300580 R08: 0000000000000000 R09: 0000000000000060
[ 385.698729] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000046
[ 385.698734] R13: 00007fff13552c6f R14: 00007ff1c6279d00 R15: 00007ff1c6279a40

Testcase: igt/perf_pmu
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171122172621.16158-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
(cherry picked from commit ee48700dd57d9ce783ec40f035b324d0b75632e4)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 5888fc9e 04-Dec-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush pending GTT writes before unbinding

From the shrinker paths, we want to relinquish the GPU and GGTT access to
the object, releasing the backing storage back to the system for
swapout. As a part of that process we would unpin the pages, marking
them for access by the CPU (for the swapout/swapin). However, if that
process was interrupted after unbind the vma, we missed a flush of the
inflight GGTT writes before we made that GTT space available again for
reuse, with the prospect that we would redirect them to another page.

The bug dates back to the introduction of multiple GGTT vma, but the
code itself dates to commit 02bef8f98d26 ("drm/i915: Unbind closed vma
for i915_gem_object_unbind()").

Fixes: 02bef8f98d26 ("drm/i915: Unbind closed vma for i915_gem_object_unbind()")
Fixes: c5ad54cf7dd8 ("drm/i915: Use partial view in mmap fault handler")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171204132513.7303-1-chris@chris-wilson.co.uk


# dda4b8f7 30-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Skip switch-to-kernel-context on suspend when wedged

If the HW is already wedged, attempting to submit a request will
generate an -EIO. If we tried this during suspend, we would abort
whereas all we want to do is to go sleep and throw away the corrupt
state.

Fixes: 5ab57c702069 ("drm/i915: Flush logical context image out to memory upon suspend")
Testcase: igt/gem_eio/suspend
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171130102951.14965-1-chris@chris-wilson.co.uk
(cherry picked from commit ecf73eb2d27d43b2153bb80671768a06d35521f1)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# ecf73eb2 30-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Skip switch-to-kernel-context on suspend when wedged

If the HW is already wedged, attempting to submit a request will
generate an -EIO. If we tried this during suspend, we would abort
whereas all we want to do is to go sleep and throw away the corrupt
state.

Fixes: 5ab57c702069 ("drm/i915: Flush logical context image out to memory upon suspend")
Testcase: igt/gem_eio/suspend
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171130102951.14965-1-chris@chris-wilson.co.uk


# d02a1d83 26-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename i915_gem_timelines_mark_idle

The kerneldoc markup for i915_gem_timelines_mark_idle() was incorrect,
so take the opportunity to also convert it from the "mark_idle" to "park"
naming scheme.

drivers/gpu/drm/i915/i915_gem_timeline.c:120: warning: No description found for parameter 'i915'

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171127123054.20966-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# ee48700d 22-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Call i915_gem_init_userptr() before taking struct_mutex

We don't need struct_mutex to initialise userptr (it just allocates a
workqueue for itself etc), but we do need struct_mutex later on in
i915_gem_init() in order to feed requests onto the HW.

This should break the chain

[ 385.697902] ======================================================
[ 385.697907] WARNING: possible circular locking dependency detected
[ 385.697913] 4.14.0-CI-Patchwork_7234+ #1 Tainted: G U
[ 385.697917] ------------------------------------------------------
[ 385.697922] perf_pmu/2631 is trying to acquire lock:
[ 385.697927] (&mm->mmap_sem){++++}, at: [<ffffffff811bfe1e>] __might_fault+0x3e/0x90
[ 385.697941]
but task is already holding lock:
[ 385.697946] (&cpuctx_mutex){+.+.}, at: [<ffffffff8116fe8c>] perf_event_ctx_lock_nested+0xbc/0x1d0
[ 385.697957]
which lock already depends on the new lock.

[ 385.697963]
the existing dependency chain (in reverse order) is:
[ 385.697970]
-> #4 (&cpuctx_mutex){+.+.}:
[ 385.697980] __mutex_lock+0x86/0x9b0
[ 385.697985] perf_event_init_cpu+0x5a/0x90
[ 385.697991] perf_event_init+0x178/0x1a4
[ 385.697997] start_kernel+0x27f/0x3f1
[ 385.698003] verify_cpu+0x0/0xfb
[ 385.698006]
-> #3 (pmus_lock){+.+.}:
[ 385.698015] __mutex_lock+0x86/0x9b0
[ 385.698020] perf_event_init_cpu+0x21/0x90
[ 385.698025] cpuhp_invoke_callback+0xca/0xc00
[ 385.698030] _cpu_up+0xa7/0x170
[ 385.698035] do_cpu_up+0x57/0x70
[ 385.698039] smp_init+0x62/0xa6
[ 385.698044] kernel_init_freeable+0x97/0x193
[ 385.698050] kernel_init+0xa/0x100
[ 385.698055] ret_from_fork+0x27/0x40
[ 385.698058]
-> #2 (cpu_hotplug_lock.rw_sem){++++}:
[ 385.698068] cpus_read_lock+0x39/0xa0
[ 385.698073] apply_workqueue_attrs+0x12/0x50
[ 385.698078] __alloc_workqueue_key+0x1d8/0x4d8
[ 385.698134] i915_gem_init_userptr+0x5f/0x80 [i915]
[ 385.698176] i915_gem_init+0x7c/0x390 [i915]
[ 385.698213] i915_driver_load+0x99e/0x15c0 [i915]
[ 385.698250] i915_pci_probe+0x33/0x90 [i915]
[ 385.698256] pci_device_probe+0xa1/0x130
[ 385.698262] driver_probe_device+0x293/0x440
[ 385.698267] __driver_attach+0xde/0xe0
[ 385.698272] bus_for_each_dev+0x5c/0x90
[ 385.698277] bus_add_driver+0x16d/0x260
[ 385.698282] driver_register+0x57/0xc0
[ 385.698287] do_one_initcall+0x3e/0x160
[ 385.698292] do_init_module+0x5b/0x1fa
[ 385.698297] load_module+0x2374/0x2dc0
[ 385.698302] SyS_finit_module+0xaa/0xe0
[ 385.698307] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 385.698311]
-> #1 (&dev->struct_mutex){+.+.}:
[ 385.698320] __mutex_lock+0x86/0x9b0
[ 385.698361] i915_mutex_lock_interruptible+0x4c/0x130 [i915]
[ 385.698403] i915_gem_fault+0x206/0x760 [i915]
[ 385.698409] __do_fault+0x1a/0x70
[ 385.698413] __handle_mm_fault+0x7c4/0xdb0
[ 385.698417] handle_mm_fault+0x154/0x300
[ 385.698440] __do_page_fault+0x2d6/0x570
[ 385.698445] page_fault+0x22/0x30
[ 385.698449]
-> #0 (&mm->mmap_sem){++++}:
[ 385.698459] lock_acquire+0xaf/0x200
[ 385.698464] __might_fault+0x68/0x90
[ 385.698470] _copy_to_user+0x1e/0x70
[ 385.698475] perf_read+0x1aa/0x290
[ 385.698480] __vfs_read+0x23/0x120
[ 385.698484] vfs_read+0xa3/0x150
[ 385.698488] SyS_read+0x45/0xb0
[ 385.698493] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 385.698497]
other info that might help us debug this:

[ 385.698505] Chain exists of:
&mm->mmap_sem --> pmus_lock --> &cpuctx_mutex

[ 385.698517] Possible unsafe locking scenario:

[ 385.698522] CPU0 CPU1
[ 385.698526] ---- ----
[ 385.698529] lock(&cpuctx_mutex);
[ 385.698553] lock(pmus_lock);
[ 385.698558] lock(&cpuctx_mutex);
[ 385.698564] lock(&mm->mmap_sem);
[ 385.698568]
*** DEADLOCK ***

[ 385.698574] 1 lock held by perf_pmu/2631:
[ 385.698578] #0: (&cpuctx_mutex){+.+.}, at: [<ffffffff8116fe8c>] perf_event_ctx_lock_nested+0xbc/0x1d0
[ 385.698589]
stack backtrace:
[ 385.698595] CPU: 3 PID: 2631 Comm: perf_pmu Tainted: G U 4.14.0-CI-Patchwork_7234+ #1
[ 385.698602] Hardware name: /NUC6CAYB, BIOS AYAPLCEL.86A.0040.2017.0619.1722 06/19/2017
[ 385.698609] Call Trace:
[ 385.698615] dump_stack+0x5f/0x86
[ 385.698621] print_circular_bug.isra.18+0x1d0/0x2c0
[ 385.698627] __lock_acquire+0x19c3/0x1b60
[ 385.698634] ? generic_exec_single+0x77/0xe0
[ 385.698640] ? lock_acquire+0xaf/0x200
[ 385.698644] lock_acquire+0xaf/0x200
[ 385.698650] ? __might_fault+0x3e/0x90
[ 385.698655] __might_fault+0x68/0x90
[ 385.698660] ? __might_fault+0x3e/0x90
[ 385.698665] _copy_to_user+0x1e/0x70
[ 385.698670] perf_read+0x1aa/0x290
[ 385.698675] __vfs_read+0x23/0x120
[ 385.698682] ? __fget+0x101/0x1f0
[ 385.698686] vfs_read+0xa3/0x150
[ 385.698691] SyS_read+0x45/0xb0
[ 385.698696] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 385.698701] RIP: 0033:0x7ff1c46876ed
[ 385.698705] RSP: 002b:00007fff13552f90 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
[ 385.698712] RAX: ffffffffffffffda RBX: ffffc90000647ff0 RCX: 00007ff1c46876ed
[ 385.698718] RDX: 0000000000000010 RSI: 00007fff13552fa0 RDI: 0000000000000005
[ 385.698723] RBP: 000056063d300580 R08: 0000000000000000 R09: 0000000000000060
[ 385.698729] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000046
[ 385.698734] R13: 00007fff13552c6f R14: 00007ff1c6279d00 R15: 00007ff1c6279a40

Testcase: igt/perf_pmu
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171122172621.16158-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# feff0dc6 21-Nov-2017 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915/pmu: Suspend sampling when GPU is idle

If only a subset of events is enabled we can afford to suspend
the sampling timer when the GPU is idle and so save some cycles
and power.

v2: Rebase and limit timer even more.
v3: Rebase.
v4: Rebase.
v5: Skip action if perf PMU failed to register.
v6: Checkpatch cleanup.
v7:
* Add a common helper to start the timer if needed. (Chris Wilson)
* Add comment explaining bitwise logic in pmu_needs_timer.
v8: Fix some comments styles. (Chris Wilson)
v9: Rebase.
v10: Move function declarations to i915_pmu.h.
v11: Rename functions to i915_pmu_gt_(un)parked. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-3-tvrtko.ursulin@linux.intel.com


# 93c6e966 20-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove i915.semaphores modparam

Having disabled the broken semaphores on Sandybridge, there is no need
for a modparam any more, so remove it in favour of a simple
HAS_LEGACY_SEMAPHORES() guard.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-5-chris@chris-wilson.co.uk


# 0da715ee 20-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Disable semaphores on Sandybridge

I should have admitted defeat long ago as there has been a rare but
persistent error on Sandybridge where semaphore signaling did not
propagate to the waiter, leading to a GPU hang.

With the work on fence signaling for v4.9, the impact of using CPU driven
signaling was greatly reduced wrt to the latency of GPU semaphores,
though without logical rings support, the benefit of reordering work to
avoid bubbles is not realised (i.e. as it stands fence signaling is just
a slower, more costly version of HW semaphores; but works more
consistently). As a rough indicator of the difference,

with semaphores:
Sequential (3 engines, 1 processes): average 5.470us per cycle [expected 4.988us]
w/o semaphores:
Sequential (3 engines, 1 processes): average 15.771us per cycle [expected 4.923us]

In comparison, v3.4:
with semaphores:
Sequential (3 engines, 1 processes): average 16.066us per cycle [expected 11.842us]
w/o semaphores:
Sequential (3 engines, 1 processes): average 23.460us per cycle [expected 11.839us]

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54226 #and 100+ dupes
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-3-chris@chris-wilson.co.uk


# 79e6770c 20-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove obsolete ringbuffer emission for gen8+

Since removing the module parameter to force selection of ringbuffer
emission for gen8, the code is defunct. Remove it.

To put the difference into perspective, a couple of microbenchmarks
(bdw i7-5557u, 20170324):
ring execlists
exec continuous nops on all rings: 1.491us 2.223us
exec sequential nops on each ring: 12.508us 53.682us
single nop + sync: 9.272us 30.291us

vblank_mode=0 glxgears: ~11000fps ~9000fps

Since the earlier submission, gen8 ringbuffer submission has fallen
further and further behind in features. So while ringbuffer may hold the
throughput crown, in terms of interactive latency, execlists is much
better. Alas, we have no convenient metrics for such, other than
demonstrating things we can do with execlists but can not using
legacy ringbuffer submission.

We have made a few improvements to lowlevel execlists throughput,
and ringbuffer currently panics on boot! (bdw i7-5557u, 20171026):

ring execlists
exec continuous nops on all rings: n/a 1.921us
exec sequential nops on each ring: n/a 44.621us
single nop + sync: n/a 21.953us

vblank_mode=0 glxgears: n/a ~18500fps

References: https://bugs.freedesktop.org/show_bug.cgi?id=87725
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Once-upon-a-time-Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-2-chris@chris-wilson.co.uk


# fb5c551a 20-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove i915.enable_execlists module parameter

Execlists and legacy ringbuffer submission are no longer feature
comparable (execlists now offer greater functionality that should
overcome their performance hit) and obsoletes the unsafe module
parameter, i.e. comparing the two modes of execution is no longer
useful, so remove the debug tool.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #i915_perf.c
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-1-chris@chris-wilson.co.uk


# 3fef5cda 20-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Automatic i915_switch_context for legacy

During request construction, after pinning the context we know whether
or not we have to emit a context switch. So move this common operation
from every caller into i915_gem_request_alloc() itself.

v2: Always submit the request if we emitted some commands during request
construction, as typically it also involves changes in global state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120102002.22254-2-chris@chris-wilson.co.uk


# c6dce8f1 16-Nov-2017 Sagar Arun Kamble <sagar.a.kamble@intel.com>

drm/i915: Update execlists tasklet naming

intel_lrc_irq_handler and i915_guc_irq_handler are HW submission related
tasklet functions. Name them with "submission_tasklet" suffix and
remove intel/i915 prefix as they are static. Also rename irq_tasklet
as just tasklet for clarity.

v2: s/_bh/_tasklet (Chris)

Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1510839162-25197-2-git-send-email-sagar.a.kamble@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 7469c62c 14-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Resume GuC before using GEM

Resuming GEM presumes it can talk to hw, in particular to ensure the
kernel context is loaded upon resume for powersaving. If the GuC is
still asleep at this point, we upset the HW. Rearrange the resume such
that we restore the original order of init-hw, resume-guc, use-gem.

Fixes: 37cd33006d02 ("drm/i915: Remove redundant intel_autoenable_gt_powersave()")
References: a1c419941453 ("drm/i915/guc: Add host2guc notification for suspend and resume")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Alex Dai <yu.dai@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171114130300.25677-2-chris@chris-wilson.co.uk
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>


# a03f395a 14-Nov-2017 Tina Zhang <tina.zhang@intel.com>

drm/i915: Introduce GEM proxy

GEM proxy is a kind of GEM, whose backing physical memory is pinned
and produced by guest VM and is used by host as read only. With GEM
proxy, host is able to access guest physical memory through GEM object
interface. As GEM proxy is such a special kind of GEM, a new flag
I915_GEM_OBJECT_IS_PROXY is introduced to ban host from changing the
backing storage of GEM proxy.

v3:
- update "Reviewed-by". (Joonas)

v2:
- return -ENXIO when pin and map pages of GEM proxy to kernel space.
(Chris)

Here are the histories of this patch in "Dma-buf support for Gvt-g"
patch-set:

v14:
- return -ENXIO when gem proxy object is banned by ioctl.
(Chris) (Daniel)

v13:
- add comments to GEM proxy. (Chris)
- don't ban GEM proxy in i915_gem_sw_finish_ioctl. (Chris)
- check GEM proxy bar after finishing i915_gem_object_wait. (Chris)
- remove GEM proxy bar in i915_gem_madvise_ioctl.

v6:
- add gem proxy barrier in the following ioctls. (Chris)
i915_gem_set_caching_ioctl
i915_gem_set_domain_ioctl
i915_gem_sw_finish_ioctl
i915_gem_set_tiling_ioctl
i915_gem_madvise_ioctl

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v1
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1510555798-21079-2-git-send-email-tina.zhang@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171114102513.22269-2-chris@chris-wilson.co.uk


# 274b2462 14-Nov-2017 Tina Zhang <tina.zhang@intel.com>

drm/i915: Object w/o backing storage is banned by -ENXIO

-ENXIO should be returned when operations are banned from changing
backing storage of objects without backing storage.

v4:
- update "Reviewed-by". (Joonas)

v3:
- separate this patch from "Introduce GEM proxy" patch-set. (Joonas)

v2:
- update the patch description and subject to just mention objects w/o
backing storage, instead of "GEM proxy". (Joonas)

Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1510555798-21079-1-git-send-email-tina.zhang@intel.com
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171114102513.22269-1-chris@chris-wilson.co.uk


# 37cd3300 12-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove redundant intel_autoenable_gt_powersave()

Now that we always execute a context switch upon module load, there is
no need to queue a delayed task for doing so. The purpose of the delayed
task is to enable GT powersaving, for which we need the HW state to be
valid (i.e. having loaded a context and initialised basic state). We
used to defer this operation as historically it was slow (due to slow
register polling, fixed with commit 1758b90e38f5 ("drm/i915: Use a hybrid
scheme for fast register waits")) but now we have a requirement to save
the default HW state.

v2: Load the kernel context (to provide the power context) upon resume.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171112112738.1463-3-chris@chris-wilson.co.uk


# 9c52d1c8 10-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/selftests: Yet another forgotten mock_i915->mm initialiser

Move all of the i915->mm initialisation to a private function that can
be reused by the mock i915 device to save forgetting any more steps.

For example,
<7>[ 1542.046332] [IGT] drv_selftest: starting subtest mock_objects
<4>[ 1542.123924] Setting dangerous option mock_selftests - tainting kernel
<6>[ 1542.167941] i915: Performing mock selftests with st_random_seed=0x246f5ab5 st_timeout=1000
<4>[ 1542.178012] INFO: trying to register non-static key.
<4>[ 1542.178027] the code is fine but needs lockdep annotation.
<4>[ 1542.178032] turning off the locking correctness validator.
<4>[ 1542.178041] CPU: 3 PID: 6008 Comm: kworker/3:7 Tainted: G U 4.14.0-rc8-CI-CI_DRM_3332+ #1
<4>[ 1542.178049] Hardware name: /NUC6CAYB, BIOS AYAPLCEL.86A.0040.2017.0619.1722 06/19/2017
<4>[ 1542.178144] Workqueue: events __i915_gem_free_work [i915]
<4>[ 1542.178152] Call Trace:
<4>[ 1542.178163] dump_stack+0x68/0x9f
<4>[ 1542.178170] register_lock_class+0x3fd/0x580
<4>[ 1542.178177] ? unwind_next_frame+0x14/0x20
<4>[ 1542.178184] ? __save_stack_trace+0x73/0xd0
<4>[ 1542.178191] __lock_acquire+0xa4/0x1b00
<4>[ 1542.178254] ? __i915_gem_free_work+0x28/0xa0 [i915]
<4>[ 1542.178261] ? __lock_acquire+0x4ab/0x1b00
<4>[ 1542.178268] lock_acquire+0xb0/0x200
<4>[ 1542.178273] ? lock_acquire+0xb0/0x200
<4>[ 1542.178336] ? __i915_gem_free_work+0x28/0xa0 [i915]
<4>[ 1542.178344] _raw_spin_lock+0x32/0x50
<4>[ 1542.178405] ? __i915_gem_free_work+0x28/0xa0 [i915]
<4>[ 1542.178468] __i915_gem_free_work+0x28/0xa0 [i915]
<4>[ 1542.178476] process_one_work+0x221/0x650
<4>[ 1542.178483] worker_thread+0x4e/0x3c0
<4>[ 1542.178489] kthread+0x114/0x150
<4>[ 1542.178494] ? process_one_work+0x650/0x650
<4>[ 1542.178499] ? kthread_create_on_node+0x40/0x40
<4>[ 1542.178506] ret_from_fork+0x27/0x40

v2: Fish out i915->mm.object_stat_lock which was being inited over in
i915_drv.c (Matthew)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110232447.21618-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>


# a0a8b1cf 09-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Kerneldoc typo s/rps/rps_client/

Update the kerneldoc parameter name to match the real parameter name.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171109140644.10805-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>


# d2b4b979 10-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Record the default hw state after reset upon load

Take a copy of the HW state after a reset upon module loading by
executing a context switch from a blank context to the kernel context,
thus saving the default hw state over the blank context image.
We can then use the default hw state to initialise any future context,
ensuring that each starts with the default view of hw state.

v2: Unmap our default state from the GTT after stealing it from the
context. This should stop us from accidentally overwriting it via the
GTT (and frees up some precious GTT space).

Testcase: igt/gem_ctx_isolation
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-7-chris@chris-wilson.co.uk


# cc6a818a 10-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move intel_init_clock_gating() to i915_gem_init()

Despite its name intel_init_clock_gating applies both display clock gating
workarounds; GT mmio workarounds and the occasional GT power context
workaround. Worse, sometimes it includes a context register workaround
which we need to apply before we record the default HW state for all
contexts.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-4-chris@chris-wilson.co.uk


# f58d13d5 10-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move GT powersaving init to i915_gem_init()

GT powersaving is tightly coupled to the request infrastructure. To
avoid complications with the order of initialisation in the next patch
(where we want to send requests to hw during GEM init) move the
powersaving initialisation into the purview of i915_gem_init().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-3-chris@chris-wilson.co.uk


# ae6c4574 10-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Force the switch to the i915->kernel_context

In the next few patches, we will have a hard requirement that we emit a
context-switch to the perma-pinned i915->kernel_context (so that we can
save the HW state using that context-switch). As the first context
itself may be classed as a kernel context, we want to be explicit in our
comparison. For an extra-layer of finesse, we can check the last
unretired context on the engine; as well as the last retired context
when idle.

v2: verbose verbosity
v3: Always force the switch, even when the engine is idle, and update
the assert that this happens before suspend.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v1
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-2-chris@chris-wilson.co.uk


# 0f763ff3 06-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Lock llist_del_first() vs llist_del_all()

An oversight in commit 87701b4b5593 ("drm/i915: Only free the oldest
stale object before a fresh allocation") was that not only do we have to
serialise concurrent users of llist_del_first(), but we also have to
lock llist_del_first() vs llist_del_all().

From llist.h,

* This can be summarized as follows:
*
* | add | del_first | del_all
* add | - | - | -
* del_first | | L | L
* del_all | | | -
*
* Where, a particular row's operation can happen concurrently with a column's
* operation, with "-" being no lock needed, while "L" being lock is needed.

This should hopefully explain:

<4>[ 89.287106] general protection fault: 0000 [#1] PREEMPT SMP
<4>[ 89.287126] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp i915 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core r8169 mii mei_me mei snd_pcm prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel
<4>[ 89.287226] CPU: 2 PID: 23 Comm: ksoftirqd/2 Tainted: G U 4.14.0-rc8-CI-CI_DRM_3315+ #1
<4>[ 89.287247] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4>[ 89.287270] task: ffff88017ab34ec0 task.stack: ffffc90000128000
<4>[ 89.287290] RIP: 0010:llist_add_batch+0x4/0x20
<4>[ 89.287301] RSP: 0018:ffffc9000012bdb8 EFLAGS: 00010296
<4>[ 89.287314] RAX: ffffffff811017ad RBX: 6e468801a1560000 RCX: ef3e53fceecdeb81
<4>[ 89.287330] RDX: 6e468801a1566130 RSI: ffff880103d73d98 RDI: ffff880103d73d98
<4>[ 89.287346] RBP: ffffc9000012bdb8 R08: ffff88017ab35780 R09: 0000000000000000
<4>[ 89.287361] R10: ffffc9000012bd68 R11: 00000000abb18c3d R12: ffffffffa01369e0
<4>[ 89.287377] R13: ffff88017fd1b8f8 R14: ffff88017ab34ec0 R15: 000000000000000a
<4>[ 89.287393] FS: 0000000000000000(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000
<4>[ 89.287411] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 89.287424] CR2: 00007ff0c0755018 CR3: 000000016df9b000 CR4: 00000000003406e0
<4>[ 89.287440] Call Trace:
<4>[ 89.287511] __i915_gem_free_object_rcu+0x20/0x40 [i915]
<4>[ 89.287527] rcu_process_callbacks+0x27a/0x730
<4>[ 89.287544] __do_softirq+0xc0/0x4ae
<4>[ 89.287559] ? smpboot_thread_fn+0x2d/0x280
<4>[ 89.287571] run_ksoftirqd+0x1f/0x70
<4>[ 89.287582] smpboot_thread_fn+0x18a/0x280
<4>[ 89.287595] kthread+0x114/0x150
<4>[ 89.287605] ? sort_range+0x30/0x30
<4>[ 89.287615] ? kthread_create_on_node+0x40/0x40
<4>[ 89.287628] ret_from_fork+0x27/0x40
<4>[ 89.287641] Code: 0d 48 83 ea 01 4c 89 c1 48 83 fa ff 74 12 48 23 0c d7 74 ed 48 c1 e2 06 48 0f bd c9 48 8d 04 0a 5d c3 90 90 90 90 90 55 48 89 e5 <48> 8b 0a 48 89 0e 48 89 c8 f0 48 0f b1 3a 48 39 c1 75 ed 48 85
<1>[ 89.287774] RIP: llist_add_batch+0x4/0x20 RSP: ffffc9000012bdb8
<4>[ 89.287826] ---[ end trace e775d15174d8ae02 ]---

(Lockless lists are only easy (and lockless) when only using
llist_add/llist_del_all!)

Fixes: 87701b4b5593 ("drm/i915: Only free the oldest stale object before
a fresh allocation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171106111508.11941-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
(cherry picked from commit f991c492aa55fb1c6834882c5d786d5bb3b25f07)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# f991c492 06-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Lock llist_del_first() vs llist_del_all()

An oversight in commit 87701b4b5593 ("drm/i915: Only free the oldest
stale object before a fresh allocation") was that not only do we have to
serialise concurrent users of llist_del_first(), but we also have to
lock llist_del_first() vs llist_del_all().

From llist.h,

* This can be summarized as follows:
*
* | add | del_first | del_all
* add | - | - | -
* del_first | | L | L
* del_all | | | -
*
* Where, a particular row's operation can happen concurrently with a column's
* operation, with "-" being no lock needed, while "L" being lock is needed.

This should hopefully explain:

<4>[ 89.287106] general protection fault: 0000 [#1] PREEMPT SMP
<4>[ 89.287126] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp i915 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core r8169 mii mei_me mei snd_pcm prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel
<4>[ 89.287226] CPU: 2 PID: 23 Comm: ksoftirqd/2 Tainted: G U 4.14.0-rc8-CI-CI_DRM_3315+ #1
<4>[ 89.287247] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4>[ 89.287270] task: ffff88017ab34ec0 task.stack: ffffc90000128000
<4>[ 89.287290] RIP: 0010:llist_add_batch+0x4/0x20
<4>[ 89.287301] RSP: 0018:ffffc9000012bdb8 EFLAGS: 00010296
<4>[ 89.287314] RAX: ffffffff811017ad RBX: 6e468801a1560000 RCX: ef3e53fceecdeb81
<4>[ 89.287330] RDX: 6e468801a1566130 RSI: ffff880103d73d98 RDI: ffff880103d73d98
<4>[ 89.287346] RBP: ffffc9000012bdb8 R08: ffff88017ab35780 R09: 0000000000000000
<4>[ 89.287361] R10: ffffc9000012bd68 R11: 00000000abb18c3d R12: ffffffffa01369e0
<4>[ 89.287377] R13: ffff88017fd1b8f8 R14: ffff88017ab34ec0 R15: 000000000000000a
<4>[ 89.287393] FS: 0000000000000000(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000
<4>[ 89.287411] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 89.287424] CR2: 00007ff0c0755018 CR3: 000000016df9b000 CR4: 00000000003406e0
<4>[ 89.287440] Call Trace:
<4>[ 89.287511] __i915_gem_free_object_rcu+0x20/0x40 [i915]
<4>[ 89.287527] rcu_process_callbacks+0x27a/0x730
<4>[ 89.287544] __do_softirq+0xc0/0x4ae
<4>[ 89.287559] ? smpboot_thread_fn+0x2d/0x280
<4>[ 89.287571] run_ksoftirqd+0x1f/0x70
<4>[ 89.287582] smpboot_thread_fn+0x18a/0x280
<4>[ 89.287595] kthread+0x114/0x150
<4>[ 89.287605] ? sort_range+0x30/0x30
<4>[ 89.287615] ? kthread_create_on_node+0x40/0x40
<4>[ 89.287628] ret_from_fork+0x27/0x40
<4>[ 89.287641] Code: 0d 48 83 ea 01 4c 89 c1 48 83 fa ff 74 12 48 23 0c d7 74 ed 48 c1 e2 06 48 0f bd c9 48 8d 04 0a 5d c3 90 90 90 90 90 55 48 89 e5 <48> 8b 0a 48 89 0e 48 89 c8 f0 48 0f b1 3a 48 39 c1 75 ed 48 85
<1>[ 89.287774] RIP: llist_add_batch+0x4/0x20 RSP: ffffc9000012bdb8
<4>[ 89.287826] ---[ end trace e775d15174d8ae02 ]---

(Lockless lists are only easy (and lockless) when only using
llist_add/llist_del_all!)

Fixes: 87701b4b5593 ("drm/i915: Only free the oldest stale object before
a fresh allocation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171106111508.11941-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>


# 136109c6 02-Nov-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Set up mocs tables before restarting the engines

After a reset, we may immediately begin executing requests on restarting
the engines. Ergo this has to be last step with all re-initialisation
completed beforehand. The mocs setup was after we started executing the
requests; do it earlier!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171102131430.22328-1-chris@chris-wilson.co.uk


# 68027387 26-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move parking-while-active warning to intel_engines_park()

We will want to break this down to give detailed per-engine warnings as
to why we still think we are active as we attempt to park the engines.
For the first step, just move the warning verbatim from the idle-worker
to intel_engines_park().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171027110617.31745-3-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>


# 23e87338 26-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Hold rcu_read_lock when iterating over the radixtree (objects)

Kasan spotted

[IGT] gem_tiled_pread_pwrite: exiting, ret=0
==================================================================
BUG: KASAN: use-after-free in __i915_gem_object_reset_page_iter+0x15c/0x170 [i915]
Read of size 8 at addr ffff8801359da310 by task kworker/3:2/182

CPU: 3 PID: 182 Comm: kworker/3:2 Tainted: G U 4.14.0-rc6-CI-Custom_3340+ #1
Hardware name: Intel Corp. Geminilake/GLK RVP1 DDR4 (05), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
Workqueue: events __i915_gem_free_work [i915]
Call Trace:
dump_stack+0x68/0xa0
print_address_description+0x78/0x290
? __i915_gem_object_reset_page_iter+0x15c/0x170 [i915]
kasan_report+0x23d/0x350
__asan_report_load8_noabort+0x19/0x20
__i915_gem_object_reset_page_iter+0x15c/0x170 [i915]
? i915_gem_object_truncate+0x100/0x100 [i915]
? lock_acquire+0x380/0x380
__i915_gem_object_put_pages+0x30d/0x530 [i915]
__i915_gem_free_objects+0x551/0xbd0 [i915]
? lock_acquire+0x13e/0x380
__i915_gem_free_work+0x4e/0x70 [i915]
process_one_work+0x6f6/0x1590
? pwq_dec_nr_in_flight+0x2b0/0x2b0
worker_thread+0xe6/0xe90
? pci_mmcfg_check_reserved+0x110/0x110
kthread+0x309/0x410
? process_one_work+0x1590/0x1590
? kthread_create_on_node+0xb0/0xb0
ret_from_fork+0x27/0x40

Allocated by task 1801:
save_stack_trace+0x1b/0x20
kasan_kmalloc+0xee/0x190
kasan_slab_alloc+0x12/0x20
kmem_cache_alloc+0xdc/0x2e0
radix_tree_node_alloc.constprop.12+0x48/0x330
__radix_tree_create+0x274/0x480
__radix_tree_insert+0xa2/0x610
i915_gem_object_get_sg+0x224/0x670 [i915]
i915_gem_object_get_page+0xb5/0x1c0 [i915]
i915_gem_pread_ioctl+0x822/0xf60 [i915]
drm_ioctl_kernel+0x13f/0x1c0
drm_ioctl+0x6cf/0x980
do_vfs_ioctl+0x184/0xf30
SyS_ioctl+0x41/0x70
entry_SYSCALL_64_fastpath+0x1c/0xb1

Freed by task 37:
save_stack_trace+0x1b/0x20
kasan_slab_free+0xaf/0x190
kmem_cache_free+0xbf/0x340
radix_tree_node_rcu_free+0x79/0x90
rcu_process_callbacks+0x46d/0xf40
__do_softirq+0x21c/0x8d3

The buggy address belongs to the object at ffff8801359da0f0
which belongs to the cache radix_tree_node of size 576
The buggy address is located 544 bytes inside of
576-byte region [ffff8801359da0f0, ffff8801359da330)
The buggy address belongs to the page:
page:ffffea0004d67600 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0
flags: 0x8000000000008100(slab|head)
raw: 8000000000008100 0000000000000000 0000000000000000 0000000100110011
raw: ffffea0004b52920 ffffea0004b38020 ffff88015b416a80 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8801359da200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8801359da280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8801359da300: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
^
ffff8801359da380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8801359da400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Disabling lock debugging due to kernel taint

which looks like the slab containing the radixtree iter was freed as we
traversed the tree, taking the rcu read lock across the loop should
prevent that (deferring all the frees until the end).

Reported-by: Tomi Sarvela <tomi.p.sarvela@intel.com>
Fixes: 96d776345277 ("drm/i915: Use a radixtree for random access to the object's backing storage")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171026130032.10677-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
(cherry picked from commit bea6e987c1ff358224e7bef7084be7650f5d1c38)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# bea6e987 26-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Hold rcu_read_lock when iterating over the radixtree (objects)

Kasan spotted

[IGT] gem_tiled_pread_pwrite: exiting, ret=0
==================================================================
BUG: KASAN: use-after-free in __i915_gem_object_reset_page_iter+0x15c/0x170 [i915]
Read of size 8 at addr ffff8801359da310 by task kworker/3:2/182

CPU: 3 PID: 182 Comm: kworker/3:2 Tainted: G U 4.14.0-rc6-CI-Custom_3340+ #1
Hardware name: Intel Corp. Geminilake/GLK RVP1 DDR4 (05), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
Workqueue: events __i915_gem_free_work [i915]
Call Trace:
dump_stack+0x68/0xa0
print_address_description+0x78/0x290
? __i915_gem_object_reset_page_iter+0x15c/0x170 [i915]
kasan_report+0x23d/0x350
__asan_report_load8_noabort+0x19/0x20
__i915_gem_object_reset_page_iter+0x15c/0x170 [i915]
? i915_gem_object_truncate+0x100/0x100 [i915]
? lock_acquire+0x380/0x380
__i915_gem_object_put_pages+0x30d/0x530 [i915]
__i915_gem_free_objects+0x551/0xbd0 [i915]
? lock_acquire+0x13e/0x380
__i915_gem_free_work+0x4e/0x70 [i915]
process_one_work+0x6f6/0x1590
? pwq_dec_nr_in_flight+0x2b0/0x2b0
worker_thread+0xe6/0xe90
? pci_mmcfg_check_reserved+0x110/0x110
kthread+0x309/0x410
? process_one_work+0x1590/0x1590
? kthread_create_on_node+0xb0/0xb0
ret_from_fork+0x27/0x40

Allocated by task 1801:
save_stack_trace+0x1b/0x20
kasan_kmalloc+0xee/0x190
kasan_slab_alloc+0x12/0x20
kmem_cache_alloc+0xdc/0x2e0
radix_tree_node_alloc.constprop.12+0x48/0x330
__radix_tree_create+0x274/0x480
__radix_tree_insert+0xa2/0x610
i915_gem_object_get_sg+0x224/0x670 [i915]
i915_gem_object_get_page+0xb5/0x1c0 [i915]
i915_gem_pread_ioctl+0x822/0xf60 [i915]
drm_ioctl_kernel+0x13f/0x1c0
drm_ioctl+0x6cf/0x980
do_vfs_ioctl+0x184/0xf30
SyS_ioctl+0x41/0x70
entry_SYSCALL_64_fastpath+0x1c/0xb1

Freed by task 37:
save_stack_trace+0x1b/0x20
kasan_slab_free+0xaf/0x190
kmem_cache_free+0xbf/0x340
radix_tree_node_rcu_free+0x79/0x90
rcu_process_callbacks+0x46d/0xf40
__do_softirq+0x21c/0x8d3

The buggy address belongs to the object at ffff8801359da0f0
which belongs to the cache radix_tree_node of size 576
The buggy address is located 544 bytes inside of
576-byte region [ffff8801359da0f0, ffff8801359da330)
The buggy address belongs to the page:
page:ffffea0004d67600 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0
flags: 0x8000000000008100(slab|head)
raw: 8000000000008100 0000000000000000 0000000000000000 0000000100110011
raw: ffffea0004b52920 ffffea0004b38020 ffff88015b416a80 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8801359da200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8801359da280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8801359da300: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
^
ffff8801359da380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8801359da400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Disabling lock debugging due to kernel taint

which looks like the slab containing the radixtree iter was freed as we
traversed the tree, taking the rcu read lock across the loop should
prevent that (deferring all the frees until the end).

Reported-by: Tomi Sarvela <tomi.p.sarvela@intel.com>
Fixes: 96d776345277 ("drm/i915: Use a radixtree for random access to the object's backing storage")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171026130032.10677-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>


# c41937fd 26-Oct-2017 Michał Winiarski <michal.winiarski@intel.com>

drm/i915/guc: Preemption! With GuC

Pretty similar to what we have on execlists.
We're reusing most of the GEM code, however, due to GuC quirks we need a
couple of extra bits.
Preemption is implemented as GuC action, and actions can be pretty slow.
Because of that, we're using a mutex to serialize them. Since we're
requesting preemption from the tasklet, the task of creating a workitem
and wrapping it in GuC action is delegated to a worker.

To distinguish that preemption has finished, we're using additional
piece of HWSP, and since we're not getting context switch interrupts,
we're also adding a user interrupt.

The fact that our special preempt context has completed unfortunately
doesn't mean that we're ready to submit new work. We also need to wait
for GuC to finish its own processing.

v2: Don't compile out the wait for GuC, handle workqueue flush on reset,
no need for ordered workqueue, put on a reviewer hat when looking at my own
patches (Chris)
Move struct work around in intel_guc, move user interruput outside of
conditional (Michał)
Keep ring around rather than chase though intel_context

v3: Extract WA for flushing ggtt writes to a helper (Chris)
Keep work_struct in intel_guc rather than engine (Michał)
Use ordered workqueue for inject_preempt worker to avoid GuC quirks.

v4: Drop now unused INTEL_GUC_PREEMPT_OPTION_IMMEDIATE (Daniele)
Drop stray newlines, use container_of for intel_guc in worker,
check for presence of workqueue when flushing it, rather than
enable_guc_submission modparam, reorder preempt postprocessing (Chris)

v5: Make wq NULL after destroying it

v6: Swap struct guc_preempt_work members (Michał)

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171026133558.19580-1-michal.winiarski@intel.com


# 9bdc3573 25-Oct-2017 Michał Winiarski <michal.winiarski@intel.com>

drm/i915/guc: Initialize GuC before restarting engines

Now that we're handling request resubmission the same way as regular
submission (from the tasklet), we can move GuC initialization earlier,
before restarting the engines. This way, we're no longer being in the
state of flux during engine restart - we're already in user requested
submission mode.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171025172519.10670-5-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# aba5e278 25-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add a hook for making the engines idle (parking) and unparking

In the next patch, we will want to install a callback when the engines
(GT as a whole) become idle and similarly when they first become busy.
To enable that callback, first rename intel_engines_mark_idle() to
intel_engines_park() and provide the companion intel_engines_unpark().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171025143943.7661-2-chris@chris-wilson.co.uk


# ff320d6e 23-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Synchronize irq before parking each engine

When we park the engine (upon idling), we kill the irq tasklet. However,
to be sure that it is not restarted by a spurious interrupt after doing so,
flush the interrupt handler before parking. As we only park the engines
when we believe the system is idle, there should not be any interrupts to
distrub us; so flushing the final in-flight interrupt should be sufficient.
(However, we are still dependent on the HW behaving in an orderly and
timely fashion, which we shall endeavour to improve upon later.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171023213237.26536-2-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>


# 5427f207 23-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Bump wait-times for the final CS interrupt before parking

In the idle worker we drop the prolonged GT wakeref used to cover such
essentials as interrupt delivery. (When a CS interrupt arrives, we also
assert that the GT is awake.) However, it turns out that 10ms is not
long enough to be assured that the last CS interrupt has been delivered,
so bump that to 200ms, and move the entirety of that wait to before we
take the struct_mutex to avoid blocking. As this is now a potentially
long wait, restore the earlier behaviour of bailing out early when a new
request arrives.

v2: Break out the repeated check for new requests into its own little
helper to try and improve the self-commentary.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171023213237.26536-1-chris@chris-wilson.co.uk


# 8bd81815 19-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Skip waking the device to service pwrite

If the device is in runtime suspend, resuming takes time and reduces our
powersaving. If this was for a small write into an object, that resume
will take longer than any savings in using the indirect GGTT access to
avoid the cpu cache.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20171019063733.31620-1-chris@chris-wilson.co.uk


# ca8d7822 16-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Report -EFAULT before pwrite fast path into shmemfs

When pwriting into shmemfs, the fast path pagecache_write does not
notice when it is writing to beyond the end of the truncated shmemfs
inode. Report -EFAULT directly when we try to use pwrite into the
!I915_MADV_WILLNEED object.

Fixes: 7c55e2c5772d ("drm/i915: Use pagecache write to prepopulate shmemfs from pwrite-ioctl")
Testcase: igt/gem_madvise/dontneed-before-pwrite
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171016202732.25459-1-chris@chris-wilson.co.uk
(cherry picked from commit a6d65e451cc4e7127698384868a4447ee7be7d16)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# a6d65e45 16-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Report -EFAULT before pwrite fast path into shmemfs

When pwriting into shmemfs, the fast path pagecache_write does not
notice when it is writing to beyond the end of the truncated shmemfs
inode. Report -EFAULT directly when we try to use pwrite into the
!I915_MADV_WILLNEED object.

Fixes: 7c55e2c5772d ("drm/i915: Use pagecache write to prepopulate shmemfs from pwrite-ioctl")
Testcase: igt/gem_madvise/dontneed-before-pwrite
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171016202732.25459-1-chris@chris-wilson.co.uk


# 6f74b36b 15-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Skip HW reinitialisation on resume if still wedged

If we fail to recover the HW state upon resume (i.e. our attempt to
clear the wedged bit and reset during i915_gem_sanitize() fails), then
skip the HW restart inside i915_gem_init_hw(). We will ultimately do the
HW restart when successfully unwedging and resetting the HW later,
but attempting to restore a wedged device upon resume is risky as the HW
is in an unknown state.

v2: Suppress the error message when detecting the already wedged HW.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103240
Testcase: igt/gem_eio/in-flight-suspend
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171015143725.27764-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>


# cc731f5a 13-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Trim struct_mutex hold duration for i915_gem_free_objects

We free objects in bulk after they wait for their RCU grace period.
Currently, we take struct_mutex and unbind all the objects. This can lead
to a long lock duration during which time those objects have their pages
unfreeable (i.e. the shrinker is prevented from reaping those pages). If
we only process a single object under the struct_mutex and then free the
pages, the number of objects locked away from the shrinker is minimal
and we allow regular clients better access to struct_mutex if they need
it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171013202621.7276-9-chris@chris-wilson.co.uk


# 87701b4b 13-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only free the oldest stale object before a fresh allocation

Inspired by Tvrtko's critique of the reaping of the stale contexts
before allocating a new one, also limit the freed object reaping to the
oldest stale object before allocating a fresh object. Unlike contexts,
objects may have radically different sizes of backing storage, but
similar to contexts, while we want to prevent starvation due to
excessive freed lists, we also do not want to delay fresh allocations
for too long. Only freeing the oldest on the freed object list before
each allocation is a reasonable compromise.

v2: Only a single consumer of llist_del_first() is allowed (although
multiple llist_add are still allowed in parallel). Unlike
i915_gem_context, i915_gem_flush_free_objects() is itself not serialized
and so we need to add our own spinlock. Otherwise KASAN eventually spots
a use-after-free for the race on *first->next.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v1
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171013202621.7276-8-chris@chris-wilson.co.uk


# f2123818 15-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move dev_priv->mm.[un]bound_list to its own lock

Remove the struct_mutex requirement around dev_priv->mm.bound_list and
dev_priv->mm.unbound_list by giving it its own spinlock. This reduces
one more requirement for struct_mutex and in the process gives us
slightly more accurate unbound_list tracking, which should improve the
shrinker - but the drawback is that we drop the retirement before
counting so i915_gem_object_is_active() may be stale and lead us to
underestimate the number of objects that may be shrunk (see commit
bed50aea61df ("drm/i915/shrinker: Flush active on objects before
counting")).

v2: Crosslink the spinlock to the lists it protects, and btw this
changes s/obj->global_link/obj->mm.link/
v3: Fix decoupling of old links in i915_gem_object_attach_phys()
v3.1: Fix the fix, only unlink if it was linked
v3.2: Use a local for to_i915(obj->base.dev)->mm.obj_lock

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171016114037.5556-1-chris@chris-wilson.co.uk


# bd3d2252 13-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename obj->pin_display to obj->pin_global

In the next patch, we want to extend use of the global pin counter for
semi-permanent pinning of context/ring objects. Given that we plan to
extend the usage to encompass a disparate set of objects, we want a name
that reflects both and should entail less confusion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171013202621.7276-2-chris@chris-wilson.co.uk


# f1fa4f44 13-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refactor testing obj->mm.pages

Since we occasionally stuff an error pointer into obj->mm.pages for a
semi-permanent or even permanent failure, we have to be more careful and
not just test against NULL when deciding if the object has a complete
set of its concurrent pages.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171013202621.7276-1-chris@chris-wilson.co.uk


# 5d031f4e 12-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Stop asserting on set-wedged vs nop_submit_request ordering

Since the removal of the stop_machine(), it is allowed and expected for
the nop_submit_request() and nop_complete_submit_request() to run in
parallel to the i915_gem_set_wedged() processing. As such we can no
longer assert that i915_gem_set_wedged() has completed inside the
stop_machine prior to the individual nop_submit_request execution.

Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171012204019.3557-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


# af7a8ffa 11-Oct-2017 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Use rcu instead of stop_machine in set_wedged

stop_machine is not really a locking primitive we should use, except
when the hw folks tell us the hw is broken and that's the only way to
work around it.

This patch tries to address the locking abuse of stop_machine() from

commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Nov 22 14:41:21 2016 +0000

drm/i915: Stop the machine as we install the wedged submit_request handler

Chris said parts of the reasons for going with stop_machine() was that
it's no overhead for the fast-path. But these callbacks use irqsave
spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.

To stay as close as possible to the stop_machine semantics we first
update all the submit function pointers to the nop handler, then call
synchronize_rcu() to make sure no new requests can be submitted. This
should give us exactly the huge barrier we want.

I pondered whether we should annotate engine->submit_request as __rcu
and use rcu_assign_pointer and rcu_dereference on it. But the reason
behind those is to make sure the compiler/cpu barriers are there for
when you have an actual data structure you point at, to make sure all
the writes are seen correctly on the read side. But we just have a
function pointer, and .text isn't changed, so no need for these
barriers and hence no need for annotations.

Unfortunately there's a complication with the call to
intel_engine_init_global_seqno:

- Without stop_machine we must hold the corresponding spinlock.

- Without stop_machine we must ensure that all requests are marked as
having failed with dma_fence_set_error() before we call it. That
means we need to split the nop request submission into two phases,
both synchronized with rcu:

1. Only stop submitting the requests to hw and mark them as failed.

2. After all pending requests in the scheduler/ring are suitably
marked up as failed and we can force complete them all, also force
complete by calling intel_engine_init_global_seqno().

This should fix the followwing lockdep splat:

======================================================
WARNING: possible circular locking dependency detected
4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G U
------------------------------------------------------
kworker/3:4/562 is trying to acquire lock:
(cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40

but task is already holding lock:
(&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #6 (&dev->struct_mutex){+.+.}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
__mutex_lock+0x86/0x9b0
mutex_lock_interruptible_nested+0x1b/0x20
i915_mutex_lock_interruptible+0x51/0x130 [i915]
i915_gem_fault+0x209/0x650 [i915]
__do_fault+0x1e/0x80
__handle_mm_fault+0xa08/0xed0
handle_mm_fault+0x156/0x300
__do_page_fault+0x2c5/0x570
do_page_fault+0x28/0x250
page_fault+0x22/0x30

-> #5 (&mm->mmap_sem){++++}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
__might_fault+0x68/0x90
_copy_to_user+0x23/0x70
filldir+0xa5/0x120
dcache_readdir+0xf9/0x170
iterate_dir+0x69/0x1a0
SyS_getdents+0xa5/0x140
entry_SYSCALL_64_fastpath+0x1c/0xb1

-> #4 (&sb->s_type->i_mutex_key#5){++++}:
down_write+0x3b/0x70
handle_create+0xcb/0x1e0
devtmpfsd+0x139/0x180
kthread+0x152/0x190
ret_from_fork+0x27/0x40

-> #3 ((complete)&req.done){+.+.}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
wait_for_common+0x58/0x210
wait_for_completion+0x1d/0x20
devtmpfs_create_node+0x13d/0x160
device_add+0x5eb/0x620
device_create_groups_vargs+0xe0/0xf0
device_create+0x3a/0x40
msr_device_create+0x2b/0x40
cpuhp_invoke_callback+0xc9/0xbf0
cpuhp_thread_fun+0x17b/0x240
smpboot_thread_fn+0x18a/0x280
kthread+0x152/0x190
ret_from_fork+0x27/0x40

-> #2 (cpuhp_state-up){+.+.}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
cpuhp_issue_call+0x133/0x1c0
__cpuhp_setup_state_cpuslocked+0x139/0x2a0
__cpuhp_setup_state+0x46/0x60
page_writeback_init+0x43/0x67
pagecache_init+0x3d/0x42
start_kernel+0x3a8/0x3fc
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0x6d/0x70
verify_cpu+0x0/0xfb

-> #1 (cpuhp_state_mutex){+.+.}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
__mutex_lock+0x86/0x9b0
mutex_lock_nested+0x1b/0x20
__cpuhp_setup_state_cpuslocked+0x53/0x2a0
__cpuhp_setup_state+0x46/0x60
page_alloc_init+0x28/0x30
start_kernel+0x145/0x3fc
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0x6d/0x70
verify_cpu+0x0/0xfb

-> #0 (cpu_hotplug_lock.rw_sem){++++}:
check_prev_add+0x430/0x840
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
cpus_read_lock+0x3d/0xb0
stop_machine+0x1c/0x40
i915_gem_set_wedged+0x1a/0x20 [i915]
i915_reset+0xb9/0x230 [i915]
i915_reset_device+0x1f6/0x260 [i915]
i915_handle_error+0x2d8/0x430 [i915]
hangcheck_declare_hang+0xd3/0xf0 [i915]
i915_hangcheck_elapsed+0x262/0x2d0 [i915]
process_one_work+0x233/0x660
worker_thread+0x4e/0x3b0
kthread+0x152/0x190
ret_from_fork+0x27/0x40

other info that might help us debug this:

Chain exists of:
cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&dev->struct_mutex);
lock(&mm->mmap_sem);
lock(&dev->struct_mutex);
lock(cpu_hotplug_lock.rw_sem);

*** DEADLOCK ***

3 locks held by kworker/3:4/562:
#0: ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
#1: ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
#2: (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]

stack backtrace:
CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G U 4.14.0-rc3-CI-CI_DRM_3179+ #1
Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
Workqueue: events_long i915_hangcheck_elapsed [i915]
Call Trace:
dump_stack+0x68/0x9f
print_circular_bug+0x235/0x3c0
? lockdep_init_map_crosslock+0x20/0x20
check_prev_add+0x430/0x840
? irq_work_queue+0x86/0xe0
? wake_up_klogd+0x53/0x70
__lock_acquire+0x1420/0x15e0
? __lock_acquire+0x1420/0x15e0
? lockdep_init_map_crosslock+0x20/0x20
lock_acquire+0xb0/0x200
? stop_machine+0x1c/0x40
? i915_gem_object_truncate+0x50/0x50 [i915]
cpus_read_lock+0x3d/0xb0
? stop_machine+0x1c/0x40
stop_machine+0x1c/0x40
i915_gem_set_wedged+0x1a/0x20 [i915]
i915_reset+0xb9/0x230 [i915]
i915_reset_device+0x1f6/0x260 [i915]
? gen8_gt_irq_ack+0x170/0x170 [i915]
? work_on_cpu_safe+0x60/0x60
i915_handle_error+0x2d8/0x430 [i915]
? vsnprintf+0xd1/0x4b0
? scnprintf+0x3a/0x70
hangcheck_declare_hang+0xd3/0xf0 [i915]
? intel_runtime_pm_put+0x56/0xa0 [i915]
i915_hangcheck_elapsed+0x262/0x2d0 [i915]
process_one_work+0x233/0x660
worker_thread+0x4e/0x3b0
kthread+0x152/0x190
? process_one_work+0x660/0x660
? kthread_create_on_node+0x40/0x40
ret_from_fork+0x27/0x40
Setting dangerous option reset - tainting kernel
i915 0000:00:02.0: Resetting chip after gpu hang
Setting dangerous option reset - tainting kernel
i915 0000:00:02.0: Resetting chip after gpu hang

v2: Have 1 global synchronize_rcu() barrier across all engines, and
improve commit message.

v3: We need to protect the seqno update with the timeline spinlock (in
set_wedged) to avoid racing with other updates of the seqno, like we
already do in nop_submit_request (Chris).

v4: Use two-phase sequence to plug the race Chris spotted where we can
complete requests before they're marked up with -EIO.

v5: Review from Chris:
- simplify nop_submit_request.
- Add comment to rcu_read_lock section.
- Align comments with the new style.

v6: Remove unused variable to appease CI.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171011091019.1425-1-daniel.vetter@ffwll.ch


# 562d9bae 10-Oct-2017 Sagar Arun Kamble <sagar.a.kamble@intel.com>

drm/i915: Name structure in dev_priv that contains RPS/RC6 state as "gt_pm"

Prepared substructure rps for RPS related state. autoenable_work is
used for RC6 too hence it is defined outside rps structure. As we do
this lot many functions are refactored to use intel_rps *rps to access
rps related members. Hence renamed intel_rps_client pointer variables
to rps_client in various functions.

v2: Rebase.

v3: s/pm/gt_pm (Chris)
Refactored access to rps structure by declaring struct intel_rps * in
many functions.

Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com> #1
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1507360055-19948-9-git-send-email-sagar.a.kamble@intel.com
Acked-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171010213010.7415-8-chris@chris-wilson.co.uk


# b85577b7 05-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Order two completing nop_submit_request

If two nop's (requests in-flight following a wedged device) complete at
the same time, the global_seqno value written to the HWSP is undefined
as the two threads are not serialized.

v2: Use irqsafe spinlock. We expect the callback may be called from
inside another irq spinlock, so we can't unconditionally restore irqs.

Fixes: ce1135c7de64 ("drm/i915: Complete requests in nop_submit_request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006115617.18432-1-chris@chris-wilson.co.uk
(cherry picked from commit 8d550824c6f52506754f11cb6be51aa153cc580d)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 84e8978e 08-Oct-2017 Matthew Auld <matthew.auld@intel.com>

drm/i915: s/sg_mask/sg_page_sizes/

It's a little unclear what the sg_mask actually is, so prefer the more
meaningful name of sg_page_sizes.

Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171009110024.29114-1-matthew.auld@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 43ae70d9 09-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Early rejection of mappable GGTT pin attempts for large bo

Currently, we reject attempting to pin a large bo into the mappable
aperture, but only after trying to create the vma. Under debug kernels,
repeatedly creating and freeing that vma for an oversized bo consumes
one-third of the runtime for pwrite/pread tests as it is spent on
kmalloc/kfree tracking. If we move the rejection to before creating that
vma, we lose some accuracy of checking against the fence_size as opposed
to object size, though the fence can never be smaller than the object.
Note that the vma creation itself will reject an attempt to create a vma
larger than the GTT so we can remove one redundant test.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171009084401.29090-7-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# a3259ca9 09-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid evicting user fault mappable vma for pread/pwrite

Both pread/pwrite GTT paths provide a fast fallback in case we cannot
map the whole object at a time. Currently, we use the fallback for very
large objects and for active objects that would require remapping, but
we can also add active fault mappable objects to the list that we want
to avoid evicting. The rationale is that such fault mappable objects are
in active use and to evict requires tearing down the CPU PTE and forcing
a page fault on the next access; more costly, and intefers with other
processes, than our per-page GTT fallback.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171009084401.29090-6-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# a65adaf8 09-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track user GTT faulting per-vma

We don't wish to refault the entire object (other vma) when unbinding
one partial vma. To do this track which vma have been faulted into the
user's address space.

v2: Use a local vma_offset to tidy up a multiline unmap_mapping_range().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171009084401.29090-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 3bd40735 09-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Consolidate get_fence with pin_fence

Following the pattern now used for obj->mm.pages, use just pin_fence and
unpin_fence to control access to the fence registers. I.e. instead of
calling get_fence(); pin_fence(), we now just need to call pin_fence().
This will make it easier to reduce the locking requirements around
fence registers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171009084401.29090-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 1749d90f 08-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Hold forcewake for the duration of reset+restart

Resetting the engine requires us to hold the forcewake wakeref to
prevent RC6 trying to happen in the middle of the reset sequence. The
consequence of an unwanted RC6 event in the middle is that random state
is then saved to the powercontext and restored later, which may
overwrite the mmio state we need to preserve (e.g. PD_DIR_BASE in the
legacy ringbuffer reset_ring_common()).

This was noticed in the live_hangcheck selftests when Haswell would
sporadically fail to restart during igt_reset_queue().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171009110301.21705-3-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>


# da9fe3f3 06-Oct-2017 Matthew Auld <matthew.auld@intel.com>

drm/i915: disable platform support for vGPU huge gtt pages

Currently gvt gtt handling doesn't support huge page entries, so disable
for now.

v2: remove useless 48b PPGTT check

Suggested-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006145041.21673-20-matthew.auld@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006221833.32439-19-chris@chris-wilson.co.uk


# 4049866f 06-Oct-2017 Matthew Auld <matthew.auld@intel.com>

drm/i915/selftests: huge page tests

v2: mock test page support configurations and add MI_STORE_DWORD test

v3: run all mockable huge page tests on all platforms via the mock_device

v4: add pin_update regression test
various improvements suggested by Chris

v5: fix issues reported by kbuild
test single sg spanning multiple page sizes
don't explode when running the live-tests through the appgtt

v6: lots of improvements from Chris

v7: run on each engine for igt_write_huge
add simple tmpfs fallback test

v8: size_t is bad
don't break the i386 build

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006145041.21673-18-matthew.auld@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006221833.32439-17-chris@chris-wilson.co.uk


# a5c08166 06-Oct-2017 Matthew Auld <matthew.auld@intel.com>

drm/i915: introduce page_size members

In preparation for supporting huge gtt pages for the ppgtt, we introduce
page size members for gem objects. We fill in the page sizes by
scanning the sg table.

v2: pass the sg_mask to set_pages

v3: calculate the sg_mask inline with populating the sg_table where
possible, and pass to set_pages along with the pages.

v4: bunch of improvements from Joonas

v5: fix num_pages blunder
introduce i915_sg_page_sizes helper

v6: prefer GEM_BUG_ON(sizes == 0)

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006145041.21673-7-matthew.auld@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006221833.32439-6-chris@chris-wilson.co.uk


# b91b09ee 06-Oct-2017 Matthew Auld <matthew.auld@intel.com>

drm/i915: push set_pages down to the callers

Each backend is now responsible for calling __i915_gem_object_set_pages
upon successfully gathering its backing storage. This eliminates the
inconsistency between the async and sync paths, which stands out even
more when we start throwing around an sg_mask in a later patch.

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006145041.21673-6-matthew.auld@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006221833.32439-5-chris@chris-wilson.co.uk


# 465c403c 06-Oct-2017 Matthew Auld <matthew.auld@intel.com>

drm/i915: introduce simple gemfs

Not a fully blown gemfs, just our very own tmpfs kernel mount. Doing so
moves us away from the shmemfs shm_mnt, and gives us the much needed
flexibility to do things like set our own mount options, namely huge=
which should allow us to enable the use of transparent-huge-pages for
our shmem backed objects.

v2: various improvements suggested by Joonas

v3: move gemfs instance to i915.mm and simplify now that we have
file_setup_with_mnt

v4: fallback to tmpfs shm_mnt upon failure to setup gemfs

v5: make tmpfs fallback kinder

v5: better gemfs failure message
flags variable

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: linux-mm@kvack.org
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006145041.21673-3-matthew.auld@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006221833.32439-2-chris@chris-wilson.co.uk


# 8d550824 05-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Order two completing nop_submit_request

If two nop's (requests in-flight following a wedged device) complete at
the same time, the global_seqno value written to the HWSP is undefined
as the two threads are not serialized.

v2: Use irqsafe spinlock. We expect the callback may be called from
inside another irq spinlock, so we can't unconditionally restore irqs.

Fixes: ce1135c7de64 ("drm/i915: Complete requests in nop_submit_request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006115617.18432-1-chris@chris-wilson.co.uk


# 7c26240e 06-Oct-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Try harder to finish the idle-worker

If a worker requeues itself, it may switch to a different kworker pool,
which flush_work() considers as complete. To be strict, we then need to
keep flushing the work until it is no longer pending.

References: https://bugs.freedesktop.org/show_bug.cgi?id=102456
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006104038.22337-1-chris@chris-wilson.co.uk


# 269e6ea9 28-Sep-2017 Sagar Arun Kamble <sagar.a.kamble@intel.com>

drm/i915: Move i915_gem_restore_fences to i915_gem_resume

i915_gem_restore_fences is GEM resumption task hence it is moved to
i915_gem_resume from i915_restore_state.

Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1506661116-12106-1-git-send-email-sagar.a.kamble@intel.com
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# b620e870 22-Sep-2017 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Make own struct for execlist items

Engine's execlist related items have been increasing to
a point where a separate struct is warranted. Carve execlist
specific items to a dedicated struct to add clarity.

v2: add kerneldoc and fix whitespace (Joonas, Chris)
v3: csb_mmio changes, rebase
v4: s/\b(el|execlist)\b/execlists/ (Joonas)

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> (v3)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-1-mika.kuoppala@intel.com


# 4f044a88 19-Sep-2017 Michal Wajdeczko <michal.wajdeczko@intel.com>

drm/i915: Rename global i915 to i915_modparams

Our global struct with params is named exactly the same way
as new preferred name for the drm_i915_private function parameter.
To avoid such name reuse lets use different name for the global.

v5: pure rename
v6: fix

Credits-to: Coccinelle

@@
identifier n;
@@
(
- i915.n
+ i915_modparams.n
)

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ville Syrjala <ville.syrjala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170919193846.38060-1-michal.wajdeczko@intel.com


# 27a5f61b 15-Sep-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Cancel all ready but queued requests when wedging

When wedging the hw, we want to mark all in-flight requests as -EIO.
This is made slightly more complex by execlists who store the ready but
not yet submitted-to-hw requests on a private queue (an rbtree
priolist). Call into execlists to cancel not only the ELSP tracking for
the submitted requests, but also the queue of unsubmitted requests.

v2: Move the majority of engine_set_wedged to the backends (both legacy
ringbuffer and execlists handling their own lists).

Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Testcase: igt/gem_eio/in-flight-contexts
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170915173100.26470-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>


# 0ee931c4 13-Sep-2017 Michal Hocko <mhocko@suse.com>

mm: treewide: remove GFP_TEMPORARY allocation flag

GFP_TEMPORARY was introduced by commit e12ba74d8ff3 ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation. As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.

The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory. So
this is rather misleading and hard to evaluate for any benefits.

I have checked some random users and none of them has added the flag
with a specific justification. I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring. This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.

I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL. Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.

I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.

This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic. It
seems to be a heuristic without any measured advantage for most (if not
all) its current users. The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers. So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.

[1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org

[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c5ba5b24 07-Sep-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Apply the GTT write flush for all !llc machines

We also see the delayed GTT write issue on i915g/i915gm, so let's
presume that it is a universal problem for all !llc machines, and that we
just haven't yet noticed on g33, gen4 and gen5 machines.

v2: Use a register that exists on all platforms

Testcase: igt/gem_mmap_gtt/coherency # i915gm
References: https://bugs.freedesktop.org/show_bug.cgi?id=102577
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170907184520.5032-1-chris@chris-wilson.co.uk
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>


# 750fae23 07-Sep-2017 Ville Syrjälä <ville.syrjala@linux.intel.com>

i915: Fix obj size vs. alignment for drm_pci_alloc()

drm_pci_alloc() refuses to cooperate if the passed alignment exceeds the
object size. So round up the obj size to the next power of two as well
to make this actually work.

Obviously things work just fine as long as the size was a power of two
to begin with. However kms_cursor_crc doesn't always use power of two
sizes so we hit a failure when we try to allocate the phys memory.

Testcase: igt/kms_cursor_crc
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170907143203.13055-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# 5602452e 03-Aug-2017 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Use __sg_alloc_table_from_pages for userptr allocations

With the addition of __sg_alloc_table_from_pages we can control
the maximum coalescing size and eliminate a separate path for
allocating backing store here.

Similar to 871dfbd67d4e ("drm/i915: Allow compaction upto
SWIOTLB max segment size") this enables more compact sg lists to
be created and so has a beneficial effect on workloads with many
and/or large objects of this class.

v2:
* Rename helper to i915_sg_segment_size and fix swiotlb override.
* Commit message update.

v3:
* Actually include the swiotlb override fix.

v4:
* Regroup parameters a bit. (Chris Wilson)

v5:
* Rebase for swiotlb_max_segment.
* Add DMA map failure handling as in abb0deacb5a6
("drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping").

v6: Handle swiotlb_max_segment() returning 1. (Joonas Lahtinen)

v7: Rebase.
v8: Commit spelling fix.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170803091417.23677-1-tvrtko.ursulin@linux.intel.com


# 912d572d 06-Sep-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: wire up shrinkctl->nr_scanned

shrink_slab() allows us to report back the number of objects we
successfully scanned (out of the target shrinkctl->nr_to_scan). As
report the number of pages owned by each GEM object as a separate item
to the shrinker, we cannot precisely control the number of shrinker
objects we scan on each pass; and indeed may free more than requested.
If we fail to tell the shrinker about the number of objects we process,
it will continue to hold a grudge against us as any objects left
unscanned are added to the next reclaim -- and so we will keep on
"unfairly" shrinking our own slab in comparison to other slabs.

Link: http://lkml.kernel.org/r/20170822135325.9191-2-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Shaohua Li <shli@fb.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 88c880bb 06-Sep-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Lift has-pinned-pages assert to caller of ____i915_gem_object_get_pages

i915_gem_object_attach_phys() is trying to swap out its shmemfs pages
for a new set of physically contiguous pages, but unfortunately triggers
an assert inside get-pages.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102561
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170906135220.13508-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>


# 6910d852 01-Sep-2017 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Add __rcu to radix tree slot pointer

radix_tree_for_each_slot() wants an __rcu annotated pointer for the
slot. So let's add the annotation.

Fixes the following sparse warnings:
i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces)
i915_gem.c:2217:9: expected void **slot
i915_gem.c:2217:9: got void [noderef] <asn:4>**
i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces)
i915_gem.c:2217:9: expected void **slot
i915_gem.c:2217:9: got void [noderef] <asn:4>**
i915_gem.c:2217:9: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:2217:9: expected void [noderef] <asn:4>**slot
i915_gem.c:2217:9: got void **slot
i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces)
i915_gem.c:2217:9: expected void **slot
i915_gem.c:2217:9: got void [noderef] <asn:4>**

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: 96d776345277 ("drm/i915: Use a radixtree for random access to the object's backing storage")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170901171252.31025-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit c23aa71bcfe8a9d597ae5fe4c1527fac20254d0a)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# afe722be 01-Sep-2017 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: io unmap functions want __iomem

Don't cast away the __iomem from the io_mapping functions so that
sparse won't be so unhappy when we pass the pointer to the unmap
functions. Instead let's move the cast to where we actually use the
pointer.

Fixes the following sparse warnings:
i915_gem.c:1022:33: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:1022:33: expected void [noderef] <asn:2>*vaddr
i915_gem.c:1022:33: got void *[assigned] vaddr
i915_gem.c:1027:34: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:1027:34: expected void [noderef] <asn:2>*vaddr
i915_gem.c:1027:34: got void *[assigned] vaddr
i915_gem.c:1199:33: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:1199:33: expected void [noderef] <asn:2>*vaddr
i915_gem.c:1199:33: got void *[assigned] vaddr
i915_gem.c:1204:34: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:1204:34: expected void [noderef] <asn:2>*vaddr
i915_gem.c:1204:34: got void *[assigned] vaddr

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170901171252.31025-2-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# c23aa71b 01-Sep-2017 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Add __rcu to radix tree slot pointer

radix_tree_for_each_slot() wants an __rcu annotated pointer for the
slot. So let's add the annotation.

Fixes the following sparse warnings:
i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces)
i915_gem.c:2217:9: expected void **slot
i915_gem.c:2217:9: got void [noderef] <asn:4>**
i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces)
i915_gem.c:2217:9: expected void **slot
i915_gem.c:2217:9: got void [noderef] <asn:4>**
i915_gem.c:2217:9: warning: incorrect type in argument 1 (different address spaces)
i915_gem.c:2217:9: expected void [noderef] <asn:4>**slot
i915_gem.c:2217:9: got void **slot
i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces)
i915_gem.c:2217:9: expected void **slot
i915_gem.c:2217:9: got void [noderef] <asn:4>**

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: 96d776345277 ("drm/i915: Use a radixtree for random access to the object's backing storage")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170901171252.31025-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# fa3722f6 21-Aug-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Ignore duplicate VMA stored within the per-object handle LUT

By using drm_gem_flink/drm_gem_open on an object using the same fd, it
is possible for a client to create multiple handles pointing to the same
object (tied to the same contexts and VMA), as exemplified by
igt::gem_handle_to_libdrm_bo(). Since this duplication has been possible
since forever, we cannot assume that the handle:(fpriv, object) is
unique and so must handle the multiple users of a single VMA.

v2: Added commentary noise.

Testcase: igt/gem_close
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102355
Fixes: d1b48c1e7184 ("drm/i915: Replace execbuf vma ht with an idr")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170822110517.22277-3-chris@chris-wilson.co.uk
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
(cherry-picked from commit 3ffff01749928ea5ffdae2cecad561898c3b0f71)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 0168bdfc 29-Aug-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Always wake the device to flush the GTT

Since we hold the device wakeref when writing through the GTT (otherwise
the writes would fail), we presumed that before the device sleeps those
writes would naturally be flushed and that we wouldn't need our mmio
read trick. However, that presumption seems false and a sleepy bxt seems
to require us to always manually flush the GTT writes prior to direct
access.

Fixes: e2a2aa36a509 ("drm/i915: Check we have an wake device before flushing GTT writes")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170829192546.1087-1-chris@chris-wilson.co.uk
(cherry picked from commit b69a784f5e2308d6360a76eceae450e96751f3e4)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 3b24e7e8 28-Aug-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Recreate vmapping even when the object is pinned

Sometimes we know we are the only user of the bo, but since we take a
protective pin_pages early on, an attempt to change the vmap on the
object is denied because it is busy. i915_gem_object_pin_map() cannot
tell from our single pin_count if the operation is safe. Instead we must
pass that information down from the caller in the manner of
I915_MAP_OVERRIDE.

This issue has existed from the introduction of the mapping, but was
never noticed as the only place where this conflict might happen is for
cached kernel buffers (such as allocated by i915_gem_batch_pool_get()).
Until recently there was only a single user (the cmdparser) so no
conflicts ever occurred. However, we now use it to allocate batches for
different operations (using MAP_WC on !llc for writes) in addition to the
existing shadow batch (using MAP_WB for reads).

We could either keep both mappings cached, or use a different write
mechanism if we detect a MAP_WB already exists (i.e. clflush
afterwards), but as we haven't seen this issue in the wild (it requires
hitting the GPU reloc path in addition to the cmdparser) for simplicity
just allow the mappings to be recreated.

v2: Include the i915_MAP_OVERRIDE bit in the enum so the compiler knows
about all the valid values.

Fixes: 7dd4f6729f92 ("drm/i915: Async GPU relocation processing")
Testcase: igt/gem_lut_handle # byt, completely by accident
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170828104631.8606-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit a575c6761757232ea2c7dc9f370640754b90cc69)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# b69a784f 29-Aug-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Always wake the device to flush the GTT

Since we hold the device wakeref when writing through the GTT (otherwise
the writes would fail), we presumed that before the device sleeps those
writes would naturally be flushed and that we wouldn't need our mmio
read trick. However, that presumption seems false and a sleepy bxt seems
to require us to always manually flush the GTT writes prior to direct
access.

Fixes: e2a2aa36a509 ("drm/i915: Check we have an wake device before flushing GTT writes")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170829192546.1087-1-chris@chris-wilson.co.uk


# fc692bd3 25-Aug-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Discard the request queue if we fail to sleep before suspend

If we fail to clear the outstanding request queue before suspending,
mark those requests as lost.

References: https://bugs.freedesktop.org/show_bug.cgi?id=102037
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170826110935.10237-3-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


# f36325f3 25-Aug-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Clear wedged status upon resume

When we wake up from suspend, the device has been powered down and
should come back afresh. We should be able to safely remove the wedged
status from the previous session and start afresh.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170826110935.10237-2-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


# cad9946c 25-Aug-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Always sanity check engine state upon idling

When we do a locked idle we know that afterwards all requests have been
completed and the engines have been cleared of tasks. For whatever
reason, this doesn't always happen and we may go into a suspend with
ELSP still full, and this causes an issue upon resume as we get very,
very confused.

If the engines refuse to idle, mark the device as wedged. In the process
we get rid of the maybe unused open-coded version of wait_for_engines
reported by Nick Desaulniers and Matthias Kaehlcke.

v2: Suppress the -EIO before suspend, but keep it for seqno wrap.

References: https://bugs.freedesktop.org/show_bug.cgi?id=101891
References: https://bugs.freedesktop.org/show_bug.cgi?id=102456
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20170826110935.10237-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>


# a575c676 28-Aug-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Recreate vmapping even when the object is pinned

Sometimes we know we are the only user of the bo, but since we take a
protective pin_pages early on, an attempt to change the vmap on the
object is denied because it is busy. i915_gem_object_pin_map() cannot
tell from our single pin_count if the operation is safe. Instead we must
pass that information down from the caller in the manner of
I915_MAP_OVERRIDE.

This issue has existed from the introduction of the mapping, but was
never noticed as the only place where this conflict might happen is for
cached kernel buffers (such as allocated by i915_gem_batch_pool_get()).
Until recently there was only a single user (the cmdparser) so no
conflicts ever occurred. However, we now use it to allocate batches for
different operations (using MAP_WC on !llc for writes) in addition to the
existing shadow batch (using MAP_WB for reads).

We could either keep both mappings cached, or use a different write
mechanism if we detect a MAP_WB already exists (i.e. clflush
afterwards), but as we haven't seen this issue in the wild (it requires
hitting the GPU reloc path in addition to the cmdparser) for simplicity
just allow the mappings to be recreated.

v2: Include the i915_MAP_OVERRIDE bit in the enum so the compiler knows
about all the valid values.

Fixes: 7dd4f6729f92 ("drm/i915: Async GPU relocation processing")
Testcase: igt/gem_lut_handle # byt, completely by accident
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170828104631.8606-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 3ffff017 21-Aug-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Ignore duplicate VMA stored within the per-object handle LUT

By using drm_gem_flink/drm_gem_open on an object using the same fd, it
is possible for a client to create multiple handles pointing to the same
object (tied to the same contexts and VMA), as exemplified by
igt::gem_handle_to_libdrm_bo(). Since this duplication has been possible
since forever, we cannot assume that the handle:(fpriv, object) is
unique and so must handle the multiple users of a single VMA.

v2: Added commentary noise.

Testcase: igt/gem_close
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102355
Fixes: d1b48c1e7184 ("drm/i915: Replace execbuf vma ht with an idr")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170822110517.22277-3-chris@chris-wilson.co.uk
Tested-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>


# 67b48040 21-Aug-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Assert that the handle->vma lut is empty on object close

Make sure that we are not leaking an entry in the ctx->handles_lut by
asserting that the object was removed prior to being freed. This should
be enforced by all such handles being removed by i915_gem_close_object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170822110517.22277-2-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>


# 432295d7 21-Aug-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Assert the context is not closed on object-close

During the context-close, we should be decoupling all the vma from the
object so that upon object-closing we shouldn't see any vma from the
already closed contexts. So include a check upon closing the object that
the context is still open.

v2: Eek, the fpriv check is required for shared objects. Double eek, BAT
passed?

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170822110517.22277-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>


# d1b48c1e 16-Aug-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Replace execbuf vma ht with an idr

This was the competing idea long ago, but it was only with the rewrite
of the idr as an radixtree and using the radixtree directly ourselves,
along with the realisation that we can store the vma directly in the
radixtree and only need a list for the reverse mapping, that made the
patch performant enough to displace using a hashtable. Though the vma ht
is fast and doesn't require any extra allocation (as we can embed the node
inside the vma), it does require a thread for resizing and serialization
and will have the occasional slow lookup. That is hairy enough to
investigate alternatives and favour them if equivalent in peak performance.
One advantage of allocating an indirection entry is that we can support a
single shared bo between many clients, something that was done on a
first-come first-serve basis for shared GGTT vma previously. To offset
the extra allocations, we create yet another kmem_cache for them.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-5-chris@chris-wilson.co.uk


# b8050148 11-Aug-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Handle full s64 precision for wait-ioctl

The wait-ioctl is optionally supplied a timeout with nanosecond
precision in a s64 field. We use nsecs_to_jiffies64() to convert that
into the jiffies consumed by the scheduler, but internally
nsecs_to_jiffies64() does not guard against overflow (as it's purpose is
for use by the scheduler and not drivers!). So we must guard against the
overflow ourselves, and in the process note that we may then return
much earlier than the timeout selected by the user, so don't report
ETIME unless we do hit the timeout. (Woe betold us though if the user
waits for a year (32bit) and the request is still not complete!)

v2: Refine overflow detection (to not include an overffow itself)

Reported-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20170811105731.9482-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# b8f55be6 10-Aug-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Split obj->cache_coherent to track r/w

Another month, another story in the cache coherency saga. This time, we
come to the realisation that i915_gem_object_is_coherent() has been
reporting whether we can read from the target without requiring a cache
invalidate; but we were using it in places for testing whether we could
write into the object without requiring a cache flush. So split the
tracking into two, one to decide before reads, one after writes.

See commit e27ab73d17ef ("drm/i915: Mark CPU cache as dirty on every
transition for CPU writes") for the previous entry in this saga.

v2: Be verbose
v3: Remove unused function (i915_gem_object_is_coherent)
v4: Fix inverted coherency check prior to execbuf (from v2)
v5: Add comment for nasty code where we are optimising on gcc's behalf.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101109
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101555
Testcase: igt/kms_mmap_write_crc
Testcase: igt/kms_pwrite_crc
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Tested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170811111116.10373-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 8fb6a5df 26-Jul-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Call the unlocked version of i915_gem_object_get_pages()

When we hold for the lock for swapping out the shmem pages for the
physically contiguous pages, we have to call the unlocked version of
get_pages!

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101934
Fixes: 35d23516946e ("drm/i915: Make i915_gem_object_phys_attach() use obj->mm.lock more appropriately")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170726181602.23527-2-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8eeb7906 26-Jul-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move i915_gem_object_phys_attach()

Prevent a forward declaration in the next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170726181602.23527-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6ea1d55d 26-Jul-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make i915_gem_object_phys_attach() use obj->mm.lock more appropriately

Actually transferring from shmemfs to the physically contiguous set of
pages should be wholly guarded by its obj->mm.lock!

v2: Remember to free the old pages.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170726160038.29487-2-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d1d1ebf4 21-Jul-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Don't touch fence->error when resetting an innocent request

If the request has been completed before the reset took effect, we don't
need to mark it up as being a victim. Touching fence->error after the
fence has been signaled is detected by dma_fence_set_error() and
triggers a BUG:

[ 231.743133] kernel BUG at ./include/linux/dma-fence.h:434!
[ 231.743156] invalid opcode: 0000 [#1] SMP KASAN
[ 231.743172] Modules linked in: i915 drm_kms_helper drm iptable_nat nf_nat_ipv4 nf_nat x86_pkg_temp_thermal iosf_mbi i2c_algo_bit cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea fb font fbdev [last unloaded: drm]
[ 231.743221] CPU: 2 PID: 20 Comm: kworker/2:0 Tainted: G U 4.13.0-rc1+ #52
[ 231.743236] Hardware name: Hewlett-Packard HP EliteBook 8460p/161C, BIOS 68SCF Ver. F.01 03/11/2011
[ 231.743363] Workqueue: events_long i915_hangcheck_elapsed [i915]
[ 231.743382] task: ffff8801f42e9780 task.stack: ffff8801f42f8000
[ 231.743489] RIP: 0010:i915_gem_reset_engine+0x45a/0x460 [i915]
[ 231.743505] RSP: 0018:ffff8801f42ff770 EFLAGS: 00010202
[ 231.743521] RAX: 0000000000000007 RBX: ffff8801bf6b1880 RCX: ffffffffa02881a6
[ 231.743537] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff8801bf6b18c8
[ 231.743551] RBP: ffff8801f42ff7c8 R08: 0000000000000001 R09: 0000000000000000
[ 231.743566] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801edb02d00
[ 231.743581] R13: ffff8801e19d4200 R14: 000000000000001d R15: ffff8801ce2a4000
[ 231.743599] FS: 0000000000000000(0000) GS:ffff8801f5a80000(0000) knlGS:0000000000000000
[ 231.743614] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 231.743629] CR2: 00007f0ebd1add10 CR3: 0000000002621000 CR4: 00000000000406e0
[ 231.743643] Call Trace:
[ 231.743752] i915_gem_reset+0x6c/0x150 [i915]
[ 231.743853] i915_reset+0x175/0x210 [i915]
[ 231.743958] i915_reset_device+0x33b/0x350 [i915]
[ 231.744061] ? valleyview_pipestat_irq_handler+0xe0/0xe0 [i915]
[ 231.744081] ? trace_hardirqs_off_caller+0x70/0x110
[ 231.744102] ? _raw_spin_unlock_irqrestore+0x46/0x50
[ 231.744120] ? find_held_lock+0x119/0x150
[ 231.744138] ? mark_lock+0x6d/0x850
[ 231.744241] ? gen8_gt_irq_ack+0x1f0/0x1f0 [i915]
[ 231.744262] ? work_on_cpu_safe+0x60/0x60
[ 231.744284] ? rcu_read_lock_sched_held+0x57/0xa0
[ 231.744400] ? gen6_read32+0x2ba/0x320 [i915]
[ 231.744506] i915_handle_error+0x382/0x5f0 [i915]
[ 231.744611] ? gen6_rps_reset_ei+0x20/0x20 [i915]
[ 231.744630] ? vsnprintf+0x128/0x8e0
[ 231.744649] ? pointer+0x6b0/0x6b0
[ 231.744667] ? debug_check_no_locks_freed+0x1a0/0x1a0
[ 231.744688] ? scnprintf+0x92/0xe0
[ 231.744706] ? snprintf+0xb0/0xb0
[ 231.744820] hangcheck_declare_hang+0x15a/0x1a0 [i915]
[ 231.744932] ? engine_stuck+0x440/0x440 [i915]
[ 231.744951] ? rcu_read_lock_sched_held+0x57/0xa0
[ 231.745062] ? gen6_read32+0x2ba/0x320 [i915]
[ 231.745173] ? gen6_read16+0x320/0x320 [i915]
[ 231.745284] ? intel_engine_get_active_head+0x91/0x170 [i915]
[ 231.745401] i915_hangcheck_elapsed+0x3d8/0x400 [i915]
[ 231.745424] process_one_work+0x3e8/0xac0
[ 231.745444] ? pwq_dec_nr_in_flight+0x110/0x110
[ 231.745464] ? do_raw_spin_lock+0x8e/0x120
[ 231.745484] worker_thread+0x8d/0x720
[ 231.745506] kthread+0x19e/0x1f0
[ 231.745524] ? process_one_work+0xac0/0xac0
[ 231.745541] ? kthread_create_on_node+0xa0/0xa0
[ 231.745560] ret_from_fork+0x27/0x40
[ 231.745581] Code: 8b 7d c8 e8 49 0d 02 e1 49 8b 7f 38 48 8b 75 b8 48 83 c7 10 e8 b8 89 be e1 e9 95 fc ff ff 4c 89 e7 e8 4b b9 ff ff e9 30 ff ff ff <0f> 0b 0f 1f 40 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fe
[ 231.745767] RIP: i915_gem_reset_engine+0x45a/0x460 [i915] RSP: ffff8801f42ff770

At first glance this looks to be related to commit c64992e035d7
("drm/i915: Look for active requests earlier in the reset path"), but it
could easily happen before as well. On the other hand, we no longer
logged victims due to the active_request being dropped earlier.

v2: Be trickier to unwind the incomplete request as we cannot rely on
request retirement for the lockless per-engine reset.
v3: Reprobe the active request at the time of the reset.

Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Fixes: c64992e035d7 ("drm/i915: Look for active requests earlier in the reset path")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-15-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v1
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 77b25a97 21-Jul-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make i915_gem_context_mark_guilty() safe for unlocked updates

Since we make call i915_gem_context_mark_guilty() concurrently when
resetting different engines in parallel, we need to make sure that our
updates are safe for the unlocked access.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-12-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ed454f2c 21-Jul-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Clear engine irq posted following a reset

When the GPU is reset, we want to discard all pending notifications as
either we have manually completed them, or they are no longer
applicable. Make sure we do reset the engine->irq_posted prior to
re-enabling the engine (e.g. the interrupt tasklets) in
i915_gem_reset_finish_engine().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-11-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# bf2eac3b 21-Jul-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Assert that machine is wedged for nop_submit_request

We should only ever do nop_submit_request when the machine is wedged, so
assert it is so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-10-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 3d7adbbf 21-Jul-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wake up waiters after setting the WEDGED bit

After setting the WEDGED bit, make sure that we do wake up waiters as
they may not be waiting for a request completion yet, just for its
execution.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-9-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5e32d748 21-Jul-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Clear execlist port[] before updating seqno on wedging

When we wedge the device, we clear out the in-flight requests and
advance the breadcrumb to indicate they are complete. However, the
breadcrumb advance includes an assert that the engine is idle, so that
advancement needs to be the last step to ensure we pass our own sanity
checks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-7-chris@chris-wilson.co.uk
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8b5d27b9 20-Jul-2017 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Remove intel_flip_work infrastructure

This gets rid of all the interactions between the legacy flip code and
the modeset code. Yay!

This highlights an ommission in the atomic paths, where we fail to
apply a boost to the pending rendering when we miss the target vblank.
But the existing code is still dead and can be removed.

v2: Note that the boosting doesn't work in atomic (Chris).

Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170720175754.30751-7-daniel.vetter@ffwll.ch


# dbb32956 12-Jul-2017 Michal Hocko <mhocko@suse.com>

drm/i915: use __GFP_RETRY_MAYFAIL

Commit 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than
incur the oom for gfx allocations") has tried to remove disruptive OOM
killer because the userspace should be able to cope with allocation
failures.

At the time only __GFP_NORETRY could achieve that and it turned out that
this would fail the allocations just too easily. So "drm/i915: Remove
__GFP_NORETRY from our buffer allocator" removed it and hoped for a
better solution. __GFP_RETRY_MAYFAIL is that solution. It will keep
retrying the allocation until there is no more progress and we would go
OOM. Instead we fail the allocation and let the caller to deal with it.

Link: http://lkml.kernel.org/r/20170623085345.11304-6-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Alex Belits <alex.belits@cavium.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: NeilBrown <neilb@suse.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7b92c1bd 28-Jun-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid keeping waitboost active for signaling threads

Once a client has requested a waitboost, we keep that waitboost active
until all clients are no longer waiting. This is because we don't
distinguish which waiter deserves the boost. However, with the advent of
fence signaling, the signaler threads appear as waiters to the RPS
interrupt handler. So instead of using a single boolean to track when to
keep the waitboost active, use a counter of all outstanding waitboosted
requests.

At this point, I have removed all vestiges of the rate limiting on
clients. Whilst this means that compositors should remain more fluid,
it also means that boosts are more prevalent. See commit b29c19b64528
("drm/i915: Boost RPS frequency for CPU stalls") for a longer discussion
on the pros and cons of both approaches.

A drawback of this implementation is that it requires constant request
submission to keep the waitboost trimmed (as it is now cancelled when the
request is completed). This will be fine for a busy system, but near
idle the boosts may be kept for longer than desired (effectively tens of
vblanks worstcase) and there is a reliance on rc6 instead.

v2: Remove defunct rps.client_lock

Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170628123548.9236-1-chris@chris-wilson.co.uk


# 13f8458f 27-Jun-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop flushing of the object free list/worker from i915_gem_suspend

i915_gem_suspend() is called from all of our finalization paths
(suspend, hibernate, unload). i915_gem_drain_freed_objects() adds an
arbitrary delay as it uses an rcu_barrier() to ensure that there are no
more freed objects in flight, and this delay causes a large amount of
variability in suspend timings. For S3 suspend, we do not need to free
pages as doing so does not impact at all upon the system in its
suspended state, unlike S4 hibernation where we do want the hibernation
image to be as small as possible. Therefore we can forgo waiting inside
i915_gem_suspend(), so long as we ensure that we do cleanup before
unload (see i915_gem_load_cleanup()) and prefer to reap our objects
prior to hibernation (see i915_gem_freeze()).

Removing the rcu_barrier() from i915_gem_suspend() improves S3 latency
by about 30ms on Skylake (ymmv).

Reported-by: David Weinehall <david.weinehall@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170627173731.11566-1-chris@chris-wilson.co.uk
Tested-by: David Weinehall <david.weinehall@linux.intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>


# 36703e79 22-Jun-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Break modeset deadlocks on reset

Trying to do a modeset from within a reset is fraught with danger. We
can fall into a cyclic deadlock where the modeset is waiting on a
previous modeset that is waiting on a request, and since the GPU hung
that request completion is waiting on the reset. As modesetting doesn't
allow its locks to be broken and restarted, or for its *own* reset
mechanism to take over the display, we have to do something very
evil instead. If we detect that we are stuck waiting to prepare the
display reset (by using a very simple timeout), resort to cancelling all
in-flight requests and throwing the user data into /dev/null, which is
marginally better than the driver locking up and keeping that data to
itself.

This is not a fix; this is just a workaround that unbreaks machines
until we can resolve the deadlock in a way that doesn't lose data!

v2: Move the retirement from set-wegded to the i915_reset() error path,
after which we no longer any delayed worker cleanup for
i915_handle_error()
v3: C abuse for syntactic sugar
v4: Cover all waits with the timeout to catch more driver breakage

References: https://bugs.freedesktop.org/show_bug.cgi?id=99093
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170622105625.16952-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 4ee056f4 21-Jun-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Cancel pending execlist tasklet upon wedging

Highly unlikely, but if the stop_machine() did suspend the tasklet, we
want to make sure that when it wakes it finds there is nothing to do.
Otherwise, it will loudly complain that the ELSP port tracking no longer
matches the hardware, and we will be mightly confused.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170621124804.4529-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# a1ef70e1 20-Jun-2017 Michel Thierry <michel.thierry@intel.com>

drm/i915: Add support for per engine reset recovery

This change implements support for per-engine reset as an initial, less
intrusive hang recovery option to be attempted before falling back to the
legacy full GPU reset recovery mode if necessary. This is only supported
from Gen8 onwards.

Hangchecker determines which engines are hung and invokes error handler to
recover from it. Error handler schedules recovery for each of those engines
that are hung. The recovery procedure is as follows,
- identifies the request that caused the hang and it is dropped
- force engine to idle: this is done by issuing a reset request
- reset the engine
- re-init the engine to resume submissions.

If engine reset fails then we fall back to heavy weight full gpu reset
which resets all engines and reinitiazes complete state of HW and SW.

v2: Rebase.
v3: s/*engine_reset*/*reset_engine*/; freeze engine and irqs before
calling i915_gem_reset_engine (Chris).
v4: Rebase, modify i915_gem_reset_prepare to use a ring mask and
reuse the function for reset_engine.
v5: intel_reset_engine_start/cancel instead of request/unrequest_reset.
v6: Clean up reset_engine function to not require mutex, i.e. no need to call
revoke/restore_fences and _retire_requests (Chris).
v7: Remove leftovers from v5, i.e. no need to disable irq, hold
forcewake or wakeup the handoff bit (Chris).
v8: engine_retire_requests should be (and it was) static; explain that
we have to re-init the engine after reset, which is why the init_hw call
is needed; check reset-in-progress flag (Chris).
v9: Rebase, include code to pass the active request to gem_reset_engine
(as it is already done in full reset). Remove unnecessary
intel_reset_engine_start/cancel, these are executed as part of the
reset.
v10: Rebase, use the right I915_RESET_ENGINE flag.
v11: Fixup to call reset_finish_engine even on error.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-6-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-6-chris@chris-wilson.co.uk


# c64992e0 20-Jun-2017 Michel Thierry <michel.thierry@intel.com>

drm/i915: Look for active requests earlier in the reset path

And store the active request so that we only search for it once.

v2: Check for request completion inside _prepare_engine, don't use
ECANCELED, remove unnecessary null checks (Chris).

v3: Capture active requests during reset_prepare and store it the
engine hangcheck obj.

v4: Rename commit, change i915_gem_reset_request to just confirm the
active_request is still incomplete, instead of engine_stalled (Chris).

v5: With style; pass the active request to gem_reset_engine, keep single
return in reset_prepare_engine (Chris).

v6: Moved before reset-engine code appears (Chris)

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v5)
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-2-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-3-chris@chris-wilson.co.uk


# 829a0af2 19-Jun-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Group all the global context information together

Create a substruct to hold all the global context state under
drm_i915_private.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-1-chris@chris-wilson.co.uk


# ce2c5872 08-Jun-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove __GFP_NORETRY from our buffer allocator

I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
struggles with handling reclaim of our dirty buffers and relies on
reclaim via kswapd. As a result, a single pass of direct reclaim is
unreliable when i915 occupies the majority of available memory, and the
only means of effectively waiting on kswapd to amke progress is by not
setting the __GFP_NORETRY flag and lopping. That leaves us with the
dilemma of invoking the oomkiller instead of propagating the allocation
failure back to userspace where it can be handled more gracefully (one
hopes). In the future we may have __GFP_MAYFAIL to allow repeats up until
we genuinely run out of memory and the oomkiller would have been invoked.
Until then, let the oomkiller wreck havoc.

v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL
v3: Update comments that direct reclaim only appears to be ignoring our
dirty buffers!

Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Testcase: igt/gem_tiled_swapping
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Michal Hocko <mhocko@suse.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit eaf41801559a687cc7511c04dc712984765c9dd7)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# b8d5a9cc 08-Jun-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Encourage our shrinker more when our shmemfs allocations fails

Commit 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than
incur the oom for gfx allocations") made the bold decision to try and
avoid the oomkiller by reporting -ENOMEM to userspace if our allocation
failed after attempting to free enough buffer objects. In short, it
appears we were giving up too easily (even before we start wondering if
one pass of reclaim is as strong as we would like). Part of the problem
is that if we only shrink just enough pages for our expected allocation,
the likelihood of those pages becoming available to us is less than 100%
To counter-act that we ask for twice the number of pages to be made
available. Furthermore, we allow the shrinker to pull pages from the
active list in later passes.

v2: Be a little more cautious in paging out gfx buffers, and leave that
to a more balanced approach from shrink_slab(). Important when combined
with "drm/i915: Start writeback from the shrinker" as anything shrunk is
immediately swapped out and so should be more conservative.

Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-1-chris@chris-wilson.co.uk
(cherry picked from commit 4846bf0ca8cb4304dde6140eff33a92b3fe8ef24)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 7dd4f672 16-Jun-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Async GPU relocation processing

If the user requires patching of their batch or auxiliary buffers, we
currently make the alterations on the cpu. If they are active on the GPU
at the time, we wait under the struct_mutex for them to finish executing
before we rewrite the contents. This happens if shared relocation trees
are used between different contexts with separate address space (and the
buffers then have different addresses in each), the 3D state will need
to be adjusted between execution on each context. However, we don't need
to use the CPU to do the relocation patching, as we could queue commands
to the GPU to perform it and use fences to serialise the operation with
the current activity and future - so the operation on the GPU appears
just as atomic as performing it immediately. Performing the relocation
rewrites on the GPU is not free, in terms of pure throughput, the number
of relocations/s is about halved - but more importantly so is the time
under the struct_mutex.

v2: Break out the request/batch allocation for clearer error flow.
v3: A few asserts to ensure rq ordering is maintained

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 8a2421bd 16-Jun-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wait upon userptr get-user-pages within execbuffer

This simply hides the EAGAIN caused by userptr when userspace causes
resource contention. However, it is quite beneficial with highly
contended userptr users as we avoid repeating the setup costs and
kernel-user context switches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>


# 4ff4b44c 16-Jun-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Store a direct lookup from object handle to vma

The advent of full-ppgtt lead to an extra indirection between the object
and its binding. That extra indirection has a noticeable impact on how
fast we can convert from the user handles to our internal vma for
execbuffer. In order to bypass the extra indirection, we use a
resizable hashtable to jump from the object to the per-ctx vma.
rhashtable was considered but we don't need the online resizing feature
and the extra complexity proved to undermine its usefulness. Instead, we
simply reallocate the hastable on demand in a background task and
serialize it before iterating.

In non-full-ppgtt modes, multiple files and multiple contexts can share
the same vma. This leads to having multiple possible handle->vma links,
so we only use the first to establish the fast path. The majority of
buffers are not shared and so we should still be able to realise
speedups with multiple clients.

v2: Prettier names, more magic.
v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 7fc92e96 16-Jun-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Store i915_gem_object_is_coherent() as a bit next to cache-dirty

For ease of use (i.e. avoiding a few checks and function calls), store
the object's cache coherency next to the cache is dirty bit.

Specifically this patch aims to reduce the frequency of no-op calls to
i915_gem_object_clflush() to counter-act the increase of such calls for
GPU only objects in the previous patch.

v2: Replace cache_dirty & ~cache_coherent with cache_dirty &&
!cache_coherent as gcc generates much better code for the latter
(Tvrtko)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Tested-by: Dongwon Kim <dongwon.kim@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170616105455.16977-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# e27ab73d 15-Jun-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark CPU cache as dirty on every transition for CPU writes

Currently, we only mark the CPU cache as dirty if we skip a clflush.
This leads to some confusion where we have to ask if the object is in
the write domain or missed a clflush. If we always mark the cache as
dirty, this becomes a much simply question to answer.

The goal remains to do as few clflushes as required and to do them as
late as possible, in the hope of deferring the work to a kthread and not
block the caller (e.g. execbuf, flips).

v2: Always call clflush before GPU execution when the cache_dirty flag
is set. This may cause some extra work on llc systems that migrate dirty
buffers back and forth - but we do try to limit that by only setting
cache_dirty at the end of the gpu sequence.

v3: Always mark the cache as dirty upon a level change, as we need to
invalidate any stale cachelines due to external writes.

Reported-by: Dongwon Kim <dongwon.kim@intel.com>
Fixes: a6a7cc4b7db6 ("drm/i915: Always flush the dirty CPU cache when pinning the scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Tested-by: Dongwon Kim <dongwon.kim@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615123850.26843-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 0f6ab55d 08-Jun-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only restrict noreclaim in the early shrink passes

In our first pass, we do not want to use reclaim at all as we want to
solely reap the i915 buffer caches (its purgeable pages). But we don't
mind it initiates IO or pulls via the FS (but it shouldn't anyway as we
say no to reclaim!). Just drop the GFP_IO constraint for simplicity.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# eaf41801 08-Jun-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove __GFP_NORETRY from our buffer allocator

I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
struggles with handling reclaim of our dirty buffers and relies on
reclaim via kswapd. As a result, a single pass of direct reclaim is
unreliable when i915 occupies the majority of available memory, and the
only means of effectively waiting on kswapd to amke progress is by not
setting the __GFP_NORETRY flag and lopping. That leaves us with the
dilemma of invoking the oomkiller instead of propagating the allocation
failure back to userspace where it can be handled more gracefully (one
hopes). In the future we may have __GFP_MAYFAIL to allow repeats up until
we genuinely run out of memory and the oomkiller would have been invoked.
Until then, let the oomkiller wreck havoc.

v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL
v3: Update comments that direct reclaim only appears to be ignoring our
dirty buffers!

Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Testcase: igt/gem_tiled_swapping
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Michal Hocko <mhocko@suse.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 4846bf0c 08-Jun-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Encourage our shrinker more when our shmemfs allocations fails

Commit 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than
incur the oom for gfx allocations") made the bold decision to try and
avoid the oomkiller by reporting -ENOMEM to userspace if our allocation
failed after attempting to free enough buffer objects. In short, it
appears we were giving up too easily (even before we start wondering if
one pass of reclaim is as strong as we would like). Part of the problem
is that if we only shrink just enough pages for our expected allocation,
the likelihood of those pages becoming available to us is less than 100%
To counter-act that we ask for twice the number of pages to be made
available. Furthermore, we allow the shrinker to pull pages from the
active list in later passes.

v2: Be a little more cautious in paging out gfx buffers, and leave that
to a more balanced approach from shrink_slab(). Important when combined
with "drm/i915: Start writeback from the shrinker" as anything shrunk is
immediately swapped out and so should be more conservative.

Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-1-chris@chris-wilson.co.uk


# e0da1963 30-May-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Short-circuit i915_gem_wait_for_idle() if already idle

If the device is asleep (no GT wakeref), we know the GPU is already idle.
If we add an early return, we can avoid touching registers and checking
hw state outside of the assumed GT wakelock. This prevents causing such
errors whilst debugging:

[ 2613.401647] RPM wakelock ref not held during HW access
[ 2613.401684] ------------[ cut here ]------------
[ 2613.401720] WARNING: CPU: 5 PID: 7739 at drivers/gpu/drm/i915/intel_drv.h:1787 gen6_read32+0x21f/0x2b0 [i915]
[ 2613.401731] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek coretemp snd_hda_codec_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mii mei_me lpc_ich mei prime_numbers [last unloaded: i915]
[ 2613.401823] CPU: 5 PID: 7739 Comm: drv_missed_irq Tainted: G U 4.12.0-rc2-CI-CI_DRM_421+ #1
[ 2613.401825] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
[ 2613.401840] task: ffff880409e3a740 task.stack: ffffc900084dc000
[ 2613.401861] RIP: 0010:gen6_read32+0x21f/0x2b0 [i915]
[ 2613.401863] RSP: 0018:ffffc900084dfce8 EFLAGS: 00010292
[ 2613.401869] RAX: 000000000000002a RBX: ffff8804016a8000 RCX: 0000000000000006
[ 2613.401871] RDX: 0000000000000006 RSI: ffffffff81cbf2d9 RDI: ffffffff81c9e3a7
[ 2613.401874] RBP: ffffc900084dfd18 R08: ffff880409e3afc8 R09: 0000000000000000
[ 2613.401877] R10: 000000008a1c483f R11: 0000000000000000 R12: 000000000000209c
[ 2613.401879] R13: 0000000000000001 R14: ffff8804016a8000 R15: ffff8804016ac150
[ 2613.401882] FS: 00007f39ef3dd8c0(0000) GS:ffff88041fb40000(0000) knlGS:0000000000000000
[ 2613.401885] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2613.401887] CR2: 00000000023717c8 CR3: 00000002e7b34000 CR4: 00000000001406e0
[ 2613.401889] Call Trace:
[ 2613.401912] intel_engine_is_idle+0x76/0x90 [i915]
[ 2613.401931] i915_gem_wait_for_idle+0xe6/0x1e0 [i915]
[ 2613.401951] fault_irq_set+0x40/0x90 [i915]
[ 2613.401970] i915_ring_test_irq_set+0x42/0x50 [i915]
[ 2613.401976] simple_attr_write+0xc7/0xe0
[ 2613.401981] full_proxy_write+0x4f/0x70
[ 2613.401987] __vfs_write+0x23/0x120
[ 2613.401992] ? rcu_read_lock_sched_held+0x75/0x80
[ 2613.401996] ? rcu_sync_lockdep_assert+0x2a/0x50
[ 2613.401999] ? __sb_start_write+0xfa/0x1f0
[ 2613.402004] vfs_write+0xc5/0x1d0
[ 2613.402008] ? trace_hardirqs_on_caller+0xe7/0x1c0
[ 2613.402013] SyS_write+0x44/0xb0
[ 2613.402020] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 2613.402022] RIP: 0033:0x7f39eded6670
[ 2613.402025] RSP: 002b:00007fffdcdcb1a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 2613.402030] RAX: ffffffffffffffda RBX: ffffffff81470203 RCX: 00007f39eded6670
[ 2613.402033] RDX: 0000000000000001 RSI: 000000000041bc33 RDI: 0000000000000006
[ 2613.402036] RBP: ffffc900084dff88 R08: 00007f39ef3dd8c0 R09: 0000000000000001
[ 2613.402038] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000041bc33
[ 2613.402041] R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000
[ 2613.402046] ? __this_cpu_preempt_check+0x13/0x20
[ 2613.402052] Code: 01 9b fa e0 0f ff e9 28 fe ff ff 80 3d 6a dd 0e 00 00 0f 85 29 fe ff ff 48 c7 c7 48 19 29 a0 c6 05 56 dd 0e 00 01 e8 da 9a fa e0 <0f> ff e9 0f fe ff ff b9 01 00 00 00 ba 01 00 00 00 44 89 e6 48
[ 2613.402199] ---[ end trace 31f0cfa93ab632bf ]---

Fixes: 25112b64b3d2 ("drm/i915: Wait for all engines to be idle as part of i915_gem_wait_for_idle()")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170530121334.17364-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit 863e9fde1a7061dad09bb299c65bed5f1ccb44ff)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# ff8f7975 30-May-2017 Weinan Li <weinan.z.li@intel.com>

drm/i915: return the correct usable aperture size under gvt environment

I915_GEM_GET_APERTURE ioctl is used to probe aperture size from userspace.
In gvt environment, each vm only use the ballooned part of aperture, so we
should return the correct available aperture size exclude the reserved part
by balloon.

v2: add 'reserved' in struct i915_address_space to record the reserved size
in ggtt (Chris)

v3: remain aper_size as total, adjust aper_available_size exclude reserved
and pinned. UMD driver need to adjust the max allocation size according to
the available aperture size but not total size. KMD return the correct
usable aperture size any time (Chris, Joonas)

v4: decrease reserved in deballoon (Joonas)

v5: add onion teardown in balloon, add vgt_deballoon_space (Joonas)

v6: change title name (Zhenyu)

v7: code style refine (Joonas)

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1496198152-14175-1-git-send-email-weinan.z.li@intel.com
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 863e9fde 30-May-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Short-circuit i915_gem_wait_for_idle() if already idle

If the device is asleep (no GT wakeref), we know the GPU is already idle.
If we add an early return, we can avoid touching registers and checking
hw state outside of the assumed GT wakelock. This prevents causing such
errors whilst debugging:

[ 2613.401647] RPM wakelock ref not held during HW access
[ 2613.401684] ------------[ cut here ]------------
[ 2613.401720] WARNING: CPU: 5 PID: 7739 at drivers/gpu/drm/i915/intel_drv.h:1787 gen6_read32+0x21f/0x2b0 [i915]
[ 2613.401731] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek coretemp snd_hda_codec_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mii mei_me lpc_ich mei prime_numbers [last unloaded: i915]
[ 2613.401823] CPU: 5 PID: 7739 Comm: drv_missed_irq Tainted: G U 4.12.0-rc2-CI-CI_DRM_421+ #1
[ 2613.401825] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016
[ 2613.401840] task: ffff880409e3a740 task.stack: ffffc900084dc000
[ 2613.401861] RIP: 0010:gen6_read32+0x21f/0x2b0 [i915]
[ 2613.401863] RSP: 0018:ffffc900084dfce8 EFLAGS: 00010292
[ 2613.401869] RAX: 000000000000002a RBX: ffff8804016a8000 RCX: 0000000000000006
[ 2613.401871] RDX: 0000000000000006 RSI: ffffffff81cbf2d9 RDI: ffffffff81c9e3a7
[ 2613.401874] RBP: ffffc900084dfd18 R08: ffff880409e3afc8 R09: 0000000000000000
[ 2613.401877] R10: 000000008a1c483f R11: 0000000000000000 R12: 000000000000209c
[ 2613.401879] R13: 0000000000000001 R14: ffff8804016a8000 R15: ffff8804016ac150
[ 2613.401882] FS: 00007f39ef3dd8c0(0000) GS:ffff88041fb40000(0000) knlGS:0000000000000000
[ 2613.401885] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2613.401887] CR2: 00000000023717c8 CR3: 00000002e7b34000 CR4: 00000000001406e0
[ 2613.401889] Call Trace:
[ 2613.401912] intel_engine_is_idle+0x76/0x90 [i915]
[ 2613.401931] i915_gem_wait_for_idle+0xe6/0x1e0 [i915]
[ 2613.401951] fault_irq_set+0x40/0x90 [i915]
[ 2613.401970] i915_ring_test_irq_set+0x42/0x50 [i915]
[ 2613.401976] simple_attr_write+0xc7/0xe0
[ 2613.401981] full_proxy_write+0x4f/0x70
[ 2613.401987] __vfs_write+0x23/0x120
[ 2613.401992] ? rcu_read_lock_sched_held+0x75/0x80
[ 2613.401996] ? rcu_sync_lockdep_assert+0x2a/0x50
[ 2613.401999] ? __sb_start_write+0xfa/0x1f0
[ 2613.402004] vfs_write+0xc5/0x1d0
[ 2613.402008] ? trace_hardirqs_on_caller+0xe7/0x1c0
[ 2613.402013] SyS_write+0x44/0xb0
[ 2613.402020] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 2613.402022] RIP: 0033:0x7f39eded6670
[ 2613.402025] RSP: 002b:00007fffdcdcb1a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 2613.402030] RAX: ffffffffffffffda RBX: ffffffff81470203 RCX: 00007f39eded6670
[ 2613.402033] RDX: 0000000000000001 RSI: 000000000041bc33 RDI: 0000000000000006
[ 2613.402036] RBP: ffffc900084dff88 R08: 00007f39ef3dd8c0 R09: 0000000000000001
[ 2613.402038] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000041bc33
[ 2613.402041] R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000
[ 2613.402046] ? __this_cpu_preempt_check+0x13/0x20
[ 2613.402052] Code: 01 9b fa e0 0f ff e9 28 fe ff ff 80 3d 6a dd 0e 00 00 0f 85 29 fe ff ff 48 c7 c7 48 19 29 a0 c6 05 56 dd 0e 00 01 e8 da 9a fa e0 <0f> ff e9 0f fe ff ff b9 01 00 00 00 ba 01 00 00 00 44 89 e6 48
[ 2613.402199] ---[ end trace 31f0cfa93ab632bf ]---

Fixes: 25112b64b3d2 ("drm/i915: Wait for all engines to be idle as part of i915_gem_wait_for_idle()")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170530121334.17364-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 80debff8 25-May-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Consolidate #ifdef CONFIG_INTEL_IOMMU

We depend on intel_iommu_gfx_mapped for various workarounds, but that is
only available under an #ifdef CONFIG_INTEL_IOMMU. Refactor all the
cut-and-paste ifdefs to a common routine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170525121612.2190-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 2098105e 17-May-2017 Michal Hocko <mhocko@kernel.org>

drm: drop drm_[cm]alloc* helpers

Now that drm_[cm]alloc* helpers are simple one line wrappers around
kvmalloc_array and drm_free_large is just kvfree alias we can drop
them and replace by their native forms.

This shouldn't introduce any functional change.

Changes since v1
- fix typo in drivers/gpu//drm/etnaviv/etnaviv_gem.c - noticed by 0day
build robot

Suggested-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Michal Hocko <mhocko@suse.com>drm: drop drm_[cm]alloc* helpers
[danvet: Fixup vgem which grew another user very recently.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517122312.GK18247@dhcp22.suse.cz


# c5cf9a91 17-May-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Create a kmem_cache to allocate struct i915_priolist from

The i915_priolist are allocated within an atomic context on a path where
we wish to minimise latency. If we use a dedicated kmem_cache, we have
the advantage of a local freelist from which to service new requests
that should keep the latency impact of an allocation small. Though
currently we expect the majority of requests to be at default priority
(and so hit the preallocate priolist), once userspace starts using
priorities they are likely to use many fine grained policies improving
the utilisation of a private slab.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-9-chris@chris-wilson.co.uk


# 6c067579 17-May-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Split execlist priority queue into rbtree + linked list

All the requests at the same priority are executed in FIFO order. They
do not need to be stored in the rbtree themselves, as they are a simple
list within a level. If we move the requests at one priority into a list,
we can then reduce the rbtree to the set of priorities. This should keep
the height of the rbtree small, as the number of active priorities can not
exceed the number of active requests and should be typically only a few.

Currently, we have ~2k possible different priority levels, that may
increase to allow even more fine grained selection. Allocating those in
advance seems a waste (and may be impossible), so we opt for allocating
upon first use, and freeing after its requests are depleted. To avoid
the possibility of an allocation failure causing us to lose a request,
we preallocate the default priority (0) and bump any request to that
priority if we fail to allocate it the appropriate plist. Having a
request (that is ready to run, so not leading to corruption) execute
out-of-order is better than leaking the request (and its dependency
tree) entirely.

There should be a benefit to reducing execlists_dequeue() to principally
using a simple list (and reducing the frequency of both rbtree iteration
and balancing on erase) but for typical workloads, request coalescing
should be small enough that we don't notice any change. The main gain is
from improving PI calls to schedule, and the explicit list within a
level should make request unwinding simpler (we just need to insert at
the head of the list rather than the tail and not have to make the
rbtree search more complicated).

v2: Avoid use-after-free when deleting a depleted priolist

v3: Michał found the solution to handling the allocation failure
gracefully. If we disable all priority scheduling following the
allocation failure, those requests will be executed in fifo and we will
ensure that this request and its dependencies are in strict fifo (even
when it doesn't realise it is only a single list). Normal scheduling is
restored once we know the device is idle, until the next failure!
Suggested-by: Michał Wajdeczko <michal.wajdeczko@intel.com>

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk


# 77f0d0e9 17-May-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/execlists: Pack the count into the low bits of the port.request

add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187)
function old new delta
execlists_submit_ports 262 471 +209
port_assign.isra - 136 +136
capture 6344 6359 +15
reset_common_ring 438 452 +14
execlists_submit_request 228 238 +10
gen8_init_common_ring 334 341 +7
intel_engine_is_idle 106 105 -1
i915_engine_info 2314 2290 -24
__i915_gem_set_wedged_BKL 485 411 -74
intel_lrc_irq_handler 1789 1604 -185
execlists_update_context 294 - -294

The most important change there is the improve to the
intel_lrc_irq_handler and excclist_submit_ports (net improvement since
execlists_update_context is now inlined).

v2: Use the port_api() for guc as well (even though currently we do not
pack any counters in there, yet) and hide all port->request_count inside
the helpers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-5-chris@chris-wilson.co.uk


# 0ce81788 17-May-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Redefine ptr_pack_bits() and friends

Rebrand the current (pointer | bits) pack/unpack utility macros as
explicit bit twiddling for PAGE_SIZE so that we can use the more
flexible underlying macros for different bits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-4-chris@chris-wilson.co.uk


# 991bfc64 17-May-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make ptr_unpack_bits() more function-like

ptr_unpack_bits() is a function-like macro, as such it is meant to be
replaceable by a function. In this case, we should be passing in the
out-param as a pointer.

Bizarrely this does affect code generation:

function old new delta
i915_gem_object_pin_map 409 389 -20

An improvement(?) in this case, but one can't help wonder what
strict-aliasing optimisations we are preventing.

The generated code looks identical in using ptr_unpack_bits (no extra
motions to stack, the pointer and bits appear to be kept in registers),
the difference appears to be code ordering and with a reorder it is able
to use smaller forward jumps.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-3-chris@chris-wilson.co.uk


# 47979480 03-May-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Squash repeated awaits on the same fence

Track the latest fence waited upon on each context, and only add a new
asynchronous wait if the new fence is more recent than the recorded
fence for that context. This requires us to filter out unordered
timelines, which are noted by DMA_FENCE_NO_CONTEXT. However, in the
absence of a universal identifier, we have to use our own
i915->mm.unordered_timeline token.

v2: Throw around the debug crutches
v3: Inline the likely case of the pre-allocation cache being full.
v4: Drop the pre-allocation support, we can lose the most recent fence
in case of allocation failure -- it just means we may emit more awaits
than strictly necessary but will not break.
v5: Trim allocation size for leaf nodes, they only need an array of u32
not pointers.
v6: Create mock_timeline to tidy selftest writing
v7: s/intel_timeline_sync_get/intel_timeline_sync_is_later/ (Tvrtko)
v8: Prune the stale sync points when we idle.
v9: Include a small benchmark in the kselftests
v10: Separate the idr implementation into its own compartment. (Tvrkto)
v11: Refactor igt_sync kselftests to avoid deep nesting (Tvrkto)
v12: __sync_leaf_idx() to assert that p->height is 0 when checking leaves
v13: kselftests to investigate struct i915_syncmap itself (Tvrtko)
v14: Foray into ascii art graphs
v15: Take into account that the random lookup/insert does 2 prng calls,
not 1, when benchmarking, and use for_each_set_bit() (Tvrtko)
v16: Improved ascii art

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170503093924.5320-4-chris@chris-wilson.co.uk


# 94312828 03-May-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark up clflushes as belonging to an unordered timeline

2 clflushes on two different objects are not ordered, and so do not
belong to the same timeline (context). Either we use a unique context
for each, or we reserve a special global context to mean unordered.
Ideally, we would reserve 0 to mean unordered (DMA_FENCE_NO_CONTEXT) to
have the same semantics everywhere.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170503093924.5320-1-chris@chris-wilson.co.uk


# ea117b8d 28-Apr-2017 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Reset ILK during GEM sanitization

ILK should survive a reset without display corruption.

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# f2e4d76e 28-Apr-2017 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Eliminate HAS_HW_CONTEXTS

HAS_HW_CONTEXTS is misleading condition for GPU reset and CCID,
replace it with Gen specific (to be updated in next patches).

HAS_HW_CONTEXTS in i915_l3_write is bogus because each HAS_L3_DPF
match also has .has_hw_contexts = 1 set.

This leads to us being able to get rid of the property completely.

v2:
- Keep the checks at Gen6 for no functional change (Ville)

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>


# bdb57b8d 05-Apr-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use the right mapping_gfp_mask for final shmem allocation

Many sightings report the greater prevalence of allocation failures.
This is all due to the incorrect use of mapping_gfp_constraint(), so
remove it in favour of just querying the mapping_gfp_mask() which are
the exact gfp_t we wanted in the first place.

We still do expect a higher chance of reporting ENOMEM, as that is the
intention of using __GFP_NORETRY -- to fail rather than oom after having
reclaimed from our bo caches, and having done a direct|kswapd reclaim
pass.

Reported-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100594
Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170405221514.23251-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit b268d9fe0f10544f5f7a1b7015e2b97075e6215d)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 5f0d5a3a 18-Jan-2017 Paul E. McKenney <paulmck@kernel.org>

mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU

A group of Linux kernel hackers reported chasing a bug that resulted
from their assumption that SLAB_DESTROY_BY_RCU provided an existence
guarantee, that is, that no block from such a slab would be reallocated
during an RCU read-side critical section. Of course, that is not the
case. Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
slab of blocks.

However, there is a phrase for this, namely "type safety". This commit
therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
to avoid future instances of this sort of confusion.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
[ paulmck: Add comments mentioning the old name, as requested by Eric
Dumazet, in order to help people familiar with the old name find
the new one. ]
Acked-by: David Rientjes <rientjes@google.com>


# e22d8e3c 11-Apr-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Treat WC a separate cache domain

When discussing a new WC mmap, we based the interface upon the
assumption that GTT was fully coherent. How naive! Commits 3b5724d702ef
("drm/i915: Wait for writes through the GTT to land before reading
back") and ed4596ea992d ("drm/i915/guc: WA to address the Ringbuffer
coherency issue") demonstrate that writes through the GTT are indeed
delayed and may be overtaken by direct WC access. To be safe, if
userspace is mixing WC mmaps with other potential GTT access (pwrite,
GTT mmaps) it should use set_domain(WC).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96563
Testcase: igt/gem_pwrite/small-gtt*
Testcase: igt/drv_selftest/coherency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170412110111.26626-2-chris@chris-wilson.co.uk


# ef74921b 11-Apr-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Combine write_domain flushes to a single function

In the next patch, we will introduce a new cache domain for
differentiating between GTT access and direct WC access. This will
require us to include WC in our write_domain flushes. Rather than
duplicate a third function, combine the existing two into one and
flushing WC writes will then be automatically handled as well.

v2: Be smarter and clearer by passing in the write domains to flush (Joonas)
v3: One missed ~ in v2 conversion

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170412110111.26626-1-chris@chris-wilson.co.uk


# 1d39f281 11-Apr-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename intel_engine_cs.exec_id to uabi_id

We want to refer to the index of the engine consistently throughout the
userspace ABI. We already have such an index through the execbuffer
engine specifier, that needs to be able to refer to each engine
specifically, so rename it the index to uabi_id to reflect its
generality beyond execbuf.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170411124306.15448-1-chris@chris-wilson.co.uk


# 63987bfe 05-Apr-2017 Sagar Arun Kamble <sagar.a.kamble@intel.com>

drm/i915: Suspend GuC prior to GPU Reset during GEM suspend

i915 is currently doing a full GPU reset at the end of
i915_gem_suspend() followed by GuC suspend in i915_drm_suspend(). This
GPU reset clobbers the GuC, causing the suspend request to then fail,
leaving the GuC in an undefined state. We need to tell the GuC to
suspend before we do the direct intel_gpu_reset().

v2: Commit message update. (Chris, Daniele)

Fixes: 1c777c5d1dcd ("drm/i915/hsw: Fix GPU hang during resume from S3-devices state")
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1491387710-20553-1-git-send-email-sagar.a.kamble@intel.com
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit fd08923384385400101c71ac0d21d37d6b23b00d)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# f2be9d68 07-Apr-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Insert cond_resched() into i915_gem_free_objects

As we may have very many objects to free, check to see if the task needs
to be rescheduled whilst freeing them.

Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170407102552.5781-4-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 5ad08be7 07-Apr-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Break up long runs of freeing objects

Before freeing the next batch of objects from the worker, check if the
worker's timeslice has expired and if so, defer the next batch to the
next invocation of the worker.

Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170407102552.5781-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# e92075ff 07-Apr-2017 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Simplify shrinker locking

By using the same structure for both interruptible and
uninterruptible locking in shrinker code, combined with the
information that mm.interruptible is only being written to, the
code can be greatly simplified.

Also removing the i915_gem_ prefix from the locking functions so
that nobody in their wildest dreams considers exporting them.

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1491562175-27680-1-git-send-email-joonas.lahtinen@linux.intel.com


# 17b93c40 07-Apr-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drain any freed objects prior to hibernation

As we call into the shrinker during freeze, we may have freed more
objects since we idled during i915_gem_suspend. Make sure we flush the
i915_gem_free_objects worker prior to saving the unwanted pages into the
hibernation image.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170407102552.5781-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# d0aa301a 07-Apr-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: The shrinker already acquires struct_mutex, so call it unlocked

The shrinker is prepared to be called unlocked (and at other times with
struct_mutex held for DIRECT_RECLAIM) so we can skip acquiring the
struct_mutex prior to calling the shrinker during freeze. This improves
our ability to shrink as we can be more aggressive when we know the
caller isn't holding struct_mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170407102552.5781-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# b268d9fe 05-Apr-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use the right mapping_gfp_mask for final shmem allocation

Many sightings report the greater prevalence of allocation failures.
This is all due to the incorrect use of mapping_gfp_constraint(), so
remove it in favour of just querying the mapping_gfp_mask() which are
the exact gfp_t we wanted in the first place.

We still do expect a higher chance of reporting ENOMEM, as that is the
intention of using __GFP_NORETRY -- to fail rather than oom after having
reclaimed from our bo caches, and having done a direct|kswapd reclaim
pass.

Reported-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100594
Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170405221514.23251-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# fd089233 05-Apr-2017 Sagar Arun Kamble <sagar.a.kamble@intel.com>

drm/i915: Suspend GuC prior to GPU Reset during GEM suspend

i915 is currently doing a full GPU reset at the end of
i915_gem_suspend() followed by GuC suspend in i915_drm_suspend(). This
GPU reset clobbers the GuC, causing the suspend request to then fail,
leaving the GuC in an undefined state. We need to tell the GuC to
suspend before we do the direct intel_gpu_reset().

v2: Commit message update. (Chris, Daniele)

Fixes: 1c777c5d1dcd ("drm/i915/hsw: Fix GPU hang during resume from S3-devices state")
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1491387710-20553-1-git-send-email-sagar.a.kamble@intel.com
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 7a3ee5de 30-Mar-2017 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Remove user-triggerable WARN from i915_gem_object_create

Since this can be triggered by simply attempting a huge object,
a WARN_ON is not appropriate.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330163130.24141-1-tvrtko.ursulin@linux.intel.com


# 25112b64 30-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wait for all engines to be idle as part of i915_gem_wait_for_idle()

Make i915_gem_wait_for_idle() be a little heavier in order to try and
guarantee that the GPU is indeed idle (by checking each engine
individually is idle, i.e. all writes are complete and the rings
stopped) after waiting for in-flight requests to be completed.

v2: And return the final error.
v3: Break the wait_for() out from under the WARN -- the macro expansion
is hideous and unreadable in the warning message
v4: If wait_for_engine() fails the result is catastrophic, mark the
device as wedged and wait for the repair team.

References: https://bugs.freedesktop.org/show_bug.cgi?id=98836
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-4-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 72022a70 30-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move retire-requests into i915_gem_wait_for_idle()

As we now distinguish everywhere that can call
i915_gem_retire_requests() following a successful wait_for_idle, we can
remove the duplication by moving that call into i915_gem_wait_for_idle()
itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 8490ae20 30-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Suppress busy status for engines if wedged

If the driver is wedged, HW state may be very inconsistent and
report that it is still busy, even though we have stopped using it. This
can lead to a double *ERROR* rather than a graceful cleanup after
wedging.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-2-chris@chris-wilson.co.uk


# 2c170af7 30-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Do request retirement before marking engines as wedged

As we declare an engine as wedged, we mark all of its active requests as
in error. However, we don't want to mark successfully completed requests
as in error, which requires us to retire those requests first.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-1-chris@chris-wilson.co.uk


# b8991403 28-Mar-2017 Oscar Mateo <oscar.mateo@intel.com>

drm/i915/guc: Take enable_guc_loading check out of GEM core code

The should happen as soon as possible, but always within the logic that
depends on it (and not interrupting the top-level driver control flow).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1490720027-23234-1-git-send-email-oscar.mateo@intel.com


# e2a2aa36 23-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Check we have an wake device before flushing GTT writes

We can assume that if the device is asleep then all pending GTT writes
will have been posted, and so we can defer the flush from
i915_gem_object_flush_gtt_write_domain()

[ 1957.462568] WARNING: CPU: 0 PID: 6132 at drivers/gpu/drm/i915/intel_drv.h:1742 fwtable_read32+0x123/0x150 [i915]
[ 1957.462582] RPM wakelock ref not held during HW access
[ 1957.462583] Modules linked in: i915 intel_gtt drm_kms_helper prime_numbers
[ 1957.462607] CPU: 0 PID: 6132 Comm: gem_concurrent_ Tainted: G U 4.11.0-rc1+ #464
[ 1957.462619] Hardware name: / , BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
[ 1957.462630] Call Trace:
[ 1957.462646] dump_stack+0x4d/0x6f
[ 1957.462657] __warn+0xc1/0xe0
[ 1957.462667] warn_slowpath_fmt+0x4a/0x50
[ 1957.462709] fwtable_read32+0x123/0x150 [i915]
[ 1957.462750] i915_gem_object_flush_gtt_write_domain+0x43/0x70 [i915]
[ 1957.462791] i915_gem_object_set_to_cpu_domain+0x46/0xa0 [i915]
[ 1957.462831] i915_gem_set_domain_ioctl+0x15d/0x220 [i915]
[ 1957.462843] drm_ioctl+0x1d7/0x440
[ 1957.462885] ? i915_gem_obj_prepare_shmem_write+0x1d0/0x1d0 [i915]
[ 1957.462896] ? pick_next_task_fair+0x436/0x440
[ 1957.462906] ? mntput+0x1f/0x30
[ 1957.462915] do_vfs_ioctl+0x8f/0x5c0
[ 1957.462925] ? __schedule+0x16f/0x5f0
[ 1957.462935] ? ____fput+0x9/0x10
[ 1957.462943] SyS_ioctl+0x3c/0x70
[ 1957.462952] entry_SYSCALL_64_fastpath+0x17/0x98
[ 1957.462961] RIP: 0033:0x7fc542179ca7
[ 1957.462968] RSP: 002b:00007ffeef12ff98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1957.462982] RAX: ffffffffffffffda RBX: 00007ffeef1301d0 RCX: 00007fc542179ca7
[ 1957.462990] RDX: 00007ffeef12ffd0 RSI: 00000000400c645f RDI: 0000000000000003
[ 1957.462999] RBP: 0000000000000003 R08: 000055f433bc7c40 R09: 000000000000002c
[ 1957.463006] R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000018
[ 1957.463015] R13: 000055f432c89d20 R14: 000055f432c87690 R15: 0000000000000000

Fixes: 3b5724d702ef ("drm/i915: Wait for writes through the GTT to land before reading back")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170323150053.28582-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 3950bf3d 22-Mar-2017 Oscar Mateo <oscar.mateo@intel.com>

drm/i915/guc: Add onion teardown to the GuC setup

Starting with intel_guc_loader, down to intel_guc_submission
and finally to intel_guc_log.

v2:
- Null execbuf client outside guc_client_free (Daniele)
- Assert if things try to get allocated twice (Daniele/Joonas)
- Null guc->log.buf_addr when destroyed (Daniele)
- Newline between returning success and error labels (Joonas)
- Remove some unnecessary comments (Joonas)
- Keep guc_log_create_extras naming convention (Joonas)
- Helper function guc_log_has_extras (Joonas)
- No need for separate relay_channel create/destroy. It's just another extra.
- No need to nullify guc->log.flush_wq when destroyed (Joonas)
- Hoist the check for has_extras out of guc_log_create_extras (Joonas)
- Try to do i915_guc_log_register/unregister calls (kind of) symmetric (Daniele)
- Make sure initel_guc_fini is not called before init is ever called (Daniele)

v3:
- Remove unnecessary parenthesis (Joonas)
- Check for logs enabled on debugfs registration
- Rebase on top of Tvrtko's "Fix request re-submission after reset"

v4:
- Rebased
- Comment around enabling/disabling interrupts inside GuC logging (Joonas)

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 40149f00 22-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Actually pass the reclaim gfp_t along to shmemfs!

Words cannot describe the embarrassment at creating a new gfp_t relaim to
only prevent the oomkiller but allow direct|kswapd reclaim, and then not
use it in the shmem_read_mapping_page_gfp().

Fixes: 24f8e00a8a2e ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170322223447.7493-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 24f8e00a 22-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations

Since gfx allocations tend to be large, unmovable and disposable, report
the allocation failure back to userspace as an ENOMEM rather than incur
the oomkiller. We have already tried to make room by purging our own
cached gfx objects, and the oomkiller doesn't attribute ownership of gfx
objects so will likely pick the wrong candidate. Instead, let userspace
see the ENOMEM.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170322110521.29930-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 54ec12af 18-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Skip force-wake for uncached mmio flush of GGTT writes

The trick of using an uncached mmio read to ensure that the GGTT writes
are flushed does not require us to do the forcewake dance, so avoid it
in the hope of reducing the frequency that we do keep the device forced
awake.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170318104257.694-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


# be062fa4 17-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Initialise i915_gem_object_create_from_data() directly

Use pagecache_write to avoid shmemfs clearing the pages prior to us
immediately overwriting them with our data.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170317194648.12468-2-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>


# ce8ff099 17-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: i915_gem_object_create_from_data() doesn't require struct_mutex

Both object creation and backing storage page allocation do not require
struct_mutex, so do not require the caller to take it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170317194648.12468-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>


# 2e8f9d32 16-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Restore engine->submit_request before unwedging

When we wedge the device, we override engine->submit_request with a nop
to ensure that all in-flight requests are marked in error. However, igt
would like to unwedge the device to test -EIO handling. This requires us
to flush those in-flight requests and restore the original
engine->submit_request.

v2: Use a vfunc to unify enabling request submission to engines
v3: Split new vfunc to a separate patch.
v4: Make the wait interruptible -- the third party fences we wait upon
may be indefinitely broken, so allow the reset to be aborted.

Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests")
Testcase: igt/gem_eio
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v3
Link: http://patchwork.freedesktop.org/patch/msgid/20170316171305.12972-3-chris@chris-wilson.co.uk


# 8c185eca 16-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Split I915_RESET_IN_PROGRESS into two flags

I915_RESET_IN_PROGRESS is being used for both signaling the requirement
to i915_mutex_lock_interruptible() to avoid taking the struct_mutex and
to instruct a waiter (already holding the struct_mutex) to perform the
reset. To allow for a little more coordination, split these two meaning
into a couple of distinct flags. I915_RESET_BACKOFF tells
i915_mutex_lock_interruptible() not to acquire the mutex and
I915_RESET_HANDOFF tells the waiter to call i915_reset().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170316171305.12972-1-chris@chris-wilson.co.uk


# 6cd5a72c 14-Mar-2017 Arkadiusz Hiler <arkadiusz.hiler@intel.com>

drm/i915/guc: Simplify intel_guc_init_hw()

Current version of intel_guc_init_hw() does a lot:
- cares about submission
- loads huc
- implement WA

This change offloads some of the logic to intel_uc_init_hw(), which now
cares about the above.

v2: rename guc_hw_reset and fix typo in define name (M. Wajdeczko)
v3: rename once again
v4: remove spurious comments and add some style (J. Lahtinen)
v5: flow changes, got rid of dead checks (M. Wajdeczko)
v6: rebase
v7: rebase & onion teardown (J. Lahtinen)

Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 882d1db0 14-Mar-2017 Arkadiusz Hiler <arkadiusz.hiler@intel.com>

drm/i915/uc: Rename intel_?uc_{setup, load}() to _init_hw()

GuC historically has two "startup" functions called _init() and _setup()

Then HuC came with it's _init() and _load().

This commit renames intel_guc_setup() and intel_huc_load() to
*uc_init_hw() as they called from the i915_gem_init_hw().

The aim is to be consistent in that entry points called during
particular driver init phases (e.g. init_hw) are all suffixed by that
phase. When reading the leaf functions, it should be clear at what stage
during the driver load it is called and therefore what operations are
legal at that point.

Also, since the functions start with intel_guc and intel_huc they take
appropiate structure.

v2: commit message update (Chris Wilson)
v3: change taken parameters to be more "semantic" (M. Wajdeczko)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 0f5418e5 13-Mar-2017 Kenneth Graunke <kenneth@whitecape.org>

drm/i915: Drop support for I915_EXEC_CONSTANTS_* execbuf parameters.

This patch makes the I915_PARAM_HAS_EXEC_CONSTANTS getparam return 0
(indicating the optional feature is not supported), and makes execbuf
always return -EINVAL if the flags are used.

Apparently, no userspace ever shipped which used this optional feature:
I checked the git history of Mesa, xf86-video-intel, libva, and Beignet,
and there were zero commits showing a use of these flags. Kernel commit
72bfa19c8deb4 apparently introduced the feature prematurely. According
to Chris, the intention was to use this in cairo-drm, but "the use was
broken for gen6", so I don't think it ever happened.

'relative_constants_mode' has always been tracked per-device, but this
has actually been wrong ever since hardware contexts were introduced, as
the INSTPM register is saved (and automatically restored) as part of the
render ring context. The software per-device value could therefore get
out of sync with the hardware per-context value. This meant that using
them is actually unsafe: a client which tried to use them could damage
the state of other clients, causing the GPU to interpret their BO
offsets as absolute pointers, leading to bogus memory reads.

These flags were also never ported to execlist mode, making them no-ops
on Gen9+ (which requires execlists), and Gen8 in the default mode.

On Gen8+, userspace can write these registers directly, achieving the
same effect. On Gen6-7.5, it likely makes sense to extend the command
parser to support them. I don't think anyone wants this on Gen4-5.

Based on a patch by Dave Gordon.

v3: Return -ENODEV for the getparam, as this is what we do for other
obsolete features. Suggested by Chris Wilson.

Cc: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92448
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215093446.21291-1-kenneth@whitecape.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170313170433.26843-1-chris@chris-wilson.co.uk
(cherry picked from commit ef0f411f51475f4eebf9fc1b19a85be698af19ff)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 4565bf58 13-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Disable engine->irq_tasklet around resets

When we restart the engines, and we have active requests, a request on
the first engine may complete and queue a request to the second engine
before we try to restart the second engine. That queueing of the
request may race with the engine to restart, and so may corrupt the
current state. Disabling the engine->irq_tasklet prevents the two paths
from writing into ELSP simultaneously (and modifyin the execlists_port[]
at the same time).

Include fixup 1d309634bcf4 ("drm/i915: Kill the tasklet then disable")

Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests")
Testcase: igt/gem_exec_fence/await-hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-3-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/20170313165958.13970-2-chris@chris-wilson.co.uk
(cherry picked from commit 1f7b847d72c3583df5048d83bd945d0c2c524c28)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# da9a796f 13-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Split GEM resetting into 3 phases

Currently we do a reset prepare/finish around the call to reset the GPU,
but it looks like we need a later stage after the hw has been
reinitialised to allow GEM to restart itself. Start by splitting the 2
GEM phases into 3:

prepare - before the reset, check if GEM recovered, then stop GEM

reset - after the reset, update GEM bookkeeping

finish - after the re-initialisation following the reset, restart GEM

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-2-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/20170313165958.13970-1-chris@chris-wilson.co.uk
(cherry picked from commit d80270931314a88d79d9bd5e0a5df93c12196375)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 7f5f95d8 09-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move whole object to CPU domain for coherent shmem access

If the object is coherent, we can simply update the cache domain on the
whole object rather than calculate the before/after clflushes. The
advantage is that we then get correct tracking of ellided flushes when
changing coherency later.

Testcase: igt/gem_pwrite_snooped
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170310000942.11661-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 4e6fdafa 06-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use pagecache write to prepopulate shmemfs from pwrite-ioctl

Before we instantiate/pin the backing store for our use, we
can prepopulate the shmemfs filp efficiently using a write into the
pagecache. We avoid the penalty of instantiating all the pages, important
if the user is just writing to a few and never uses the object on the GPU,
and using a direct write into shmemfs allows it to avoid the cost of
retrieving a page (mostly the clear-before-use, but in theory we could
curtail swapin) before it is overwritten.

This can be extended later to provide additional specialisation for
other backends (other than shmemfs). For now it provides a defense
against very large write-only allocations from exhausting all of system
memory.

v2: Smelling fixes.

Fixes: fe115628d567 ("drm/i915: Implement pwrite without struct-mutex")
References: https://bugs.freedesktop.org/show_bug.cgi?id=99107
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170307120338.7277-2-chris@chris-wilson.co.uk
(cherry picked from commit 7c55e2c5772dcf3cbacd0fa2bcfeefae416b73f7)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 0d9dc306 07-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Store a permanent error in obj->mm.pages

Once the object has been truncated, it is unrecoverable. To facilitate
detection of this state store the error in obj->mm.pages.

This is required for the next patch which should be applied to v4.10
(via stable), so we also need to mark this patch for backporting. In
that regard, let's consider this to be a fix/improvement too.

v2: Avoid dereferencing the ERR_PTR when freeing the object.

Fixes: 1233e2db199d ("drm/i915: Move object backing storage manipulation to its own locking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Link: http://patchwork.freedesktop.org/patch/msgid/20170307132031.32461-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit 4e5462ee843c883790e9609cf560d88960ea4227)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 89cf83d4 15-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Squelch any ktime/jiffie rounding errors for wait-ioctl

We wait upon jiffies, but report the time elapsed using a
high-resolution timer. This discrepancy can lead to us timing out the
wait prior to us reporting the elapsed time as complete.

This restores the squelching lost in commit e95433c73a11 ("drm/i915:
Rearrange i915_wait_request() accounting with callers").

Fixes: e95433c73a11 ("drm/i915: Rearrange i915_wait_request() accounting with callers")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170216125441.30923-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit c1d2061b28c2aa25ec39b60d9c248e6beebd7315)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 03d1cac6 08-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoiding recursing on ww_mutex inside shrinker

We have to avoid taking ww_mutex inside the shrinker as we use it as a
plain mutex type and so need to avoid recursive deadlocks:

[ 602.771969] =================================
[ 602.771970] [ INFO: inconsistent lock state ]
[ 602.771973] 4.10.0gpudebug+ #122 Not tainted
[ 602.771974] ---------------------------------
[ 602.771975] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[ 602.771978] kswapd0/40 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 602.771979] (reservation_ww_class_mutex){+.+.?.}, at: [<ffffffffa054680a>] i915_gem_object_wait+0x39a/0x410 [i915]
[ 602.772020] {RECLAIM_FS-ON-W} state was registered at:
[ 602.772024] mark_held_locks+0x76/0x90
[ 602.772026] lockdep_trace_alloc+0xb8/0xc0
[ 602.772028] __kmalloc_track_caller+0x5d/0x130
[ 602.772031] krealloc+0x89/0xb0
[ 602.772033] reservation_object_reserve_shared+0xaf/0xd0
[ 602.772055] i915_gem_do_execbuffer.isra.35+0x1413/0x18b0 [i915]
[ 602.772075] i915_gem_execbuffer2+0x10e/0x1d0 [i915]
[ 602.772078] drm_ioctl+0x291/0x480
[ 602.772079] do_vfs_ioctl+0x695/0x6f0
[ 602.772081] SyS_ioctl+0x3c/0x70
[ 602.772084] entry_SYSCALL_64_fastpath+0x18/0xad
[ 602.772085] irq event stamp: 5197423
[ 602.772088] hardirqs last enabled at (5197423): [<ffffffff8116751d>] kfree+0xdd/0x170
[ 602.772091] hardirqs last disabled at (5197422): [<ffffffff811674f9>] kfree+0xb9/0x170
[ 602.772095] softirqs last enabled at (5190992): [<ffffffff8107bfe1>] __do_softirq+0x221/0x280
[ 602.772097] softirqs last disabled at (5190575): [<ffffffff8107c294>] irq_exit+0x64/0xc0
[ 602.772099]
other info that might help us debug this:
[ 602.772100] Possible unsafe locking scenario:

[ 602.772101] CPU0
[ 602.772101] ----
[ 602.772102] lock(reservation_ww_class_mutex);
[ 602.772104] <Interrupt>
[ 602.772105] lock(reservation_ww_class_mutex);
[ 602.772107]
*** DEADLOCK ***

[ 602.772109] 2 locks held by kswapd0/40:
[ 602.772110] #0: (shrinker_rwsem){++++..}, at: [<ffffffff811337b5>] shrink_slab.constprop.62+0x35/0x280
[ 602.772116] #1: (&dev->struct_mutex){+.+.+.}, at: [<ffffffffa0553957>] i915_gem_shrinker_lock+0x27/0x60 [i915]
[ 602.772141]
stack backtrace:
[ 602.772144] CPU: 2 PID: 40 Comm: kswapd0 Not tainted 4.10.0gpudebug+ #122
[ 602.772145] Hardware name: LENOVO 42433ZG/42433ZG, BIOS 8AET64WW (1.44 ) 07/26/2013
[ 602.772147] Call Trace:
[ 602.772151] dump_stack+0x68/0xa1
[ 602.772153] print_usage_bug+0x1d4/0x1f0
[ 602.772155] mark_lock+0x390/0x530
[ 602.772157] ? print_irq_inversion_bug+0x200/0x200
[ 602.772159] __lock_acquire+0x405/0x1260
[ 602.772181] ? i915_gem_object_wait+0x39a/0x410 [i915]
[ 602.772183] lock_acquire+0x60/0x80
[ 602.772205] ? i915_gem_object_wait+0x39a/0x410 [i915]
[ 602.772207] mutex_lock_nested+0x69/0x760
[ 602.772229] ? i915_gem_object_wait+0x39a/0x410 [i915]
[ 602.772231] ? kfree+0xdd/0x170
[ 602.772253] ? i915_gem_object_wait+0x163/0x410 [i915]
[ 602.772255] ? trace_hardirqs_on_caller+0x18d/0x1c0
[ 602.772256] ? trace_hardirqs_on+0xd/0x10
[ 602.772278] i915_gem_object_wait+0x39a/0x410 [i915]
[ 602.772300] i915_gem_object_unbind+0x5e/0x130 [i915]
[ 602.772323] i915_gem_shrink+0x22d/0x3d0 [i915]
[ 602.772347] i915_gem_shrinker_scan+0x3f/0x80 [i915]
[ 602.772349] shrink_slab.constprop.62+0x1ad/0x280
[ 602.772352] shrink_node+0x52/0x80
[ 602.772355] kswapd+0x427/0x5c0
[ 602.772358] kthread+0x122/0x130
[ 602.772360] ? try_to_free_pages+0x270/0x270
[ 602.772362] ? kthread_stop+0x70/0x70
[ 602.772365] ret_from_fork+0x2e/0x40

v2: Add commentary about the pruning being opportunistic

Reported-by: Jan Nordholz <jckn@gmx.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99977#c10
Fixes: e54ca9774777 ("drm/i915: Remove completed fences after a wait")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170308132629.7987-1-chris@chris-wilson.co.uk


# 7c55e2c5 06-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use pagecache write to prepopulate shmemfs from pwrite-ioctl

Before we instantiate/pin the backing store for our use, we
can prepopulate the shmemfs filp efficiently using a write into the
pagecache. We avoid the penalty of instantiating all the pages, important
if the user is just writing to a few and never uses the object on the GPU,
and using a direct write into shmemfs allows it to avoid the cost of
retrieving a page (mostly the clear-before-use, but in theory we could
curtail swapin) before it is overwritten.

This can be extended later to provide additional specialisation for
other backends (other than shmemfs). For now it provides a defense
against very large write-only allocations from exhausting all of system
memory.

v2: Smelling fixes.

Fixes: fe115628d567 ("drm/i915: Implement pwrite without struct-mutex")
References: https://bugs.freedesktop.org/show_bug.cgi?id=99107
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170307120338.7277-2-chris@chris-wilson.co.uk


# 4e5462ee 07-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Store a permanent error in obj->mm.pages

Once the object has been truncated, it is unrecoverable. To facilitate
detection of this state store the error in obj->mm.pages.

This is required for the next patch which should be applied to v4.10
(via stable), so we also need to mark this patch for backporting. In
that regard, let's consider this to be a fix/improvement too.

v2: Avoid dereferencing the ERR_PTR when freeing the object.

Fixes: 1233e2db199d ("drm/i915: Move object backing storage manipulation to its own locking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Link: http://patchwork.freedesktop.org/patch/msgid/20170307132031.32461-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 05425249 02-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Generalise wait for execlists to be idle

The code to check for execlists completion is generic, so move it to
intel_engine_cs.c, where we can reuse the new intel_engine_is_idle().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170303121947.20482-2-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


# c8659efa 01-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop spinlocks around adding to the client request list

Adding to the tail of the client request list as the only other user is
in the throttle ioctl that iterates forwards over the list. It only
needs protection against deletion of a request as it reads it, it simply
won't see a new request added to the end of the list, or it would be too
early and rejected. We can further reduce the number of spinlocks
required when throttling by removing stale requests from the client_list
as we throttle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170302122525.19675-1-chris@chris-wilson.co.uk


# c998e8a0 02-Mar-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Hold rpm during GEM suspend in driver unload/suspend

i915_gem_suspend() tries to access the device to ensure it is idle and
all writes from the device are flushed to memory. It assumed is already
held the runtime pm wakeref, but we should explicitly acquire it for our
access to be safe.

[ 619.926287] WARNING: CPU: 3 PID: 9353 at drivers/gpu/drm/i915/intel_drv.h:1750 gen6_write32+0x23e/0x2a0 [i915]
[ 619.926300] RPM wakelock ref not held during HW access
[ 619.926311] Modules linked in: vgem x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_codec coretemp snd_hwdep crct10dif_pclmul snd_hda_core crc32_pclmul snd_pcm mei_me mei lpc_ich ghash_clmulni_intel i915(-) sdhci_pci sdhci mmc_core e1000e ptp pps_core prime_numbers [last unloaded: snd_hda_intel]
[ 619.926578] CPU: 3 PID: 9353 Comm: drv_module_relo Tainted: G U 4.10.0-CI-Trybot_609+ #1
[ 619.926585] Hardware name: LENOVO 42962WU/42962WU, BIOS 8DET56WW (1.26 ) 12/01/2011
[ 619.926592] Call Trace:
[ 619.926609] dump_stack+0x67/0x92
[ 619.926625] __warn+0xc6/0xe0
[ 619.926640] warn_slowpath_fmt+0x4a/0x50
[ 619.926726] gen6_write32+0x23e/0x2a0 [i915]
[ 619.926801] gen6_mm_switch+0x38/0x70 [i915]
[ 619.926871] i915_switch_context+0xec/0xa10 [i915]
[ 619.926942] i915_gem_switch_to_kernel_context+0x13c/0x2b0 [i915]
[ 619.927019] i915_gem_suspend+0x2b/0x180 [i915]
[ 619.927079] i915_driver_unload+0x22/0x200 [i915]
[ 619.927093] ? __this_cpu_preempt_check+0x13/0x20
[ 619.927105] ? trace_hardirqs_on_caller+0xe7/0x200
[ 619.927118] ? trace_hardirqs_on+0xd/0x10
[ 619.927128] ? _raw_spin_unlock_irqrestore+0x3d/0x60
[ 619.927192] i915_pci_remove+0x14/0x20 [i915]
[ 619.927205] pci_device_remove+0x34/0xb0
[ 619.927219] device_release_driver_internal+0x158/0x210
[ 619.927234] driver_detach+0x3b/0x80
[ 619.927245] bus_remove_driver+0x53/0xd0
[ 619.927256] driver_unregister+0x27/0x50
[ 619.927267] pci_unregister_driver+0x25/0xa0
[ 619.927351] i915_exit+0x1a/0xb1a [i915]
[ 619.927362] SyS_delete_module+0x193/0x1e0
[ 619.927378] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 619.927386] RIP: 0033:0x7f82b46c5d37
[ 619.927393] RSP: 002b:00007ffdb6f610d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
[ 619.927408] RAX: ffffffffffffffda RBX: ffffffff81481ff3 RCX: 00007f82b46c5d37
[ 619.927415] RDX: 0000000000000001 RSI: 0000000000000800 RDI: 000000000224f558
[ 619.927422] RBP: ffffc90001187f88 R08: 0000000000000000 R09: 00007ffdb6f61100
[ 619.927428] R10: 000000000224f4e0 R11: 0000000000000246 R12: 0000000000000000
[ 619.927435] R13: 00007ffdb6f612b0 R14: 0000000000000000 R15: 0000000000000000
[ 619.927451] ? __this_cpu_preempt_check+0x13/0x20

or

[ 641.646590] WARNING: CPU: 1 PID: 8913 at drivers/gpu/drm/i915/intel_drv.h:1750 intel_runtime_pm_get_noresume+0x8b/0x90 [i915]
[ 641.646595] RPM wakelock ref not held during HW access
[ 641.646600] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec snd_hwdep crct10dif_pclmul snd_hda_core crc32_pclmul ghash_clmulni_intel snd_pcm mei_me mei i915(-) r8169 mii prime_numbers i2c_hid [last unloaded: snd_hda_intel]
[ 641.646825] CPU: 1 PID: 8913 Comm: drv_module_relo Tainted: G U 4.10.0-CI-Trybot_609+ #1
[ 641.646836] Hardware name: TOSHIBA SATELLITE P50-C/06F4 , BIOS 1.20 10/08/2015
[ 641.646843] Call Trace:
[ 641.646857] dump_stack+0x67/0x92
[ 641.646869] __warn+0xc6/0xe0
[ 641.646880] warn_slowpath_fmt+0x4a/0x50
[ 641.646893] ? __this_cpu_preempt_check+0x13/0x20
[ 641.646904] ? trace_hardirqs_on_caller+0xe7/0x200
[ 641.646957] intel_runtime_pm_get_noresume+0x8b/0x90 [i915]
[ 641.647022] __i915_add_request+0x423/0x540 [i915]
[ 641.647080] i915_gem_switch_to_kernel_context+0x148/0x2b0 [i915]
[ 641.647145] i915_gem_suspend+0x2b/0x180 [i915]
[ 641.647189] i915_driver_unload+0x22/0x200 [i915]
[ 641.647200] ? __this_cpu_preempt_check+0x13/0x20
[ 641.647210] ? trace_hardirqs_on_caller+0xe7/0x200
[ 641.647220] ? trace_hardirqs_on+0xd/0x10
[ 641.647231] ? _raw_spin_unlock_irqrestore+0x3d/0x60
[ 641.647276] i915_pci_remove+0x14/0x20 [i915]
[ 641.647293] pci_device_remove+0x34/0xb0
[ 641.647307] device_release_driver_internal+0x158/0x210
[ 641.647321] driver_detach+0x3b/0x80
[ 641.647330] bus_remove_driver+0x53/0xd0
[ 641.647338] driver_unregister+0x27/0x50
[ 641.647348] pci_unregister_driver+0x25/0xa0
[ 641.647415] i915_exit+0x1a/0xb1a [i915]
[ 641.647429] SyS_delete_module+0x193/0x1e0
[ 641.647444] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 641.647453] RIP: 0033:0x7fc622bd2d37
[ 641.647463] RSP: 002b:00007ffff8ffb5c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
[ 641.647475] RAX: ffffffffffffffda RBX: ffffffff81481ff3 RCX: 00007fc622bd2d37
[ 641.647480] RDX: 0000000000000001 RSI: 0000000000000800 RDI: 0000000000d49118
[ 641.647485] RBP: ffffc90000997f88 R08: 0000000000000000 R09: 00007ffff8ffb5f0
[ 641.647491] R10: 0000000000d490a0 R11: 0000000000000246 R12: 0000000000000000
[ 641.647498] R13: 00007ffff8ffb7a0 R14: 0000000000000000 R15: 0000000000000000
[ 641.647510] ? __this_cpu_preempt_check+0x13/0x20

v2: Keep holding rpm until the end to cover i915_gem_sanitize() as well.

Fixes: 5ab57c702069 ("drm/i915: Flush logical context image out to memory upon suspend")
Fixes: 1c777c5d1dc ("drm/i915/hsw: Fix GPU hang during resume from S3-devices state")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170302083029.19576-1-chris@chris-wilson.co.uk
Reviewed-by: Imre Deak <imre.deak@intel.com>
Cc: <stable@vger.kernel.org> # v4.9+


# 67b807a8 27-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Delay disabling the user interrupt for breadcrumbs

A significant cost in setting up a wait is the overhead of enabling the
interrupt. As we disable the interrupt whenever the queue of waiters is
empty, if we are frequently waiting on alternating batches, we end up
re-enabling the interrupt on a frequent basis. We do want to disable the
interrupt during normal operations as under high load it may add several
thousand interrupts/s - we have been known in the past to occupy whole
cores with our interrupt handler after accidentally leaving user
interrupts enabled. As a compromise, leave the interrupt enabled until
the next IRQ, or the system is idle. This gives a small window for a
waiter to keep the interrupt active and not be delayed by having to
re-enable the interrupt.

v2: Restore hangcheck/missed-irq detection for continuations
v3: Be more careful restoring the hangcheck timer after reset
v4: Be more careful restoring the fake irq after reset (if required!)
v5: Redo changes to intel_engine_wakeup()
v6: Factor out __intel_engine_wakeup()
v7: Improve commentary for declaring a missed wakeup

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170227205850.2828-4-chris@chris-wilson.co.uk


# 11bac800 24-Feb-2017 Dave Jiang <dave.jiang@intel.com>

mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf

->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ef0f411f 15-Feb-2017 Kenneth Graunke <kenneth@whitecape.org>

drm/i915: Drop support for I915_EXEC_CONSTANTS_* execbuf parameters.

This patch makes the I915_PARAM_HAS_EXEC_CONSTANTS getparam return 0
(indicating the optional feature is not supported), and makes execbuf
always return -EINVAL if the flags are used.

Apparently, no userspace ever shipped which used this optional feature:
I checked the git history of Mesa, xf86-video-intel, libva, and Beignet,
and there were zero commits showing a use of these flags. Kernel commit
72bfa19c8deb4 apparently introduced the feature prematurely. According
to Chris, the intention was to use this in cairo-drm, but "the use was
broken for gen6", so I don't think it ever happened.

'relative_constants_mode' has always been tracked per-device, but this
has actually been wrong ever since hardware contexts were introduced, as
the INSTPM register is saved (and automatically restored) as part of the
render ring context. The software per-device value could therefore get
out of sync with the hardware per-context value. This meant that using
them is actually unsafe: a client which tried to use them could damage
the state of other clients, causing the GPU to interpret their BO
offsets as absolute pointers, leading to bogus memory reads.

These flags were also never ported to execlist mode, making them no-ops
on Gen9+ (which requires execlists), and Gen8 in the default mode.

On Gen8+, userspace can write these registers directly, achieving the
same effect. On Gen6-7.5, it likely makes sense to extend the command
parser to support them. I don't think anyone wants this on Gen4-5.

Based on a patch by Dave Gordon.

v3: Return -ENODEV for the getparam, as this is what we do for other
obsolete features. Suggested by Chris Wilson.

Cc: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92448
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215093446.21291-1-kenneth@whitecape.org
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 754c9fd5 23-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Protect the request->global_seqno with the engine->timeline lock

A request is assigned a global seqno only when it is on the hardware
execution queue. The global seqno can be used to maintain a list of
requests on the same engine in retirement order, for example for
constructing a priority queue for waiting. Prior to its execution, or
if it is subsequently removed in the event of preemption, its global
seqno is zero. As both insertion and removal from the execution queue
may operate in IRQ context, it is not guarded by the usual struct_mutex
BKL. Instead those relying on the global seqno must be prepared for its
value to change between reads. Only when the request is complete can
the global seqno be stable (due to the memory barriers on submitting
the commands to the hardware to write the breadcrumb, if the HWS shows
that it has passed the global seqno and the global seqno is unchanged
after the read, it is indeed complete).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-9-chris@chris-wilson.co.uk


# d59b21ec 22-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove 'retire' parameter from intel_fb_obj_flush

Setting retire=true is identical to using origin=ORIGIN_CS, so make the
same simplification to intel_fb_obj_flush() as already employed for
intel_fb_obj_invalidate().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-6-chris@chris-wilson.co.uk


# 57822dc6 22-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Perform object clflushing asynchronously

Flushing the cachelines for an object is slow, can be as much as 100ms
for a large framebuffer. We currently do this under the struct_mutex BKL
on execution or on pageflip. But now with the ability to add fences to
obj->resv for both flips and execbuf (and we naturally wait on the fence
before CPU access), we can move the clflush operation to a workqueue and
signal a fence for completion, thereby doing the work asynchronously and
not blocking the driver or its clients.

v2: Introduce i915_gem_clflush.h and use a new name, split out some
extras into separate patches.

Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-5-chris@chris-wilson.co.uk


# f6aaba4d 22-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Skip clflushes for all non-page backed objects

Generalise the skip for physical and stolen objects by skipping anything
we do not have a valid address for inside the sg.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-4-chris@chris-wilson.co.uk


# 5a97bcc6 22-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Amalgamate flushing of display objects

We have three different paths by which userspace wants to flush the
display plane (i.e. objects with obj->pin_display). Use a common helper
to identify those paths and to simplify a later change.

v2: Include the conditional in the name, i915_gem_object_flush_if_display

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-3-chris@chris-wilson.co.uk


# e59dc172 22-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move cpu_cache_is_coherent() to header

For use in the next patch, take the current is-coherent helper and add
it to i915_gem_object.h

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-2-chris@chris-wilson.co.uk


# 208b84a3 22-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove change_domain tracepoint

The change_domain tracepoint has been inaccurate for a few years - it
doesn't fully capture the domains, especially with userspace bypassing
them. It is defunct, misleading and time to be removed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170222114049.28456-1-chris@chris-wilson.co.uk


# e54ca977 17-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove completed fences after a wait

If we wait upon the full (i.e. all shared fences, or upon an exclusive
fence) reservation object successfully, we know that all fences beneath
it have been signaled, so long as no new fences were added whilst we
slept. If the reservation_object remains the same, as detected by its
seqcount, we can then reap all the fences upon completion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170217151304.16665-6-chris@chris-wilson.co.uk


# 581ab1fe 15-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Unwind conversion to i915_gem_phys_ops on failure

The physical object is treated as permanently pinned. If we fail to take
this initial pin during i915_gem_object_attach_phys() we need to revert
it back to an ordinary shmemfs object before reporting the failure.

v2: git-add

Reported-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170215163900.11606-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


# c1d2061b 15-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Squelch any ktime/jiffie rounding errors for wait-ioctl

We wait upon jiffies, but report the time elapsed using a
high-resolution timer. This discrepancy can lead to us timing out the
wait prior to us reporting the elapsed time as complete.

This restores the squelching lost in commit e95433c73a11 ("drm/i915:
Rearrange i915_wait_request() accounting with callers").

Fixes: e95433c73a11 ("drm/i915: Rearrange i915_wait_request() accounting with callers")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170216125441.30923-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 636deb5b 12-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pass timeout==0 on to i915_gem_object_wait_fence()

The i915_gem_object_wait_fence() uses an incoming timeout=0 to query
whether the current fence is busy or idle, without waiting. This can be
used by the wait-ioctl to implement a busy query.

Fixes: e95433c73a11 ("drm/i915: Rearrange i915_wait_request() accounting with callers")
Testcase: igt/gem_wait/basic-busy-write-all
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170212215344.16600-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit d892e9398ecf6defc7972a62227b77dad6be20bd)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# ec62ed3e 07-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Restore context and pd for ringbuffer submission after reset

Following a reset, the context and page directory registers are lost.
However, the queue of requests that we resubmit after the reset may
depend upon them - the registers are restored from a context image, but
that restore may be inhibited and may simply be absent from the request
if it was in the middle of a sequence using the same context. If we
prime the CCID/PD registers with the first request in the queue (even
for the hung request), we prevent invalid memory access for the
following requests (and continually hung engines).

v2: Magic BIT(8), reserved for future use but still appears unused.
v3: Some commentary on handling innocent vs guilty requests
v4: Add a wait for PD_BASE fetch. The reload appears to be instant on my
Ivybridge, but this bit probably exists for a reason.

Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170207152437.4252-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
(cherry picked from commit c0dcb203fb009678e5be9e7782329dcfbbf16439)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# d892e939 12-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pass timeout==0 on to i915_gem_object_wait_fence()

The i915_gem_object_wait_fence() uses an incoming timeout=0 to query
whether the current fence is busy or idle, without waiting. This can be
used by the wait-ioctl to implement a busy query.

Fixes: e95433c73a11 ("drm/i915: Rearrange i915_wait_request() accounting with callers")
Testcase: igt/gem_wait/basic-busy-write-all
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170212215344.16600-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 17059450 13-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Test coherency of and barriers between cache domains

Write into an object using WB, WC, GTT, and GPU paths and make sure that
our internal API is sufficient to ensure coherent reads and writes.

v2: Avoid invalid free upon allocation error

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-21-chris@chris-wilson.co.uk


# 8335fd65 13-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add selftests for object allocation, phys

The phys object is a rarely used device (only very old machines require
a chunk of physically contiguous pages for a few hardware interactions).
As such, it is not exercised by CI and to combat that we want to add a
test that exercises the phys object on all platforms.

v2: Always set err on error paths and not rely on inheriting the err.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-17-chris@chris-wilson.co.uk


# 44653988 13-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Create a fake object for testing huge allocations

We would like to be able to exercise huge allocations even on memory
constrained devices. To do this we create an object that allocates only
a few pages and remaps them across its whole range - each page is reused
multiple times. We can therefore pretend we are rendering into a much
larger object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-9-chris@chris-wilson.co.uk


# 66d9cb5d 13-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mock the GEM device for self-testing

A simulacrum of drm_i915_private to let us pretend interactions with the
device.

v2: Tidy init error paths

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-6-chris@chris-wilson.co.uk


# 935a2f77 13-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add some selftests for sg_table manipulation

Start exercising the scattergather lists, especially looking at
iteration after coalescing.

v2: Comment on the peculiarity of table construction (i.e. why this
sg_table might be interesting).
v3: Added one __func__ to identify expect_pfn_sg()
v4: Loop until we have crossed the chain boundary (forcing sg_table to
do multiple allocations) before squelching a potential ENOMEM from oom.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-2-chris@chris-wilson.co.uk


# 2ae55738 12-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Clear the last_retired_context following a hang/reset

Following a hang and reset, we know that the engine is idle and all
context state has been saved or lost. Consequently, we know that the
engine is no longer referencing the last context and we can relinquish
our tracking.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Link: http://patchwork.freedesktop.org/patch/msgid/20170212172002.23072-5-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


# fe3288b5 12-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Park the breadcrumbs signaler across a GPU reset

The signal threads may be running concurrently with the GPU reset. The
completion from the GPU run asynchronous with the reset and two threads
may see different snapshots of the state, and the signaler may mark a
request as complete as we try to reset it. We don't tolerate 2 different
views of the same state and complain if we try to mark a request as
failed if it is already complete. Disable the signal threads during
reset to prevent this conflict (even though the conflict implies that
the state we resetting to is invalid, we have already made our
decision!).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99733
References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170212172002.23072-4-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


# 1d309634 12-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Kill the tasklet then disable

Disabling the tasklet leaves it if scheduled on the ready to run list
until it is re-enabled. This will leave the ksoftird thread spinning
until satisfied. To prevent this situation on starting the GPU reset, we
want to kill the tasklet first and then disable. The same problem will
arise when a tasklet is scheduled from another device, so a better
solution is required for the general case.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 1f7b847d72c3 ("drm/i915: Disable engine->irq_tasklet around resets")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170212172002.23072-3-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


# c00122f3 12-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Assert that the active request hasn't been signaled

As the request is not complete, it should not be signaled. Assert that
this is true before we process the request for a reset.

References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170212172002.23072-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


# 8c12d121 10-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move the irq_barrier for reset earlier into reset_prepare

When updating the bookkeeping following the reset, we need the seqno to
be coherent on the CPU prior to trusting its result for deciding whether
any request is completed. We need the irq_barrier before we start making
these decisions, i.e. in reset_prepare.

References: https://bugs.freedesktop.org/show_bug.cgi?id=99733
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170210185214.23463-1-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>


# c4d4c1c6 10-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush the freed object queue on device release

As dmabufs may live beyond the PCI device removal, we need to flush the
freed object worker on device release, and include a warning in case
there is a leak.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170210163523.17533-3-chris@chris-wilson.co.uk


# 1f7b847d 08-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Disable engine->irq_tasklet around resets

When we restart the engines, and we have active requests, a request on
the first engine may complete and queue a request to the second engine
before we try to restart the second engine. That queueing of the
request may race with the engine to restart, and so may corrupt the
current state. Disabling the engine->irq_tasklet prevents the two paths
from writing into ELSP simultaneously (and modifyin the execlists_port[]
at the same time).

Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests")
Testcase: igt/gem_exec_fence/await-hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-3-chris@chris-wilson.co.uk


# d8027093 08-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Split GEM resetting into 3 phases

Currently we do a reset prepare/finish around the call to reset the GPU,
but it looks like we need a later stage after the hw has been
reinitialised to allow GEM to restart itself. Start by splitting the 2
GEM phases into 3:

prepare - before the reset, check if GEM recovered, then stop GEM

reset - after the reset, update GEM bookkeeping

finish - after the re-initialisation following the reset, restart GEM

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-2-chris@chris-wilson.co.uk


# 20a8a74a 08-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move calling engine->init_hw() to its own function

Just a simple refactor to move a loop and some support code out of
i915_gem_init_hw(). This is in preparation for avoiding a race between
the tasklet writing to ELSP whilst simultaneously being written by
engine->init_hw() following a GPU reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208143033.11651-1-chris@chris-wilson.co.uk


# 519d5249 08-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: i915_gem_shrink_all() needs an awake device

Since to unbind an object, we may need a powered up device to access the
GTT entries, we only shrink bound objects if awake. Callers to
i915_gem_shrink_all() had to take this into account and take the rpm
wakeref, but we can move this wakeref into the shrink_all itself for
convenience and making the function live up to its name.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170208104710.18089-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


# 83bf6d55 02-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove overzealous fence warn on runtime suspend

The goal of the WARN was to catch when we are still actively using the
fence as we go into the runtime suspend. However, the reg->pin_count is
too coarse as it does not distinguish between exclusive ownership of the
fence register from activity.

I've not improved on the WARN, nor have we captured this WARN in an
exact igt, but it is showing up regularly in the wild:

[ 1915.935332] WARNING: CPU: 1 PID: 10861 at drivers/gpu/drm/i915/i915_gem.c:2022 i915_gem_runtime_suspend+0x116/0x130 [i915]
[ 1915.935383] WARN_ON(reg->pin_count)[ 1915.935399] Modules linked in:
snd_hda_intel i915 drm_kms_helper vgem netconsole scsi_transport_iscsi fuse vfat fat x86_pkg_temp_thermal coretemp intel_cstate intel_uncore snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mei_me mei serio_raw intel_rapl_perf intel_pch_thermal soundcore wmi acpi_pad i2c_algo_bit syscopyarea sysfillrect sysimgblt fb_sys_fops drm r8169 mii video [last unloaded: drm_kms_helper]
[ 1915.935785] CPU: 1 PID: 10861 Comm: kworker/1:0 Tainted: G U W 4.9.0-rc5+ #170
[ 1915.935799] Hardware name: LENOVO 80MX/Lenovo E31-80, BIOS DCCN34WW(V2.03) 12/01/2015
[ 1915.935822] Workqueue: pm pm_runtime_work
[ 1915.935845] ffffc900044fbbf0 ffffffffac3220bc ffffc900044fbc40 0000000000000000
[ 1915.935890] ffffc900044fbc30 ffffffffac059bcb 000007e6044fbc60 ffff8801626e3198
[ 1915.935937] ffff8801626e0000 0000000000000002 ffffffffc05e5d4e 0000000000000000
[ 1915.935985] Call Trace:
[ 1915.936013] [<ffffffffac3220bc>] dump_stack+0x4f/0x73
[ 1915.936038] [<ffffffffac059bcb>] __warn+0xcb/0xf0
[ 1915.936060] [<ffffffffac059c4f>] warn_slowpath_fmt+0x5f/0x80
[ 1915.936158] [<ffffffffc052d916>] i915_gem_runtime_suspend+0x116/0x130 [i915]
[ 1915.936251] [<ffffffffc04f1c74>] intel_runtime_suspend+0x64/0x280 [i915]
[ 1915.936277] [<ffffffffac0926f1>] ? dequeue_entity+0x241/0xbc0
[ 1915.936298] [<ffffffffac36bb85>] pci_pm_runtime_suspend+0x55/0x180
[ 1915.936317] [<ffffffffac36bb30>] ? pci_pm_runtime_resume+0xa0/0xa0
[ 1915.936339] [<ffffffffac4514e2>] __rpm_callback+0x32/0x70
[ 1915.936356] [<ffffffffac451544>] rpm_callback+0x24/0x80
[ 1915.936375] [<ffffffffac36bb30>] ? pci_pm_runtime_resume+0xa0/0xa0
[ 1915.936392] [<ffffffffac45222d>] rpm_suspend+0x12d/0x680
[ 1915.936415] [<ffffffffac69f6d7>] ? _raw_spin_unlock_irq+0x17/0x30
[ 1915.936435] [<ffffffffac0810b8>] ? finish_task_switch+0x88/0x220
[ 1915.936455] [<ffffffffac4534bf>] pm_runtime_work+0x6f/0xb0
[ 1915.936477] [<ffffffffac074353>] process_one_work+0x1f3/0x4d0
[ 1915.936501] [<ffffffffac074678>] worker_thread+0x48/0x4e0
[ 1915.936523] [<ffffffffac074630>] ? process_one_work+0x4d0/0x4d0
[ 1915.936542] [<ffffffffac074630>] ? process_one_work+0x4d0/0x4d0
[ 1915.936559] [<ffffffffac07a2c9>] kthread+0xd9/0xf0
[ 1915.936580] [<ffffffffac07a1f0>] ? kthread_park+0x60/0x60
[ 1915.936600] [<ffffffffac69fe62>] ret_from_fork+0x22/0x30

In the case the register is pinned, it should be present and we will
need to invalidate them to be restored upon resume as we cannot expect
the owner of the pin to call get_fence prior to use after resume.

Fixes: 7c108fd8feac ("drm/i915: Move fence cancellation to runtime suspend")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98804
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Imre Deak <imre.deak@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20170203125717.8431-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit e0ec3ec698851a6c97a12d696407b3ff77700c23)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# e3818697 09-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush untouched framebuffers before display on !llc

On a non-llc system, the objects are created with .cache_level =
CACHE_NONE and so the transition to uncached for scanout is a no-op.
However, if the object was never written to, it will still be in the CPU
domain (having been zeroed out by shmemfs). Those cachelines need to be
flushed prior to display.

Reported-and-tested-by: Vito Caputo
Fixes: a6a7cc4b7db6 ("drm/i915: Always flush the dirty CPU cache when pinning the scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20170109111932.6342-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit 69aeafeae9b30d797c439a30d1a4ccc8dc5b0eb0)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# c0dcb203 07-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Restore context and pd for ringbuffer submission after reset

Following a reset, the context and page directory registers are lost.
However, the queue of requests that we resubmit after the reset may
depend upon them - the registers are restored from a context image, but
that restore may be inhibited and may simply be absent from the request
if it was in the middle of a sequence using the same context. If we
prime the CCID/PD registers with the first request in the queue (even
for the hung request), we prevent invalid memory access for the
following requests (and continually hung engines).

v2: Magic BIT(8), reserved for future use but still appears unused.
v3: Some commentary on handling innocent vs guilty requests
v4: Add a wait for PD_BASE fetch. The reload appears to be instant on my
Ivybridge, but this bit probably exists for a reason.

Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170207152437.4252-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


# e0ec3ec6 02-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove overzealous fence warn on runtime suspend

The goal of the WARN was to catch when we are still actively using the
fence as we go into the runtime suspend. However, the reg->pin_count is
too coarse as it does not distinguish between exclusive ownership of the
fence register from activity.

I've not improved on the WARN, nor have we captured this WARN in an
exact igt, but it is showing up regularly in the wild:

[ 1915.935332] WARNING: CPU: 1 PID: 10861 at drivers/gpu/drm/i915/i915_gem.c:2022 i915_gem_runtime_suspend+0x116/0x130 [i915]
[ 1915.935383] WARN_ON(reg->pin_count)[ 1915.935399] Modules linked in:
snd_hda_intel i915 drm_kms_helper vgem netconsole scsi_transport_iscsi fuse vfat fat x86_pkg_temp_thermal coretemp intel_cstate intel_uncore snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mei_me mei serio_raw intel_rapl_perf intel_pch_thermal soundcore wmi acpi_pad i2c_algo_bit syscopyarea sysfillrect sysimgblt fb_sys_fops drm r8169 mii video [last unloaded: drm_kms_helper]
[ 1915.935785] CPU: 1 PID: 10861 Comm: kworker/1:0 Tainted: G U W 4.9.0-rc5+ #170
[ 1915.935799] Hardware name: LENOVO 80MX/Lenovo E31-80, BIOS DCCN34WW(V2.03) 12/01/2015
[ 1915.935822] Workqueue: pm pm_runtime_work
[ 1915.935845] ffffc900044fbbf0 ffffffffac3220bc ffffc900044fbc40 0000000000000000
[ 1915.935890] ffffc900044fbc30 ffffffffac059bcb 000007e6044fbc60 ffff8801626e3198
[ 1915.935937] ffff8801626e0000 0000000000000002 ffffffffc05e5d4e 0000000000000000
[ 1915.935985] Call Trace:
[ 1915.936013] [<ffffffffac3220bc>] dump_stack+0x4f/0x73
[ 1915.936038] [<ffffffffac059bcb>] __warn+0xcb/0xf0
[ 1915.936060] [<ffffffffac059c4f>] warn_slowpath_fmt+0x5f/0x80
[ 1915.936158] [<ffffffffc052d916>] i915_gem_runtime_suspend+0x116/0x130 [i915]
[ 1915.936251] [<ffffffffc04f1c74>] intel_runtime_suspend+0x64/0x280 [i915]
[ 1915.936277] [<ffffffffac0926f1>] ? dequeue_entity+0x241/0xbc0
[ 1915.936298] [<ffffffffac36bb85>] pci_pm_runtime_suspend+0x55/0x180
[ 1915.936317] [<ffffffffac36bb30>] ? pci_pm_runtime_resume+0xa0/0xa0
[ 1915.936339] [<ffffffffac4514e2>] __rpm_callback+0x32/0x70
[ 1915.936356] [<ffffffffac451544>] rpm_callback+0x24/0x80
[ 1915.936375] [<ffffffffac36bb30>] ? pci_pm_runtime_resume+0xa0/0xa0
[ 1915.936392] [<ffffffffac45222d>] rpm_suspend+0x12d/0x680
[ 1915.936415] [<ffffffffac69f6d7>] ? _raw_spin_unlock_irq+0x17/0x30
[ 1915.936435] [<ffffffffac0810b8>] ? finish_task_switch+0x88/0x220
[ 1915.936455] [<ffffffffac4534bf>] pm_runtime_work+0x6f/0xb0
[ 1915.936477] [<ffffffffac074353>] process_one_work+0x1f3/0x4d0
[ 1915.936501] [<ffffffffac074678>] worker_thread+0x48/0x4e0
[ 1915.936523] [<ffffffffac074630>] ? process_one_work+0x4d0/0x4d0
[ 1915.936542] [<ffffffffac074630>] ? process_one_work+0x4d0/0x4d0
[ 1915.936559] [<ffffffffac07a2c9>] kthread+0xd9/0xf0
[ 1915.936580] [<ffffffffac07a1f0>] ? kthread_park+0x60/0x60
[ 1915.936600] [<ffffffffac69fe62>] ret_from_fork+0x22/0x30

In the case the register is pinned, it should be present and we will
need to invalidate them to be restored upon resume as we cannot expect
the owner of the pin to call get_fence prior to use after resume.

Fixes: 7c108fd8feac ("drm/i915: Move fence cancellation to runtime suspend")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98804
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Imre Deak <imre.deak@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20170203125717.8431-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 4e64e553 02-Feb-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm: Improve drm_mm search (and fix topdown allocation) with rbtrees

The drm_mm range manager claimed to support top-down insertion, but it
was neither searching for the top-most hole that could fit the
allocation request nor fitting the request to the hole correctly.

In order to search the range efficiently, we create a secondary index
for the holes using either their size or their address. This index
allows us to find the smallest hole or the hole at the bottom or top of
the range efficiently, whilst keeping the hole stack to rapidly service
evictions.

v2: Search for holes both high and low. Rename flags to mode.
v3: Discover rb_entry_safe() and use it!
v4: Kerneldoc for enum drm_mm_insert_mode.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Sinclair Yeh <syeh@vmware.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com> # vmwgfx
Reviewed-by: Lucas Stach <l.stach@pengutronix.de> #etnaviv
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170202210438.28702-1-chris@chris-wilson.co.uk


# 69aeafea 09-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush untouched framebuffers before display on !llc

On a non-llc system, the objects are created with .cache_level =
CACHE_NONE and so the transition to uncached for scanout is a no-op.
However, if the object was never written to, it will still be in the CPU
domain (having been zeroed out by shmemfs). Those cachelines need to be
flushed prior to display.

Reported-and-tested-by: Vito Caputo
Fixes: a6a7cc4b7db6 ("drm/i915: Always flush the dirty CPU cache when pinning the scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.10-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20170109111932.6342-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 24145517 24-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reset the gpu on takeover

The GPU may be in an unknown state following resume and module load. The
previous occupant may have left contexts loaded, or other dangerous
state, which can cause an immediate GPU hang for us. The only save
course of action is to reset the GPU prior to using it - similarly to
how we reset the GPU prior to unload (before a second user may be
affected by our leftover state).

We need to reset the GPU very early in our load/resume sequence so that
any stale HW pointers are revoked prior to any resource allocations we
make (that may conflict).

A reset should only be a couple of milliseconds on a slow device, a cost
we should easily be able to absorb into our initialisation times.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170124110135.6418-2-chris@chris-wilson.co.uk


# e0216b76 19-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Treat an error from i915_vma_instance() as unlikely

When pinning into the global GTT, an error from creating the VMA is
unlikely, so mark it so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170119192659.31789-4-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# befedbb7 19-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use common LRU inactive vma bumping for unpin_from_display

Now that i915_gem_object_bump_inactive_ggtt() exists, also make use of
it for the LRU bumping from i915_gem_object_unpin_from_display()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170119192659.31789-2-chris@chris-wilson.co.uk
Reviewed-by: Jonas Lahtinen <joonas.lahtinen@linux.intel.com>


# d65415df 19-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Do an unlocked wait before set-cache-level ioctl

Since a change in cache level is likely to trigger an unbind, avoid
waiting under the mutex by preemptively doing an unlocked wait.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170119082211.21257-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


# 718659a6 16-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename some warts in the VMA API

Whilst writing testcases to exercise the VMA API, some oddities came to
light, such as i915_gem_obj_lookup_or_create(). Joonas suggested
i915_vma_instance() as a neat replacement, so rename them, move them to
i915_vma.c and add some kerneldoc as a sugary bonus.

s/i915_gem_obj_to_vma/i915_vma_lookup/
s/i915_gem_obj_lookup_or_create_vma/i915_vma_instance/

Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170116152131.18089-2-chris@chris-wilson.co.uk


# 71895a08 17-Jan-2017 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Add comment how we treat hung contexts

Explain in a comment how and why we treat hung context like we do.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-7-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 0e178aef 17-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Detect a failed GPU reset+recovery

If we can't recover the GPU after the reset, mark it as wedged to cancel
the outstanding tasks and to prevent new users from trying to use the
broken GPU.

v2: Check the same ring is hung again before declaring the reset broken.
v3: use engine_stalled (Mika)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-6-git-send-email-mika.kuoppala@intel.com


# 61da5362 17-Jan-2017 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Tidy up engine reset logic

Split engine reset for engine and request specific parts.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-5-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# bf2f0436 17-Jan-2017 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Introduce engine_stalled helper

Move the engine stalled/pardoned check into a helper function.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-4-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 211b12af 17-Jan-2017 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Cleanup request skip decision

Since we now only skip banned contexts, preventing the skip of default
contexts is no longer sensible. For a similar argument as before
'commit 7ec73b7e36d0 ("drm/i915: Only skip requests once a context is banned")'
we end up with an inconsistent API if we only mark future execbufs from
the default ctx as banned but fail to mark those currently executing
as failed.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-3-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 36193acd 17-Jan-2017 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Introduce engine_skip_context

Add a new function for skipping all pending requests
for a context in order to make engine reset flow more
readable.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-2-git-send-email-mika.kuoppala@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 4c965543 17-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move engine reset preparation to i915_gem_reset_prepare()

Now that we have prepare/finish routines for the GEM reset, move the
disabling of the engine->irq_tasklet into them to reduce repetition. The
device irq enable/disable is split out to ensure it is run first and
last always (even if the GPU reset fails).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484668747-9120-1-git-send-email-mika.kuoppala@intel.com


# f131e356 29-Dec-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Skip switch to kernel context if already done

Some engines are never user or already sitting idle in the kernel
context and for those we can skip flushing the current context for
i915_gem_switch_to_kernel_context(). We used to perform this
optimisation but that was removed for convenience of converting over to
multiple timelines and handling the pending request queues.

From the perspective of writing selftests, reducing the number of
background operations on the engines makes defining assertions easier.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170114162334.10271-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 47a8e3f6 13-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Eliminate superfluous i915_ggtt_view_normal

Since commit 058d88c4330f ("drm/i915: Track pinned VMA"), there is only
one user of i915_ggtt_view_normal rodate. Just treat NULL as no special
view in pin_to_display() like everywhere else.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170114002827.31315-7-chris@chris-wilson.co.uk


# 8bab1193 13-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Convert i915_ggtt_view to use an anonymous union

Reading the ggtt_views is much more pleasant without the extra
characters from specifying the union (i.e. ggtt_view.partial rather than
ggtt_view.params.partial). To make this work inside i915_vma_compare()
with only a single memcmp requires us to ensure that there are no
uninitialised bytes within each branch of the union (we make sure the
structs are packed) and we need to store the size of each branch.

v4: Rewrite changelog and add comments explaining the assert.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170114002827.31315-5-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 3bf4d575 13-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Stop clearing i915_ggtt_view

As we now use a compact memcmp in i915_vma_compare(), we can forgo
clearing the entire view and only set the precise parameters used in
this view.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170114002827.31315-4-chris@chris-wilson.co.uk


# e4621b73 06-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix phys pwrite for struct_mutex-less operation

Since commit fe115628d567 ("drm/i915: Implement pwrite without
struct-mutex") the lowlevel pwrite calls are now called without the
protection of struct_mutex, but pwrite_phys was still asserting that it
held the struct_mutex and later tried to drop and relock it.

Fixes: fe115628d567 ("drm/i915: Implement pwrite without struct-mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152240.5793-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
(cherry picked from commit 10466d2a59b23aa6d5ecd5310296c8cdb6458dac)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# f51455d4 10-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Replace 4096 with PAGE_SIZE or I915_GTT_PAGE_SIZE

Start converting over from the byte count to its semantic macro, either
we want to allocate the size of a physical page in main memory or we
want the size of a virtual page in the GTT. 4096 could mean either, but
PAGE_SIZE and I915_GTT_PAGE_SIZE are explicit and should help improve
code comprehension and future changes. In the future, we may want to use
variable GTT page sizes and so have the challenge of knowing which
hardcoded values were used to represent a physical page vs the virtual
page.

v2: Look for a few more 4096s to convert, discover IS_ALIGNED().
v3: 4096ul paranoia, make fence alignment a distinct value of 4096, keep
bdw stolen w/a as 4096 until we know better.
v4: Add asserts that i915_vma_insert() start/end are aligned to GTT page
sizes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110144734.26052-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 2a20d6f8 10-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename i915_gem_engine_cleanup() to engine_set_wedged()

It has been some time since i915_gem_engine_cleanup was only called from
the module unload path, and now it is only called when the GPU is
wedged. Mika complained that the name is confusing, especially in light
of the existence of i915_gem_cleanup_engines().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-5-chris@chris-wilson.co.uk


# 3cd9442f 10-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark all incomplete requests as -EIO when wedged

Similarly to a normal reset, after we mark the GPU as wedged (completely
fubar and no more requests can be executed), set the error status on all
the in flight requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-4-chris@chris-wilson.co.uk


# 3c1b2847 10-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Set an error status for a resubmitted request

Let userspace know if its request was resubmitted due to it being
executed at the time of a global reset. In this case, the reset was for
a guilty request on another engine, and this request was an innocent
victim that will be re-executed upon restarting. However, since it was
running at the time of the reset, we can not guarantee that it suffered
no ill-effects from the reset (e.g. some context state may be lost, or
some self-modifying fragment shaders will be restarted from the final
state not their initial state), to let userspace know that it has been
corrupted set a special value on the fence->error, -EAGAIN.

If the request does hang on resubmission, the error will be overwritten
with -EIO.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-3-chris@chris-wilson.co.uk


# c0d5f32c 10-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Set guilty-flag on fence after detecting a hang

The struct dma_fence carries a status field exposed to userspace by
sync_file. This is inspected after the fence is signaled and can convey
whether or not the request completed successfully, or in our case if we
detected a hang during the request (signaled via -EIO in
SYNC_IOC_FILE_INFO).

v2: Mark all cancelled requests as failed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-2-chris@chris-wilson.co.uk


# 2edc6e0d 10-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Consolidate reset_request()

Always reset the requests of the guilty context, including the hung
request that we tell the hardware to skip. This should help if the
reprogram fails entirely, but more importantly makes the guilty path
more uniform (and simplifies the subsequent patch to tweak the cancelled
requests).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-1-chris@chris-wilson.co.uk


# 8201c1fa 10-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Clip the partial view against the object not vma

The VMA is later clipped against the vm_area_struct before insertion of
the faulting PTE so we are free to create the partial view as we desire.
If we use the object as the extents rather than the area, this partial
can then be used for other areas.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110095633.6612-2-chris@chris-wilson.co.uk


# 2d4281bb 10-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Extract compute_partial_view()

In order to reuse the partial view for selftesting, extract the common
function for computing the view.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110095633.6612-1-chris@chris-wilson.co.uk


# 91d4e0aa 09-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move ggtt fence/alignment to i915_gem_tiling.c

Rename i915_gem_get_ggtt_size() and i915_gem_get_ggtt_alignment() to
i915_gem_fence_size() and i915_gem_fence_alignment() respectively to
better match usage. Similarly move the pair of functions into
i915_gem_tiling.c next to the fence restrictions.

Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-6-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 944397f0 09-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Store required fence size/alignment for GGTT vma

The fence size/alignment is a combination of the vma size plus object
tiling parameters. Those parameters are rarely changed, making the fence
size/alignemnt roughly constant for the lifetime of the VMA. We can
simplify subsequent calculations by precalculating the size/alignment
required for GGTT vma taking fencing into account (with an update if we
do change the tiling or stride).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-4-chris@chris-wilson.co.uk


# 5b30694b 09-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Align GGTT sizes to a fence tile row

Ensure the view occupies the full tile row so that reads/writes into the
VMA do not escape (via fenced detiling) into neighbouring objects - we
will pad the object with scratch pages to satisfy the fence. This
applies the lazy-tiling we employed on gen2/3 to gen4+.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-2-chris@chris-wilson.co.uk


# 6649a0b6 09-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Extract tile_row_size for fencing

Computing the tile row size of a tiled object (for use with fence
registers) is repeated, so extract it to a common helper.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 7453c549 20-Dec-2016 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

swiotlb: Export swiotlb_max_segment to users

So they can figure out what is the optimal number of pages
that can be contingously stitched together without fear of
bounce buffer.

We also expose an mechanism for sub-users of SWIOTLB API, such
as Xen-SWIOTLB to set the max segment value. And lastly
if swiotlb=force is set (which mandates we bounce buffer everything)
we set max_segment so at least we can bounce buffer one 4K page
instead of a giant 512KB one for which we may not have space.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-and-Tested-by: Juergen Gross <jgross@suse.com>


# b42a13d9 06-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drain freed objects for mmap space exhaustion

As we now use a deferred free queue for objects, simply retiring the
active objects is not enough to immediately free them and recover their
mmap space - we must now also drain the freed object list.

Fixes: fbbd37b36fa5 ("drm/i915: Move object release to a freelist + worker"
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152240.5793-3-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 10466d2a 06-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix phys pwrite for struct_mutex-less operation

Since commit fe115628d567 ("drm/i915: Implement pwrite without
struct-mutex") the lowlevel pwrite calls are now called without the
protection of struct_mutex, but pwrite_phys was still asserting that it
held the struct_mutex and later tried to drop and relock it.

Fixes: fe115628d567 ("drm/i915: Implement pwrite without struct-mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152240.5793-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 984ff29f 06-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Simplify testing for am-I-the-kernel-context?

The kernel context (dev_priv->kernel_context) is unique in that it is
not associated with any user filp - it is the only one with
ctx->file_priv == NULL. This is a simpler test than comparing it against
dev_priv->kernel_context which involves some pointer dancing.

In checking that this is true, we notice that the gvt context is
allocating itself a i915_hw_ppgtt it doesn't use and not flagging that
its file_priv should be invalid.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152013.24684-5-chris@chris-wilson.co.uk


# 7ec73b7e 05-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only skip requests once a context is banned

If we skip before banning, we have an inconsistent interface between
execbuf still queueing valid request but those requests already queued
being cancelled. If we only cancel the pending requests once we stop
accepting new requests, the user interface is more consistent.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: <stable@vger.kernel.org> # v4.9+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170105170059.344-1-chris@chris-wilson.co.uk


# 40b326ee 05-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move a few utility macros into a separate header

In order to defeat some circular dependencies between headers to allow use
of e.g. range_overflows() in a header, move the simple independent macros
into their own header.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170105153023.30575-4-chris@chris-wilson.co.uk


# b1ed35d9 04-Jan-2017 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Revoke fenced GTT mmapings across GPU reset

The fence registers are clobbered by a GPU reset. If there is concurrent
user access to a fenced region via a GTT mmaping, the access will not be
fenced during the reset (until we restore the fences afterwards). In order
to prevent invalid access during the reset, before we clobber the fences
first we must invalidate the GTT mmapings. Access to the mmap will then
be forced to fault in the page, and in handling the fault, i915_gem_fault()
will take the struct_mutex and wait upon the reset to complete.

v2: Fix up commentary.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99274
Testcase: igt/gem_mmap_gtt/hang
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170104145110.1486-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 2471eb5f 23-Dec-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prevent timeline updates whilst performing reset

As the fence may be signaled concurrently from an interrupt on another
device, it is possible for the list of requests on the timeline to be
modified as we walk it. Take both (the context's timeline and the global
timeline) locks to prevent such modifications.

Fixes: 80b204bce8f2 ("drm/i915: Enable multiple timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-10-chris@chris-wilson.co.uk
(cherry picked from commit 00c25e3f40083a6d5f1111955baccd287ee49258)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 64d1461c 23-Dec-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Silence allocation failure during sg_trim()

As trimming the sg table is merely an optimisation that gracefully fails
if we cannot allocate a new table, we do not need to report the failure
either.

Fixes: 0c40ce130e38 ("drm/i915: Trim the object sg table")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-4-chris@chris-wilson.co.uk
(cherry picked from commit 8bfc478fa455b4908f745df368355b415460c60e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# c3f923b5 23-Dec-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Don't clflush before release phys object

When we teardown the backing storage for the phys object, we copy from
the coherent contiguous block back to the shmemfs object, clflushing as
we go. Trying to clflush the invalid sg beforehand just oops and would
be redundant (due to it already being coherent, and clflushed
afterwards).

Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-3-chris@chris-wilson.co.uk
(cherry picked from commit e5facdf9644f4490520e0489a0252e8feaba3744)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 6095868a 31-Dec-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Complete kerneldoc for struct i915_gem_context

The existing kerneldoc was outdated, so time for a refresh.

v2: Use single line kdoc, mention functions for manipulation

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161231112012.29263-3-chris@chris-wilson.co.uk


# 00c25e3f 23-Dec-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prevent timeline updates whilst performing reset

As the fence may be signaled concurrently from an interrupt on another
device, it is possible for the list of requests on the timeline to be
modified as we walk it. Take both (the context's timeline and the global
timeline) locks to prevent such modifications.

Fixes: 80b204bce8f2 ("drm/i915: Enable multiple timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-10-chris@chris-wilson.co.uk


# 8bfc478f 23-Dec-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Silence allocation failure during sg_trim()

As trimming the sg table is merely an optimisation that gracefully fails
if we cannot allocate a new table, we do not need to report the failure
either.

Fixes: 0c40ce130e38 ("drm/i915: Trim the object sg table")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-4-chris@chris-wilson.co.uk


# e5facdf9 23-Dec-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Don't clflush before release phys object

When we teardown the backing storage for the phys object, we copy from
the coherent contiguous block back to the shmemfs object, clflushing as
we go. Trying to clflush the invalid sg beforehand just oops and would
be redundant (due to it already being coherent, and clflushed
afterwards).

Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-3-chris@chris-wilson.co.uk


# bdeb9785 23-Dec-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Repeat flush of idle work during suspend

The idle work handler is self-arming - if it detects that it needs to
run again it will queue itself from its work handler. Take greater care
when trying to drain the idle work, and double check that it is flushed.

The free worker has a similar issue where it is armed by an RCU task
which may be running concurrently with us.

This should hopefully help with the sporadic WARN_ON(dev_priv->gt.awake)
from i915_gem_suspend.

v2: Reuse drain_freed_objects.
v3: Don't try to flush the freed objects from the shrinker, as it may be
underneath the struct_mutex already.
v4: do while and comment upon the excess rcu_barrier in drain_freed_objects

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-2-chris@chris-wilson.co.uk


# 28f412e0 23-Dec-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Break after walking all GGTT vma in bump_inactive_ggtt

Since commit db6c2b4151f2 ("drm/i915: Store the vma in an rbtree under
the object") the vma are once again sorted into GGTT first, then ppGTT
so that the typical case of walking the GGTT vma can stop as soon as we
find a non-ppGTT. Apply that optimisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161223145804.6605-1-chris@chris-wilson.co.uk


# abb0deac 18-Dec-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping

If we at first do not succeed with attempting to remap our physical
pages using a coalesced scattergather list, try again with one
scattergather entry per page. This should help with swiotlb as it uses a
limited buffer size and only searches for contiguous chunks within its
buffer aligned up to the next boundary - i.e. we may prematurely cause a
failure as we are unable to utilize the unused space between large
chunks and trigger an error such as:

i915 0000:00:02.0: swiotlb buffer is full (sz: 1630208 bytes)

Reported-by: Juergen Gross <jgross@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Fixes: 871dfbd67d4e ("drm/i915: Allow compaction upto SWIOTLB max segment size")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20161219124346.550-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit d766ef53006c2c38a7fe2bef0904105a793383f2)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 057f803f 07-Dec-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reorder phys backing storage release

In commit a4f5ea64f0a8 ("drm/i915: Refactor object page API"), I
reordered the object->pages teardown to be more friendly wrt to a
separate obj->mm.lock. However, I overlooked the phys object and left it
with a dangling use-after-free of its phys_handle. Move the allocation
of the phys handle to get_pages and it release to put_pages to prevent
the invalid access and to improve symmetry.

v2: Add commentary about always aligning to page size.

Testcase: igt/drv_selftest/objects
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: a4f5ea64f0a8 ("drm/i915: Refactor object page API")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161207133411.8028-1-chris@chris-wilson.co.uk
(cherry picked from commit dbb4351bab0a8440f6b02895c142bce6c30b7097)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# c2dc6cc9 18-Dec-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add a test that we terminate the trimmed sgtable as expected

In commit 0c40ce130e38 ("drm/i915: Trim the object sg table"), we expect
to copy exactly orig_st->nents across and allocate the table thusly.
The copy loop should therefore end with the new_sg being NULL.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161219124346.550-2-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# d766ef53 18-Dec-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping

If we at first do not succeed with attempting to remap our physical
pages using a coalesced scattergather list, try again with one
scattergather entry per page. This should help with swiotlb as it uses a
limited buffer size and only searches for contiguous chunks within its
buffer aligned up to the next boundary - i.e. we may prematurely cause a
failure as we are unable to utilize the unused space between large
chunks and trigger an error such as:

i915 0000:00:02.0: swiotlb buffer is full (sz: 1630208 bytes)

Reported-by: Juergen Gross <jgross@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Fixes: 871dfbd67d4e ("drm/i915: Allow compaction upto SWIOTLB max segment size")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20161219124346.550-1-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e8a9c58f 18-Dec-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Unify active context tracking between legacy/execlists/guc

The requests conversion introduced a nasty bug where we could generate a
new request in the middle of constructing a request if we needed to idle
the system in order to evict space for a context. The request to idle
would be executed (and waited upon) before the current one, creating a
minor havoc in the seqno accounting, as we will consider the current
request to already be completed (prior to deferred seqno assignment) but
ring->last_retired_head would have been updated and still could allow
us to overwrite the current request before execution.

We also employed two different mechanisms to track the active context
until it was switched out. The legacy method allowed for waiting upon an
active context (it could forcibly evict any vma, including context's),
but the execlists method took a step backwards by pinning the vma for
the entire active lifespan of the context (the only way to evict was to
idle the entire GPU, not individual contexts). However, to circumvent
the tricky issue of locking (i.e. we cannot take struct_mutex at the
time of i915_gem_request_submit(), where we would want to move the
previous context onto the active tracker and unpin it), we take the
execlists approach and keep the contexts pinned until retirement.
The benefit of the execlists approach, more important for execlists than
legacy, was the reduction in work in pinning the context for each
request - as the context was kept pinned until idle, it could short
circuit the pinning for all active contexts.

We introduce new engine vfuncs to pin and unpin the context
respectively. The context is pinned at the start of the request, and
only unpinned when the following request is retired (this ensures that
the context is idle and coherent in main memory before we unpin it). We
move the engine->last_context tracking into the retirement itself
(rather than during request submission) in order to allow the submission
to be reordered or unwound without undue difficultly.

And finally an ulterior motive for unifying context handling was to
prepare for mock requests.

v2: Rename to last_retired_context, split out legacy_context tracking
for MI_SET_CONTEXT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk


# 966d5bf5 13-Dec-2016 Matthew Auld <matthew.auld@intel.com>

drm/i915: convert to using range_overflows

Convert some of the obvious hand-rolled ranged overflow sanity checks to
our shiny new range_overflows macro.

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161213203222.32564-4-matthew.auld@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 1a29d85e 14-Dec-2016 Jan Kara <jack@suse.cz>

mm: use vmf->address instead of of vmf->virtual_address

Every single user of vmf->virtual_address typed that entry to unsigned
long before doing anything with it so the type of virtual_address does
not really provide us any additional safety. Just use masked
vmf->address which already has the appropriate type.

Link: http://lkml.kernel.org/r/1479460644-25076-3-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# dbb4351b 07-Dec-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reorder phys backing storage release

In commit a4f5ea64f0a8 ("drm/i915: Refactor object page API"), I
reordered the object->pages teardown to be more friendly wrt to a
separate obj->mm.lock. However, I overlooked the phys object and left it
with a dangling use-after-free of its phys_handle. Move the allocation
of the phys handle to get_pages and it release to put_pages to prevent
the invalid access and to improve symmetry.

v2: Add commentary about always aligning to page size.

Testcase: igt/drv_selftest/objects
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: a4f5ea64f0a8 ("drm/i915: Refactor object page API")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161207133411.8028-1-chris@chris-wilson.co.uk


# 73f67aa8 07-Dec-2016 Jani Nikula <jani.nikula@intel.com>

drm/i915: distinguish G33 and Pineview from each other

Pineview deserves to use its own platform enum (which was already added,
unused, previously). IS_G33() no longer matches Pineview, and gets
replaced by IS_G33() || IS_PINEVIEW() or equivalent. Pineview is no
longer an outlier among platform definitions.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1481143689-19672-1-git-send-email-jani.nikula@intel.com


# c0f86832 06-Dec-2016 Jani Nikula <jani.nikula@intel.com>

drm/i915: rename BROADWATER and CRESTLINE to I965G and I965GM, respectively

Add more consistency to our naming. Pineview remains the outlier. Keep
using code names for gen5+.

v2: rebased

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1481105584-23033-1-git-send-email-jani.nikula@intel.com


# 85fd4f58 05-Dec-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark all non-vma being inserted into the address spaces

We need to distinguish between full i915_vma structs and simple
drm_mm_nodes when considering eviction (i.e. we must be careful not to
treat a mere drm_mm_node as a much larger i915_vma causing memory
corruption, if we are lucky). To do this, color these not-a-vma with -1
(I915_COLOR_UNEVICTABLE).

v2...v200: New name for -1.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-1-chris@chris-wilson.co.uk


# ce1135c7 22-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Complete requests in nop_submit_request

Since the submit/execute split in commit d55ac5bf97c6 ("drm/i915: Defer
transfer onto execution timeline to actual hw submission") the
global seqno advance was deferred until the submit_request callback.
After wedging the GPU, we were installing a nop_submit_request handler
(to avoid waking up the dead hw) but I had missed converting this over
to the new scheme. Under the new scheme, we have to explicitly call
i915_gem_submit_request() from the submit_request handler to mark the
request as on the hardware. If we don't the request is always pending,
and any waiter will continue to wait indefinitely and hangcheck will not
be able to resolve the lockup.

References: https://bugs.freedesktop.org/show_bug.cgi?id=98748
Testcase: igt/gem_eio/in-flight
Fixes: d55ac5bf97c6 ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-3-chris@chris-wilson.co.uk
(cherry picked from commit 3dcf93f7f23a61e867a5ccadaf651cb2d29229fd)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# cb15d9f8 01-Dec-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: More GEM init dev_priv cleanup

Simplifies the code to pass the right parameter in.

v2: Commit message. (Joonas Lahtinen)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# bf9e8429 01-Dec-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Make various init functions take dev_priv

Like GEM init, GUC init, MOCS init and context creation.

Enables them to lose dev_priv locals.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 12d79d78 01-Dec-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Make GEM object create and create from data take dev_priv

Makes all GEM object constructors consistent.

v2: Fix compilation in GVT code.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)


# 187685cb 01-Dec-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Make GEM object alloc/free and stolen created take dev_priv

Where it is more appropriate and also to be consistent with
the direction of the driver.

v2: Leave out object alloc/free inlining. (Joonas Lahtinen)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 2420489b 14-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error

On the DMA mapping error path, sg may be NULL (it has already been
marked as the last scatterlist entry), and we should avoid dereferencing
it again.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e227330223a7 ("drm/i915: avoid leaking DMA mappings")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20161114112930.2033-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
(cherry picked from commit b17993b7b29612369270567643bcff814f4b3d7f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 49d73912 29-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Convert vm->dev backpointer to vm->i915

99% of the time we access i915_address_space->dev we want the i915
device and not the drm device, so let's store the drm_i915_private
backpointer instead. The only real complication here are the inlines
in i915_vma.h where drm_i915_private is not yet defined and so we have
to choose an alternate path for our asserts.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161129095008.32622-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 20e4933c 22-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Stop the machine as we install the wedged submit_request handler

In order to prevent a race between the old callback submitting an
incomplete request and i915_gem_set_wedged() installing its nop handler,
we must ensure that the swap occurs when the machine is idle
(stop_machine).

v2: move context lost from out of BKL.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-4-chris@chris-wilson.co.uk


# 3dcf93f7 22-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Complete requests in nop_submit_request

Since the submit/execute split in commit d55ac5bf97c6 ("drm/i915: Defer
transfer onto execution timeline to actual hw submission") the
global seqno advance was deferred until the submit_request callback.
After wedging the GPU, we were installing a nop_submit_request handler
(to avoid waking up the dead hw) but I had missed converting this over
to the new scheme. Under the new scheme, we have to explicitly call
i915_gem_submit_request() from the submit_request handler to mark the
request as on the hardware. If we don't the request is always pending,
and any waiter will continue to wait indefinitely and hangcheck will not
be able to resolve the lockup.

References: https://bugs.freedesktop.org/show_bug.cgi?id=98748
Testcase: igt/gem_eio/in-flight
Fixes: d55ac5bf97c6 ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-3-chris@chris-wilson.co.uk


# d9e9da64 22-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Don't deref context->file_priv ERR_PTR upon reset

When a user context is closed, it's file_priv backpointer is replaced by
ERR_PTR(-EBADF); be careful not to chase this invalid pointer after a
hang and a GPU reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: b083a0870c79 ("drm/i915: Add per client max context ban limit")
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161122144121.7379-1-chris@chris-wilson.co.uk


# bc1d53c6 16-Nov-2016 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Wipe hang stats as an embedded struct

Bannable property, banned status, guilty and active counts are
properties of i915_gem_context. Make them so.

v2: rebase

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1479309634-28574-1-git-send-email-mika.kuoppala@intel.com


# b083a087 18-Nov-2016 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Add per client max context ban limit

If we have a bad client submitting unfavourably across different
contexts, creating new ones, the per context scoring of badness
doesn't remove the root cause, the offending client.
To counter, keep track of per client context bans. Deny access if
client is responsible for more than 3 context bans in
it's lifetime.

v2: move ban check to context create ioctl (Chris)
v3: add commentary about hangs needed to reach client ban (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>


# 84102171 16-Nov-2016 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Add bannable context parameter

Now when driver has per context scoring of 'hanging badness'
and also subsequent hangs during short windows are allowed,
if there is progress made in between, it does not make sense
to expose a ban timing window as a context parameter anymore.

Let the scoring be the sole indicator for ban policy and substitute
ban period context parameter as a boolean to get/set context
bannable property.

v2: allow non root to opt into being banned (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>


# e5e1fc47 16-Nov-2016 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Use request retirement as context progress

As hangcheck score was removed, the active decay of score
was removed also. This removed feature for hangcheck to detect
if the gpu client was accidentally or maliciously causing intermittent
hangs. Reinstate the scoring as a per context property, so that if
one context starts to act unfavourably, ban it.

v2: ban_period_secs as a gate to score check (Chris)
v3: decay in proper spot. scores as tunables (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>


# 3fe3b030 18-Nov-2016 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Decouple hang detection from hangcheck period

Hangcheck state accumulation has gained more steps
along the years, like head movement and more recently the
subunit inactivity check. As the subunit sampling is only
done if the previous state check showed inactivity, we
have added more stages (and time) to reach a hang verdict.

Asymmetric engine states led to different actual weight of
'one hangcheck unit' and it was demonstrated in some
hangs that due to difference in stages, simpler engines
were accused falsely of a hang as their scoring was much
more quicker to accumulate above the hang treshold.

To completely decouple the hangcheck guilty score
from the hangcheck period, convert hangcheck score to a
rough period of inactivity measurement. As these are
tracked as jiffies, they are meaningful also across
reset boundaries. This makes finding a guilty engine
more accurate across multi engine activity scenarios,
especially across asymmetric engines.

We lose the ability to detect cross batch malicious attempts
to hinder the progress. Plan is to move this functionality
to be part of context banning which is more natural fit,
later in the series.

v2: use time_before macros (Chris)
reinstate the pardoning of moving engine after hc (Chris)
v3: avoid global state for per engine stall detection (Chris)
v4: take timeline last retirement into account (Chris)
v5: do debug print on pardoning, split out retirement timestamp (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>


# 05c34837 18-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Skip final clflush if LLC is coherent

If the LLC is coherent with the object, we do not need to worry about
whether main memory and cache mismatch when we hand the object back to
the system.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161118211747.25197-2-chris@chris-wilson.co.uk


# a6a7cc4b 18-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Always flush the dirty CPU cache when pinning the scanout

Currently we only clflush the scanout if it is in the CPU domain. Also
flush if we have a pending CPU clflush. We also want to treat the
dirtyfb path similar, and flush any pending writes there as well.

v2: Only send the fb flush message if flushing the dirt on flip
v3: Make flush-for-flip and dirtyfb look more alike since they serve
similar roles as end-of-frame marker.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> #v2
Link: http://patchwork.freedesktop.org/patch/msgid/20161118211747.25197-1-chris@chris-wilson.co.uk


# b17993b7 14-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error

On the DMA mapping error path, sg may be NULL (it has already been
marked as the last scatterlist entry), and we should avoid dereferencing
it again.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e227330223a7 ("drm/i915: avoid leaking DMA mappings")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/20161114112930.2033-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>


# 5b8c8aec 16-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move frontbuffer CS write tracking from ggtt vma to object

I tried to avoid having to track the write for every VMA by only
tracking writes to the ggtt. However, for the purposes of frontbuffer
tracking this is insufficient as we need to invalidate around writes not
just to the the ggtt but all aliased ppgtt views of the framebuffer. By
moving the critical section to the object and only doing so for
framebuffer writes we can reduce the tracking even further by only
watching framebuffers and not vma.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161116190704.5293-1-chris@chris-wilson.co.uk
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# ea84aa77 17-Nov-2016 Matthew Auld <matthew.auld@intel.com>

drm/i915: don't leak global_timeline

We need to clean up the global_timeline in i915_gem_load_cleanup.

v2: don't forget about the struct_mutex, and also WARN_ON if we have any
remaining timelines before purging the global_timeline.

v3: it might be a good idea to first remove the global_timeline...duh!

Fixes: 73cb97010d4f ("drm/i915: Combine seqno + tracking into a global timeline struct")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1479415087-13216-1-git-send-email-matthew.auld@intel.com
Link: http://patchwork.freedesktop.org/patch/msgid/20161117210411.14044-2-chris@chris-wilson.co.uk


# c4c29d7b 09-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Demote i915_gem_open() debugging from DRIVER to USER

We use DRM_DEBUG() when reporting on user actions, to try and keep
intentional errors out of the CI dmesg. Demote the debug from
i915_gem_open() similarly so that it is only apparent with drm.debug & 1
like its brethren.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20161109104507.21228-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 275a991c 16-Nov-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: dev_priv cleanup in i915_gem_gtt.c

Started with removing INTEL_INFO(dev) and cascaded into a quite
big trickle of function prototype changes. Still, I think it is
for the better.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# 4362f4f6 16-Nov-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Use dev_priv in INTEL_INFO in i915_gem_fence_reg.c

Plus a small cascade of function prototype changes.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# c6be607a 16-Nov-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: dev_priv and a small cascade of cleanups in i915_gem.c

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# 6b5e90f5 14-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/scheduler: Boost priorities for flips

Boost the priority of any rendering required to show the next pageflip
as we want to avoid missing the vblank by being delayed by invisible
workload. We prioritise avoiding jank and jitter in the GUI over
starving background tasks.

v2: Descend dma_fence_array when boosting priorities.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-10-chris@chris-wilson.co.uk


# 20311bd3 14-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/scheduler: Execute requests in order of priorities

Track the priority of each request and use it to determine the order in
which we submit requests to the hardware via execlists.

The priority of the request is determined by the user (eventually via
the context) but may be overridden at any time by the driver. When we set
the priority of the request, we bump the priority of all of its
dependencies to match - so that a high priority drawing operation is not
stuck behind a background task.

When the request is ready to execute (i.e. we have signaled the submit
fence following completion of all its dependencies, including third
party fences), we put the request into a priority sorted rbtree to be
submitted to the hardware. If the request is higher priority than all
pending requests, it will be submitted on the next context-switch
interrupt as soon as the hardware has completed the current request. We
do not currently preempt any current execution to immediately run a very
high priority request, at least not yet.

One more limitation, is that this is first implementation is for
execlists only so currently limited to gen8/gen9.

v2: Replace recursive priority inheritance bumping with an iterative
depth-first search list.
v3: list_next_entry() for walking lists
v4: Explain how the dfs solves the recursion problem with PI.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-8-chris@chris-wilson.co.uk


# 52e54209 14-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/scheduler: Record all dependencies upon request construction

The scheduler needs to know the dependencies of each request for the
lifetime of the request, as it may choose to reschedule the requests at
any time and must ensure the dependency tree is not broken. This is in
additional to using the fence to only allow execution after all
dependencies have been completed.

One option was to extend the fence to support the bidirectional
dependency tracking required by the scheduler. However the mismatch in
lifetimes between the submit fence and the request essentially meant
that we had to build a completely separate struct (and we could not
simply reuse the existing waitqueue in the fence for one half of the
dependency tracking). The extra dependency tracking simply did not mesh
well with the fence, and keeping it separate both keeps the fence
implementation simpler and allows us to extend the dependency tracking
into a priority tree (whilst maintaining support for reordering the
tree).

To avoid the additional allocations and list manipulations, the use of
the priotree is disabled when there are no schedulers to use it.

v2: Create a dedicated slab for i915_dependency.
Rename the lists.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-7-chris@chris-wilson.co.uk


# 663f71e7 14-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove engine->execlist_lock

The execlist_lock is now completely subsumed by the engine->timeline->lock,
and so we can remove the redundant layer of locking.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-5-chris@chris-wilson.co.uk


# bb89485e 14-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Create distinct lockclasses for execution vs user timelines

In order to simplify the lockdep annotation, as they become more complex
in the future with deferred execution and multiple paths through the
same functions, create a separate lockclass for the user timeline and
the hardware execution timeline.

We should only ever be locking the user timeline and the execution
timeline in parallel so we only need to create two lock classes, rather
than a separate class for every timeline.

v2: Rename the lock classes to be more consistent with other lockdep.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-2-chris@chris-wilson.co.uk


# 2b3c8317 11-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Stop skipping the final clflush back to system pages

When we release the shmem backing storage, we make sure that the pages
are coherent with the cpu cache. However, our clflush routine was
skipping the flush as the object had no pages at release time. Fix this by
explicitly flushing the sg_table we are decoupling.

Fixes: 03ac84f1830e ("drm/i915: Pass around sg_table to get_pages/put_pages backend")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161111145809.9701-2-chris@chris-wilson.co.uk


# 9caa34aa 11-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only wait upon the execution timeline when unlocked

In order to walk the list of all timelines, we currently require the
struct_mutex. We are sometimes called prior to the struct_mutex being
taken by the caller (i.e !I915_WAIT_LOCKED) in which case we can only
trust the global execution timelines (as these are owned by the device).
This means in the unlocked phase we can only wait upon the currently
executing requests and not all queued.

[ 175.743243] general protection fault: 0000 [#1] SMP
[ 175.743263] Modules linked in: nls_iso8859_1 intel_rapl x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iwlwifi aesni_intel aes_x86_64 lrw snd_soc_rt5640 gf128mul snd_soc_rl6231 snd_soc_core glue_helper snd_compress snd_pcm_dmaengine snd_hda_codec_hdmi ablk_helper snd_hda_codec_realtek cryptd snd_hda_codec_generic serio_raw cfg80211 snd_hda_intel snd_hda_codec ir_lirc_codec snd_hda_core lirc_dev snd_hwdep snd_pcm lpc_ich mei_me mei snd_seq_midi shpchp snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer rc_rc6_mce acpi_als nuvoton_cir kfifo_buf rc_core snd industrialio snd_soc_sst_acpi soundcore snd_soc_sst_match i2c_designware_platform 8250_dw i2c_designware_core dw_dmac spi_pxa2xx_platform mac_hid acpi_pad parport_pc ppdev lp parport
[ 175.743509] autofs4 i915 e1000e psmouse ptp pps_core xhci_pci ehci_pci ahci xhci_hcd ehci_hcd libahci video sdhci_acpi sdhci i2c_hid hid
[ 175.743560] CPU: 2 PID: 2386 Comm: wtdg_monitor.sh Tainted: G U 4.9.0-rc4-nightly+ #2
[ 175.743581] Hardware name: /NUC5i7RYB, BIOS RYBDWi35.86A.0358.2016.0606.1423 06/06/2016
[ 175.743603] task: ffff88024509ba80 task.stack: ffffc9007bd18000
[ 175.743618] RIP: 0010:[<ffffffffa01af29b>] [<ffffffffa01af29b>] i915_gem_wait_for_idle+0x3b/0x140 [i915]
[ 175.743660] RSP: 0000:ffffc9007bd1b9b8 EFLAGS: 00010297
[ 175.743674] RAX: ffff88024489d248 RBX: 0000000000000000 RCX: 0000000000000000
[ 175.743691] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880244898000
[ 175.743708] RBP: ffffc9007bd1b9f0 R08: 0000000000000000 R09: 0000000000000001
[ 175.743724] R10: 00000028eaf42792 R11: 0000000000000001 R12: dead000000000100
[ 175.743741] R13: dead000000000148 R14: ffffc9007bd1ba5f R15: 0000000000000005
[ 175.743758] FS: 00007f2638330700(0000) GS:ffff880256d00000(0000) knlGS:0000000000000000
[ 175.743777] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 175.743791] CR2: 00007f885c8cea40 CR3: 00000002416b5000 CR4: 00000000003406e0
[ 175.743808] Stack:
[ 175.743816] ffff88024489d248 000000004509ba80 ffff880244898000 ffff88024509ba80
[ 175.743840] 00000000ffff8b69 ffffc9007bd1ba5f ffffc9007bd1ba5e ffffc9007bd1ba28
[ 175.743863] ffffffffa01b661d 00000000ffffffff 0000000000000000 ffff880244898000
[ 175.743886] Call Trace:
[ 175.743906] [<ffffffffa01b661d>] i915_gem_shrinker_lock_uninterruptible.constprop.5+0x5d/0xc0 [i915]
[ 175.743937] [<ffffffffa01b6cd0>] i915_gem_shrinker_oom+0x30/0x1b0 [i915]
[ 175.743955] [<ffffffff8109ca79>] notifier_call_chain+0x49/0x70
[ 175.743971] [<ffffffff8109cd9d>] __blocking_notifier_call_chain+0x4d/0x70
[ 175.743988] [<ffffffff8109cdd6>] blocking_notifier_call_chain+0x16/0x20
[ 175.744005] [<ffffffff811885dc>] out_of_memory+0x22c/0x480
[ 175.744020] [<ffffffff81205542>] __alloc_pages_slowpath+0x851/0x8ec
[ 175.744037] [<ffffffff8118ca51>] __alloc_pages_nodemask+0x2c1/0x310
[ 175.744054] [<ffffffff811d8ea8>] alloc_pages_current+0x88/0x120
[ 175.744070] [<ffffffff811833a4>] __page_cache_alloc+0xb4/0xc0
[ 175.744086] [<ffffffff811865ca>] filemap_fault+0x29a/0x500
[ 175.744101] [<ffffffff81299aa6>] ext4_filemap_fault+0x36/0x50
[ 175.744117] [<ffffffff811b3d4a>] __do_fault+0x6a/0xe0
[ 175.744131] [<ffffffff811b97ee>] handle_mm_fault+0xd0e/0x1330
[ 175.744147] [<ffffffff8106738c>] __do_page_fault+0x23c/0x4d0
[ 175.744162] [<ffffffff81067650>] do_page_fault+0x30/0x80
[ 175.744177] [<ffffffff817ffbe8>] page_fault+0x28/0x30
[ 175.744191] Code: 41 57 41 56 41 55 41 54 53 48 83 ec 10 4c 8b a7 48 52 00 00 89 75 d4 48 89 45 c8 49 39 c4 74 78 4d 8d 6c 24 48 41 bf 05 00 00 00 <49> 8b 5d 00 48 85 db 74 50 8b 83 20 01 00 00 85 c0 74 15 48 8b
[ 175.744320] RIP [<ffffffffa01af29b>] i915_gem_wait_for_idle+0x3b/0x140 [i915]
[ 175.744351] RSP <ffffc9007bd1b9b8>

Fixes: 80b204bce8f2 ("drm/i915: Enable multiple timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161111145809.9701-1-chris@chris-wilson.co.uk


# 0031fb96 04-Nov-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Assorted dev_priv cleanups

A small selection of macros which can only accept dev_priv from
now on and a resulting trickle of fixups.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>


# b42fe9ca 10-Nov-2016 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Split out i915_vma.c

As a side product, had to split two other files;
- i915_gem_fence_reg.h
- i915_gem_object.h (only parts that needed immediate untanglement)

I tried to move code in as big chunks as possible, to make review
easier. i915_vma_compare was moved to a header temporarily.

v2:
- Use i915_gem_fence_reg.{c,h}

v3:
- Rebased

v4:
- Fix building when DEBUG_GEM is enabled by reordering a bit.

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1478861034-30643-1-git-send-email-joonas.lahtinen@linux.intel.com


# 0c40ce13 09-Nov-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Trim the object sg table

At the moment we allocate enough sg table entries assuming we
will not be able to do any coalescing. But since in practice
we most often can, and more so very effectively, this ends up
wasting a lot of memory.

A simple and effective way of trimming the over-allocated
entries is to copy the table over to a new one allocated to the
exact size.

Experiments on my freshly logged and idle desktop (KDE) showed
that by doing this we can save approximately 1 MiB of RAM, or
when running a typical benchmark like gl_manhattan I have
even seen a 6 MiB saving.

More complicated techniques such as only copying the last used
page and freeing the rest are left to the reader.

v2:
* Update commit message.
* Use temporary sg_table on stack. (Chris Wilson)

v3:
* Commit message update.
* Comment added.
* Replace memcpy with copy assignment.
(Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1478704423-7447-1-git-send-email-tvrtko.ursulin@linux.intel.com


# d0da48cf 05-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove chipset flush after cache flush

We always flush the chipset prior to executing with the GPU, so we can
skip the flush during ordinary domain management.

This should help mitigate some of the potential performance regressions,
but likely trivial, from doing the flush unconditionally before execbuf
introduced in commit dcd79934b0dd ("drm/i915: Unconditionally flush any
chipset buffers before execbuf")

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20161106130001.9509-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 54905ab5 07-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Limit Valleyview and earlier to only using mappable scanout

Valleyview appears to be limited to only scanning out from the first 512MiB
of the Global GTT. Lets presume that this behaviour was inherited from the
display block copied from g4x (not Ironlake) and all earlier generations
are similarly affected, though testing suggests different symptoms. For
simplicity, impose that these platforms must scanout from the mappable
region. (For extra simplicity, use HAS_GMCH_DISPLAY even though this
catches Cherryview which does not appear to be limited to the low
aperture for its scanout.)

v2: Use HAS_GMCH_DISPLAY() to more clearly convey my intent about
limiting this workaround to the old style of display engine.

v3: Update changelog to reflect testing by Ville Syrjälä
v4: Include the changes to the comments as well

Reported-by: Luis Botello <luis.botello.ortega@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98036
Fixes: 2efb813d5388 ("drm/i915: Fallback to using unmappable memory for scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.9-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20161107110128.28762-1-chris@chris-wilson.co.uk
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
(cherry picked from commit 767a222e47cc13239d38018887f911fec06169ea)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# c4b8c570 07-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Round tile chunks up for constructing partial VMAs

When we split a large object up into chunks for GTT faulting (because we
can't fit the whole object into the aperture) we have to align our cuts
with the fence registers. Each partial VMA must cover a complete set of
tile rows or the offset into each partial VMA is not aligned with the
whole image. Currently we enforce a minimum size on each partial VMA,
but this minimum size itself was not aligned to the tile row causing
distortion.

Reported-by: Andreas Reis <andreas.reis@gmail.com>
Reported-by: Chris Clayton <chris2553@googlemail.com>
Reported-by: Norbert Preining <preining@logic.at>
Tested-by: Norbert Preining <preining@logic.at>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Fixes: 03af84fe7f48 ("drm/i915: Choose partial chunksize based on tile row size")
Fixes: a61007a83a46 ("drm/i915: Fix partial GGTT faulting") # enabling patch
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98402
Testcase: igt/gem_mmap_gtt/medium-copy-odd
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.9-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20161107105443.27855-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit 0ef723cbceb6dce8116e75d44c5b8679b2eba69a)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 31ab49ab 07-Nov-2016 Imre Deak <imre.deak@intel.com>

drm/i915: Add assert for no pending GPU requests during suspend/resume in LR mode

During resume we will reset the SW/HW tracking for each ring head/tail
pointers and so are not prepared to replay any pending requests (as
opposed to GPU reset time). Add an assert for this both to the suspend
and the resume code.

v2:
- Check for ELSP port idle already during suspend and check !gt.awake
during resume. (Chris)
v3:
- Move the !gt.awake check to i915_gem_resume().
v4:
- s/intel_lr_engines_idle/intel_execlists_idle/ (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-4-git-send-email-imre.deak@intel.com


# 0cb5670b 07-Nov-2016 Imre Deak <imre.deak@intel.com>

drm/i915: Make sure engines are idle during GPU idling in LR mode

We assume that the GPU is idle once receiving the seqno via the last
request's user interrupt. In execlist mode the corresponding context
completed interrupt can be delayed though and until this latter
interrupt arrives we consider the request to be pending on the ELSP
submit port. This can cause a problem during system suspend where this
last request will be seen by the resume code as still pending. Such
pending requests are normally replayed after a GPU reset, but during
resume we reset both SW and HW tracking of the ring head/tail pointers,
so replaying the pending request with its stale tail pointer will leave
the ring in an inconsistent state. A subsequent request submission can
lead then to the GPU executing from uninitialized area in the ring
behind the above stale tail pointer.

Fix this by making sure any pending request on the ELSP port is
completed before suspending. I used a polling wait since the completion
time I measured was <1ms and since normally we only need to wait during
system suspend. GPU idling during runtime suspend is scheduled with a
delay (currently 50-100ms) after the retirement of the last request at
which point the context completed interrupt must have arrived already.

The chance of this bug was increased by

commit 1c777c5d1dcdf8fa0223fcff35fb387b5bb9517a
Author: Imre Deak <imre.deak@intel.com>
Date: Wed Oct 12 17:46:37 2016 +0300

drm/i915/hsw: Fix GPU hang during resume from S3-devices state

but it could happen even without the explicit GPU reset, since we
disable interrupts afterwards during the suspend sequence.

v2:
- Do an unlocked poll-wait first. (Chris)
v3-4:
- s/intel_lr_engines_idle/intel_execlists_idle/ and move
i915.enable_execlists check to the new helper. (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98470
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-3-git-send-email-imre.deak@intel.com


# 93c97dc1 07-Nov-2016 Imre Deak <imre.deak@intel.com>

drm/i915: Avoid early GPU idling due to race with new request

There is a small race where a new request can be submitted and retired
after the idle worker started to run which leads to idling the GPU too
early. Fix this by deferring the idling to the pending instance of the
worker.

This scenario was pointed out by Chris.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-2-git-send-email-imre.deak@intel.com


# 767a222e 07-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Limit Valleyview and earlier to only using mappable scanout

Valleyview appears to be limited to only scanning out from the first 512MiB
of the Global GTT. Lets presume that this behaviour was inherited from the
display block copied from g4x (not Ironlake) and all earlier generations
are similarly affected, though testing suggests different symptoms. For
simplicity, impose that these platforms must scanout from the mappable
region. (For extra simplicity, use HAS_GMCH_DISPLAY even though this
catches Cherryview which does not appear to be limited to the low
aperture for its scanout.)

v2: Use HAS_GMCH_DISPLAY() to more clearly convey my intent about
limiting this workaround to the old style of display engine.

v3: Update changelog to reflect testing by Ville Syrjälä
v4: Include the changes to the comments as well

Reported-by: Luis Botello <luis.botello.ortega@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98036
Fixes: 2efb813d5388 ("drm/i915: Fallback to using unmappable memory for scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.9-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20161107110128.28762-1-chris@chris-wilson.co.uk
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com


# 0ef723cb 07-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Round tile chunks up for constructing partial VMAs

When we split a large object up into chunks for GTT faulting (because we
can't fit the whole object into the aperture) we have to align our cuts
with the fence registers. Each partial VMA must cover a complete set of
tile rows or the offset into each partial VMA is not aligned with the
whole image. Currently we enforce a minimum size on each partial VMA,
but this minimum size itself was not aligned to the tile row causing
distortion.

Reported-by: Andreas Reis <andreas.reis@gmail.com>
Reported-by: Chris Clayton <chris2553@googlemail.com>
Reported-by: Norbert Preining <preining@logic.at>
Tested-by: Norbert Preining <preining@logic.at>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Fixes: 03af84fe7f48 ("drm/i915: Choose partial chunksize based on tile row size")
Fixes: a61007a83a46 ("drm/i915: Fix partial GGTT faulting") # enabling patch
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98402
Testcase: igt/gem_mmap_gtt/medium-copy-odd
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.9-rc1+
Link: http://patchwork.freedesktop.org/patch/msgid/20161107105443.27855-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 2c3a3f44 04-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix pages pin counting around swizzle quirk

commit bc0629a76726 ("drm/i915: Track pages pinned due to swizzling
quirk") fixed one problem, but revealed a whole lot more. The root cause
of the pin count mismatch for the swizzle quirk (for L-shaped memory on
gen3/4) was that we were incrementing the pages_pin_count upon getting
the backing pages but then overwriting the pages_pin_count to set it to
1 afterwards. With a little bit of adjustment to satisfy the GEM_BUG_ON
sanitychecks, the fix is to replace the explicit atomic_set with an
atomic_inc.

v2: Consistently use atomics (not mix atomics and helpers) within the
lowlevel get_pages routines. This makes the atomic operations much
clearer.

Fixes: 1233e2db199d ("drm/i915: Move object backing storage manipulation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161104103001.27643-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# a933568e 02-Nov-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Tidy slab cache allocations

We can use the preferred KMEM_CACHE helper for brevity.

Also simplifiy error unwind by only setting the ENOMEM
error code once.

v2: Add forgotten changes. (Joonas Lahtinen)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1478099699-28652-1-git-send-email-tvrtko.ursulin@linux.intel.com


# 56cea323 01-Nov-2016 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Unify global_list into global_link

$ sed -i -r 's/\bglobal_list\b/global_link/g' *.c *.h

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1478081764-8058-1-git-send-email-joonas.lahtinen@linux.intel.com


# 3599a91c 01-Nov-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Allow shrinking of userptr objects once again

Commit 1bec9b0bda3d ("drm/i915/shrinker: Only shmemfs objects
are backed by swap") stopped considering the userptr objects
in shrinker callbacks.

Restore that so idle userptr objects can be discarded in order
to free up memory.

One change further to what was introduced in 1bec9b0bda3d is
to start considering userptr objects in oom but that should
also be a correct thing to do.

v2: Introduce I915_GEM_OBJECT_IS_SHRINKABLE. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 1bec9b0bda3d ("drm/i915/shrinker: Only shmemfs objects are backed by swap")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://patchwork.freedesktop.org/patch/msgid/1478011450-6634-1-git-send-email-tvrtko.ursulin@linux.intel.com


# a26e5239 31-Oct-2016 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Pass dev_priv to IS_BROADWATER/IS_CRESTLINE

Unify our approach to things by passing around dev_priv instead of dev.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477946245-14134-21-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# 548625ee 31-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Improve lockdep tracking for obj->mm.lock

The shrinker may appear to recurse into obj->mm.lock as the shrinker may
be called from a direct reclaim path whilst handling get_pages. We
filter out recursing on the same obj->mm.lock by inspecting
obj->mm.pages, but we do want to take the lock on a second object in
order to reap their pages. lockdep spots the recursion on the same
lockclass and needs annotation to avoid a false positive. To keep the
two paths distinct, create an enum to indicate which subclass of
obj->mm.lock we are using. This removes the false positive and avoids
masking real bugs.

Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101121134.27504-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# db6c2b41 01-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Store the vma in an rbtree under the object

With full-ppgtt one of the main bottlenecks is the lookup of the VMA
underneath the object. For execbuf there is merit in having a very fast
direct lookup of ctx:handle to the vma using a hashtree, but that still
leaves a large number of other lookups. One way to speed up the lookup
would be to use a rhashtable, but that requires extra allocations and
may exhibit poor worse case behaviour. An alternative is to use an
embedded rbtree, i.e. no extra allocations and deterministic behaviour,
but at the slight cost of O(lgN) lookups (instead of O(1) for
rhashtable). The major of such tree will be very shallow and so not much
slower, and still scales much, much better than the current unsorted
list.

v2: Bump vma_compare() to return a long, as we return the result of
comparing two pointers.

References: https://bugs.freedesktop.org/show_bug.cgi?id=87726
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101115400.15647-1-chris@chris-wilson.co.uk


# bc0629a7 01-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track pages pinned due to swizzling quirk

If we have a tiled object and an unknown CPU swizzle pattern, we pin the
pages to prevent the object from being swapped out (and us corrupting
the contents as we do not know the access pattern and so cannot convert
it to linear and back to tiled on reuse). This requires us to remember
to drop the extra pinning when freeing the object, or else we trigger
warnings about the pin leak. In commit fbbd37b36fa5 ("drm/i915: Move
object release to a freelist + worker"), the object free path was
deferred to a worker, but the unpinning of the quirk, along with marking
the object as reclaimable, was left on the immediate path (so that if
required we could reclaim the pages under memory pressure as early as
possible). However, this split introduced a bug where the pages were no
longer being unpinned if they were marked as unneeded.

[ 231.800401] WARNING: CPU: 1 PID: 90 at drivers/gpu/drm/i915/i915_gem.c:4275 __i915_gem_free_objects+0x326/0x3c0 [i915]
[ 231.800403] WARN_ON(i915_gem_object_has_pinned_pages(obj))
[ 231.800405] Modules linked in:
[ 231.800406] snd_hda_intel i915 snd_hda_codec_generic mei_me snd_hda_codec coretemp snd_hwdep mei lpc_ich snd_hda_core snd_pcm e1000e ptp pps_core [last unloaded: i915]
[ 231.800426] CPU: 1 PID: 90 Comm: kworker/1:4 Tainted: G U 4.9.0-rc2-CI-CI_DRM_1780+ #1
[ 231.800428] Hardware name: LENOVO 7465CTO/7465CTO, BIOS 6DET44WW (2.08 ) 04/22/2009
[ 231.800456] Workqueue: events __i915_gem_free_work [i915]
[ 231.800459] ffffc9000034fc80 ffffffff8142dd65 ffffc9000034fcd0 0000000000000000
[ 231.800465] ffffc9000034fcc0 ffffffff8107e4e6 000010b300000001 0000000000001000
[ 231.800469] ffff88011d3db740 ffff880130ef0000 0000000000000000 ffff880130ef5ea0
[ 231.800474] Call Trace:
[ 231.800479] [<ffffffff8142dd65>] dump_stack+0x67/0x92
[ 231.800484] [<ffffffff8107e4e6>] __warn+0xc6/0xe0
[ 231.800487] [<ffffffff8107e54a>] warn_slowpath_fmt+0x4a/0x50
[ 231.800491] [<ffffffff811d12ac>] ? kmem_cache_free+0x2dc/0x340
[ 231.800520] [<ffffffffa009ef36>] __i915_gem_free_objects+0x326/0x3c0 [i915]
[ 231.800548] [<ffffffffa009effe>] __i915_gem_free_work+0x2e/0x50 [i915]
[ 231.800552] [<ffffffff8109c27c>] process_one_work+0x1ec/0x6b0
[ 231.800555] [<ffffffff8109c1f6>] ? process_one_work+0x166/0x6b0
[ 231.800558] [<ffffffff8109c789>] worker_thread+0x49/0x490
[ 231.800561] [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0
[ 231.800563] [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0
[ 231.800566] [<ffffffff810a2aab>] kthread+0xeb/0x110
[ 231.800569] [<ffffffff810a29c0>] ? kthread_park+0x60/0x60
[ 231.800573] [<ffffffff818164a7>] ret_from_fork+0x27/0x40

Moving to a separate flag for tracking the quirked pin is overkill for
the bug (since we only have to interchange the two tests in
i915_gem_free_object) but it does reduce a complicated test on all
objects and provide a sanitycheck for uncommon code paths.

Fixes: fbbd37b36fa5 ("drm/i915: Move object release to a freelist + worker")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101100317.11129-2-chris@chris-wilson.co.uk


# cb399eab 01-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid accessing request->timeline outside of its lifetime

Whilst waiting on a request, we may do so without holding any locks or
any guards beyond a reference to the request. In order to avoid taking
locks within request deallocation, we drop references to its timeline
(via the context and ppgtt) upon retirement. We should avoid chasing
such pointers outside of their control, in particular we inspect the
request->timeline to see if we may restore the RPS waitboost for a
client. If we instead look at the engine->timeline, we will have similar
behaviour on both full-ppgtt and !full-ppgtt systems and reduce the
amount of reward we give towards stalling clients (i.e. only if the
client stalls and the GPU is uncontended does it reclaim its boost).
This restores behaviour back to pre-timelines, whilst fixing:

[ 645.078485] BUG: KASAN: use-after-free in i915_gem_object_wait_fence+0x1ee/0x2e0 at addr ffff8802335643a0
[ 645.078577] Read of size 4 by task gem_exec_schedu/28408
[ 645.078638] CPU: 1 PID: 28408 Comm: gem_exec_schedu Not tainted 4.9.0-rc2+ #64
[ 645.078724] Hardware name: / , BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
[ 645.078816] ffff88022daef9a0 ffffffff8143d059 ffff880235402a80 ffff880233564200
[ 645.078998] ffff88022daef9c8 ffffffff81229c5c ffff88022daefa48 ffff880233564200
[ 645.079172] ffff880235402a80 ffff88022daefa38 ffffffff81229ef0 000000008110a796
[ 645.079345] Call Trace:
[ 645.079404] [<ffffffff8143d059>] dump_stack+0x68/0x9f
[ 645.079467] [<ffffffff81229c5c>] kasan_object_err+0x1c/0x70
[ 645.079534] [<ffffffff81229ef0>] kasan_report_error+0x1f0/0x4b0
[ 645.079601] [<ffffffff8122a244>] kasan_report+0x34/0x40
[ 645.079676] [<ffffffff81634f5e>] ? i915_gem_object_wait_fence+0x1ee/0x2e0
[ 645.079741] [<ffffffff81229951>] __asan_load4+0x61/0x80
[ 645.079807] [<ffffffff81634f5e>] i915_gem_object_wait_fence+0x1ee/0x2e0
[ 645.079876] [<ffffffff816364bf>] i915_gem_object_wait+0x19f/0x590
[ 645.079944] [<ffffffff81636320>] ? i915_gem_object_wait_priority+0x500/0x500
[ 645.080016] [<ffffffff8110fb30>] ? debug_show_all_locks+0x1e0/0x1e0
[ 645.080084] [<ffffffff8110abdc>] ? check_chain_key+0x14c/0x210
[ 645.080157] [<ffffffff8110a796>] ? __lock_is_held+0x46/0xc0
[ 645.080226] [<ffffffff8163bc61>] ? i915_gem_set_domain_ioctl+0x141/0x690
[ 645.080296] [<ffffffff8163bcc2>] i915_gem_set_domain_ioctl+0x1a2/0x690
[ 645.080366] [<ffffffff811f8f85>] ? __might_fault+0x75/0xe0
[ 645.080433] [<ffffffff815a55f7>] drm_ioctl+0x327/0x640
[ 645.080508] [<ffffffff8163bb20>] ? i915_gem_obj_prepare_shmem_write+0x3a0/0x3a0
[ 645.080603] [<ffffffff815a52d0>] ? drm_ioctl_permit+0x120/0x120
[ 645.080670] [<ffffffff8110abdc>] ? check_chain_key+0x14c/0x210
[ 645.080738] [<ffffffff81275717>] do_vfs_ioctl+0x127/0xa20
[ 645.080804] [<ffffffff8120268c>] ? do_mmap+0x47c/0x580
[ 645.080871] [<ffffffff811da567>] ? vm_mmap_pgoff+0x117/0x140
[ 645.080938] [<ffffffff812755f0>] ? ioctl_preallocate+0x150/0x150
[ 645.081011] [<ffffffff81108c53>] ? up_write+0x23/0x50
[ 645.081078] [<ffffffff811da567>] ? vm_mmap_pgoff+0x117/0x140
[ 645.081145] [<ffffffff811da450>] ? vma_is_stack_for_current+0x90/0x90
[ 645.081214] [<ffffffff8110d853>] ? mark_held_locks+0x23/0xc0
[ 645.082030] [<ffffffff81288408>] ? __fget+0x168/0x250
[ 645.082106] [<ffffffff819ad517>] ? entry_SYSCALL_64_fastpath+0x5/0xb1
[ 645.082176] [<ffffffff81288592>] ? __fget_light+0xa2/0xc0
[ 645.082242] [<ffffffff8127604c>] SyS_ioctl+0x3c/0x70
[ 645.082309] [<ffffffff819ad52e>] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 645.082374] Object at ffff880233564200, in cache kmalloc-8192 size: 8192
[ 645.082431] Allocated:
[ 645.082480] PID = 28408
[ 645.082535] [ 645.082566] [<ffffffff8103ae66>] save_stack_trace+0x16/0x20
[ 645.082623] [ 645.082656] [<ffffffff81228b06>] save_stack+0x46/0xd0
[ 645.082716] [ 645.082756] [<ffffffff812292fd>] kasan_kmalloc+0xad/0xe0
[ 645.082817] [ 645.082848] [<ffffffff81631752>] i915_ppgtt_create+0x52/0x220
[ 645.082908] [ 645.082941] [<ffffffff8161db96>] i915_gem_create_context+0x396/0x560
[ 645.083027] [ 645.083059] [<ffffffff8161f857>] i915_gem_context_create_ioctl+0x97/0xf0
[ 645.083152] [ 645.083183] [<ffffffff815a55f7>] drm_ioctl+0x327/0x640
[ 645.083243] [ 645.083274] [<ffffffff81275717>] do_vfs_ioctl+0x127/0xa20
[ 645.083334] [ 645.083372] [<ffffffff8127604c>] SyS_ioctl+0x3c/0x70
[ 645.083432] [ 645.083464] [<ffffffff819ad52e>] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 645.083551] Freed:
[ 645.083599] PID = 27629
[ 645.083648] [ 645.083676] [<ffffffff8103ae66>] save_stack_trace+0x16/0x20
[ 645.083738] [ 645.083770] [<ffffffff81228b06>] save_stack+0x46/0xd0
[ 645.083830] [ 645.083862] [<ffffffff81229203>] kasan_slab_free+0x73/0xc0
[ 645.083922] [ 645.083961] [<ffffffff812279c9>] kfree+0xa9/0x170
[ 645.084021] [ 645.084053] [<ffffffff81629f60>] i915_ppgtt_release+0x100/0x180
[ 645.084139] [ 645.084171] [<ffffffff8161d414>] i915_gem_context_free+0x1b4/0x230
[ 645.084257] [ 645.084288] [<ffffffff816537b2>] intel_lr_context_unpin+0x192/0x230
[ 645.084380] [ 645.084413] [<ffffffff81645250>] i915_gem_request_retire+0x620/0x630
[ 645.084500] [ 645.085226] [<ffffffff816473d1>] i915_gem_retire_requests+0x181/0x280
[ 645.085313] [ 645.085352] [<ffffffff816352ba>] i915_gem_retire_work_handler+0xca/0xe0
[ 645.085440] [ 645.085471] [<ffffffff810c725b>] process_one_work+0x4fb/0x920
[ 645.085532] [ 645.085562] [<ffffffff810c770d>] worker_thread+0x8d/0x840
[ 645.085622] [ 645.085653] [<ffffffff810d21e5>] kthread+0x185/0x1b0
[ 645.085718] [ 645.085750] [<ffffffff819ad7a7>] ret_from_fork+0x27/0x40
[ 645.085811] Memory state around the buggy address:
[ 645.085869] ffff880233564280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 645.085956] ffff880233564300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 645.086053] >ffff880233564380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 645.086138] ^
[ 645.086193] ffff880233564400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 645.086283] ffff880233564480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

v2: Add a comment to document the hint like nature of
intel_engine_last_submit()

Fixes: 73cb97010d4f ("drm/i915: Combine seqno + tracking into a global timeline struct")
Fixes: 80b204bce8f2 ("drm/i915: Enable multiple timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101100317.11129-1-chris@chris-wilson.co.uk


# 7d5d59e5 01-Nov-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use the full hammer when shutting down the rcu tasks

To flush all call_rcu() tasks (here from i915_gem_free_object()) we need
to call rcu_barrier() (not synchronize_rcu()). If we don't then we may
still have objects being freed as we continue to teardown the driver -
in particular, the recently released rings may race with the memory
manager shutdown resulting in sporadic:

[ 142.217186] WARNING: CPU: 7 PID: 6185 at drivers/gpu/drm/drm_mm.c:932 drm_mm_takedown+0x2e/0x40
[ 142.217187] Memory manager not clean during takedown.
[ 142.217187] Modules linked in: i915(-) x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel lpc_ich snd_hda_codec_realtek snd_hda_codec_generic mei_me mei snd_hda_codec_hdmi snd_hda_codec snd_hwdep snd_hda_core snd_pcm e1000e ptp pps_core [last unloaded: snd_hda_intel]
[ 142.217199] CPU: 7 PID: 6185 Comm: rmmod Not tainted 4.9.0-rc2-CI-Trybot_242+ #1
[ 142.217199] Hardware name: LENOVO 10AGS00601/SHARKBAY, BIOS FBKT34AUS 04/24/2013
[ 142.217200] ffffc90002ecfce0 ffffffff8142dd65 ffffc90002ecfd30 0000000000000000
[ 142.217202] ffffc90002ecfd20 ffffffff8107e4e6 000003a40778c2a8 ffff880401355c48
[ 142.217204] ffff88040778c2a8 ffffffffa040f3c0 ffffffffa040f4a0 00005621fbf8b1f0
[ 142.217206] Call Trace:
[ 142.217209] [<ffffffff8142dd65>] dump_stack+0x67/0x92
[ 142.217211] [<ffffffff8107e4e6>] __warn+0xc6/0xe0
[ 142.217213] [<ffffffff8107e54a>] warn_slowpath_fmt+0x4a/0x50
[ 142.217214] [<ffffffff81559e3e>] drm_mm_takedown+0x2e/0x40
[ 142.217236] [<ffffffffa035c02a>] i915_gem_cleanup_stolen+0x1a/0x20 [i915]
[ 142.217246] [<ffffffffa034c581>] i915_ggtt_cleanup_hw+0x31/0xb0 [i915]
[ 142.217253] [<ffffffffa0310311>] i915_driver_cleanup_hw+0x31/0x40 [i915]
[ 142.217260] [<ffffffffa0312001>] i915_driver_unload+0x141/0x1a0 [i915]
[ 142.217268] [<ffffffffa031c2c4>] i915_pci_remove+0x14/0x20 [i915]
[ 142.217269] [<ffffffff8147d214>] pci_device_remove+0x34/0xb0
[ 142.217271] [<ffffffff8157b14c>] __device_release_driver+0x9c/0x150
[ 142.217272] [<ffffffff8157bcc6>] driver_detach+0xb6/0xc0
[ 142.217273] [<ffffffff8157abe3>] bus_remove_driver+0x53/0xd0
[ 142.217274] [<ffffffff8157c787>] driver_unregister+0x27/0x50
[ 142.217276] [<ffffffff8147c265>] pci_unregister_driver+0x25/0x70
[ 142.217287] [<ffffffffa03d764c>] i915_exit+0x1a/0x71 [i915]
[ 142.217289] [<ffffffff811136b3>] SyS_delete_module+0x193/0x1e0
[ 142.217291] [<ffffffff818174ae>] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 142.217292] ---[ end trace 6fd164859c154772 ]---
[ 142.217505] [drm:show_leaks] *ERROR* node [6b6b6b6b6b6b6b6b + 6b6b6b6b6b6b6b6b]: inserted at
[<ffffffff81559ff3>] save_stack.isra.1+0x53/0xa0
[<ffffffff8155a98d>] drm_mm_insert_node_in_range_generic+0x2ad/0x360
[<ffffffffa035bf23>] i915_gem_stolen_insert_node_in_range+0x93/0xe0 [i915]
[<ffffffffa035c855>] i915_gem_object_create_stolen+0x75/0xb0 [i915]
[<ffffffffa036a51a>] intel_engine_create_ring+0x9a/0x140 [i915]
[<ffffffffa036a921>] intel_init_ring_buffer+0xf1/0x440 [i915]
[<ffffffffa036be1b>] intel_init_render_ring_buffer+0xab/0x1b0 [i915]
[<ffffffffa0363d08>] intel_engines_init+0xc8/0x210 [i915]
[<ffffffffa0355d7c>] i915_gem_init+0xac/0xf0 [i915]
[<ffffffffa0311454>] i915_driver_load+0x9c4/0x1430 [i915]
[<ffffffffa031c2f8>] i915_pci_probe+0x28/0x40 [i915]
[<ffffffff8147d315>] pci_device_probe+0x85/0xf0
[<ffffffff8157b7ff>] driver_probe_device+0x21f/0x430
[<ffffffff8157baee>] __driver_attach+0xde/0xe0

In particular note that the node was being poisoned as we inspected the
list, a clear indication that the object is being freed as we make the
assertion.

v2: Don't loop, just assert that we do all the work required as that
will be better at detecting further errors.

Fixes: fbbd37b36fa5 ("drm/i915: Move object release to a freelist + worker")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161101084843.3961-1-chris@chris-wilson.co.uk


# 80b204bc 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Enable multiple timelines

With the infrastructure converted over to tracking multiple timelines in
the GEM API whilst preserving the efficiency of using a single execution
timeline internally, we can now assign a separate timeline to every
context with full-ppgtt.

v2: Add a comment to indicate the xfer between timelines upon submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-35-chris@chris-wilson.co.uk


# 28176ef4 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reserve space in the global seqno during request allocation

A restriction on our global seqno is that they cannot wrap, and that we
cannot use the value 0. This allows us to detect when a request has not
yet been submitted, its global seqno is still 0, and ensures that
hardware semaphores are monotonic as required by older hardware. To
meet these restrictions when we defer the assignment of the global
seqno, we must check that we have an available slot in the global seqno
space during request construction. If that test fails, we wait for all
requests to be completed and reset the hardware back to 0.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-33-chris@chris-wilson.co.uk


# 65e4760e 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Introduce a global_seqno for each request

Though we will have multiple timelines, we still have a single timeline
of execution. This we can use to provide an execution and retirement order
of requests. This keeps tracking execution of requests simple, and vital
for preserving a single waiter (i.e. so that we can order the waiters so
that only the earliest to wakeup need be woken). To accomplish this we
distinguish the seqno used to order requests per-context (external) and
that used internally for execution.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-26-chris@chris-wilson.co.uk


# 3033acab 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Queue the idling context switch after all other timelines

Before suspend, we wait for the switch to the kernel context. In order
for all the other context images to be complete upon suspend, that
switch must be the last operation by the GPU (i.e. this idling request
must not overtake any pending requests). To make this request execute last,
we make it depend on every other inflight request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-24-chris@chris-wilson.co.uk


# 73cb9701 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Combine seqno + tracking into a global timeline struct

Our timelines are more than just a seqno. They also provide an ordered
list of requests to be executed. Due to the restriction of handling
individual address spaces, we are limited to a timeline per address
space but we use a fence context per engine within.

Our first step to introducing independent timelines per context (i.e. to
allow each context to have a queue of requests to execute that have a
defined set of dependencies on other requests) is to provide a timeline
abstraction for the global execution queue.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-23-chris@chris-wilson.co.uk


# d07f0e59 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move GEM activity tracking into a common struct reservation_object

In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)

v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.

Caveats:

* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.

* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"

* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).

* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.

* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk


# f0cd5182 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use lockless object free

Having moved the locked phase of freeing an object to a separate worker,
we can now declare to the core that we only need the unlocked variant of
driver->gem_free_object, and can use the simple unreference internally.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-20-chris@chris-wilson.co.uk


# fbbd37b3 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move object release to a freelist + worker

We want to hide the latency of releasing objects and their backing
storage from the submission, so we move the actual free to a worker.
This allows us to switch to struct_mutex freeing of the object in the
next patch.

Furthermore, if we know that the object we are dereferencing remains valid
for the duration of our access, we can forgo the usual synchronisation
barriers and atomic reference counting. To ensure this we defer freeing
an object til after an RCU grace period, such that any lookup of the
object within an RCU read critical section will remain valid until
after we exit that critical section. We also employ this delay for
rate-limiting the serialisation on reallocation - we have to slow down
object creation in order to prevent resource starvation (in particular,
files).

v2: Return early in i915_gem_tiling() ioctl to skip over superfluous
work on error.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-19-chris@chris-wilson.co.uk


# 40e62d5d 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Acquire the backing storage outside of struct_mutex in set-domain

As we can locklessly (well struct_mutex-lessly) acquire the backing
storage, do so in set-domain-ioctl to reduce the contention on the
struct_mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-18-chris@chris-wilson.co.uk


# fe115628 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Implement pwrite without struct-mutex

We only need struct_mutex within pwrite for a brief window where we need
to serialise with rendering and control our cache domains. Elsewhere we
can rely on the backing storage being pinned, and forgive userspace any
races against us.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-17-chris@chris-wilson.co.uk


# bb6dc8d9 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Implement pread without struct-mutex

We only need struct_mutex within pread for a brief window where we need
to serialise with rendering and control our cache domains. Elsewhere we
can rely on the backing storage being pinned, and forgive userspace any
races against us.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-16-chris@chris-wilson.co.uk


# 1233e2db 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move object backing storage manipulation to its own locking

Break the allocation of the backing storage away from struct_mutex into
a per-object lock. This allows parallel page allocation, provided we can
do so outside of struct_mutex (i.e. set-domain-ioctl, pwrite, GTT
fault), i.e. before execbuf! The increased cost of the atomic counters
are hidden behind i915_vma_pin() for the typical case of execbuf, i.e.
as the object is typically bound between execbufs, the page_pin_count is
static. The cost will be felt around set-domain and pwrite, but offset
by the improvement from reduced struct_mutex contention.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-14-chris@chris-wilson.co.uk


# 03ac84f1 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pass around sg_table to get_pages/put_pages backend

The plan is to move obj->pages out from under the struct_mutex into its
own per-object lock. We need to prune any assumption of the struct_mutex
from the get_pages/put_pages backends, and to make it easier we pass
around the sg_table to operate on rather than indirectly via the obj.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-13-chris@chris-wilson.co.uk


# a4f5ea64 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refactor object page API

The plan is to make obtaining the backing storage for the object avoid
struct_mutex (i.e. use its own locking). The first step is to update the
API so that normal users only call pin/unpin whilst working on the
backing storage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-12-chris@chris-wilson.co.uk


# 96d77634 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use a radixtree for random access to the object's backing storage

A while ago we switched from a contiguous array of pages into an sglist,
for that was both more convenient for mapping to hardware and avoided
the requirement for a vmalloc array of pages on every object. However,
certain GEM API calls (like pwrite, pread as well as performing
relocations) do desire access to individual struct pages. A quick hack
was to introduce a cache of the last access such that finding the
following page was quick - this works so long as the caller desired
sequential access. Walking backwards, or multiple callers, still hits a
slow linear search for each page. One solution is to store each
successful lookup in a radix tree.

v2: Rewrite building the radixtree for clarity, hopefully.

v3: Rearrange execbuf to avoid calling i915_gem_object_get_sg() from
within an atomic section and so relax the allocation context to a simple
GFP_KERNEL and mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-10-chris@chris-wilson.co.uk


# 4c7d62c6 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Markup GEM API with lockdep asserts

Add lockdep_assert_held(struct_mutex) to the API preamble of the
internal GEM interfaces.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-9-chris@chris-wilson.co.uk


# e95433c7 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rearrange i915_wait_request() accounting with callers

Our low-level wait routine has evolved from our generic wait interface
that handled unlocked, RPS boosting, waits with time tracking. If we
push our GEM fence tracking to use reservation_objects (required for
handling multiple timelines), we lose the ability to pass the required
information down to i915_wait_request(). However, if we push the extra
functionality from i915_wait_request() to the individual callsites
(i915_gem_object_wait_rendering and i915_gem_wait_ioctl) that make use
of those extras, we can both simplify our low level wait and prepare for
extending the GEM interface for use of reservation_objects.

v2: Rewrite i915_wait_request() kerneldocs

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-4-chris@chris-wilson.co.uk


# f8a7fde4 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Defer active reference until required

We only need the active reference to keep the object alive after the
handle has been deleted (so as to prevent a synchronous gem_close). Why
then pay the price of a kref on every execbuf when we can insert that
final active ref just in time for the handle deletion?

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-6-chris@chris-wilson.co.uk


# c92ac094 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove superfluous wait_for_error() from throttle-ioctl

The throttle-ioctl never touches the struct_mutex. It does, however, as
part of its ABI report whether the hardware is terminally wedged. For
that purposes, it only has to report the current state and not incur the
cost of checking/waiting every invocation, as we do not have to wait for
a reset before waiting on a request to ensure completion (that is baked
into the wait request implementation).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-3-chris@chris-wilson.co.uk


# b52992c0 28-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Support asynchronous waits on struct fence from i915_gem_request

We will need to wait on DMA completion (as signaled via struct fence)
before executing our i915_gem_request. Therefore we want to expose a
method for adding the await on the fence itself to the request.

v2: Add a comment detailing a failure to handle a signal-on-any
fence-array.
v3: Pretend that magic numbers don't exist.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-1-chris@chris-wilson.co.uk


# 7e9b3f95 25-Oct-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Remove two invalid warns

Objects can have multiple VMAs used for display in which
case assertion that objects must not be pinned for display
more times than the current VMA is incorrect.

v2: Commit message update. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 058d88c4330f ("drm/i915: Track pinned VMA")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1477413635-3876-1-git-send-email-tvrtko.ursulin@linux.intel.com
(cherry picked from commit 3299e7e43484a85eeb5c7ec09958bff05c9d0543)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 6d9deb91 25-Oct-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Rotated view does not need a fence

We do not need to set up a fence for the rotated view.

Display does not need it and no one can access it.

v2: Move code to __i915_vma_set_map_and_fenceable. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 05a20d098db1 ("drm/i915: Move map-and-fenceable tracking to the VMA")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit 07ee2bce6a516e0218dba22581803cb8f11bcf82)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 3299e7e4 25-Oct-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Remove two invalid warns

Objects can have multiple VMAs used for display in which
case assertion that objects must not be pinned for display
more times than the current VMA is incorrect.

v2: Commit message update. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 058d88c4330f ("drm/i915: Track pinned VMA")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1477413635-3876-1-git-send-email-tvrtko.ursulin@linux.intel.com


# 07ee2bce 25-Oct-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Rotated view does not need a fence

We do not need to set up a fence for the rotated view.

Display does not need it and no one can access it.

v2: Move code to __i915_vma_set_map_and_fenceable. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 05a20d098db1 ("drm/i915: Move map-and-fenceable tracking to the VMA")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# de867c20 25-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Include the kernel uptime in the error state

As well as knowing when the error occurred, it is more interesting to me
to know how long after booting the error occurred, and for good measure
record the time since last hw initialisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161025121602.1457-1-chris@chris-wilson.co.uk


# 7c108fd8 24-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move fence cancellation to runtime suspend

At the moment, we have dependency on the RPM as a barrier itself in both
i915_gem_release_all_mmaps() and i915_gem_restore_fences().
i915_gem_restore_fences() is also called along !runtime pm paths, but we
can move the markup of lost fences alongside releasing the mmaps into a
common i915_gem_runtime_suspend(). This has the advantage of locating
all the tricky barrier dependencies into one location.

v2: Just mark the fence as invalid (fence->dirty) so that upon waking we
will be sure to clear the fence after use, or restore it to the correct
value before use. This makes sure that if the fence is left intact
across the sleep, we do not leave it pointing to a region of GTT for the
next unsuspecting user.

Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Imre Deak <imre.deak@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-5-chris@chris-wilson.co.uk


# 3594a3e2 24-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove superfluous locking around userfault_list

Now that we have reduced the access to the list to either (a) under the
struct_mutex whilst holding the RPM wakeref (so that concurrent writers to
the list are serialised by struct_mutex) and (b) under the atomic
runtime suspend (which cannot run concurrently with any other accessor due
to the atomic nature of the runtime suspend) we can remove the extra
locking around the list itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-3-chris@chris-wilson.co.uk


# 9c870d03 24-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use RPM as the barrier for controlling user mmap access

We can remove the false coupling between RPM and struct mutex by the
observation that we can use the RPM wakeref as the barrier around user
mmap access. That is as we tear down the user's PTE atomically from
within rpm suspend and then to fault in new PTE requires the rpm
wakeref, means that no user access is possible through those PTE without
RPM being awake. Having made that observation, we can then remove the
presumption of having to take rpm outside of struct_mutex and so allow
fine grained acquisition of a wakeref around hw access rather than
having to remember to acquire the wakeref early on.

v2: Rejig placement of the new intel_runtime_pm_get() to be as tight
as possible around the GTT pread/pwrite.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-2-chris@chris-wilson.co.uk


# 275f039d 24-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move user fault tracking to a separate list

We want to decouple RPM and struct_mutex, but currently RPM has to walk
the list of bound objects and remove userspace mmapping before we
suspend (otherwise userspace may continue to access the GTT whilst it is
powered down). This currently requires the struct_mutex to walk the
bound_list, but if we move that to a separate list and lock we can take
the first step towards removing the struct_mutex.

v2: Split runtime suspend unmapping vs regular unmapping, to make the
locking (and barriers) clearer. Add the object to the userfault_list
prior to inserting the first PTE, the race between add/revoke depends
upon struct_mutex for regular unmappings and rpm for runtime-suspend.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> #v1
Link: http://patchwork.freedesktop.org/patch/msgid/20161024124218.18252-1-chris@chris-wilson.co.uk


# 4ff340f0 18-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Limit the scattergather coalescing to 32bits

The scattergather list uses a 32bit size counter, we should avoid
exceeding it.

v2: Also we should use unsigned int to match sg->length.

Fixes: 871dfbd67d4e ("drm/i915: Allow compaction upto SWIOTLB max segment size")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161018120251.25043-3-chris@chris-wilson.co.uk


# b4bcbe2a 18-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Document our internal limit on object size

In many places, we try to count pages using a 32 bit integer. That
implies if we are asked to create an object larger than 43bits, we will
subtly crash much later. Catch this on the boundary, and add a warning
to remind ourselves later on our exabyte systems.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161018120251.25043-2-chris@chris-wilson.co.uk


# 3ef7f228 18-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Bump object bookkeeping to u64 from size_t

Internally we allow for using more objects than a single process can
allocate, i.e. we allow for a 64bit GPU address space even on a 32bit
system. Using size_t may oveerflow.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161018120251.25043-1-chris@chris-wilson.co.uk


# 4fb84d99 13-Oct-2016 Michał Winiarski <michal.winiarski@intel.com>

drm/i915: Remove unused "valid" parameter from pte_encode

We never used any invalid ptes, those were put in place for
a possibility of doing gpu faults. However our batchbuffers are not
restricted in length, so everything needs to be pointing to something
and thus out-of-bounds is pointing to scratch.

Remove the valid flag as it is always true.

v2: Expand commit msg, patch reorder (Mika)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1476360162-24062-1-git-send-email-michal.winiarski@intel.com


# 5db94019 13-Oct-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Make IS_GEN macros only take dev_priv

Saves 1416 bytes of .rodata strings.

v2: Add parantheses around dev_priv. (Ville Syrjala)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476352990-2504-1-git-send-email-tvrtko.ursulin@linux.intel.com


# 772c2a51 13-Oct-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Make IS_HASWELL only take dev_priv

Saves 2432 bytes of .rodata strings.

v2: Add parantheses around dev_priv. (Ville Syrjala)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>


# 8652744b 13-Oct-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Make IS_BROADWELL only take dev_priv

Saves 1808 bytes of .rodata strings.

v2: Add parantheses around dev_priv. (Ville Syrjala)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>


# fd6b8f43 14-Oct-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Make IS_IVYBRIDGE only take dev_priv

Saves 848 bytes of .rodata strings.

v2: Add parantheses around dev_priv. (Ville Syrjala)
v3: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>


# 50a0bc90 13-Oct-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Make INTEL_DEVID only take dev_priv

Saves 4472 bytes of .rodata strings.

v2: Add parantheses around dev_priv. (Ville Syrjala)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>


# 6e266956 13-Oct-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Make INTEL_PCH_TYPE & co only take dev_priv

This saves 1872 bytes of .rodata strings.

v2:
* Rebase.
* Add parantheses around dev_priv. (Ville Syrjala)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>


# 3b3f1650 13-Oct-2016 Akash Goel <akash.goel@intel.com>

drm/i915: Allocate intel_engine_cs structure only for the enabled engines

With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.

There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).

v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)

v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.

v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().

v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)

v6:
- Rebase.

v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.

v8: Rebase.

v9: Rebase.

v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)

v11: Rebase.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com


# ad16d2ed 13-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Skip unbinding large unmappable global buffers

If the user requests a mappable binding to the global GTT, we will first
unbind an existing mapping if it doesn't match. We will unbind even if
there is no possibility that the object can fit in the mappable
aperture. This may lead to a ping-pong migration of the object, for
example igt/gem_exec_big.

v2: Comment upon the reasoning, or lack thereof!, behind the choice of
magic numbers.

Testcase: igt/gem_exec_big
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20161013085504.30705-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com


# 1c777c5d 12-Oct-2016 Imre Deak <imre.deak@intel.com>

drm/i915/hsw: Fix GPU hang during resume from S3-devices state

Currently resuming on HSW from S3 pm_test/devices state leads to an
unrecoverable GPU hang. Resetting the GPU during suspend fixes this. For
a full S3 cycle this change only means the reset happens earlier (before
reaching S3). For S4 the reset will happen now both during the freeze
and quiesce phases, which is a benefit since it will guarantee that the
GPU is idle before creating and loading the hibernation image.

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1476283597-580-1-git-send-email-imre.deak@intel.com


# 908b1232 11-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Convert open-coded use of vma_pages()

If we want to know how many pages a VMA spans, we can use vma_pages() to
find out. We have one such invocation inside our faulthandler, so
convert it. (We have two other that want the size in bytes rather than
pages, food for future thought.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20161011090656.29554-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>


# 871dfbd6 11-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Allow compaction upto SWIOTLB max segment size

commit 1625e7e549c5 ("drm/i915: make compact dma scatter lists creation
work with SWIOTLB backend") took a heavy handed approach to undo the
scatterlist compaction in the face of SWIOTLB. (The compaction hit a bug
whereby we tried to pass a segment larger than SWIOTLB could handle.) We
can be a little more intelligent and try compacting the scatterlist up
to the maximum SWIOTLB segment size (when using SWIOTLB).

v2: Tidy sg_mark_end() and cpp

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
CC: Imre Deak <imre.deak@intel.com>
CC: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161011082021.14606-2-chris@chris-wilson.co.uk


# 465350d0 11-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove self-harming shrink_all on get_pages_gtt fail

When we notice the system under memory pressure, we try to evict some
driver pages before asking the VM to shrink all caches. As a final step
in that process, we tried to evict everything, including active buffers.
This is harming ourselves, and we can mix shrinking all caches as well
as our residual buffers (after the first pass of trying to shrink just
our own buffers).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161011082021.14606-1-chris@chris-wilson.co.uk


# 105f1a65 10-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix conflict resolution from backmerge of v4.8-rc8 to drm-next

The conflict resolution of v4.8-rc8 backmerge to drm-next pulled back in
a few lines of dead code due to the code movement around
i915_gem_reset(), fix that up.

Fixes: ca09fb9f60b5 ("Merge tag 'v4.8-rc8' into drm-next")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161010125017.23911-1-chris@chris-wilson.co.uk


# ec7ce653 21-Sep-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only shrink the unbound objects during freeze

At the point of creating the hibernation image, the runtime power manage
core is disabled - and using the rpm functions triggers a warn.
i915_gem_shrink_all() tries to unbind objects, which requires device
access and so tries to how an rpm reference triggering a warning:

[ 44.235420] ------------[ cut here ]------------
[ 44.235424] WARNING: CPU: 2 PID: 2199 at drivers/gpu/drm/i915/intel_runtime_pm.c:2688 intel_runtime_pm_get_if_in_use+0xe6/0xf0
[ 44.235426] WARN_ON_ONCE(ret < 0)
[ 44.235445] Modules linked in: ctr ccm arc4 rt2800usb rt2x00usb rt2800lib rt2x00lib crc_ccitt mac80211 cmac cfg80211 btusb rfcomm bnep btrtl btbcm btintel bluetooth dcdbas x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic aesni_intel snd_hda_codec_hdmi aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper cryptd snd_hda_codec hid_multitouch joydev snd_hda_core binfmt_misc i2c_hid serio_raw snd_pcm acpi_pad snd_timer snd i2c_designware_platform 8250_dw nls_iso8859_1 i2c_designware_core lpc_ich mfd_core soundcore usbhid hid psmouse ahci libahci
[ 44.235447] CPU: 2 PID: 2199 Comm: kworker/u8:8 Not tainted 4.8.0-rc5+ #130
[ 44.235447] Hardware name: Dell Inc. XPS 13 9343/0310JH, BIOS A07 11/11/2015
[ 44.235450] Workqueue: events_unbound async_run_entry_fn
[ 44.235453] 0000000000000000 ffff8801b2f7fb98 ffffffff81306c2f ffff8801b2f7fbe8
[ 44.235454] 0000000000000000 ffff8801b2f7fbd8 ffffffff81056c01 00000a801f50ecc0
[ 44.235456] ffff88020ce50000 ffff88020ce59b60 ffffffff81a60b5c ffffffff81414840
[ 44.235456] Call Trace:
[ 44.235459] [<ffffffff81306c2f>] dump_stack+0x4d/0x6e
[ 44.235461] [<ffffffff81056c01>] __warn+0xd1/0xf0
[ 44.235464] [<ffffffff81414840>] ? i915_pm_suspend_late+0x30/0x30
[ 44.235465] [<ffffffff81056c6f>] warn_slowpath_fmt+0x4f/0x60
[ 44.235468] [<ffffffff814e73ce>] ? pm_runtime_get_if_in_use+0x6e/0xa0
[ 44.235469] [<ffffffff81433526>] intel_runtime_pm_get_if_in_use+0xe6/0xf0
[ 44.235471] [<ffffffff81458a26>] i915_gem_shrink+0x306/0x360
[ 44.235473] [<ffffffff81343fd4>] ? pci_platform_power_transition+0x24/0x90
[ 44.235475] [<ffffffff81414840>] ? i915_pm_suspend_late+0x30/0x30
[ 44.235476] [<ffffffff81458dfb>] i915_gem_shrink_all+0x1b/0x30
[ 44.235478] [<ffffffff814560b3>] i915_gem_freeze_late+0x33/0x90
[ 44.235479] [<ffffffff81414877>] i915_pm_freeze_late+0x37/0x40
[ 44.235481] [<ffffffff814e9b8e>] dpm_run_callback+0x4e/0x130
[ 44.235483] [<ffffffff814ea5db>] __device_suspend_late+0xdb/0x1f0
[ 44.235484] [<ffffffff814ea70f>] async_suspend_late+0x1f/0xa0
[ 44.235486] [<ffffffff81077557>] async_run_entry_fn+0x37/0x150
[ 44.235488] [<ffffffff8106f518>] process_one_work+0x148/0x3f0
[ 44.235490] [<ffffffff8106f8eb>] worker_thread+0x12b/0x490
[ 44.235491] [<ffffffff8106f7c0>] ? process_one_work+0x3f0/0x3f0
[ 44.235492] [<ffffffff81074d09>] kthread+0xc9/0xe0
[ 44.235495] [<ffffffff816e257f>] ret_from_fork+0x1f/0x40
[ 44.235496] [<ffffffff81074c40>] ? kthread_park+0x60/0x60
[ 44.235497] ---[ end trace e438706b97c7f132 ]---

Alternatively, to actually shrink everything we have to do so slightly
earlier in the hibernation process.

To keep lockdep silent, we need to take struct_mutex for the shrinker
even though we know that we are the only user during the freeze.

Fixes: 7aab2d534e35 ("drm/i915: Shrink objects prior to hibernation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-2-chris@chris-wilson.co.uk
(cherry picked from commit 6a800eabba34945c2986d70114b41d564bad52a8)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# ac756941 21-Sep-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Restore current RPS state after reset

Following commit 821ed7df6e2a ("drm/i915: Update reset path to fix
incomplete requests") we no longer mark the context as lost on reset as
we keep the requests (and contexts) alive. However, RPS remains reset
and we need to restore the current state to match the in-flight
requests.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97824
Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-1-chris@chris-wilson.co.uk
(cherry picked from commit f2a91d1a6f5960c08f1ca60bd076f4dc020c50c6)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 77c60701 04-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Double check hangcheck.seqno after reset

Check that there was not a late recovery between us declaring the GPU
hung and processing the reset. If the GPU did recover by itself, let the
request remain on the active list and see if it hangs again!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161004201132.21801-5-chris@chris-wilson.co.uk


# 9e60ab03 04-Oct-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Disable irqs across GPU reset

Whilst we reset the GPU, we want to prevent execlists from submitting
new work (which it does via an interrupt handler). To achieve this we
disable the irq (and drain the irq tasklet) around the reset. When we
enable it again afters, the interrupt queue should be empty and we can
reinitialise from a known state without fear of the tasklet running
concurrently.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161004201132.21801-4-chris@chris-wilson.co.uk


# 4bce9f6e 17-Sep-2016 Al Viro <viro@zeniv.linux.org.uk>

get rid of separate multipage fault-in primitives

* the only remaining callers of "short" fault-ins are just as happy with generic
variants (both in lib/iov_iter.c); switch them to multipage variants, kill the
"short" ones
* rename the multipage variants to now available plain ones.
* get rid of compat macro defining iov_iter_fault_in_multipage_readable by
expanding it in its only user.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 6a800eab 21-Sep-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only shrink the unbound objects during freeze

At the point of creating the hibernation image, the runtime power manage
core is disabled - and using the rpm functions triggers a warn.
i915_gem_shrink_all() tries to unbind objects, which requires device
access and so tries to how an rpm reference triggering a warning:

[ 44.235420] ------------[ cut here ]------------
[ 44.235424] WARNING: CPU: 2 PID: 2199 at drivers/gpu/drm/i915/intel_runtime_pm.c:2688 intel_runtime_pm_get_if_in_use+0xe6/0xf0
[ 44.235426] WARN_ON_ONCE(ret < 0)
[ 44.235445] Modules linked in: ctr ccm arc4 rt2800usb rt2x00usb rt2800lib rt2x00lib crc_ccitt mac80211 cmac cfg80211 btusb rfcomm bnep btrtl btbcm btintel bluetooth dcdbas x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic aesni_intel snd_hda_codec_hdmi aes_x86_64 lrw gf128mul snd_hda_intel glue_helper ablk_helper cryptd snd_hda_codec hid_multitouch joydev snd_hda_core binfmt_misc i2c_hid serio_raw snd_pcm acpi_pad snd_timer snd i2c_designware_platform 8250_dw nls_iso8859_1 i2c_designware_core lpc_ich mfd_core soundcore usbhid hid psmouse ahci libahci
[ 44.235447] CPU: 2 PID: 2199 Comm: kworker/u8:8 Not tainted 4.8.0-rc5+ #130
[ 44.235447] Hardware name: Dell Inc. XPS 13 9343/0310JH, BIOS A07 11/11/2015
[ 44.235450] Workqueue: events_unbound async_run_entry_fn
[ 44.235453] 0000000000000000 ffff8801b2f7fb98 ffffffff81306c2f ffff8801b2f7fbe8
[ 44.235454] 0000000000000000 ffff8801b2f7fbd8 ffffffff81056c01 00000a801f50ecc0
[ 44.235456] ffff88020ce50000 ffff88020ce59b60 ffffffff81a60b5c ffffffff81414840
[ 44.235456] Call Trace:
[ 44.235459] [<ffffffff81306c2f>] dump_stack+0x4d/0x6e
[ 44.235461] [<ffffffff81056c01>] __warn+0xd1/0xf0
[ 44.235464] [<ffffffff81414840>] ? i915_pm_suspend_late+0x30/0x30
[ 44.235465] [<ffffffff81056c6f>] warn_slowpath_fmt+0x4f/0x60
[ 44.235468] [<ffffffff814e73ce>] ? pm_runtime_get_if_in_use+0x6e/0xa0
[ 44.235469] [<ffffffff81433526>] intel_runtime_pm_get_if_in_use+0xe6/0xf0
[ 44.235471] [<ffffffff81458a26>] i915_gem_shrink+0x306/0x360
[ 44.235473] [<ffffffff81343fd4>] ? pci_platform_power_transition+0x24/0x90
[ 44.235475] [<ffffffff81414840>] ? i915_pm_suspend_late+0x30/0x30
[ 44.235476] [<ffffffff81458dfb>] i915_gem_shrink_all+0x1b/0x30
[ 44.235478] [<ffffffff814560b3>] i915_gem_freeze_late+0x33/0x90
[ 44.235479] [<ffffffff81414877>] i915_pm_freeze_late+0x37/0x40
[ 44.235481] [<ffffffff814e9b8e>] dpm_run_callback+0x4e/0x130
[ 44.235483] [<ffffffff814ea5db>] __device_suspend_late+0xdb/0x1f0
[ 44.235484] [<ffffffff814ea70f>] async_suspend_late+0x1f/0xa0
[ 44.235486] [<ffffffff81077557>] async_run_entry_fn+0x37/0x150
[ 44.235488] [<ffffffff8106f518>] process_one_work+0x148/0x3f0
[ 44.235490] [<ffffffff8106f8eb>] worker_thread+0x12b/0x490
[ 44.235491] [<ffffffff8106f7c0>] ? process_one_work+0x3f0/0x3f0
[ 44.235492] [<ffffffff81074d09>] kthread+0xc9/0xe0
[ 44.235495] [<ffffffff816e257f>] ret_from_fork+0x1f/0x40
[ 44.235496] [<ffffffff81074c40>] ? kthread_park+0x60/0x60
[ 44.235497] ---[ end trace e438706b97c7f132 ]---

Alternatively, to actually shrink everything we have to do so slightly
earlier in the hibernation process.

To keep lockdep silent, we need to take struct_mutex for the shrinker
even though we know that we are the only user during the freeze.

Fixes: 7aab2d534e35 ("drm/i915: Shrink objects prior to hibernation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-2-chris@chris-wilson.co.uk


# f2a91d1a 21-Sep-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Restore current RPS state after reset

Following commit 821ed7df6e2a ("drm/i915: Update reset path to fix
incomplete requests") we no longer mark the context as lost on reset as
we keep the requests (and contexts) alive. However, RPS remains reset
and we need to restore the current state to match the in-flight
requests.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97824
Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-1-chris@chris-wilson.co.uk


# 7aab2d53 09-Sep-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Shrink objects prior to hibernation

In an attempt to keep the hibernation image as same as possible, let's
try and discard any unwanted pages and our own page arrays.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909190218.16831-1-chris@chris-wilson.co.uk


# a2bc4695 09-Sep-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prepare object synchronisation for asynchronicity

We are about to specialize object synchronisation to enable nonblocking
execbuf submission. First we make a copy of the current object
synchronisation for execbuffer. The general i915_gem_object_sync() will
be removed following the removal of CS flips in the near future.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: John Harrison <john.c.harrison@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-16-chris@chris-wilson.co.uk


# 5590af3e 09-Sep-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drive request submission through fence callbacks

Drive final request submission from a callback from the fence. This way
the request is queued until all dependencies are resolved, at which
point it is handed to the backend for queueing to hardware. At this
point, no dependencies are set on the request, so the callback is
immediate.

A side-effect of imposing a heavier-irqsafe spinlock for execlist
submission is that we lose the softirq enabling after scheduling the
execlists tasklet. To compensate, we manually kickstart the softirq by
disabling and enabling the bh around the fence signaling.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: John Harrison <john.c.harrison@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-14-chris@chris-wilson.co.uk


# 821ed7df 09-Sep-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Update reset path to fix incomplete requests

Update reset path in preparation for engine reset which requires
identification of incomplete requests and associated context and fixing
their state so that engine can resume correctly after reset.

The request that caused the hang will be skipped and head is reset to the
start of breadcrumb. This allows us to resume from where we left-off.
Since this request didn't complete normally we also need to cleanup elsp
queue manually. This is vital if we employ nonblocking request
submission where we may have a web of dependencies upon the hung request
and so advancing the seqno manually is no longer trivial.

ABI: gem_reset_stats / DRM_IOCTL_I915_GET_RESET_STATS

We change the way we count pending batches. Only the active context
involved in the reset is marked as either innocent or guilty, and not
mark the entire world as pending. By inspection this only affects
igt/gem_reset_stats (which assumes implementation details) and not
piglit.

ARB_robustness gives this guide on how we expect the user of this
interface to behave:

* Provide a mechanism for an OpenGL application to learn about
graphics resets that affect the context. When a graphics reset
occurs, the OpenGL context becomes unusable and the application
must create a new context to continue operation. Detecting a
graphics reset happens through an inexpensive query.

And with regards to the actual meaning of the reset values:

Certain events can result in a reset of the GL context. Such a reset
causes all context state to be lost. Recovery from such events
requires recreation of all objects in the affected context. The
current status of the graphics reset state is returned by

enum GetGraphicsResetStatusARB();

The symbolic constant returned indicates if the GL context has been
in a reset state at any point since the last call to
GetGraphicsResetStatusARB. NO_ERROR indicates that the GL context
has not been in a reset state since the last call.
GUILTY_CONTEXT_RESET_ARB indicates that a reset has been detected
that is attributable to the current GL context.
INNOCENT_CONTEXT_RESET_ARB indicates a reset has been detected that
is not attributable to the current GL context.
UNKNOWN_CONTEXT_RESET_ARB indicates a detected graphics reset whose
cause is unknown.

The language here is explicit in that we must mark up the guilty batch,
but is loose enough for us to relax the innocent (i.e. pending)
accounting as only the active batches are involved with the reset.

In the future, we are looking towards single engine resetting (with
minimal locking), where it seems inappropriate to mark the entire world
as innocent since the reset occurred on a different engine. Reducing the
information available means we only have to encounter the pain once, and
also reduces the information leaking from one context to another.

v2: Legacy ringbuffer submission required a reset following hibernation,
or else we restore stale values to the RING_HEAD and walked over
stolen garbage.

v3: GuC requires replaying the requests after a reset.

v4: Restore engine IRQ after reset (so waiters will be woken!)
Rearm hangcheck if resetting with a waiter.

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-13-chris@chris-wilson.co.uk


# 22dd3bb9 09-Sep-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark up all locked waiters

In the next patch we want to handle reset directly by a locked waiter in
order to avoid issues with returning before the reset is handled. To
handle the reset, we must first know whether we hold the struct_mutex.
If we do not hold the struct_mtuex we can not perform the reset, but we do
not block the reset worker either (and so we can just continue to wait for
request completion) - otherwise we must relinquish the mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-10-chris@chris-wilson.co.uk


# ea746f36 09-Sep-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Expand bool interruptible to pass flags to i915_wait_request()

We need finer control over wakeup behaviour during i915_wait_request(),
so expand the current bool interruptible to a bitmask.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-9-chris@chris-wilson.co.uk


# 8af29b0c 09-Sep-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Separate out reset flags from the reset counter

In preparation for introducing a per-engine reset, we can first separate
the mixing of the reset state from the global reset counter.

The loss of atomicity in updating the reset state poses a small problem
for handling the waiters. For requests, this is solved by advancing the
seqno so that a waiter waking up after the reset knows the request is
complete. For pending flips, we still rely on the increment of the
global reset epoch (as well as the reset-in-progress flag) to signify
when the hardware was reset.

The advantage, now that we do not inspect the reset state during reset
itself i.e. we no longer emit requests during reset, is that we can use
the atomic updates of the state flags to ensure that only one reset
worker is active.

v2: Mika spotted that I transformed the i915_gem_wait_for_error() wakeup
into a waiter wakeup.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470414607-32453-6-git-send-email-arun.siluvery@linux.intel.com
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-7-chris@chris-wilson.co.uk


# 70c2a24d 09-Sep-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Simplify ELSP queue request tracking

Emulate HW to track and manage ELSP queue. A set of SW ports are defined
and requests are assigned to these ports before submitting them to HW. This
helps in cleaning up incomplete requests during reset recovery easier
especially after engine reset by decoupling elsp queue management. This
will become more clear in the next patch.

In the engine reset case we want to resume where we left-off after skipping
the incomplete batch which requires checking the elsp queue, removing
element and fixing elsp_submitted counts in some cases. Instead of directly
manipulating the elsp queue from reset path we can examine these ports, fix
up ringbuffer pointers using the incomplete request and restart submissions
again after reset.

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1470414607-32453-3-git-send-email-arun.siluvery@linux.intel.com
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-6-chris@chris-wilson.co.uk


# 6f633402 01-Sep-2016 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Use atomic for dev_priv->mm.bsd_engine_dispatch_index

Use atomic type and operands for dev_priv->mm.bsd_engine_dispatch_index
to avoid one struct_mutex locking scenario.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1472731101-21982-1-git-send-email-joonas.lahtinen@linux.intel.com


# 4cc69075 25-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add I915_PARAM_MMAP_GTT_VERSION to advertise unlimited mmaps

Now that we have working partial VMA and faulting support for all
objects, including fence support, advertise to userspace that it can
take advantage of unlimited GGTT mmaps.

v2: Make room in the kerneldoc for a more detailed explanation of the
limitations of the GTT mmap interface.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20160825180519.11341-1-chris@chris-wilson.co.uk


# c58305af 19-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass

Very old numbers indicate this is a 66% improvement when remapping the
entire object for fence contention - due to the elimination of
track_pfn_insert and its strcmp.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Testcase: igt/gem_fence_upload/performance
Testcase: igt/gem_mmap_gtt
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-6-chris@chris-wilson.co.uk


# f7bbe788 19-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Embed the io-mapping struct inside drm_i915_private

As io_mapping.h now always allocates the struct, we can avoid that
allocation and extra pointer dance by embedding the struct inside
drm_i915_private

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-5-chris@chris-wilson.co.uk


# cd3127d6 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Stop discarding GTT cache-domain on unbind vma

Since commit 43566dedde54 ("drm/i915: Broaden application of
set-domain(GTT)") we allowed objects to be in the GTT domain, but unbound.
Therefore removing the GTT cache domain when removing the GGTT vma is no
longer semantically correct.

An unfortunate side-effect is we lose the wondrously named
i915_gem_object_finish_gtt(), not to be confused with
i915_gem_gtt_finish_object()!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-30-chris@chris-wilson.co.uk


# 383d5823 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Bump the inactive tracking for all VMA accessed

We track the LRU access for eviction and bump the last access for the
user GGTT on set-to-gtt. When we do so we need to not only bump the
primary GGTT VMA but all partials as well. Similarly we want to
bump the last access tracking for when unpinning an object from the
scanout so that they do not get promptly evicted and hopefully remain
available for reuse on the next frame.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-29-chris@chris-wilson.co.uk


# d8923dcf 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track display alignment on VMA

When using the aliasing ppgtt and pageflipping with the shrinker/eviction
active, we note that we often have to rebind the backbuffer before
flipping onto the scanout because it has an invalid alignment. If we
store the worst-case alignment required for a VMA, we can avoid having
to rebind at critical junctures.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-28-chris@chris-wilson.co.uk


# 2efb813d 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fallback to using unmappable memory for scanout

The existing ABI says that scanouts are pinned into the mappable region
so that legacy clients (e.g. old Xorg or plymouthd) can write directly
into the scanout through a GTT mapping. However if the surface does not
fit into the mappable region, we are better off just trying to fit it
anywhere and hoping for the best. (Any userspace that is capable of
using ginormous scanouts is also likely not to rely on pure GTT
updates.) With the partial vma fault support, we are no longer
restricted to only using scanouts that we can pin (though it is still
preferred for performance reasons and for powersaving features like
FBC).

v2: Skip fence pinning when not mappable.
v3: Add a comment to explain the possible ramifications of not being
able to use fences for unmappable scanouts.
v4: Rebase to skip over some local patches
v5: Rebase to defer until after we have unmappable GTT fault support

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-27-chris@chris-wilson.co.uk


# 82118877 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Choose not to evict faultable objects from the GGTT

Often times we do not want to evict mapped objects from the GGTT as
these are quite expensive to teardown and frequently reused (causing an
equally, if not more so, expensive setup). In particular, when faulting
in a new object we want to avoid evicting an active object, or else we
may trigger a page-fault-of-doom as we ping-pong between evicting two
objects.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-26-chris@chris-wilson.co.uk


# 50349247 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop ORIGIN_GTT for untracked GTT writes

If FBC is set on a framebuffer that is unmapped, all GTT faults will be
from a partial mapping. Writes by the user through the partial VMA are
then untracked by the FBC and so we must use the ORIGIN_CPU when flushing
the I915_GEM_DOMAIN_GTT.

v2: Keep ORIGIN_CPU for set-to-domain(.write=CPU)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: "Zanoni, Paulo R" <paulo.r.zanoni@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-25-chris@chris-wilson.co.uk


# aa136d9d 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Convert partial ggtt vma to full ggtt if it spans the entire object

If we want to create a partial vma from a chunk that is the same size as
the object, create a normal ggtt vma instead. The benefit is that it
will match future requests for the normal ggtt.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-24-chris@chris-wilson.co.uk


# a61007a8 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix partial GGTT faulting

We want to always use the partial VMA as a fallback for a failure to
bind the object into the GGTT. This extends the support partial objects
in the GGTT to cover everything, not just objects too large.

v2: Call the partial view, view not partial.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-23-chris@chris-wilson.co.uk


# 03af84fe 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Choose partial chunksize based on tile row size

In order to support setting up fences for partial mappings of an object,
we have to align those mappings with the fence. The minimum chunksize we
choose is at least the size of a single tile row.

v2: Make minimum chunk size a define for later use

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-22-chris@chris-wilson.co.uk


# 49ef5294 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move fence tracking from object to vma

In order to handle tiled partial GTT mmappings, we need to associate the
fence with an individual vma.

v2: A couple of silly drops replaced spotted by Joonas

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-21-chris@chris-wilson.co.uk


# a1e5afbe 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename fence.lru_list to link

Our current practice is to only name the actual list (here
dev_priv->fence_list) using "list", and elements upon that list are
referred to as "link". Further, the lru nature is of the list and not of
the node and including in the name does not disambiguate the link from
anything else.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-20-chris@chris-wilson.co.uk


# 05a20d09 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move map-and-fenceable tracking to the VMA

By moving map-and-fenceable tracking from the object to the VMA, we gain
fine-grained tracking and the ability to track individual fences on the VMA
(subsequent patch).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-16-chris@chris-wilson.co.uk


# b0dc465f 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Tidy up flush cpu/gtt write domains

Since we know the write domain, we can drop the local variable and make
the code look a tiny bit simpler.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-12-chris@chris-wilson.co.uk


# 9764951e 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pin the pages first in shmem prepare read/write

There is an improbable, but not impossible, case that if we leave the
pages unpin as we operate on the object, then somebody via the shrinker
may steal the lock (which lock? right now, it is struct_mutex, THE lock)
and change the cache domains after we have already inspected them.

(Whilst here, avail ourselves of the opportunity to take a couple of
steps to make the two functions look more similar.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-11-chris@chris-wilson.co.uk


# 3b5724d7 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wait for writes through the GTT to land before reading back

If we quickly switch from writing through the GTT to a read of the
physical page directly with the CPU (e.g. performing relocations through
the GTT and then running the command parser), we can observe that the
writes are not visible to the CPU. It is not a coherency problem, as
extensive investigations with clflush have demonstrated, but a mere
timing issue - we have to wait for the GTT to complete it's write before
we start our read from the CPU.

The issue can be illustrated in userspace with:

gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);

for (i = 0; i < OBJECT_SIZE / 64; i++) {
int x = 16*i + (i%16);
gtt[x] = i;
clflush(&cpu[x], sizeof(cpu[x]));
assert(cpu[x] == i);
}

Experimenting with that shows that this behaviour is indeed limited to
recent Atom-class hardware.

Testcase: igt/gem_exec_flush/basic-batch-default-cmd #byt
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-10-chris@chris-wilson.co.uk


# a314d5cb 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Before accessing an object via the cpu, flush GTT writes

If we want to read the pages directly via the CPU, we have to be sure
that we have to flush the writes via the GTT (as the CPU can not see
the address aliasing).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-9-chris@chris-wilson.co.uk


# 43394c7d 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Extract i915_gem_obj_prepare_shmem_write()

This is a companion to i915_gem_obj_prepare_shmem_read() that prepares
the backing storage for direct writes. It first serialises with the GPU,
pins the backing storage and then indicates what clfushes are required in
order for the writes to be coherent.

Whilst here, fix support for ancient CPUs without clflush for which we
cannot do the GTT+clflush tricks.

v2: Add i915_gem_obj_finish_shmem_access() for symmetry

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-8-chris@chris-wilson.co.uk


# 18034584 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fallback to single page pwrite/pread if unable to release fence

If we cannot release the fence (for example if someone is inexplicably
trying to write into a tiled framebuffer that is currently pinned to the
display! *cough* kms_frontbuffer_tracking *cough*) fallback to using the
page-by-page pwrite/pread interface, rather than fail the syscall
entirely.

Since this is triggerable by the user (along pwrite) we have to remove
the WARN_ON(fence->pin_count).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-6-chris@chris-wilson.co.uk


# d243ad82 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark up the GTT flush following WC writes as ORIGIN_CPU

Similarly to invalidating beforehand, if the object is mmapped via
I915_MMAP_WC we cannot track writes through the I915_GEM_DOMAIN_GTT. At
the conclusion of the write, i915_gem_object_flush_gtt_writes() we also
need to treat the origin carefully in case it may have been untracked.

See also commit aeecc9696aa0 ("drm/i915: use ORIGIN_CPU for frontbuffer
invalidation on WC mmaps").

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-5-chris@chris-wilson.co.uk


# b19482d7 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use ORIGIN_CPU for fb invalidation from pwrite

As pwrite does not use the fence for its GTT access, and may even go
through a secondary interface avoiding the main VMA, we cannot treat the
write as automatically invalidated by the hardware and so we require
ORIGIN_CPU frontbufer invalidate/flushes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-4-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4b30cb23 18-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: vfree() no longer ignores the low bits of the address

Since vfree() now likes to WARN when passed a non-page-aligned pointer,
we need to discard the low bits to comply with it.

Fixes: d31d7cb1460c ("drm/i915: Support for creating write combined type vmaps")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-3-chris@chris-wilson.co.uk


# 1255501d 16-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Embrace the race in busy-ioctl

Daniel Vetter proposed a new challenge to the serialisation inside the
busy-ioctl that exposed a flaw that could result in us reporting the
wrong engine as being busy. If the request is reallocated as we test
its busyness and then reassigned to this object by another thread, we
would not notice that the test itself was incorrect.

We are faced with a choice of using __i915_gem_active_get_request_rcu()
to first acquire a reference to the request preventing the race, or to
acknowledge the race and accept the limitations upon the accuracy of the
busy flags. Note that we guarantee that we never falsely report the
object as idle (providing userspace itself doesn't race), and so the
most important use of the busy-ioctl and its guarantees are fulfilled.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471337440-16777-1-git-send-email-chris@chris-wilson.co.uk


# bde13ebd 15-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Introduce i915_ggtt_offset()

This little helper only exists to safely discard the upper unused 32bits
of the general 64-bit VMA address - as we know that all Global GTT
currently are less than 4GiB in size and so that the upper bits must be
zero. In many places, we use a u32 for the global GTT offset and we want
to document where we are discarding the full VMA offset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-28-git-send-email-chris@chris-wilson.co.uk


# 058d88c4 15-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track pinned VMA

Treat the VMA as the primary struct responsible for tracking bindings
into the GPU's VM. That is we want to treat the VMA returned after we
pin an object into the VM as the cookie we hold and eventually release
when unpinning. Doing so eliminates the ambiguity in pinning the object
and then searching for the relevant pin later.

v2: Joonas' stylistic nitpicks, a fun rebase.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-27-git-send-email-chris@chris-wilson.co.uk


# 247177dd 15-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Always set the vma->pages

Previously, we would only set the vma->pages pointer for GGTT entries.
However, if we always set it, we can use it to prettify some code that
may want to access the backing store associated with the VMA (as
assigned to the VMA).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-8-git-send-email-chris@chris-wilson.co.uk


# 02bef8f9 14-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Unbind closed vma for i915_gem_object_unbind()

Closed vma are removed from the obj->vma_list so that they cannot be
found by userspace. However, this means that when forcibly unbinding an
object, we have to wait upon all rendering to that object first in order
for the closed, but active, vma to be reaped and their bindings removed.

Reported-by: Matthew Auld <matthew.auld@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97343
Fixes: aa653a685d81 ("drm/i915: Be more careful when unbinding vma")
Fixes: 8a3b3d576c93 (" drm/i915: Convert non-blocking userptr waits...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471196681-30043-2-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Tested-by: Matthew Auld <matthew.auld@intel.com>


# 35a9611c 14-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Initialize return value for empty i915_gem_object_unbind()

If the obj->vma_list is empty, we immediately return ret. However, we
are doing so having never set it to any value, it should be zero!

Reported-by: Matthew Auld <matthew.auld@intel.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=97343
Fixes: aa653a685d81 ("drm/i915: Be more careful when unbinding vma")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471196681-30043-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.auld@intel.com>


# d31d7cb1 11-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Support for creating write combined type vmaps

vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.

We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT

v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)

v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)

v4:
- Rename the enums and clean up the pin_map function. (Chris)

v5: Drop the VM_NO_GUARD, minor cosmetics.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk


# 2ca17b87 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add missing rpm wakelock to GGTT pread

Joonas spotted a discrepancy between the pwrite and pread ioctls, in
that pwrite takes the rpm wakelock around its GGTT access, The wakelock
is required in order for the GTT to function. In disregard for the
current convention, we take the rpm wakelock around the access itself
rather than around the struct_mutex as the nesting is not strictly
required and such ordering will one day be fixed by explicitly noting
the barrier dependencies between the GGTT and rpm.

Fixes: b50a53715f09 ("drm/i915: Support for pread/pwrite ...")
Reported-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: drm-intel-fixes@lists.freedesktop.org
Link: http://patchwork.freedesktop.org/patch/msgid/1470298193-21765-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit 1dd5b6f2020389e75bb3d269c038497f065e68c9)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# fae82e59 16-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Handle ENOSPC after failing to insert a mappable node

Even after adding individual page support for GTT mmaping, we can still
fail to find any space within the mappable region, and
drm_mm_insert_node() will then report ENOSPC. We have to then handle
this error by using the shmem access to the pages.

Fixes: b50a53715f09 ("drm/i915: Support for pread/pwrite ... objects")
Testcase: igt/gem_concurrent_blit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com
Link: http://patchwork.freedesktop.org/patch/msgid/1468690956-23480-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit d1054ee492a89b134fb0ac527b0714c277ae9c0f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# b06bc7ec 13-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush GT idle status upon reset

Upon resetting the GPU, we force the engines to be idle by clearing
their request lists. However, I neglected to clear the GT active status
and so the next request following the reset was not marking the device
as busy again. (We had to wait until any outstanding retire worker
finally ran and cleared the active status.)

Fixes: 67d97da34917 ("drm/i915: Only start retire worker when idle")
Testcase: igt/pm_rps/reset
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468397438-21226-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
(cherry picked from commit b913b33c43db849778f044d4b9e74b167898a9bc)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 83348ba8 09-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs

In commit 2529d57050af ("drm/i915: Drop racy markup of missed-irqs from
idle-worker") the racy detection of missed interrupts was removed when
we went idle. This however opened up the issue that the stuck waiters
were not being reported, causing a test case failure. If we move the
stuck waiter detection out of hangcheck and into the breadcrumb
mechanims (i.e. the waiter) itself, we can avoid this issue entirely.
This leaves hangcheck looking for a stuck GPU (inspecting for request
advancement and HEAD motion), and breadcrumbs looking for a stuck
waiter - hopefully make both easier to understand by their segregation.

v2: Reduce the error message as we now run independently of hangcheck,
and the hanging batch used by igt also counts as a stuck waiter causing
extra warnings in dmesg.
v3: Move the breadcrumb's hangcheck kickstart to the first missed wait.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97104
Fixes: 2529d57050af (waiter"drm/i915: Drop racy markup of missed-irqs...")
Testcase: igt/drv_missed_irq
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470761272-1245-2-git-send-email-chris@chris-wilson.co.uk


# 70cb472c 09-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Always mark the writer as also a read for busy ioctl

One of the few guarantees we want the busy ioctl to provide is that the
reported busy writer is included in the set of busy read engines. This
should be provided by the ordering of setting and retiring the active
trackers, but we can do better by explicitly setting the busy read
engine flag for the last writer.

v2: More comments inside __busy_write_id() to explain why both fields
are set.

Fixes: 3fdc13c7a3cb ("drm/i915: Remove (struct_mutex) locking for busy-ioctl")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470762505-12799-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# edf6b76f 09-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add smp_rmb() to busy ioctl's RCU dance

In the debate as to whether the second read of active->request is
ordered after the dependent reads of the first read of active->request,
just give in and throw a smp_rmb() in there so that ordering of loads is
assured.

v2: Explain the manual smp_rmb()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470731014-6894-1-git-send-email-chris@chris-wilson.co.uk


# 87b723a1 09-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Don't check for idleness before retiring after a GPU hang

When we force the cleanup after a GPU hang, we want to retire all
requests, or else we may leak them if truly wedged (and the GPU never
advances again). Converting to the active request helpers had the issue
of doing the check against busyness before reporting the request, so if
we claim the GPU had hung but this engine hadn't we could potential skip
the request cleanup - triggering the self-check BUG.

Fixes: dcff85c8443e ("drm/i915: Enable i915_gem_wait_for_idle() ...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470728222-10243-3-git-send-email-chris@chris-wilson.co.uk


# 3e510a8e 05-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Repack fence tiling mode and stride into a single integer

In the previous commit, we moved the obj->tiling_mode out of a bitfield
and into its own integer so that we could safely use READ_ONCE(). Let us
now repair some of that damage by sharing the tiling_mode with its
companion, the fence stride.

v2: New magic

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-18-git-send-email-chris@chris-wilson.co.uk


# e883d735 05-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove pinned check from madvise ioctl

We don't need to incur the overhead of checking whether the object is
pinned prior to changing its madvise. If the object is pinned, the
madvise will not take effect until it is unpinned and so we cannot free
the pages being pointed at by hardware. Marking a pinned object with
allocated pages as DONTNEED will not trigger any undue warnings. The check
is therefore superfluous, and by removing it we can remove a linear walk
over all the vma the object has.

Still despite it being an overzealous check, that error code is part of
the current ABI and so we must proceed with caution.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-15-git-send-email-chris@chris-wilson.co.uk


# c21724cc 05-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reduce locking inside swfinish ioctl

We only need to take the struct_mutex if the object is pinned to the
display engine and so requires checking for clflush. (The race with
userspace pinning the object to a framebuffer is irrelevant.)

v2: Use access once for compiler hints (or not as it is a bitfield)
v3: READ_ONCE, obj->pin_display is not a bitfield anymore
v4: Don't be creative with goto.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-14-git-send-email-chris@chris-wilson.co.uk


# 3fdc13c7 05-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove (struct_mutex) locking for busy-ioctl

By applying the same logic as for wait-ioctl, we can query whether a
request has completed without holding struct_mutex. The biggest impact
system-wide is removing the flush_active and the contention that causes.

Testcase: igt/gem_busy
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-13-git-send-email-chris@chris-wilson.co.uk


# 033d549b 05-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove (struct_mutex) locking for wait-ioctl

With a bit of care (and leniency) we can iterate over the object and
wait for previous rendering to complete with judicial use of atomic
reference counting. The ABI requires us to ensure that an active object
is eventually flushed (like the busy-ioctl) which is guaranteed by our
management of requests (i.e. everything that is submitted to hardware is
flushed in the same request). All we have to do is ensure that we can
detect when the requests are complete for reporting when the object is
idle (without triggering ETIME), locklessly - this is handled by
i915_gem_active_wait_unlocked().

The impact of this is actually quite small - the return to userspace
following the wait was already lockless and so we don't see much gain in
latency improvement upon completing the wait. What we do achieve here is
completing an already finished wait without hitting the struct_mutex,
our hold is quite short and so we are typically just a victim of
contention rather than a cause - but it is still one less contention
point!

v2: Break up a long line.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-12-git-send-email-chris@chris-wilson.co.uk


# 258a5ede 05-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Do a nonblocking wait first in pread/pwrite

If we try and read or write to an active request, we first must wait
upon the GPU completing that request. Let's do that without holding the
mutex (and so allow someone else to access the GPU whilst we wait). Upon
completion, we will acquire the mutex and only then start the operation
(i.e. we do not rely on state from before the initial wait).

v2: Repaint the goto labels
v3: Move the tracepoints back to the start of the ioctls

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-11-git-send-email-chris@chris-wilson.co.uk


# f3f6184c 05-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Tidy generation of the GTT mmap offset

If we make the observation that mmap-offsets are only released when we
free an object, we can then deduce that the shrinker only creates free
space in the mmap arena indirectly by flushing the request list and
freeing expired objects. If we combine this with the lockless
vma-manager and lockless idling, we can avoid taking our big struct_mutex
until we need to actually free the requests.

One side-effect is that we defer the madvise checking until we need the
pages (i.e. the fault handler). This brings us into line with the other
delayed checks (and madvise in general).

v2: s/ret/err/ and use if (!err) rather than if (ret == 0)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-9-git-send-email-chris@chris-wilson.co.uk


# dcff85c8 05-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex

The principal motivation for this was to try and eliminate the
struct_mutex from i915_gem_suspend - but we still need to hold the mutex
current for the i915_gem_context_lost(). (The issue there is that there
may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and
struct_mutex via the stop_machine().) For the moment, enabling last
request tracking for the engine, allows us to do busyness checking and
waiting without requiring the struct_mutex - which is useful in its own
right.

As a side-effect of having a robust means for tracking engine busyness,
we can replace our other busyness heuristic, that of comparing against
the last submitted seqno. For paranoid reasons, we have a semi-ordered
check of that seqno inside the hangchecker, which we can now improve to
an ordered check of the engine's busyness (removing a locked xchg in the
process).

v2: Pass along "bool interruptible" as being unlocked we cannot rely on
i915->mm.interruptible being stable or even under our control.
v3: Replace check Ironlake i915_gpu_busy() with the common precalculated value

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-6-git-send-email-chris@chris-wilson.co.uk


# 90f4fcd5 05-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove forced stop ring on suspend/unload

Before suspending (or unloading), we would first wait upon all rendering
to be completed and then disable the rings. This later step is a remanent
from DRI1 days when we did not use request tracking for all operations
upon the ring. Now that we are sure we are waiting upon the very last
operation by the engine, we can forgo clobbering the ring registers,
though we do keep the assert that the engine is indeed idle before
sleeping.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-5-git-send-email-chris@chris-wilson.co.uk


# b8f9096d 05-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Convert non-blocking waits for requests over to using RCU

We can completely avoid taking the struct_mutex around the non-blocking
waits by switching over to the RCU request management (trading the mutex
for a RCU read lock and some complex atomic operations). The improvement
is that we gain further contention reduction, and overall the code
become simpler due to the reduced mutex dancing.

v2: Move i915_gem_fault tracepoint back to the start of the function,
before the unlocked wait.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-2-git-send-email-chris@chris-wilson.co.uk


# 0eafec6d 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Enable lockless lookup of request tracking via RCU

If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.

However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.

v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)

Paul E. McKenney wrote:

Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.

Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:

cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();

You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:

cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);

Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).

v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.

v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk


# 00e60f26 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move i915_gem_object_wait_rendering()

Just move it earlier so that we can use the companion nonblocking
version in a couple of more callsites without having to add a forward
declaration.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-24-git-send-email-chris@chris-wilson.co.uk


# 573adb39 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move obj->active:5 to obj->flags

We are motivated to avoid using a bitfield for obj->active for a couple
of reasons. Firstly, we wish to document our lockless read of obj->active
using READ_ONCE inside i915_gem_busy_ioctl() and that requires an
integral type (i.e. not a bitfield). Secondly, gcc produces abysmal code
when presented with a bitfield and that shows up high on the profiles of
request tracking (mainly due to excess memory traffic as it converts
the bitfield to a register and back and generates frequent AGI in the
process).

v2: BIT, break up a long line in compute the other engines, new paint
for i915_gem_object_is_active (now i915_gem_object_get_active).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-23-git-send-email-chris@chris-wilson.co.uk


# faf5bf0a 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use atomics to manipulate obj->frontbuffer_bits

The individual bits inside obj->frontbuffer_bits are protected by each
plane->mutex, but the whole bitfield may be accessed by multiple KMS
operations simultaneously and so the RMW need to be under atomics.
However, for updating the single field we do not need to mandate that it
be under the struct_mutex, one more step towards its removal as the de
facto BKL.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-21-git-send-email-chris@chris-wilson.co.uk


# b5add959 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make fb_tracking.lock a spinlock

We only need a very lightweight mechanism here as the locking is only
used for co-ordinating a bitfield.

v2: Move the cheap unlikely tests into the caller
v3: Move the kerneldoc into the header (now separated out into
intel_fronbuffer.h for better kerneldoc and readability)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtien <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-20-git-send-email-chris@chris-wilson.co.uk


# 5d723d7a 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Separate intel_frontbuffer into its own header

In view of adding inline functions into the intel_frontbuffer section,
we first split the header into its own file so that we can integrate it
more easily with kerneldoc.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-19-git-send-email-chris@chris-wilson.co.uk


# de895082 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove highly confusing i915_gem_obj_ggtt_pin()

Since i915_gem_obj_ggtt_pin() is an idiom breaking curry function for
i915_gem_object_ggtt_pin(), spare us the confusion and remove it.
Removing it now simplifies later patches to change the i915_vma_pin()
(and friends) interface.

v2: Add a redundant GEM_BUG_ON(!view) to
i915_gem_obj_lookup_or_create_ggtt_vma()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-18-git-send-email-chris@chris-wilson.co.uk


# 305bc234 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make i915_vma_pin() small and inline

Not only is i915_vma_pin() called for every single object on every single
execbuf, it is usually a simple increment as the VMA is already bound for
execution by the GPU. Rearrange the tests for unbound and pin_count
overflow so that we can do the increment and test very cheaply and
compact enough to inline the operation into execbuf. The trick used is
to note that we can check for an overflow bit (keeping space available
for it inside the flags) at the same time as checking the binding bits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-17-git-send-email-chris@chris-wilson.co.uk


# 3272db53 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Combine all i915_vma bitfields into a single set of flags

In preparation to perform some magic to speed up i915_vma_pin(), which
is among the hottest of hot paths in execbuf, refactor all the bitfields
accessed by i915_vma_pin() into a single unified set of flags.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-16-git-send-email-chris@chris-wilson.co.uk


# 59bfa124 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Start passing around i915_vma from execbuffer

During execbuffer we look up the i915_vma in order to reserve them in
the VM. However, we then do a double lookup of the vma in order to then
pin them, all because we lack the necessary interfaces to operate on
i915_vma - so introduce i915_vma_pin()!

v2: Tidy parameter lists to remove one level of redirection in the hot
path.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-15-git-send-email-chris@chris-wilson.co.uk


# 20dfbde4 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wrap vma->pin_count accessors with small inline helpers

In the next few patches, the VMA pinning API is overhauled and to reduce
the churn we pull out the update to the accessors into a prep patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-14-git-send-email-chris@chris-wilson.co.uk


# de180033 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Record allocated vma size

Tracking the size of the VMA as allocated allows us to dramatically
reduce the complexity of later functions (like inserting the VMA in to
the drm_mm range manager).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-13-git-send-email-chris@chris-wilson.co.uk


# a9f1481f 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Update i915_gem_get_ggtt_size/_alignment to use drm_i915_private

For consistency, internal functions should take drm_i915_private rather
than drm_device. Now that we are subclassing drm_device, there are no
more size wins, but being consistent is its own blessing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-12-git-send-email-chris@chris-wilson.co.uk


# ad1a7d20 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Update the GGTT size/alignment query functions

In order to be consistent with other address space functions, we want to
pass around 64-bit sizes, even though all known global GTT are limited
to 4GiB. Similarly, we are trying to be consistent in using the _ggtt_
nomenclature when referring to the special global GTT.

v2: Update docs to consistently state "global GTT".

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-11-git-send-email-chris@chris-wilson.co.uk


# 954c4691 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Convert 4096 alignment request to 0 for drm_mm allocations

As we always allocate in chunks of 4096 (that being both the PAGE_SIZE
and our own GTT_PAGE_SIZE), we know that all results from the drm_mm are
aligned to at least 4096. The drm_mm allocator itself is optimised for
alignment == 0, and so by converting alignments of 4096 to 0 we can
satisfy our own requirements and still hit the faster path.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-10-git-send-email-chris@chris-wilson.co.uk


# 3b16525c 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Split insertion/binding of an object into the VM

Split the insertion into the address space's range manager and binding
of that object into the GTT to simplify the code flow when pinning a
VMA.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-9-git-send-email-chris@chris-wilson.co.uk


# 37508589 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reduce WARN(i915_gem_valid_gtt_space) to a debug-only check

i915_gem_valid_gtt_space() is used after inserting the VMA to double
check the list - the location should have been chosen to pass all the
restrictions.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-8-git-send-email-chris@chris-wilson.co.uk


# 91b2db6f 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pad GTT views of exec objects up to user specified size

Our GPUs impose certain requirements upon buffers that depend upon how
exactly they are used. Typically this is expressed as that they require
a larger surface than would be naively computed by pitch * height.
Normally such requirements are hidden away in the userspace driver, but
when we accept pointers from strangers and later impose extra conditions
on them, the original client allocator has no idea about the
monstrosities in the GPU and we require the userspace driver to inform
the kernel how many padding pages are required beyond the client
allocation.

v2: Long time, no see
v3: Try an anonymous union for uapi struct compatibility

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-7-git-send-email-chris@chris-wilson.co.uk


# 2ffffd0f 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix up vma alignment to be u64

This is not the full fix, as we are required to percolate the u64 nature
down through the drm_mm stack, but this is required now to prevent
explosions due to mismatch between execbuf (eb_vma_misplaced) and vma
binding (i915_vma_misplaced) - and reduces the risk of spurious changes
as we adjust the vma interface in the next patches.

v2: long long casts not required for u64 printk (%llx)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-6-git-send-email-chris@chris-wilson.co.uk


# e522ac23 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove surplus drm_device parameter to i915_gem_evict_something()

Eviction is VM local, so we can ignore the significance of the
drm_device in the caller, and leave it to i915_gem_evict_something() to
manage itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-2-git-send-email-chris@chris-wilson.co.uk


# 1dd5b6f2 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add missing rpm wakelock to GGTT pread

Joonas spotted a discrepancy between the pwrite and pread ioctls, in
that pwrite takes the rpm wakelock around its GGTT access, The wakelock
is required in order for the GTT to function. In disregard for the
current convention, we take the rpm wakelock around the access itself
rather than around the struct_mutex as the nesting is not strictly
required and such ordering will one day be fixed by explicitly noting
the barrier dependencies between the GGTT and rpm.

Fixes: b50a53715f09 ("drm/i915: Support for pread/pwrite ...")
Reported-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: drm-intel-fixes@lists.freedesktop.org
Link: http://patchwork.freedesktop.org/patch/msgid/1470298193-21765-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# df0e9a28 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

Revert "drm/i915: Clean up associated VMAs on context destruction"

This reverts commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae.

The patch was only a stop-gap measure that fixed half the problem - the
leak of the fbcon when restarting X. A complete solution required
releasing the VMA when the object itself was closed rather than rely on
file/process exit. The previous patches add the VMA tracking necessary
to do close them along with the object, context or file, and so the time
has come to remove the partial fix.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-28-git-send-email-chris@chris-wilson.co.uk


# 50e046b6 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark the context and address space as closed

When the user closes the context mark it and the dependent address space
as closed. As we use an asynchronous destruct method, this has two
purposes. First it allows us to flag the closed context and detect
internal errors if we to create any new objects for it (as it is removed
from the user's namespace, these should be internal bugs only). And
secondly, it allows us to immediately reap stale vma.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-27-git-send-email-chris@chris-wilson.co.uk


# b1f788c6 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Release vma when the handle is closed

In order to prevent a leak of the vma on shared objects, we need to
hook into the object_close callback to destroy the vma on the object for
this file. However, if we destroyed that vma immediately we may cause
unexpected application stalls as we try to unbind a busy vma - hence we
defer the unbind to when we retire the vma.

v2: Keep vma allocated until closed. This is useful for a later
optimisation, but it is required now in order to handle potential
recursion of i915_vma_unbind() by retiring itself.
v3: Comments are important.

Testcase: igt/gem_ppggtt/flink-and-close-vma-leak
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-26-git-send-email-chris@chris-wilson.co.uk


# b0decaf7 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track active vma requests

Hook the vma itself into the i915_gem_request_retire() so that we can
accurately track when a solitary vma is inactive (as opposed to having
to wait for the entire object to be idle). This improves the interaction
when using multiple contexts (with full-ppgtt) and eliminates some
frequent list walking when retiring objects after a completed request.

A side-effect is that we get an active vma reference for free. The
consequence of this is shown in the next patch...

v2: Update inline names to be consistent with
i915_gem_object_get_active()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-25-git-send-email-chris@chris-wilson.co.uk


# 5cf3d280 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: i915_vma_move_to_active prep patch

This patch is broken out of the next just to remove the code motion from
that patch and make it more readable. What we do here is move the
i915_vma_move_to_active() to i915_gem_execbuffer.c and put the three
stages (read, write, fenced) together so that future modifications to
active handling are all located in the same spot. The importance of this
is so that we can more simply control the order in which the requests
are place in the retirement list (i.e. control the order at which we
retire and so control the lifetimes to avoid having to hold onto
references).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-24-git-send-email-chris@chris-wilson.co.uk


# 4b8de8e6 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move request list retirement to i915_gem_request.c

As the list retirement is now clean of implementation details, we can
move it closer to the request management.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-23-git-send-email-chris@chris-wilson.co.uk


# 776f3236 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: s/__i915_wait_request/i915_wait_request/

There is only one wait on request function now, so drop the "expert"
indication of leading __.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-21-git-send-email-chris@chris-wilson.co.uk


# fa545cbf 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refactor activity tracking for requests

With the introduction of requests, we amplified the number of atomic
refcounted objects we use and update every execbuffer; from none to
several references, and a set of references that need to be changed. We
also introduced interesting side-effects in the order of retiring
requests and objects.

Instead of independently tracking the last request for an object, track
the active objects for each request. The object will reside in the
buffer list of its most recent active request and so we reduce the kref
interchange to a list_move. Now retirements are entirely driven by the
request, dramatically simplifying activity tracking on the object
themselves, and removing the ambiguity between retiring objects and
retiring requests.

Furthermore with the consolidation of managing the activity tracking
centrally, we can look forward to using RCU to enable lockless lookup of
the current active requests for an object. In the future, we will be
able to query the status or wait upon rendering to an object without
even touching the struct_mutex BKL.

All told, less code, simpler and faster, and more extensible.

v2: Add a typedef for the function pointer for convenience later.
v3: Make the noop retirement callback explicit. Allow passing NULL to
the init_request_active() which is expanded to a common noop function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk


# 21c310f2 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove obsolete i915_gem_object_flush_active()

Since we track requests, and requests are always added to the GPU fully
formed, we never have to flush the incomplete request and know that the
given request will eventually complete without any further action on our
part.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-15-git-send-email-chris@chris-wilson.co.uk


# efdf7c06 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename request->list to link for consistency

We use "list" to denote the list and "link" to denote an element on that
list. Rename request->list to match this idiom.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-14-git-send-email-chris@chris-wilson.co.uk


# 8cac6f6c 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refactor blocking waits

Tidy up the for loops that handle waiting for read/write vs read-only
access.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-13-git-send-email-chris@chris-wilson.co.uk


# d72d908b 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark up i915_gem_active for locking annotation

The future annotations will track the locking used for access to ensure
that it is always sufficient. We make the preparations now to present
the API ahead and to make sure that GCC can eliminate the unused
parameter.

Before: 6298417 3619610 696320 10614347 a1f64b vmlinux
After: 6298417 3619610 696320 10614347 a1f64b vmlinux
(with i915 builtin)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-12-git-send-email-chris@chris-wilson.co.uk


# 27c01aae 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prepare i915_gem_active for annotations

In the future, we will want to add annotations to the i915_gem_active
struct. The API is thus expanded to hide direct access to the contents
of i915_gem_active and mediated instead through a number of helpers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-11-git-send-email-chris@chris-wilson.co.uk


# 381f371b 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Introduce i915_gem_active for request tracking

In the next patch, request tracking is made more generic and for that we
need a new expanded struct and to separate out the logic changes from
the mechanical churn, we split out the structure renaming into this
patch.

v2: Writer's block. Add some spiel about why we track requests.
v3: Now i915_gem_active.
v4: Now with i915_gem_active_set() for attaching to the active request.
v5: Use i915_gem_active_set() from inside the retirement handlers

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-10-git-send-email-chris@chris-wilson.co.uk


# 4717ca9e 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Kill drop_pages()

The drop_pages() function is a dangerous trap in that it can release the
passed in object pointer and so unless the caller is aware, it can
easily trick us into using the stale object afterwards. Move it into its
solitary callsite where we know it is safe.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-9-git-send-email-chris@chris-wilson.co.uk


# aa653a68 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Be more careful when unbinding vma

When we call i915_vma_unbind(), we will wait upon outstanding rendering.
This will also trigger a retirement phase, which may update the object
lists. If, we extend request tracking to the VMA itself (rather than
keep it at the encompassing object), then there is a potential that the
obj->vma_list be modified for other elements upon i915_vma_unbind(). As
a result, if we walk over the object list and call i915_vma_unbind(), we
need to be prepared for that list to change.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-8-git-send-email-chris@chris-wilson.co.uk


# 15717de2 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Count how many VMA are bound for an object

Since we may have VMA allocated for an object, but we interrupted their
binding, there is a disparity between have elements on the obj->vma_list
and being bound. i915_gem_obj_bound_any() does this check, but this is
not rigorously observed - add an explicit count to make it easier.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-7-git-send-email-chris@chris-wilson.co.uk


# f6b9d5ca 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Split early global GTT initialisation

Initialising the global GTT is tricky as we wish to use the drm_mm range
manager during the modesetting initialisation (to capture stolen
allocations from the BIOS) before we actually enable GEM. To overcome
this, we currently setup the drm_mm first and then carefully rebind
them.

v2: Fixup after rebasing
v3: GGTT initialisation needs to be split around kicking out conflicts
v4: Restore an old UMS BUG_ON(mappable > total) as a DRM_ERROR plus
fixup of probe results.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-4-git-send-email-chris@chris-wilson.co.uk


# 97d6d7ab 04-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Update GGTT initialisation functions to take drm_i915_private

Since these are internal functions they operate on drm_i915_private and
not the drm_device being passed in. So pass in the drm_i915_private
instead, and remove one layer of dancing. No space wins here, just
conforming to the norm in function parameters.

v2: Include all the probe functions

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-3-git-send-email-chris@chris-wilson.co.uk


# ddf07be7 02-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Simplify calling engine->sync_to

Since requests can no longer be generated as a side-effect of
intel_ring_begin(), we know that the seqno will be unchanged during
ring-emission. This predicatablity then means we do not have to check
for the seqno wrapping around whilst emitting the semaphore for
engine->sync_to().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-31-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-22-git-send-email-chris@chris-wilson.co.uk


# 5b043f4e 02-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Unify legacy/execlists submit_execbuf callbacks

Now that emitting requests is identical between legacy and execlists, we
can use the same function to build up the ring for submitting to either
engine. (With the exception of i915_switch_contexts(), but in time that
will also be handled gracefully.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-30-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-21-git-send-email-chris@chris-wilson.co.uk


# 8e637178 02-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Simplify request_alloc by returning the allocated request

If is simpler and leads to more readable code through the callstack if
the allocation returns the allocated struct through the return value.

The importance of this is that it no longer looks like we accidentally
allocate requests as side-effect of calling certain functions.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-19-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-9-git-send-email-chris@chris-wilson.co.uk


# 7e37f889 02-Aug-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename struct intel_ringbuffer to struct intel_ring

The state stored in this struct is not only the information about the
buffer object, but the ring used to communicate with the hardware. Using
buffer here is overly specific and, for me at least, conflates with the
notion of buffer objects themselves.

s/struct intel_ringbuffer/struct intel_ring/
s/enum intel_ring_hangcheck/enum intel_engine_hangcheck/
s/describe_ctx_ringbuf()/describe_ctx_ring()/
s/intel_ring_get_active_head()/intel_engine_get_active_head()/
s/intel_ring_sync_index()/intel_engine_sync_index()/
s/intel_ring_init_seqno()/intel_engine_init_seqno()/
s/ring_stuck()/engine_stuck()/
s/intel_cleanup_engine()/intel_engine_cleanup()/
s/intel_stop_engine()/intel_engine_stop()/
s/intel_pin_and_map_ringbuffer_obj()/intel_pin_and_map_ring()/
s/intel_unpin_ringbuffer()/intel_unpin_ring()/
s/intel_engine_create_ringbuffer()/intel_engine_create_ring()/
s/intel_ring_flush_all_caches()/intel_engine_flush_all_caches()/
s/intel_ring_invalidate_all_caches()/intel_engine_invalidate_all_caches()/
s/intel_ringbuffer_free()/intel_ring_free()/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-15-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-4-git-send-email-chris@chris-wilson.co.uk


# 7e21d648 27-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove stray intel_engine_cs ring identifiers from i915_gem.c

A few places we use ring when referring to the struct intel_engine_cs. An
anachronism we are pruning out.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-9-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469606850-28659-4-git-send-email-chris@chris-wilson.co.uk


# c80ff16e 27-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use engine to refer to the user's BSD intel_engine_cs

This patch transitions the execbuf engine selection away from using the
ring nomenclature - though we still refer to the user's incoming
selector as their user_ring_id.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-7-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469606850-28659-2-git-send-email-chris@chris-wilson.co.uk


# 15f7bbc7 25-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only clear the client pointer when tearing down the file

Upon release of the file (i.e. the user calls close(fd)), we decouple
all objects from the client list so that we don't chase the dangling
file_priv. As we always inspect file_priv first, we only need to nullify
that pointer and can safely ignore the list_head.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-4-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469530913-17180-3-git-send-email-chris@chris-wilson.co.uk


# 2529d570 24-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop racy markup of missed-irqs from idle-worker

During the idle-worker we disable the hangcheck and so kick any waiters
that should have been completed (since the GPU is now idle). Unlike the
hangcheck, we do not take any care to avoid the race between the irq
handler and ourselves, and so it is possible for us to declare a missed
interrupt even as the bottom-half is being scheduled to run. Let's
ignore this race to stop a potential false-positive error.

References: https://bugs.freedesktop.org/show_bug.cgi?id=96974
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469351421-13493-1-git-send-email-chris@chris-wilson.co.uk


# 54b4f68f 21-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

Revert "drm/i915: Enable RC6 immediately"

This reverts commit b12e0ee2080c ("drm/i915: Enable RC6 immediately"),
as it was never meant to be sent anywhere other than the bug report for
experimentation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1469132179-4052-1-git-send-email-chris@chris-wilson.co.uk
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b12e0ee2 21-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Enable RC6 immediately

Now that PCU communication is reasonably fast, we do not need to defer
RC6 initialisation to a workqueue.

References: https://bugs.freedesktop.org/show_bug.cgi?id=97017
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 39df9190 20-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Convert i915_semaphores_is_enabled over to early sanitize

Rather than recomputing whether semaphores are enabled, we can do that
computation once during early initialisation as the i915.semaphores
module parameter is now read-only.

s/i915_semaphores_is_enabled/i915.semaphores/

v2: Add the state to the debug dmesg as well

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-10-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-9-git-send-email-chris@chris-wilson.co.uk


# 34911fd3 20-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename drm_gem_object_unreference_unlocked in preparation for lockless free

Whilst this ultimately wraps kref_put_mutex(), our goal here is the
lockless variant, so keep the _unlocked() suffix until we need it no
more.

s/drm_gem_object_unreference_unlocked/i915_gem_object_put_unlocked/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-7-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-6-git-send-email-chris@chris-wilson.co.uk


# f8c417cd 20-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename drm_gem_object_unreference in preparation for lockless free

Ultimately wraps kref_put(), so adopt its nomenclature for consistency
with other subsystems.

s/drm_gem_object_unreference/i915_gem_object_put/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-6-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-5-git-send-email-chris@chris-wilson.co.uk


# 25dc556a 20-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wrap drm_gem_object_reference in i915_gem_object_get

Ultimately wraps kref_get(), so adopt its nomenclature for consistency
with other subsystems.

s/drm_gem_object_reference/i915_gem_object_get/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-5-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-4-git-send-email-chris@chris-wilson.co.uk


# 03ac0642 20-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wrap drm_gem_object_lookup in i915_gem_object_lookup

For symmetry with a forthcoming i915_gem_object_get() and
i915_gem_object_put().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-4-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-3-git-send-email-chris@chris-wilson.co.uk


# e8a261ea 20-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename request reference/unreference to get/put

Now that we derive requests from struct fence, swap over to its
nomenclature for references. It's shorter and more idiomatic across the
kernel.

s/i915_gem_request_reference/i915_gem_request_get/
s/i915_gem_request_unreference/i915_gem_request_put/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-2-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-1-git-send-email-chris@chris-wilson.co.uk


# c13d87ea 20-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wait on external rendering for GEM objects

When transitioning to the GTT or CPU domain we wait on all rendering
from i915 to complete (with the optimisation of allowing concurrent read
access by both the GPU and client). We don't yet ensure all rendering
from third parties (tracked by implicit fences on the dma-buf) is
complete. Since implicitly tracked rendering by third parties will
ignore our cache-domain tracking, we have to always wait upon rendering
from third-parties when transitioning to direct access to the backing
store. We still rely on clients notifying us of cache domain changes
(i.e. they need to move to the GTT read or write domain after doing a CPU
access before letting the third party render again).

v2:
This introduces a potential WARN_ON into i915_gem_object_free() as the
current i915_vma_unbind() calls i915_gem_object_wait_rendering(). To
hit this path we first need to render with the GPU, have a dma-buf
attached with an unsignaled fence and then interrupt the wait. It does
get fixed later in the series (when i915_vma_unbind() only waits on the
active VMA and not all, including third-party, rendering.

To offset that risk, use the __i915_vma_unbind_no_wait hack.

Testcase: igt/prime_vgem/basic-fence-read
Testcase: igt/prime_vgem/basic-fence-mmap
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-8-git-send-email-chris@chris-wilson.co.uk


# 197be2ae 20-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Disable waitboosting for mmioflips/semaphores

Since commit a6f766f39751 ("drm/i915: Limit ring synchronisation (sw
sempahores) RPS boosts") and commit bcafc4e38b6a ("drm/i915: Limit mmio
flip RPS boosts") we have limited the waitboosting for semaphores and
flips. Ideally we do not want to boost in either of these instances as no
userspace consumer is waiting upon the results (though a userspace producer
may be stalled trying to submit an execbuf - but in this case the
producer is being throttled due to the engine being saturated with
work). With the introduction of NO_WAITBOOST in the previous patch, we
can finally disable these needless boosts.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-6-git-send-email-chris@chris-wilson.co.uk


# c4b0930b 20-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark all current requests as complete before resetting them

Following a GPU reset upon hang, we retire all the requests and then
mark them all as complete. If we mark them as complete first, we both
keep the normal retirement order (completed first then retired) and
provide a small optimisation for concurrent lookups.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-3-git-send-email-chris@chris-wilson.co.uk


# 05235c53 20-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move GEM request routines to i915_gem_request.c

Migrate the request operations out of the main body of i915_gem.c and
into their own C file for easier expansion.

v2: Move __i915_add_request() across as well

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-1-git-send-email-chris@chris-wilson.co.uk


# 62f90b38 15-Jul-2016 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Update missing kerneldoc

Not sure why so much slips through when 0day is catching these. Hopefully
the much faster sphinx toolchain helps in unlazying people.

Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1468612088-9721-10-git-send-email-daniel.vetter@ffwll.ch


# d1054ee4 16-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Handle ENOSPC after failing to insert a mappable node

Even after adding individual page support for GTT mmaping, we can still
fail to find any space within the mappable region, and
drm_mm_insert_node() will then report ENOSPC. We have to then handle
this error by using the shmem access to the pages.

Fixes: b50a53715f09 ("drm/i915: Support for pread/pwrite ... objects")
Testcase: igt/gem_concurrent_blit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com
Link: http://patchwork.freedesktop.org/patch/msgid/1468690956-23480-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5ab57c70 15-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush logical context image out to memory upon suspend

Before suspend, and especially before building the hibernation image, we
need to context image to be coherent in memory. To do this we require
that we perform a context switch to a disposable context (i.e. the
dev_priv->kernel_context) - when that switch is complete, all other
context images will be complete. This leaves the kernel_context image as
incomplete, but fortunately that is disposable and we can do a quick
fixup of the logical state after resuming.

v2: Share the nearly identical code to switch to the kernel context with
eviction.
v3: Explain why we need the switch and reset.

Testcase: igt/gem_exec_suspend # bsw
References: https://bugs.freedesktop.org/show_bug.cgi?id=96526
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468590980-6186-2-git-send-email-chris@chris-wilson.co.uk


# b7137e0c 13-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Defer enabling rc6 til after we submit the first batch/context

Some hardware requires a valid render context before it can initiate
rc6 power gating of the GPU; the default state of the GPU is not
sufficient and may lead to undefined behaviour. The first execution of
any batch will load the "golden render state", at which point it is safe
to enable rc6. As we do not forcibly load the kernel context at resume,
we have to hook into the batch submission to be sure that the render
state is setup before enabling rc6.

However, since we don't enable powersaving until that first batch, we
queued a delayed task in order to guarantee that the batch is indeed
submitted.

v2: Rearrange intel_disable_gt_powersave() to match.
v3: Apply user specified cur_freq (or idle_freq if not set).
v4: Give in, and supply a delayed work to autoenable rc6
v5: Mika suggested a couple of better names for delayed_resume_work
v6: Rebalance rpm_put around the autoenable task

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468397438-21226-7-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


# b913b33c 13-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush GT idle status upon reset

Upon resetting the GPU, we force the engines to be idle by clearing
their request lists. However, I neglected to clear the GT active status
and so the next request following the reset was not marking the device
as busy again. (We had to wait until any outstanding retire worker
finally ran and cleared the active status.)

Fixes: 67d97da34917 ("drm/i915: Only start retire worker when idle")
Testcase: igt/pm_rps/reset
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468397438-21226-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 5b585925 09-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/breadcrumbs: Queue hangcheck before sleeping

Never go to sleep waiting on the GPU without first ensuring that we will
get woken up.

We have a choice of queuing the hangcheck before every schedule() or the
first time we wakeup. In order to simply accommodate both the signaler
and the ordinary waiter, move the queuing to the common point of
enabling the irq. We lose the paranoid safety of ensuring that the
hangcheck is active before the sleep, but avoid code duplication (and
redundant hangcheck queuing).

Testcase: igt/prime_busy
Fixes: c81d46138da6 ("drm/i915: Convert trace-irq to the breadcrumb waiter")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468055535-19740-2-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
(cherry picked from commit 232af392fdb52aa2739dad4e03fed273b3c3f24a)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 3fef3a5b 05-Jul-2016 Matthew Auld <matthew.auld@intel.com>

drm/i915: remove superfluous i915_gem_object_free_mmap_offset call

This should already be handled by drm_gem_object_release, which is
called later on.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467720019-31876-1-git-send-email-matthew.auld@intel.com


# 8b3e2d36 13-Jul-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Unify engine init loop

With the unified common engine setup done, and the execlist engine
initialization loop clearly split into two phases, we can eliminate
the separate legacy engine initialization code.

v2: Fix cleanup path for legacy.
v3: Rename constructors. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris-wilson.co.uk>


# c9615613 09-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Kick hangcheck from retire worker

Let's ensure that we cannot run indefinitely without the hangcheck
worker being queued. We removed it from being kicked on every request
because we were kicking it a few millions times in every hangcheck
interval and only once is necessary! However, that leaves us with the
issue of what if userspace never waits for a request, or runs out of
resources, what if userspace just issues a request then spins on
BUSY_IOCTL?

Testcase: igt/gem_busy
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468055535-19740-3-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


# 232af392 09-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/breadcrumbs: Queue hangcheck before sleeping

Never go to sleep waiting on the GPU without first ensuring that we will
get woken up.

We have a choice of queuing the hangcheck before every schedule() or the
first time we wakeup. In order to simply accommodate both the signaler
and the ordinary waiter, move the queuing to the common point of
enabling the irq. We lose the paranoid safety of ensuring that the
hangcheck is active before the sleep, but avoid code duplication (and
redundant hangcheck queuing).

Testcase: igt/prime_busy
Fixes: c81d46138da6 ("drm/i915: Convert trace-irq to the breadcrumb waiter")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468055535-19740-2-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


# 91c8a326 05-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Convert dev_priv->dev backpointers to dev_priv->drm

Since drm_i915_private is now a subclass of drm_device we do not need to
chase the drm_i915_private->dev backpointer and can instead simply
access drm_i915_private->drm directly.

text data bss dec hex filename
1068757 4565 416 1073738 10624a drivers/gpu/drm/i915/i915.ko
1066949 4565 416 1071930 105b3a drivers/gpu/drm/i915/i915.ko

Created by the coccinelle script:
@@
struct drm_i915_private *d;
identifier i;
@@
(
- d->dev->i
+ d->drm.i
|
- d->dev
+ &d->drm
)

and for good measure the dev_priv->dev backpointer was removed entirely.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467711623-2905-4-git-send-email-chris@chris-wilson.co.uk


# b1379d49 05-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Replace lockless_dereference(bool) with READ_ONCE()

After Joonas complained about using READ_ONCE() on the only use of the
variable in the function, where the intent was to simply document that
the read was intentionally racy and unlocked, I switched the READ_ONCE()
over to lockless_dereference(). However, in linux-next that has a
stronger type-check to only allow pointers and is no longer
interchangeable with READ_ONCE(), see commit 331b6d8c7afc
("locking/barriers: Validate lockless_dereference() is used on a pointer
type")

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 67d97da34917 ("drm/i915: Only start retire worker when idle")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1467705276-707-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>


# f19ec8cb 04-Jul-2016 Dave Gordon <david.s.gordon@intel.com>

drm/i915: convert a few more E->dev_private to to_i915(E)

Also remove some redundant dev and dev_priv locals

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1467626365-29871-1-git-send-email-david.s.gordon@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1467628477-25379-2-git-send-email-chris@chris-wilson.co.uk


# fac5e23e 04-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mass convert dev->dev_private to to_i915(dev)

Since we now subclass struct drm_device, we can save pointer dances by
noting the equivalence of struct drm_device and struct drm_i915_private,
i.e. by using to_i915().

text data bss dec hex filename
1073824 4562 416 1078802 107612 drivers/gpu/drm/i915/i915.ko
1068976 4562 416 1073954 106322 drivers/gpu/drm/i915/i915.ko

Created by the coccinelle script:

@@
expression E;
identifier p;
@@
- struct drm_i915_private *p = E->dev_private;
+ struct drm_i915_private *p = to_i915(E);

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467628477-25379-1-git-send-email-chris@chris-wilson.co.uk


# 7b4d3a16 04-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove stop-rings debugfs interface

Now that we have (near) universal GPU recovery code, we can inject a
real hang from userspace and not need any fakery. Not only does this
mean that the testing is far more realistic, but we can simplify the
kernel in the process.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-7-git-send-email-chris@chris-wilson.co.uk


# df4ba509 04-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add background commentary to "waitboosting"

Describe the intent of boosting the GPU frequency to maximum before
waiting on the GPU.

RPS waitboosting was introduced with commit b29c19b64528 ("drm/i915:
Boost RPS frequency for CPU stalls") but lacked a concise comment in the
code to explain itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-5-git-send-email-chris@chris-wilson.co.uk


# 0e6883b0 04-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Restore waitboost credit to the synchronous waiter

Ideally, we want to automagically have the GPU respond to the
instantaneous load by reclocking itself. However, reclocking occurs
relatively slowly, and to the client waiting for a result from the GPU,
too late. To compensate and reduce the client latency, we allow the
first wait from a client to boost the GPU clocks to maximum. This
overcomes the lag in autoreclocking, at the expense of forcing the GPU
clocks too high. So to offset the excessive power usage, we currently
allow a client to only boost the clocks once before we detect the GPU
is idle again. This works reasonably for say the first frame in a
benchmark, but for many more synchronous workloads (like OpenCL) we find
the GPU clocks remain too low. By noting a wait which would idle the GPU
(i.e. we just waited upon the last known request), we can give that
client the idle boost credit (for their next wait) without the 100ms
delay required for us to detect the GPU idle state. The intention is to
boost clients that are stalling in the process of feeding the GPU more
work (and who in doing so let the GPU idle), without granting boost
credits to clients that are throttling themselves (such as compositors).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Zou, Nanhai" <nanhai.zou@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-4-git-send-email-chris@chris-wilson.co.uk


# e307d62d 04-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove redundant queue_delayed_work() from throttle ioctl

We know, by design, that whilst the GPU is active (and thus we are
throttling) the retire_worker is queued. Therefore attempting to requeue
it with queue_delayed_work() is a no-op and we can safely remove it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-3-git-send-email-chris@chris-wilson.co.uk


# 1b51bce2 04-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Do not keep postponing the idle-work

Rather than persistently postponing the idle-work everytime somebody
calls i915_gem_retire_requests() (potentially ensuring that we never
reach the idle state), queue the work the first time we detect all
requests are complete. Then if in 100ms, more requests have been queued,
we will abort the idle-worker and wait again until all the new requests
have been completed.

Of course, this does depend upon the idle worker cancelling itself
gracefully from the previous patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-2-git-send-email-chris@chris-wilson.co.uk


# 67d97da3 04-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only start retire worker when idle

The retire worker is a low frequency task that makes sure we retire
outstanding requests if userspace is being lax. We only need to start it
once as it remains active until the GPU is idle, so do a cheap test
before the more expensive queue_work(). A consequence of this is that we
need correct locking in the worker to make the hot path of request
submission cheap. To keep the symmetry and keep hangcheck strictly bound
by the GPU's wakelock, we move the cancel_sync(hangcheck) to the idle
worker before dropping the wakelock.

v2: Guard against RCU fouling the breadcrumbs bottom-half whilst we kick
the waiter.
v3: Remove the wakeref assertion squelching (now we hold a wakeref for
the hangcheck, any rpm error there is genuine).
v4: To prevent excess work when retiring requests, we split the busy
flag into two, a boolean to denote whether we hold the wakeref and a
bitmask of active engines.
v5: Reorder cancelling hangcheck upon idling to avoid a race where we
might cancel a hangcheck after being preempted by a new task

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
References: https://bugs.freedesktop.org/show_bug.cgi?id=88437
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-1-git-send-email-chris@chris-wilson.co.uk


# c81d4613 01-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Convert trace-irq to the breadcrumb waiter

If we convert the tracing over from direct use of ring->irq_get() and
over to the breadcrumb infrastructure, we only have a single user of the
ring->irq_get and so we will be able to simplify the driver routines
(eliminating the redundant validation and irq refcounting).

Process context is preferred over softirq (or even hardirq) for a couple
of reasons:

- we already utilize process context to have fast wakeup of a single
client (i.e. the client waiting for the GPU inspects the seqno for
itself following an interrupt to avoid the overhead of a context
switch before it returns to userspace)

- engine->irq_seqno() is not suitable for use from an softirq/hardirq
context as we may require long waits (100-250us) to ensure the seqno
write is posted before we read it from the CPU

A signaling framework is a requirement for enabling dma-fences.

v2: Move to a signaling framework based upon the waiter.
v3: Track the first-signal to avoid having to walk the rbtree everytime.
v4: Mark the signaler thread as RT priority to reduce latency in the
indirect wakeups.
v5: Make failure to allocate the thread fatal.
v6: Rename kthreads to i915/signal:%u

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-16-git-send-email-chris@chris-wilson.co.uk


# 1137fa86 01-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Stop setting wraparound seqno on initialisation

We have testcases to ensure that seqno wraparound works fine, so we can
forgo forcing everyone to encounter seqno wraparound during early
uptime. seqno wraparound incurs a full GPU stall so not forcing it
will eliminate one jitter from the early system. Using the testcases, we
have very deterministic testing which given how difficult it would be to
debug an issue (GPU hang) stemming from a wraparound using pure
postmortem analysis I see no value in forcing a wrap during boot.

Advancing the global next_seqno after a GPU reset is equally pointless.

References? https://bugs.freedesktop.org/show_bug.cgi?id=95023
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-15-git-send-email-chris@chris-wilson.co.uk


# f69a02c9 01-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Spin after waking up for an interrupt

When waiting for an interrupt (waiting for the engine to complete some
work), we know we are the only waiter to be woken on this engine. We also
know when the GPU has nearly completed our request (or at least started
processing it), so after being woken and we detect that the GPU is
active and working on our request, allow us the bottom-half (the first
waiter who wakes up to handle checking the seqno after the interrupt) to
spin for a very short while to reduce client latencies.

The impact is minimal, there was an improvement to the realtime-vs-many
clients case, but exporting the function proves useful later. However,
it is tempting to adjust irq_seqno_barrier to include the spin. The
problem is first ensuring that the "start-of-request" seqno is coherent
as we use that as our basis for judging when it is ok to spin. If we
could, spinning there could dramatically shorten some sleeps, and allow
us to make the barriers more conservative to handle missed seqno writes
on more platforms (all gen7+ are known to have the occasional issue, at
least).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-7-git-send-email-chris@chris-wilson.co.uk


# 688e6c72 01-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Slaughter the thundering i915_wait_request herd

One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.

Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.

Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs approximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.

To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
every time to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.

v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).
v17: Add waiters to post-mortem GPU hang state.
v18: Return early for a completed wait after acquiring the spinlock.
Avoids adding ourselves to the tree if the is already complete, and
skips the awkward question of why we don't do completion wakeups for
waits earlier than or equal to ourselves.
v19: Prepare for init_breadcrumbs to fail. Later patches may want to
allocate during init, so be prepared to propagate back the error code.

Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk


# 1f15b76f 01-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Separate GPU hang waitqueue from advance

Currently __i915_wait_request uses a per-engine wait_queue_t for the dual
purpose of waking after the GPU advances or for waking after an error.
In the future, we may add even more wake sources and require greater
separation, but for now we can conceptually simplify wakeups by separating
the two sources. In particular, this allows us to use different wait-queues
(e.g. one on the engine advancement, a global one for errors and one on
each requests) without any hassle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-5-git-send-email-chris@chris-wilson.co.uk


# 05535726 01-Jul-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Delay queuing hangcheck to wait-request

We can forgo queuing the hangcheck from the start of every request to
until we wait upon a request. This reduces the overhead of every
request, but may increase the latency of detecting a hang. However, if
nothing every waits upon a hang, did it ever hang? It also improves the
robustness of the wait-request by ensuring that the hangchecker is
indeed running before we sleep indefinitely (and thereby ensuring that
we never actually sleep forever waiting for a dead GPU).

As pointed out by Tvrtko, it is possible for a GPU hang to go unnoticed
for as long as nobody is waiting for the GPU. Though this rare, during
that time we may be consuming more power than if we had promptly
recovered, and in the most extreme case we may exhaust all memory before
forcing the hangcheck. Something to be wary off in future.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-2-git-send-email-chris@chris-wilson.co.uk


# 0c5eed65 29-Jun-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove request->reset_counter

Since commit 2ed53a94d8cb ("drm/i915: On GPU reset, set the HWS
breadcrumb to the last seqno") once a hang is completed, the seqno is
advanced past all current requests. With this we know that if we wake up
from waiting for a request, if a hang has occurred and reset completed,
our request will be considered complete (i.e.
i915_gem_request_completed() returns true). Therefore we only need to
worry about the situation where a hang has occurred, but not yet reset,
where we may need to release our struct_mutex. Since we don't need to
detect the completed reset using the global gpu_error->reset_counter
anymore, we do not need to track the reset_counter epoch inside the
request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467211874-11552-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>


# 6e5a5beb 24-Jun-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Split idling from forcing context switch

We only need to force a switch to the kernel context placeholder during
eviction. All other uses of i915_gpu_idle() just want to wait until
existing work on the GPU is idle. Rename i915_gpu_idle() to
i915_gem_wait_for_idle() to avoid any implications about "parking" the
context first.

v2: Tweak an error message if the wait fails for the ilk vtd w/a

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-6-git-send-email-chris@chris-wilson.co.uk


# 62e63007 24-Jun-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Skip idling an idle engine

During suspend (or module unload), if we have never accessed the engine
(i.e. userspace never submitted a batch to it), the engine is idle. Then
we attempt to idle the engine by forcing it to the default context,
which actually means we submit a render batch to setup the golden
context state and then wait for it to complete. We can skip this
entirely as we know the engine is idle.

v2: Drop incorrect comment.

References: https://bugs.freedesktop.org/show_bug.cgi?id=95634
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-1-git-send-email-chris@chris-wilson.co.uk


# aeecc969 17-Jun-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: use ORIGIN_CPU for frontbuffer invalidation on WC mmaps

... instead of the previous ORIGIN_GTT. This should actually
invalidate FBC once something is written on the frontbuffer using WC
mmaps. The problem with ORIGIN_GTT is that the automatic hardware
tracking is not able to detect the WC writes as it can detect the GTT
writes.

This should help fix the SKL bug where nothing happens when you type
your username/password on lightdm.

This patch was originally pasted on an email by Chris and converted to
an actual git patch by Paulo.

v2 (from Paulo):
- Make it a full variable instead of a bit-field (Daniel)
- Use WRITE_ONCE (Chris)
v3 (from Paulo):
- Remove huge comment since now we have WRITE_ONCE (Chris)
- Remove uneeded new line (Chris)
- Add Chris' Signed-off-by, authorized via IRC

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466185599-26401-1-git-send-email-paulo.r.zanoni@intel.com


# 6eae0059 20-Jun-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: pwrite/pread do not require obj->base.filp, just pages

The idea behind relaxing the restriction for pread/pwrite was to handle
!obj->base.flip, i.e. non-shmemfs backed objects, which only requires
that the object provide struct pages.

v2: Remove excess (). Note enough editing after copy'n'paste.
v3: Use new i915_gem_object_has_struct_page()

Testcase: igt/prime_vgem/read
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466431552-17860-2-git-send-email-chris@chris-wilson.co.uk


# b9bcd14a 20-Jun-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Extract checking for backing struct pages to a helper

Currently to see if an object is backed by struct pages (as opposed to
being a simple pointer to stolen memory, for example) we do a manual
check on the obj->ops->flags. This is quite shouty and before adding
more checks in future, we should make it a bit calmer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466431552-17860-1-git-send-email-chris@chris-wilson.co.uk


# b50a5371 10-Jun-2016 Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>

drm/i915: Support for pread/pwrite from/to non shmem backed objects

This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.

v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)

v3: Rebased to the latest drm-intel-nightly (Ankit)

v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)

v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)

v6: Using pwrite_fast for non-shmem backed objects as well (Chris)

v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)

v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)

v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)

v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)

v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration

v12: Use page-by-page copy for slow user access too (Chris)

v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)

v14: Corrected datatypes/initializations (Tvrtko)

Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite

Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com


# 4f1959ee 10-Jun-2016 Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>

drm/i915: Use insert_page for pwrite_fast

In pwrite_fast, map an object page by page if obj_ggtt_pin fails. First,
we try a nonblocking pin for the whole object (since that is fastest if
reused), then failing that we try to grab one page in the mappable
aperture. It also allows us to handle objects larger than the mappable
aperture (e.g. if we need to pwrite with vGPU restricting the aperture
to a measely 8MiB or something like that).

v2: Pin pages before starting pwrite, Combined duplicate loops (Chris)

v3: Combined loops based on local patch by Chris (Chris)

v4: Added i915 wrapper function for drm_mm_insert_node_in_range (Chris)

v5: Renamed wrapper function for drm_mm_insert_node_in_range (Chris)

v5: Added wrapper for drm_mm_remove_node() (Chris)

v6: Added get_pages call before pinning the pages (Tvrtko)
Added remove_mappable_node() wrapper for drm_mm_remove_node() (Chris)

v7: Added size argument for insert_mappable_node (Tvrtko)

v8: Do not put_pages after pwrite, do memset of node in the wrapper
function (insert_mappable_node) (Chris)

v9: Rebase (Ankit)

Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# e556f7c1 07-Jun-2016 Dave Gordon <david.s.gordon@intel.com>

drm/i915/guc: fix GuC loading/submission check

The last stage of the GuC loader also sanitises the GuC submission
settings, so should be called unconditionally (even on platforms
without a GuC) to ensure consistent settings; in particular, this
prevents any attempt to use GuC submission on GuCless platforms!

Also fix error path handling and clarify DRM_INFO fallback message.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 14bb2c11 03-Jun-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Fix a buch of kerneldoc warnings

Just a bunch of stale kerneldocs generating warnings when
building the docs. Mostly function parameters so not very
useful but still.

v2: Tidy.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1464958937-23344-1-git-send-email-tvrtko.ursulin@linux.intel.com


# 93c76a3d 04-Dec-2015 Al Viro <viro@zeniv.linux.org.uk>

file_inode(f)->i_mapping is f->f_mapping

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# e2efd130 24-May-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename struct intel_context

Our goal is to rename the anonymous per-engine struct beneath the
current intel_context. However, after a lively debate resolving around
the confusion between intel_context_engine and intel_engine_context, the
realisation is that the two structs target different users. The outer
struct is API / user facing, and so carries the higher level GEM
information. The inner struct is hw facing. Thus we want to name the
inner struct intel_context and the outer one i915_gem_context. As the
first step, we need to rename the current struct:

s/struct intel_context/struct i915_gem_context/

which fits much better with its constructors already conveying the
i915_gem_context prefix!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464098023-3294-1-git-send-email-chris@chris-wilson.co.uk


# 80a89a5e 23-May-2016 Michal Hocko <mhocko@suse.com>

drm/i915: make i915_gem_mmap_ioctl wait for mmap_sem killable

i915_gem_mmap_ioctl relies on mmap_sem for write. If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving. Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: David Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# fce91f22 20-May-2016 Dave Gordon <david.s.gordon@intel.com>

drm/i915/guc: add enable_guc_loading parameter

Split the function of "enable_guc_submission" into two separate
options. The new one ("enable_guc_loading") controls only the
*fetching and loading* of the GuC firmware image. The existing
one is redefined to control only the *use* of the GuC for batch
submission once the firmware is loaded.

In addition, the degree of control has been refined from a simple
bool to an integer key, allowing several options:
-1 (default) whatever the platform default is
0 DISABLE don't load/use the GuC
1 BEST EFFORT try to load/use the GuC, fallback if not available
2 REQUIRE must load/use the GuC, else leave the GPU wedged

The new platform default (as coded here) will be to attempt to
load the GuC iff the device has a GuC that requires firmware,
but not yet to use it for submission. A later patch will change
to enable it if appropriate.

v4:
Changed some error-message levels, mostly ERROR->INFO, per
review comments by Tvrtko Ursulin.

v5:
Dropped one more error message, disabled GuC submission on
hypothetical firmware-free devices [Tvrtko Ursulin].

v6:
Logging tidy by Tvrtko Ursulin:
* Do not log falling back to execlists when wedging the GPU.
* Do not log fw load errors when load was disabled by user.
* Pass down some error code from fw load for log message to
make more sense.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v5)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Tested-by: Fiedorowicz, Lukasz <lukasz.fiedorowicz@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v5)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com> (v6)


# 1a3d1898 13-May-2016 Dave Gordon <david.s.gordon@intel.com>

drm/i915/guc: distinguish HAS_GUC() from HAS_GUC_UCODE/HAS_GUC_SCHED

For now, anything with a GuC requires uCode loading, and then supports
command submission once loaded. But these are logically distinct from
simply "having a GuC", so we need a separate macro for the latter. Then,
various tests should use this new macro rather than HAS_GUC_UCODE() or
testing enable_guc_submission.

v4:
Added a couple more uses of the new macro.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# f09d675f 13-May-2016 Dave Gordon <david.s.gordon@intel.com>

drm/i915/guc: rename loader entry points

The GuC initialisation code could do other things apart from loading
firmware, so here we rename the three primary entry points to remove any
specific reference to "ucode" (no functional changes, just renaming).

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 157d2c7f 13-May-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Stop automatically retiring requests after a GPU hang

Following a GPU hang, we break out of the request loop in order to
unlock the struct_mutex for use by the GPU reset. However, if we retire
all the requests at that moment, we cannot identify the guilty request
after performing the reset.

v2: Not automatically retiring requests forces us to recheck for
available ringspace.

Fixes: f4457ae71fd6 ("drm/i915: Prevent leaking of -EIO from i915_wait_request()")
Testcase: igt/gem_reset_stats/ban-*
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Tested-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463137042-9669-4-git-send-email-chris@chris-wilson.co.uk
(cherry picked from commit e075a32f515becef66dc849f5eca47409ccf5473)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 5fbd0418 06-May-2016 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms

Move the intel_enable_gtt() call to happen before we touch the GTT
during resume. Right now it's done way too late. Before
commit ebb7c78d358b ("agp/intel-gtt: Only register fake agp driver for gen1")
it was actually done earlier on account of also getting called from
the resume hook of the fake agp driver. With the fake agp driver
no longer getting registered we must move the call up.

The symptoms I've seen on my 830 machine include lowmem corruption,
other kinds of memory corruption, and straight up hung machine during
or just after resume. Not really sure what causes the memory corruption,
but so far I've not seen any with this fix.

I think we shouldn't really need to call this during init, but we have
been doing that so I've decided to keep the call. However moving that
call earlier could be prudent as well. Doing it right after the
intel-gtt probe seems appropriate.

Also tested this on 946gz,elk,ilk and all seemed quite happy with
this change.

v2: Reorder init_hw vs. enable_hw functions (Chris)

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: drm-intel-fixes@lists.freedesktop.org
Fixes: ebb7c78d358b ("agp/intel-gtt: Only register fake agp driver for gen1")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462559755-353-1-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit ac840ae53573d9f435c88c131f6707a79aecb466)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 85d1225e 20-May-2016 Dave Gordon <david.s.gordon@intel.com>

drm/i915: Introduce & use new lightweight SGL iterators

The existing for_each_sg_page() iterator is somewhat heavyweight, and is
limiting i915 driver performance in a few benchmarks. So here we
introduce somewhat lighter weight iterators, primarily for use with GEM
objects or other case where we need only deal with whole aligned pages.

Unlike the old iterator, the new iterators use an internal state
structure which is not intended to be accessed by the caller; instead
each takes as a parameter an output variable which is set before each
iteration. This makes them particularly simple to use :)

One of the new iterators provides the caller with the DMA address of
each page in turn; the other provides the 'struct page' pointer required
by many memory management operations.

Various uses of for_each_sg_page() are then converted to the new macros.

v2: Force inlining of the sg_iter constructor and make the union
anonymous.

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1463741647-15666-4-git-send-email-chris@chris-wilson.co.uk


# b338fa47 20-May-2016 Dave Gordon <david.s.gordon@intel.com>

drm/i915: optimise i915_gem_object_map() for small objects

We're using this function for ringbuffers and other "small" objects, so
it's worth avoiding an extra malloc()/free() cycle if the page array is
small enough to put on the stack. Here we've chosen an arbitrary cutoff
of 32 (4k) pages, which is big enough for a ringbuffer (4 pages) or a
context image (currently up to 22 pages).

v5:
change name of local array [Chris Wilson]

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1463741647-15666-3-git-send-email-chris@chris-wilson.co.uk


# dd6034c6 20-May-2016 Dave Gordon <david.s.gordon@intel.com>

drm/i915: refactor i915_gem_object_pin_map()

The recently-added i915_gem_object_pin_map() can be further optimised
for "small" objects. To facilitate this, and simplify the error paths
before adding the new code, this patch pulls out the "mapping" part of
the operation (involving local allocations which must be undone before
return) into its own subfunction.

The next patch will then insert the new optimisation into the middle of
the now-separated subfunction.

This reorganisation will probably not affect the generated code, as the
compiler will most likely inline it anyway, but it makes the logical
structure a bit clearer and easier to modify.

v2:
Restructure loop-over-pages & error check [Chris Wilson]

v3:
Add page count to debug messages [Chris Wilson]
Convert WARN_ON() to GEM_BUG_ON()

v4:
Drop the DEBUG messages [Tvrtko Ursulin]

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1463741647-15666-2-git-send-email-chris@chris-wilson.co.uk


# 72778cb2 19-May-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/userptr: Convert to drm_i915_private

userptr directly only uses drm_device in a single interface where it
meant to use drm_i915_private (everywhere else we have to derive it from
the drm_i915_gem_object and so require going from drm_device).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463671036-3235-1-git-send-email-chris@chris-wilson.co.uk


# a8ad0bd8 09-May-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm: Remove unused drm_device from drm_gem_object_lookup()

drm_gem_object_lookup() has never required the drm_device for its file
local translation of the user handle to the GEM object. Let's remove the
unused parameter and save some space.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Cc: Dave Airlie <airlied@redhat.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: Fixup kerneldoc too.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 461fb99c 14-May-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Update domain tracking for GEM objects on hibernation

When creating the hibernation image, the CPU will read the pages of all
objects and thus conflict with our domain tracking. We need to update
our domain tracking to accurately reflect the state on restoration.

v2: Perform the domain tracking inside freeze, before the image is
written, rather than upon restoration.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: David Weinehall <david.weinehall@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463207195-22076-2-git-send-email-chris@chris-wilson.co.uk


# e075a32f 13-May-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Stop automatically retiring requests after a GPU hang

Following a GPU hang, we break out of the request loop in order to
unlock the struct_mutex for use by the GPU reset. However, if we retire
all the requests at that moment, we cannot identify the guilty request
after performing the reset.

v2: Not automatically retiring requests forces us to recheck for
available ringspace.

Fixes: f4457ae71fd6 ("drm/i915: Prevent leaking of -EIO from i915_wait_request()")
Testcase: igt/gem_reset_stats/ban-*
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Tested-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463137042-9669-4-git-send-email-chris@chris-wilson.co.uk


# e6db7469 13-May-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Stop retiring requests from busy/wait ioctls

In order to reduce the workload of the caller, we do not want to
actually have to retire requests of others when checking the busy status
of this object. This applies to both busy/wait ioctls as the wait ioctl
has a semantically equivalent mode to the busy ioctl.

At the present time, this is only a minor improvement to reduce the
workload of the busy ioctl under the struct_mutex which helps to reduce
its impact upon contention of struct_mutex. However, since it is mostly
a victim in highly contended scenarios, the impact is very minor until
we can eliminate the struct_mutex requirement for busy-ioctl in the near
future.

v2: Mention the patches intended limited impact. It is just paving the
way for greater changes whilst reducing the impact of a bugfix in the
next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463137042-9669-3-git-send-email-chris@chris-wilson.co.uk


# 7e22dbbb 10-May-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Replace "INTEL_INFO->gen == x" checks with IS_GENx

This way optimization from a previous patch works even better.

v2: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>


# ac840ae5 06-May-2016 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms

Move the intel_enable_gtt() call to happen before we touch the GTT
during resume. Right now it's done way too late. Before
commit ebb7c78d358b ("agp/intel-gtt: Only register fake agp driver for gen1")
it was actually done earlier on account of also getting called from
the resume hook of the fake agp driver. With the fake agp driver
no longer getting registered we must move the call up.

The symptoms I've seen on my 830 machine include lowmem corruption,
other kinds of memory corruption, and straight up hung machine during
or just after resume. Not really sure what causes the memory corruption,
but so far I've not seen any with this fix.

I think we shouldn't really need to call this during init, but we have
been doing that so I've decided to keep the call. However moving that
call earlier could be prudent as well. Doing it right after the
intel-gtt probe seems appropriate.

Also tested this on 946gz,elk,ilk and all seemed quite happy with
this change.

v2: Reorder init_hw vs. enable_hw functions (Chris)

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: drm-intel-fixes@lists.freedesktop.org
Fixes: ebb7c78d358b ("agp/intel-gtt: Only register fake agp driver for gen1")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462559755-353-1-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# c033666a 06-May-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Store a i915 backpointer from engine, and use it

text data bss dec hex filename
6309351 3578714 696320 10584385 a18141 vmlinux
6308391 3578714 696320 10583425 a17d81 vmlinux

Almost 1KiB of code reduction.

v2: More s/INTEL_INFO()->gen/INTEL_GEN()/ and IS_GENx() conversions

text data bss dec hex filename
6304579 3578778 696320 10579677 a16edd vmlinux
6303427 3578778 696320 10578525 a16a5d vmlinux

Now over 1KiB!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462545621-30125-3-git-send-email-chris@chris-wilson.co.uk


# 7d993739 27-Apr-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Simplify intel_mark_busy/idle

They use dev_priv exclusively so pass it in instead of dev
for smaller source and binary.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1461844620-35360-1-git-send-email-tvrtko.ursulin@linux.intel.com


# 3ed605bc 25-Apr-2016 Gustavo Padovan <gustavo.padovan@collabora.co.uk>

kernel.h: add u64_to_user_ptr()

This function had copies in 3 different files. Unify them in kernel.h.

Cc: Joe Perches <joe@perches.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Acked-by: Daniel Vetter <daniel.vetter@intel.com> [drm/i915/]
Acked-by: Rob Clark <robdclark@gmail.com> [drm/msm/]
Acked-by: Lucas Stach <l.stach@pengutronix.de> [drm/etinav/]
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 0e4ca100 29-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix ordering of sanitize ppgtt and sanitize execlists

The i915.enable_ppgtt option depends upon the state of
i915.enable_execlists option - so we need to sanitize execlists first.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461932305-14637-2-git-send-email-chris@chris-wilson.co.uk


# e7ae86ba 28-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Unify GPU resets upon shutdown

Both execlists and legacy need to reset the context (and mode) of the
GPU before we lose control of the system. By resetting the GPU, we
revert back to default settings. This simplifies the life of any
subsequent driver (in particular for virtualized setups) as it does not
then have to try and recover from an unknown condition. As both paths
need to reset for the same reason, move the reset to a common point.

This unifies the resets added in a647828afc (drm/i915: Also perform gpu
reset under execlist mode) and 8e96d9c4d9 (drm/i915: reset the GPU on
context fini).

v2: Restrict the reset to "modern" gen (where we enable HW contexts) to
try and avoid leaving the machine in an unusable state with a risky
reset on older GPU. This should keep the status quo as to who performs
resets (i.e. currently only GPUs with HW contexts perform a reset on
shutdown).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
CC: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: "Niu, Bing" <bing.niu@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-25-git-send-email-chris@chris-wilson.co.uk


# e39d42fa 28-Apr-2016 Tvrtko Ursulin <tvrtko@ursulin.net>

drm/i915: Stop tracking execlists retired requests

With the previous patch having extended the pinned lifetime of
contexts by referencing the previous context from the current
request until the latter is retired (completed by the GPU),
we can now remove usage of execlist retired queue entirely.

This is because the above now guarantees that all execlist
object access requirements are satisfied by this new tracking,
and we can stop taking additional references and stop keeping
request on the execlists retired queue.

The latter was a source of significant scalability issues in
the driver causing performance hits on some tests. Most
dramatical of which was igt/gem_close_race which had run time
in tens of minutes which is now reduced to tens of seconds.

Signed-off-by: Tvrtko Ursulin <tvrtko@ursulin.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-24-git-send-email-chris@chris-wilson.co.uk


# a16a4052 28-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track the previous pinned context inside the request

As the contexts are accessed by the hardware until the switch is completed
to a new context, the hardware may still be writing to the context object
after the breadcrumb is visible. We must not unpin/unbind/prune that
object whilst still active and so we keep the previous context pinned until
the following request. We can generalise the tracking we already do via
the engine->last_context and move it to the request so that it works
equally for execlists and GuC.

v2: Drop the execlists double pin as that exposes a race inside the lrc
irq handler as it tries to access the context after it may be retired.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-22-git-send-email-chris@chris-wilson.co.uk


# 73db04cf 28-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move releasing of the GEM request from free to retire/cancel

If we move the release of the GEM request (i.e. decoupling it from the
various lists used for client and context tracking) after it is complete
(either by the GPU retiring the request, or by the caller cancelling the
request), we can remove the requirement that the final unreference of
the GEM request need to be under the struct_mutex.

The careful reader may notice that one or two impossible NULL pointer
tests are dropped for readability. These pointers cannot be NULL since
they are assigned during request construction and never unset.

v2,v3: Rebalance execlists by moving the context unpinning.
v4: Rebase onto -nightly
v5: Avoid trying to rebalance execlist/GuC context pinning, leave that
to the next step

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-21-git-send-email-chris@chris-wilson.co.uk


# 24f1d3cc 28-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refactor execlists default context pinning

Refactor pinning and unpinning of contexts, such that the default
context for an engine is pinned during initialisation and unpinned
during teardown (pinning of the context handles the reference counting).
Thus we can eliminate the special case handling of the default context
that was required to mask that it was not being pinned normally.

v2: Rebalance context_queue after rebasing.
v3: Rebase to -nightly (not 40 patches in)
v4: Rebase onto request_alloc unwinding

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-19-git-send-email-chris@chris-wilson.co.uk


# bfa01200 28-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Manually unwind after a failed request allocation

In the next patches, we want to move the work out of freeing the request
and into its retirement (so that we can free the request without
requiring the struct_mutex). This means that we cannot rely on
unreferencing the request to completely teardown the request any more
and so we need to manually unwind the failed allocation. In doing so, we
reorder the allocation in order to make the unwind simple (and ensure
that we don't try to unwind a partial request that may have modified
global state) and so we end up pushing the initial preallocation down
into the engine request initialisation functions where we have the
requisite control over the state of the request.

Moving the initial preallocation into the engine is less than ideal: it
moves logic to handle a specific problem with request handling out of
the common code. On the other hand, it does allow those backends
significantly more flexibility in performing its allocations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-14-git-send-email-chris@chris-wilson.co.uk


# 0251a963 28-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove the identical implementations of request space reservation

Now that we share intel_ring_begin(), reserving space for the tail of
the request is identical between legacy/execlists and so the tautology
can be removed. In the process, we move the reserved space tracking
from the ringbuffer on to the request. This is to enable us to reorder
the reserved space allocation in the next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-13-git-send-email-chris@chris-wilson.co.uk


# f9326be5 28-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rearrange switch_context to load the aliasing ppgtt on first use

The code to switch_mm() is already handled by i915_switch_context(), the
only difference required to setup the aliasing ppgtt is that we need to
emit te switch_mm() on the first context, i.e. when transitioning from
engine->last_context == NULL. This allows us to defer the
initialisation of the GPU from early device initialisation to first use,
which should marginally speed up both. The caveat is that we then defer
the context initialisation until first use - i.e. we cannot assume that
the GPU engines are initialised. For example, this means that power
contexts for rc6 (Ironlake) need to explicitly loaded, as they are.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-11-git-send-email-chris@chris-wilson.co.uk


# d200cda6 28-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove early l3-remap

Since we do the l3-remap on context switch, we can remove the redundant
early call to set the mapping prior to performing the first context
switch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-10-git-send-email-chris@chris-wilson.co.uk


# b0ebde39 28-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: L3 cache remapping is part of context switching

Move the i915_gem_l3_remap function such that it next to the context
switching, which is where we perform the L3 remap.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-8-git-send-email-chris@chris-wilson.co.uk


# b2e862d0 28-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark the current context as lost on suspend

In order to force a reload of the context image upon resume, we first
need to mark its absence on suspend. Currently we are failing to restore
the golden context state and any context w/a to the default context
after resume.

One oversight corrected, is that we had forgotten to reapply the L3
remapping when restoring the lost default context.

v2: Remove deprecated WARN.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-7-git-send-email-chris@chris-wilson.co.uk


# 8ef8561f 28-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move ioremap_wc tracking onto VMA

By tracking the iomapping on the VMA itself, we can share that area
between multiple users. Also by only revoking the iomapping upon
unbinding from the mappable portion of the GGTT, we can keep that iomap
across multiple invocations (e.g. execlists context pinning).

Note that by moving the iounnmap tracking to the VMA, we actually end up
fixing a leak of the iomapping in intel_fbdev.

v1.5: Rebase prompted by Tvrtko
v2: Drop dev_priv parameter, we can recover the i915_ggtt from the vma.
v3: Move handling of ioremap space exhaustion to vmap_purge and also
allow vmallocs to recover old iomaps. Add Tvrtko's kerneldoc.
v4: Fix a use-after-free in shrinker and rearrange i915_vma_iomap
v5: Back to i915_vm_to_ggtt
v6: Use i915_vma_pin_iomap and i915_vma_unpin_iomap to mark critical
sections and ensure the VMA cannot be reaped whilst mapped.
v7: Move i915_vma_iounmap so that consumers of the API are not tempted,
and add iomem annotations

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-5-git-send-email-chris@chris-wilson.co.uk


# fe3db79b 25-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Propagate error from drm_gem_object_init()

Propagate the real error from drm_gem_object_init(). Note this also
fixes some confusion in the error return from i915_gem_alloc_object...

v2:
(Matthew Auld)
- updated new users of gem_alloc_object from latest drm-nightly
- replaced occurrences of IS_ERR_OR_NULL() with IS_ERR()
v3:
(Joonas Lahtinen)
- fix double "From:" in commit message
- add goto teardown path
v4:
(Matthew Auld)
- rebase with i915_gem_alloc_object name change

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461587533-8841-1-git-send-email-matthew.auld@intel.com
[Joonas: Removed spurious " = NULL" from _init() function]
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# ff5ec22d 21-Apr-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Simplify i915_gem_obj_ggtt_bound_view

Can use the new vma->is_gtt to simplify the check and
get rid of the local variables.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461240286-25968-1-git-send-email-tvrtko.ursulin@linux.intel.com
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 8aac2220 21-Apr-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Simplify i915_gem_obj_ggtt_offset_view

Can use the new vma->is_ggtt to simplify the check and
remove the local variables.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>


# 598b9ec8 21-Apr-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Simplify i915_gem_obj_to_ggtt_view

Can use vma->is_ggtt to simplify the check and also switch the
BUG_ON to GEM_BUG_ON which is more appropriate for this.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>


# 8da32727 21-Apr-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Remove i915_gem_obj_size

Only caller is i915_gem_obj_ggtt_size which only cares about
GGTT so simplify it and implement under that name.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# d37cd8a8 22-Apr-2016 Dave Gordon <david.s.gordon@intel.com>

drm/i915: rename i915_gem_alloc_object() to i915_gem_object_create()

Because having both i915_gem_object_alloc() and i915_gem_alloc_object()
(with different return conventions) is just too confusing!

(i915_gem_object_alloc() is the low-level memory allocator, and remains
unchanged, whereas i915_gem_alloc_object() is a constructor that ALSO
initialises the newly-allocated object.)

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461348872-4702-1-git-send-email-david.s.gordon@intel.com


# f74418a4 30-Mar-2016 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/vma_manage: Drop has_offset

It's racy, creating mmap offsets is a slowpath, so better to remove it
to avoid drivers doing broken things.

The only user is i915, and it's ok there because everything (well
almost) is protected by dev->struct_mutex in i915-gem.

While at it add a note in the create_mmap_offset kerneldoc that
drivers must release it again. And then I also noticed that
drm_gem_object_release entirely lacks kerneldoc.

Cc: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459330852-27668-14-git-send-email-daniel.vetter@ffwll.ch


# 791bee12 19-Apr-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Remove a couple pointless WARN_ONs

Just two WARN_ONs followed by pointer dereference I spotted by accident.

v2: Remove some more of the same.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1461080770-14693-1-git-send-email-tvrtko.ursulin@linux.intel.com


# 0ccdacf6 13-Apr-2016 Peter Antoine <peter.antoine@intel.com>

drm/i915/mocs: Program MOCS for all engines on init

Allow for the MOCS to be programmed for all engines.
Currently we program the MOCS when the first render batch
goes through. This works on most platforms but fails on
platforms that do not run a render batch early,
i.e. headless servers. The patch now programs all initialised engines
on init and the RCS is programmed again within the initial batch. This
is done for predictable consistency with regards to the hardware
context.

Hardware context loading sets the values of the MOCS for RCS
and L3CC. Programming them from within the batch makes sure that
the render context is valid, no matter what the previous state of
the saved-context was.

v2: posted correct version to the mailing list.
v3: moved programming to within engine->init_hw() (Chris Wilson)
v4: code formatting and white-space changes. (Chris Wilson)

Testcase: igt/gem_mocs_settings
Signed-off-by: Peter Antoine <peter.antoine@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1460556205-6644-1-git-send-email-chris@chris-wilson.co.uk


# aa9b7810 13-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Late request cancellations are harmful

Conceptually, each request is a record of a hardware transaction - we
build up a list of pending commands and then either commit them to
hardware, or cancel them. However, whilst building up the list of
pending commands, we may modify state outside of the request and make
references to the pending request. If we do so and then cancel that
request, external objects then point to the deleted request leading to
both graphical and memory corruption.

The easiest example is to consider object/VMA tracking. When we mark an
object as active in a request, we store a pointer to this, the most
recent request, in the object. Then we want to free that object, we wait
for the most recent request to be idle before proceeding (otherwise the
hardware will write to pages now owned by the system, or we will attempt
to read from those pages before the hardware is finished writing). If
the request was cancelled instead, that wait completes immediately. As a
result, all requests must be committed and not cancelled if the external
state is unknown.

All that remains of i915_gem_request_cancel() users are just a couple of
extremely unlikely allocation failures, so remove the API entirely.

A consequence of committing all incomplete requests is that we generate
excess breadcrumbs and fill the ring much more often with dummy work. We
have completely undone the outstanding_last_seqno optimisation.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-16-git-send-email-chris@chris-wilson.co.uk


# 349f2ccf 13-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move the mb() following release-mmap into release-mmap

As paranoia, we want to ensure that the CPU's PTEs have been revoked for
the object before we return from i915_gem_release_mmap(). This allows us
to rely on there being no outstanding memory accesses from userspace
and guarantees serialisation of the code against concurrent access just
by calling i915_gem_release_mmap().

v2: Reduce the mb() into a wmb() following the revoke.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-13-git-send-email-chris@chris-wilson.co.uk


# f4457ae7 13-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prevent leaking of -EIO from i915_wait_request()

Reporting -EIO from i915_wait_request() has proven very troublematic
over the years, with numerous hard-to-reproduce bugs cropping up in the
corner case of where a reset occurs and the code wasn't expecting such
an error.

If the we reset the GPU or have detected a hang and wish to reset the
GPU, the request is forcibly complete and the wait broken. Currently, we
report either -EAGAIN or -EIO in order for the caller to retreat and
restart the wait (if appropriate) after dropping and then reacquiring
the struct_mutex (essential to allow the GPU reset to proceed). However,
if we take the view that the request is complete (no further work will
be done on it by the GPU because it is dead and soon to be reset), then
we can proceed with the task at hand and then drop the struct_mutex
allowing the reset to occur. This transfers the burden of checking
whether it is safe to proceed to the caller, which in all but one
instance it is safe - completely eliminating the source of all spurious
-EIO.

Of note, we only have two API entry points where we expect that
userspace can observe an EIO. First is when submitting an execbuf, if
the GPU is terminally wedged, then the operation cannot succeed and an
-EIO is reported. Secondly, existing userspace uses the throttle ioctl
to detect an already wedged GPU before starting using HW acceleration
(or to confirm that the GPU is wedged after an error condition). So if
the GPU is wedged when the user calls throttle, also report -EIO.

v2: Split more carefully the change to i915_wait_request() and assorted
ABI from the reset handling.
v3: Add a couple of WARN_ON(EIO) to the interruptible modesetting code
so that we don't start to leak EIO there in future (and break our hang
resistant modesetting).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-9-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-1-git-send-email-chris@chris-wilson.co.uk


# 299259a3 13-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Store the reset counter when constructing a request

As the request is only valid during the same global reset epoch, we can
record the current reset_counter when constructing the request and reuse
it when waiting upon that request in future. This removes a very hairy
atomic check serialised by the struct_mutex at the time of waiting and
allows us to transfer those waits to a central dispatcher for all
waiters and all requests.

PS: With per-engine resets, we obviously cannot assume a global reset
epoch for the requests - a per-engine epoch makes the most sense. The
challenge then is how to handle checking in the waiter for when to break
the wait, as the fine-grained reset may also want to requeue the
request (i.e. the assumption that just because the epoch changes the
request is completed may be broken - or we just avoid breaking that
assumption with the fine-grained resets).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-7-git-send-email-chris@chris-wilson.co.uk


# d98c52cf 13-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Tighten reset_counter for reset status

In the reset_counter, we use two bits to track a GPU hang and reset. The
low bit is a "reset-in-progress" flag that we set to signal when we need
to break waiters in order for the recovery task to grab the mutex. As
soon as the recovery task has the mutex, we can clear that flag (which
we do by incrementing the reset_counter thereby incrementing the gobal
reset epoch). By clearing that flag when the recovery task holds the
struct_mutex, we can forgo a second flag that simply tells GEM to ignore
the "reset-in-progress" flag.

The second flag we store in the reset_counter is whether the
reset failed and we consider the GPU terminally wedged. Whilst this flag
is set, all access to the GPU (at least through GEM rather than direct mmio
access) is verboten.

PS: Fun is in store, as in the future we want to move from a global
reset epoch to a per-engine reset engine with request recovery.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-6-git-send-email-chris@chris-wilson.co.uk


# c19ae989 13-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Hide the atomic_read(reset_counter) behind a helper

This is principally a little bit of syntatic sugar to hide the
atomic_read()s throughout the code to retrieve the current reset_counter.
It also provides the other utility functions to check the reset state on the
already read reset_counter, so that (in later patches) we can read it once
and do multiple tests rather than risk the value changing between tests.

v2: Be more strict on converting existing i915_reset_in_progress() over to
the more verbose i915_reset_in_progress_or_wedged().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-4-git-send-email-chris@chris-wilson.co.uk


# d501b1d2 13-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add GEM debugging Kconfig option

Currently there is a #define to enable extra BUG_ON for debugging
requests and associated activities. I want to expand its use to cover
all of GEM internals (so that we can saturate the code with asserts).
We can add a Kconfig option to make it easier to enable - with the usual
caveats of not enabling unless explicitly requested.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-3-git-send-email-chris@chris-wilson.co.uk


# 3accaf7e 13-Apr-2016 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Store and use edram capabilities

Store the edram capabilities instead of only the size of
edram. This is preparatory patch to allow edram size calculation
based on edram capability bits for gen9+. With gen9 the
edram is behind llc and is a separate entity. With hsw/bdw
it was more of a victim cache for LLC so the name 'eLLC' might
be warranted. Regardless, rename all mentions of eLLC to EDRAM to
clear the confusion.

v2: return bytes for edram size (Chris)
s/eLLC/eDRAM in output if we are gen > 8

v3: rebase, INTEL_GEN (Chris)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# 666fbcf5 13-Apr-2016 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Don't program eLLC IDI hash mask for gen9+

For gen9 onwards, eDRAM is a true memory side cache. So
there is no need to program idi hash mask as it is for eLLC
only.

v2: INTEL_GEN (Chris), s/has/hash (Matthew)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>


# fb8621d3 07-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid allocating a vmap arena for a single page

If we want a contiguous mapping of a single page sized object, we can
forgo using vmap() and just use a regular kmap(). Note that this is only
suitable if the desired pgprot_t is compatible.

v2: Use is_vmalloc_addr()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460113874-17366-7-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>


# f2a85e19 07-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm,i915: Introduce drm_malloc_gfp()

I have instances where I want to use drm_malloc_ab() but with a custom
gfp mask. And with those, where I want a temporary allocation, I want to
try a high-order kmalloc() before using a vmalloc().

So refactor my usage into drm_malloc_gfp().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460113874-17366-6-git-send-email-chris@chris-wilson.co.uk


# 0a798eb9 07-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refactor duplicate object vmap functions

We now have two implementations for vmapping a whole object, one for
dma-buf and one for the ringbuffer. If we couple the mapping into the
obj->pages lifetime, then we can reuse an obj->mapping for both and at
the same time couple it into the shrinker. There is a third vmapping
routine in the cmdparser that maps only a range within the object, for
the time being that is left alone, but will eventually use these routines
in order to cache the mapping between invocations.

v2: Mark the failable kmalloc() as __GFP_NOWARN (vsyrjala)
v3: Call unpin_vmap from the right dmabuf unmapper

v4: Rename vmap to map as we don't wish to imply the type of mapping
involved, just that it contiguously maps the object into kernel space.
Add kerneldoc and lockdep annotations

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460113874-17366-4-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>


# 7c90b7de 07-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Apply a mb between emitting the request and hangcheck

Seal the request and mark it as pending execution before we submit it to
hardware. We assume that the actual submission cannot fail (that
guarantee is provided by preallocating space in the request for the
submission). As we may inspect this state without holding any locks
during hangcheck we should apply a barrier to ensure that we do
not see a more recent value in the HWS than we are tracking.

Based on a patch by Mika Kuoppala.

Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460010558-10705-8-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>


# 29dcb570 07-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move the hw semaphore initialisation from GEM to the engine

Since we are setting engine local values that are tied to the hardware,
move it out of i915_gem_init_seqno() into the intel_ring_init_seqno()
backend, next to where the other hw semaphore registers are written.

v2: Make the explanatory comment about always resetting the semaphores to
0 irrespective of the value of the reset seqno.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460010558-10705-4-git-send-email-chris@chris-wilson.co.uk


# 2ed53a94 07-Apr-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: On GPU reset, set the HWS breadcrumb to the last seqno

After the GPU reset and we discard all of the incomplete requests, mark
the GPU as having advanced to the last_submitted_seqno (as having
completed the requests and ready for fresh work). The impact of this is
negligible, as all the requests will be considered completed by this
point, it just brings the HWS into line with expectations for external
viewers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460010558-10705-2-git-send-email-chris@chris-wilson.co.uk


# 09cbfeaf 01-Apr-2016 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 27af5eea 03-Apr-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Move execlists irq handler to a bottom half

Doing a lot of work in the interrupt handler introduces huge
latencies to the system as a whole.

Most dramatic effect can be seen by running an all engine
stress test like igt/gem_exec_nop/all where, when the kernel
config is lean enough, the whole system can be brought into
multi-second periods of complete non-interactivty. That can
look for example like this:

NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u8:3:143]
Modules linked in: [redacted for brevity]
CPU: 0 PID: 143 Comm: kworker/u8:3 Tainted: G U L 4.5.0-160321+ #183
Hardware name: Intel Corporation Broadwell Client platform/WhiteTip Mountain 1
Workqueue: i915 gen6_pm_rps_work [i915]
task: ffff8800aae88000 ti: ffff8800aae90000 task.ti: ffff8800aae90000
RIP: 0010:[<ffffffff8104a3c2>] [<ffffffff8104a3c2>] __do_softirq+0x72/0x1d0
RSP: 0000:ffff88014f403f38 EFLAGS: 00000206
RAX: ffff8800aae94000 RBX: 0000000000000000 RCX: 00000000000006e0
RDX: 0000000000000020 RSI: 0000000004208060 RDI: 0000000000215d80
RBP: ffff88014f403f80 R08: 0000000b1b42c180 R09: 0000000000000022
R10: 0000000000000004 R11: 00000000ffffffff R12: 000000000000a030
R13: 0000000000000082 R14: ffff8800aa4d0080 R15: 0000000000000082
FS: 0000000000000000(0000) GS:ffff88014f400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa53b90c000 CR3: 0000000001a0a000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
042080601b33869f ffff8800aae94000 00000000fffc2678 ffff88010000000a
0000000000000000 000000000000a030 0000000000005302 ffff8800aa4d0080
0000000000000206 ffff88014f403f90 ffffffff8104a716 ffff88014f403fa8
Call Trace:
<IRQ>
[<ffffffff8104a716>] irq_exit+0x86/0x90
[<ffffffff81031e7d>] smp_apic_timer_interrupt+0x3d/0x50
[<ffffffff814f3eac>] apic_timer_interrupt+0x7c/0x90
<EOI>
[<ffffffffa01c5b40>] ? gen8_write64+0x1a0/0x1a0 [i915]
[<ffffffff814f2b39>] ? _raw_spin_unlock_irqrestore+0x9/0x20
[<ffffffffa01c5c44>] gen8_write32+0x104/0x1a0 [i915]
[<ffffffff8132c6a2>] ? n_tty_receive_buf_common+0x372/0xae0
[<ffffffffa017cc9e>] gen6_set_rps_thresholds+0x1be/0x330 [i915]
[<ffffffffa017eaf0>] gen6_set_rps+0x70/0x200 [i915]
[<ffffffffa0185375>] intel_set_rps+0x25/0x30 [i915]
[<ffffffffa01768fd>] gen6_pm_rps_work+0x10d/0x2e0 [i915]
[<ffffffff81063852>] ? finish_task_switch+0x72/0x1c0
[<ffffffff8105ab29>] process_one_work+0x139/0x350
[<ffffffff8105b186>] worker_thread+0x126/0x490
[<ffffffff8105b060>] ? rescuer_thread+0x320/0x320
[<ffffffff8105fa64>] kthread+0xc4/0xe0
[<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
[<ffffffff814f351f>] ret_from_fork+0x3f/0x70
[<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170

I could not explain, or find a code path, which would explain
a +20 second lockup, but from some instrumentation it was
apparent the interrupts off proportion of time was between
10-25% under heavy load which is quite bad.

When a interrupt "cliff" is reached, which was >~320k irq/s on
my machine, the whole system goes into a terrible state of the
above described multi-second lockups.

By moving the GT interrupt handling to a tasklet in a most
simple way, the problem above disappears completely.

Testing the effect on sytem-wide latencies using
igt/gem_syslatency shows the following before this patch:

gem_syslatency: cycles=1532739, latency mean=416531.829us max=2499237us
gem_syslatency: cycles=1839434, latency mean=1458099.157us max=4998944us
gem_syslatency: cycles=1432570, latency mean=2688.451us max=1201185us
gem_syslatency: cycles=1533543, latency mean=416520.499us max=2498886us

This shows that the unrelated process is experiencing huge
delays in its wake-up latency. After the patch the results
look like this:

gem_syslatency: cycles=808907, latency mean=53.133us max=1640us
gem_syslatency: cycles=862154, latency mean=62.778us max=2117us
gem_syslatency: cycles=856039, latency mean=58.079us max=2123us
gem_syslatency: cycles=841683, latency mean=56.914us max=1667us

Showing a huge improvement in the unrelated process wake-up
latency. It also shows an approximate halving in the number
of total empty batches submitted during the test. This may
not be worrying since the test puts the driver under
a very unrealistic load with ncpu threads doing empty batch
submission to all GPU engines each.

Another benefit compared to the hard-irq handling is that now
work on all engines can be dispatched in parallel since we can
have up to number of CPUs active tasklets. (While previously
a single hard-irq would serially dispatch on one engine after
another.)

More interesting scenario with regards to throughput is
"gem_latency -n 100" which shows 25% better throughput and
CPU usage, and 14% better dispatch latencies.

I did not find any gains or regressions with Synmark2 or
GLbench under light testing. More benchmarking is certainly
required.

v2:
* execlists_lock should be taken as spin_lock_bh when
queuing work from userspace now. (Chris Wilson)
* uncore.lock must be taken with spin_lock_irq when
submitting requests since that now runs from either
softirq or process context.

v3:
* Expanded commit message with more testing data;
* converted missed locking sites to _bh;
* added execlist_lock comment. (Chris Wilson)

v4:
* Mention dispatch parallelism in commit. (Chris Wilson)
* Do not hold uncore.lock over MMIO reads since the block
is already serialised per-engine via the tasklet itself.
(Chris Wilson)
* intel_lrc_irq_handler should be static. (Chris Wilson)
* Cancel/sync the tasklet on GPU reset. (Chris Wilson)
* Document and WARN that tasklet cannot be active/pending
on engine cleanup. (Chris Wilson/Imre Deak)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Testcase: igt/gem_exec_nop/all
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94350
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1459768316-6670-1-git-send-email-tvrtko.ursulin@linux.intel.com


# 72e96d64 30-Mar-2016 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Refer to GGTT {,VM} consistently

Refer to the GGTT VM consistently as "ggtt->base" instead of just "ggtt",
"vm" or indirectly through other variables like "dev_priv->ggtt.base"
to avoid confusion with the i915_ggtt object itself and PPGTT VMs.

Refer to the GGTT as "ggtt" instead of indirectly through chaining.

As a bonus gets rid of the long-standing i915_obj_to_ggtt vs.
i915_gem_obj_to_ggtt conflict, due to removal of i915_obj_to_ggtt!

v2:
- Added some more after grepping sources with Chris

v3:
- Refer to GGTT VM through ggtt->base consistently instead of ggtt_vm
(Chris)

v4:
- Convert all dev_priv->ggtt->foo accesses to ggtt->foo.

v5:
- Make patch checker happy

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# 568a58e5 29-Mar-2016 Borislav Petkov <bp@suse.de>

x86/mm/pat, x86/cpufeature: Remove cpu_has_pat

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1459266123-21878-9-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# d85489d3 24-Mar-2016 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Rename GGTT init functions

Rename and document the GGTT init functions to give a better
idea of the context where they are called from.

i915_gem_gtt_init => i915_ggtt_init_hw
i915_gem_init_global_gtt => i915_gem_init_ggtt
i915_global_gtt_cleanup => i915_ggtt_cleanup_hw

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458830866-12578-1-git-send-email-joonas.lahtinen@linux.intel.com


# ade7daa1 24-Mar-2016 Matthew Auld <matthew.auld@intel.com>

drm/i915: BUG_ON when ggtt_view is NULL

Lets BUG_ON and don't bother with a WARN and returning an error, so we can
remove the need to pollute the code with error handling, after all it is
a programmer error to provide NULL view. Also while we're here remove
redundant NULL ggtt_view check.

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458834860-7898-1-git-send-email-matthew.auld@intel.com


# b4ac5afc 24-Mar-2016 Dave Gordon <david.s.gordon@intel.com>

drm/i915: replace for_each_engine()

Having provided for_each_engine_id() for cases where the third (id)
argument is useful, we can now replace all the remaining instances with
a simpler version that takes only two parameters. In many cases, this
also allows the elimination of the local variable used in the iterator
(usually 'i').

v2:
s/dev_priv/(dev_priv__)/ in body of for_each_engine_masked() [Chris Wilson]

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458757194-17783-2-git-send-email-david.s.gordon@intel.com


# 62106b4f 18-Mar-2016 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Rename dev_priv->gtt to dev_priv->ggtt

Refer to Global GTT consistently as GGTT, thus rename dev_priv->gtt
to dev_priv->ggtt and struct i915_gtt to struct i915_ggtt.

Fix a couple of whitespace problems while at it.

v2:
- Fix a typo in commit message.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>


# 39dabecd 17-Mar-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Use shorter route to dev_private where possible

Where we have a request we can use req->i915 directly instead
of going through the engine and device. Coccinelle script:

@@
function f;
identifier r;
@@
f(..., struct drm_i915_gem_request *r, ...)
{
...
- engine->dev->dev_private
+ r->i915
...
}
@@
struct drm_i915_gem_request *req;
@@
(
req->
- engine->dev->dev_private
+ i915
)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1458219850-21007-1-git-send-email-tvrtko.ursulin@linux.intel.com


# a112dbad 17-Mar-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Remove unused variable in i915_gem_request_add_to_client

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# 40ae4e16 16-Mar-2016 Imre Deak <imre.deak@intel.com>

drm/i915: Move load time gem_load_init earlier

The only steps requiring device access is the fence and swizzling
initialization, so split these out keeping them in their current place
and move the rest of init steps earlier.

v2-v3:
- unchanged
v4:
- move call to i915_gem_detect_bit_6_swizzle() to
i915_gem_load_init_fences() and preserve the original order of
the detection of HW fence capailities wrt. swizzling (Chris)

CC: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1458132843-21860-1-git-send-email-imre.deak@intel.com


# ee4b6faf 16-Mar-2016 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Modify reset func to handle per engine resets

In full gpu reset we prime all engines and reset domains corresponding to
each engine. Per engine reset is just a special case of this process
wherein only a single engine is reset. This change is aimed to modify
relevant functions to achieve this. There are some other steps we carry out
in case of engine reset which are addressed in later patches.

Reset func now accepts a mask of all engines that need to be reset. Where
per engine resets are supported, error handler populates the mask
accordingly otherwise all engines are specified.

v2: ALL_ENGINES mask fixup, better for_each_ring_masked (Chris)
v3: Whitespace fixes (Chris)
v4: Rebase due to s/ring/engine

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458143640-20563-1-git-send-email-mika.kuoppala@intel.com


# 117897f4 16-Mar-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: More renaming of rings to engines

This time using only sed and a few by hand.

v2: Rename also intel_ring_id and intel_ring_initialized.
v3: Fixed typo in intel_ring_initialized.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1458126040-33105-1-git-send-email-tvrtko.ursulin@linux.intel.com


# 666796da 16-Mar-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: More intel_engine_cs renaming

Some trivial ones, first pass done with Coccinelle:

@@
@@
(
- I915_NUM_RINGS
+ I915_NUM_ENGINES
|
- intel_ring_flag
+ intel_engine_flag
|
- for_each_ring
+ for_each_engine
|
- i915_gem_request_get_ring
+ i915_gem_request_get_engine
|
- intel_ring_idle
+ intel_engine_idle
|
- i915_gem_reset_ring_status
+ i915_gem_reset_engine_status
|
- i915_gem_reset_ring_cleanup
+ i915_gem_reset_engine_cleanup
|
- init_ring_lists
+ init_engine_lists
)

But that didn't fully work so I cleaned it up with:

for f in *.[hc]; do sed -i -e s/I915_NUM_RINGS/I915_NUM_ENGINES/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_request_get_ring/i915_gem_request_get_engine/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_flag/intel_engine_flag/ $f; done
for f in *.[hc]; do sed -i -e s/intel_ring_idle/intel_engine_idle/ $f; done
for f in *.[hc]; do sed -i -e s/init_ring_lists/init_engine_lists/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_cleanup/i915_gem_reset_engine_cleanup/ $f; done
for f in *.[hc]; do sed -i -e s/i915_gem_reset_ring_status/i915_gem_reset_engine_status/ $f; done

v2: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# 4a570db5 16-Mar-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Rename intel_engine_cs struct members

below and a couple manual fixups.

@@
identifier I, J;
@@
struct I {
...
- struct intel_engine_cs *J;
+ struct intel_engine_cs *engine;
...
}
@@
identifier I, J;
@@
struct I {
...
- struct intel_engine_cs J;
+ struct intel_engine_cs engine;
...
}
@@
struct drm_i915_private *d;
@@
(
- d->ring
+ d->engine
)
@@
struct i915_execbuffer_params *p;
@@
(
- p->ring
+ p->engine
)
@@
struct intel_ringbuffer *r;
@@
(
- r->ring
+ r->engine
)
@@
struct drm_i915_gem_request *req;
@@
(
- req->ring
+ req->engine
)

v2: Script missed the tracepoint code - fixed up by hand.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# 0bc40be8 16-Mar-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Rename intel_engine_cs function parameters

@@
identifier func;
@@
func(..., struct intel_engine_cs *
- ring
+ engine
, ...)
{
<...
- ring
+ engine
...>
}
@@
identifier func;
type T;
@@
T func(..., struct intel_engine_cs *
- ring
+ engine
, ...);

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# e2f80391 16-Mar-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Rename local struct intel_engine_cs variables

Done by the Coccinelle script below plus a manual
intervention to GEN8_RING_SEMAPHORE_INIT.

@@
expression E;
@@
- struct intel_engine_cs *ring = E;
+ struct intel_engine_cs *engine = E;
<+...
- ring
+ engine
...+>
@@
@@
- struct intel_engine_cs *ring;
+ struct intel_engine_cs *engine;
<+...
- ring
+ engine
...+>

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# ca377809 01-Mar-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Avoid snooping with userptr where not supported

commit e5756c10d841ddb448293c849392f3d6b809561f
Author: Imre Deak <imre.deak@intel.com>
Date: Fri Aug 14 18:43:30 2015 +0300

drm/i915/bxt: don't allow cached GEM mappings on A stepping

Added an exception of disallowing snooping for Broxton A
stepping hardware but userptr was still enabling it regardless.

Move the check to HAS_SNOOP now that it is used from multiple
call sites and use it.

v2: Userptr cannot be supported when it cannot be coherent and
generalize the code better. (Chris Wilson)

v3: Make has_snoop true only when !has_llc. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1456920631-34302-1-git-send-email-tvrtko.ursulin@linux.intel.com


# 596c5923 26-Feb-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reduce the pointer dance of i915_is_ggtt()

The multiple levels of indirect do nothing but hinder the compiler and
the pointer chasing turns to be quite painful but painless to fix.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456484600-11477-1-git-send-email-tvrtko.ursulin@linux.intel.com


# 1c7f4bca 26-Feb-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename vma->*_list to *_link for consistency

Elsewhere we have adopted the convention of using '_link' to denote
elements in the list (and '_list' for the actual list_head itself), and
that the name should indicate which list the link belongs to (and
preferrably not just where the link is being stored).

s/vma_link/obj_link/ (we iterate over obj->vma_list)
s/mm_list/vm_link/ (we iterate over vm->[in]active_list)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>


# 1db6e2e7 09-Feb-2016 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Check for get_pages instead of shmem (filp)

This behavior of checking for a shmem backed GEM object was introduced here:
commit 4c914c0c7c787b8f730128a8cdcca9c50b0784ab
Author: Brad Volkin <bradley.d.volkin@intel.com>
Date: Tue Feb 18 10:15:45 2014 -0800

drm/i915: Refactor shmem pread setup

It is possible for an object to not be a shmem backed GEM object (for example
userptr objects). An example of how we hit this failure can be found through
copy_batch() in the command parser because we allocate a userptr object for the
batch which contains privileged instructions. Userptr calls
drm_gem_private_object_init() which explicitly sets the filp to none.

NOTE: I manually retyped this from a test machine. So I haven't even compiled
this exact patch.

v2: Use same logic as from a2a4f916c2f (Kristian, Dave Gordon)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1455047053-2644-1-git-send-email-benjamin.widawsky@intel.com


# 3f441b82 11-Feb-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Use appropriate spinlock flavour

We know this never runs from interrupt context so
don't need to use the flags variant.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1ffedc06 15-Feb-2016 Daniel Vetter <daniel.vetter@ffwll.ch>

Revert "drm/i915: fix context/engine cleanup order"

This reverts commit 1b39a917a9e00378c02c50ad86632ed3d872bfad.

Chris retracted his reviewed-by (which I failed to notice) and somehow
it blows up (I did it again!) as reported by Mika with the below
backtrace on module reload:

[ 58.170374] IP: [<ffffffffa00e04d3>]
intel_logical_ring_cleanup+0x83/0x100 [i915]
...
[ 58.170469] Call Trace:
[ 58.170479] [<ffffffffa00d0ed4>] i915_gem_cleanup_engines+0x34/0x60
[i915]
[ 58.170493] [<ffffffffa0154520>] i915_driver_unload+0x140/0x220
[i915]
[ 58.170497] [<ffffffff8154a4f4>] drm_dev_unregister+0x24/0xa0
[ 58.170501] [<ffffffff8154aace>] drm_put_dev+0x1e/0x60
[ 58.170506] [<ffffffffa00912a0>] i915_pci_remove+0x10/0x20 [i915]
[ 58.170510] [<ffffffff814766e4>] pci_device_remove+0x34/0xb0
[ 58.170514] [<ffffffff8156e7d5>] __device_release_driver+0x95/0x140
[ 58.170518] [<ffffffff8156e97c>] driver_detach+0xbc/0xc0
[ 58.170521] [<ffffffff8156d883>] bus_remove_driver+0x53/0xd0
[ 58.170525] [<ffffffff8156f3a7>] driver_unregister+0x27/0x50
[ 58.170528] [<ffffffff81475725>] pci_unregister_driver+0x25/0x70
[ 58.170531] [<ffffffff8154c274>] drm_pci_exit+0x74/0x90
[ 58.170543] [<ffffffffa0154cb0>] i915_exit+0x20/0x1aa [i915]
[ 58.170548] [<ffffffff8111846f>] SyS_delete_module+0x18f/0x1f0

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Nick Hoath <nicholas.hoath@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>


# 1b39a917 22-Jan-2016 Nick Hoath <nicholas.hoath@intel.com>

drm/i915: fix context/engine cleanup order

Swap the order of context & engine cleanup, so that contexts are cleaned
up first, and *then* engines. This is a more sensible order anyway, but
in particular has become necessary since the 'intel_ring_initialized()
must be simple and inline' patch, which now uses ring->dev as an
'initialised' flag, so it can now be NULL after engine teardown. This
in turn can cause a problem in the context code, which (used to) check
the ring->dev->struct_mutex -- causing a fault if ring->dev was NULL.

Also rename the cleanup function to reflect what it actually does
(cleanup engines, not a ringbuffer), and fix an annoying whitespace issue.

v2: Also make the fix in i915_load_modeset_init, not just in
i915_driver_unload (Chris Wilson)
v3: Had extra stuff in it.
v4: Reverted extra stuff (so we're back to v2).
Rebased and updated commentary above (Dave Gordon).

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Signed-off-by: David Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1453504211-7982-2-git-send-email-david.s.gordon@intel.com


# 93232aeb 22-Jan-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Allow i915_gem_object_get_page() on userptr as well

commit 033908aed5a596f6202c848c6bbc8a40fb1a8490
Author: Dave Gordon <david.s.gordon@intel.com>
Date: Thu Dec 10 18:51:23 2015 +0000

drm/i915: mark GEM object pages dirty when mapped & written by the CPU

introduced a check into i915_gem_object_get_dirty_pages() that returned
a NULL pointer when called with a bad object, one that was not backed by
shmemfs. This WARN was too strict as we can work on all struct page
backed objects, and resulted in a WARN + GPF for existing userspace. In
order to differentiate the various types of objects, add a new flags field
to the i915_gem_object_ops struct to describe their capabilities, with
the first flag being whether the object has struct pages.

v2: Drop silly const before an integer in the structure declaration.

Testcase: igt/gem_userptr_blits/relocations
Reported-and-tested-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Tested-by: Michal Winiarski <michal.winiarski@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Fixes: 033908aed5a5 ("drm/i915: mark GEM object pages dirty when mapped & written by the CPU")
Link: http://patchwork.freedesktop.org/patch/msgid/1453487551-16799-1-git-send-email-chris@chris-wilson.co.uk
(cherry picked from commit de4726649b6b1d7f3f02b2031ee99e067cb71e2d)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# de472664 22-Jan-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Allow i915_gem_object_get_page() on userptr as well

commit 033908aed5a596f6202c848c6bbc8a40fb1a8490
Author: Dave Gordon <david.s.gordon@intel.com>
Date: Thu Dec 10 18:51:23 2015 +0000

drm/i915: mark GEM object pages dirty when mapped & written by the CPU

introduced a check into i915_gem_object_get_dirty_pages() that returned
a NULL pointer when called with a bad object, one that was not backed by
shmemfs. This WARN was too strict as we can work on all struct page
backed objects, and resulted in a WARN + GPF for existing userspace. In
order to differentiate the various types of objects, add a new flags field
to the i915_gem_object_ops struct to describe their capabilities, with
the first flag being whether the object has struct pages.

v2: Drop silly const before an integer in the structure declaration.

Testcase: igt/gem_userptr_blits/relocations
Reported-and-tested-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Tested-by: Michal Winiarski <michal.winiarski@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453487551-16799-1-git-send-email-chris@chris-wilson.co.uk


# e5292823 28-Jan-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Make LRC (un)pinning work on context and engine

Previously intel_lr_context_(un)pin were operating on requests
which is in conflict with their names.

If we make them take a context and an engine, it makes the names
make more sense and it also makes future fixes possible.

v2: Rebase for default_context/kernel_context change.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Nick Hoath <nicholas.hoath@intel.com>


# d64aa096 19-Jan-2016 Imre Deak <imre.deak@intel.com>

drm/i915: Sanitize i915_gem_load() init and clean-up

Factor out common clean-up code for the GEM load time init function.
Also rename i915_gem_load() to i915_gem_load_init() to have a better
match with its new clean-up function.

No functional change.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: David Weinehall <david.weinehall@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453209992-25995-5-git-send-email-imre.deak@intel.com


# a8a40589 19-Jan-2016 Imre Deak <imre.deak@intel.com>

drm/i915: Sanitize GEM shrinker init and clean-up

Factor out the common GEM shrinker clean-up code and call the shrinker
init function from the same function from where the corresponding
shrinker clean-up function is called. Also add sanity checking to the
shrinker and OOM registration calls.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: David Weinehall <david.weinehall@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453209992-25995-4-git-send-email-imre.deak@intel.com


# 9a15a873 27-Jan-2016 Daniel Vetter <daniel.vetter@ffwll.ch>

Revert "drm/i915: Fix context/engine cleanup order"

This reverts commit 1803c035efb88afb9d3e7feb279ac29a83216382.

It seems to blow up on module unload due to a use-after free hitting a
BUG_ON with CONFIG_DEBUG_SG. Quoting from Tvrtko's mail:

"I've decoded the instructions and it pointed to SG_MAGIC checking:

488b8098010000 mov 0x198(%rax),%rax
ba21436587 mov $0x87654321,%edx
488b00 mov (%rax),%rax *** CRASH

"Grep showed 0x87654321 is SG_MAGIC, so likely candidate for this code
pattern is:

static inline struct page *sg_page(struct scatterlist *sg)
{
BUG_ON(sg->sg_magic != SG_MAGIC);
BUG_ON(sg_is_chain(sg));
return (struct page *)((sg)->page_link & ~0x3);
}

"Which would mean the offender is in intel_logical_ring_cleanup is most
likely:

...
if (ring->status_page.obj) {
kunmap(sg_page(ring->status_page.obj->pages->sgl));
ring->status_page.obj = NULL;
}
...

"I think that the i915_gem_context_fini will do a final unref on
dev_priv->kernel_context and then the ring buff has a copy which is
left dangling because:

lrc_setup_hardware_status_page(ring,
dev_priv->kernel_context->engine[ring->id].state);

and:

ring->status_page.obj = default_ctx_obj;

"Where default_ctx_obj == dev_priv->kernel_context->engine[ring->id].state
So indeed looks like the unload ordering is the trigger. In fact it
is almost the same fragility wrt/ kernel_context hidden dependency I
expressed my worry about in an e-mail yesterday or so. It only shows
if CONFIG_DEBUG_SG is set, otherwise it accesses freed memory and
probably just survives."

This causes serious trouble in our CI system since it took out all
gen8+ machines. Not yet clear why this wasn't caught in pre-merge
testing.

Backtrace from CI, for posterity:

[ 163.737836] general protection fault: 0000 [#1] PREEMPT SMP
[ 163.737849] Modules linked in: ax88179_178a usbnet mii snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915(-) x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me mei i2c_hid e1000e ptp pps_core [last unloaded: snd_hda_intel]
[ 163.737902] CPU: 0 PID: 5812 Comm: rmmod Tainted: G U W 4.5.0-rc1-gfxbench+ #1
[ 163.737911] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 0505 11/16/2015
[ 163.737920] task: ffff8800bb99cf80 ti: ffff88022ff2c000 task.ti: ffff88022ff2c000
[ 163.737928] RIP: 0010:[<ffffffffa018f723>] [<ffffffffa018f723>] intel_logical_ring_cleanup+0x83/0x100 [i915]
[ 163.737969] RSP: 0018:ffff88022ff2fd30 EFLAGS: 00010282
[ 163.737975] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8800bb2f31b8 RCX: 0000000000000002
[ 163.737982] RDX: 0000000087654321 RSI: 000000000000000d RDI: ffff8800bb2f31f0
[ 163.737989] RBP: ffff88022ff2fd40 R08: 0000000000000000 R09: 0000000000000001
[ 163.737996] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800bb2f0000
[ 163.738003] R13: ffff8800bb2f8fc8 R14: ffff8800bb285668 R15: 000055af1ae55210
[ 163.738010] FS: 00007f187014b700(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000
[ 163.738021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 163.738030] CR2: 0000558f84e4cbc8 CR3: 000000022cd55000 CR4: 00000000003406f0
[ 163.738039] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 163.738048] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 163.738057] Stack:
[ 163.738062] ffff8800bb2f31b8 ffff8800bb2f0000 ffff88022ff2fd70 ffffffffa0180414
[ 163.738079] ffff8800bb2f0000 ffff8800bb285668 ffff8800bb2856c8 ffffffffa0242460
[ 163.738094] ffff88022ff2fd98 ffffffffa0202d30 ffff8800bb285668 ffff8800bb285668
[ 163.738109] Call Trace:
[ 163.738140] [<ffffffffa0180414>] i915_gem_cleanup_engines+0x34/0x60 [i915]
[ 163.738185] [<ffffffffa0202d30>] i915_driver_unload+0x150/0x270 [i915]
[ 163.738198] [<ffffffff815100f4>] drm_dev_unregister+0x24/0xa0
[ 163.738208] [<ffffffff815106ce>] drm_put_dev+0x1e/0x60
[ 163.738225] [<ffffffffa01412a0>] i915_pci_remove+0x10/0x20 [i915]
[ 163.738237] [<ffffffff8143d9b4>] pci_device_remove+0x34/0xb0
[ 163.738249] [<ffffffff81533d15>] __device_release_driver+0x95/0x140
[ 163.738259] [<ffffffff81533eb6>] driver_detach+0xb6/0xc0
[ 163.738268] [<ffffffff81532de3>] bus_remove_driver+0x53/0xd0
[ 163.738278] [<ffffffff815348d7>] driver_unregister+0x27/0x50
[ 163.738289] [<ffffffff8143ca15>] pci_unregister_driver+0x25/0x70
[ 163.738299] [<ffffffff81511de4>] drm_pci_exit+0x74/0x90
[ 163.738337] [<ffffffffa02034a9>] i915_exit+0x20/0x1a5 [i915]
[ 163.738349] [<ffffffff8110400f>] SyS_delete_module+0x18f/0x1f0
[ 163.738361] [<ffffffff817b8a9b>] entry_SYSCALL_64_fastpath+0x16/0x73
[ 163.738370] Code: ff d0 48 89 df e8 de a1 fd ff 48 8d 7b 38 e8 25 ab fd ff 48 8b 83 90 00 00 00 48 85 c0 74 25 48 8b 80 98 01 00 00 ba 21 43 65 87 <48> 8b 00 48 39 10 75 3c f6 40 08 01 75 38 48 c7 83 90 00 00 00
[ 163.738459] RIP [<ffffffffa018f723>] intel_logical_ring_cleanup+0x83/0x100 [i915]
[ 163.738498] RSP <ffff88022ff2fd30>
[ 163.738507] ---[ end trace 68f69ce4740fa44f ]---

Cc: Nick Hoath <nicholas.hoath@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Tested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>


# 1803c035 21-Jan-2016 Nick Hoath <nicholas.hoath@intel.com>

drm/i915: Fix context/engine cleanup order

Swap the order of context & engine cleanup, so that contexts are cleaned
up first, and *then* engines. This is a more sensible order anyway, but
in particular has become necessary since the 'intel_ring_initialized()
must be simple and inline' patch, which now uses ring->dev as an
'initialised' flag, so it can now be NULL after engine teardown. This
in turn can cause a problem in the context code, which (used to) check
the ring->dev->struct_mutex -- causing a fault if ring->dev was NULL.

Also rename the cleanup function to reflect what it actually does
(cleanup engines, not a ringbuffer), and fix an annoying whitespace issue.

v2: Also make the fix in i915_load_modeset_init, not just in
i915_driver_unload (Chris Wilson)
v3: Had extra stuff in it.
v4: Reverted extra stuff (so we're back to v2).
Rebased and updated commentary above (Dave Gordon).

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1453405067-32890-3-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 426960be 15-Jan-2016 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Seal busy-ioctl uABI and prevent leaking of internal ids

Tvrtko was looking through the execbuffer-ioctl and noticed that the
uABI was tightly coupled to our internal engine identifiers. Close
inspection also revealed that we leak those internal engine identifiers
through the busy-ioctl, and those internal identifiers already do not
match the user identifiers. Fortuitiously, there is only one user of the
set of busy rings from the busy-ioctl, and they only wish to choose
between the RENDER and the BLT engines.

Let's fix the userspace ABI while we still can.

v2: Update the uAPI documentation to explain the identifiers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Testcase: igt/gem_busy
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1452876706-21620-1-git-send-email-chris@chris-wilson.co.uk


# de1add36 15-Jan-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Decouple execbuf uAPI from internal implementation

At the moment execbuf ring selection is fully coupled to
internal ring ids which is not a good thing on its own.

This dependency is also spread between two source files and
not spelled out at either side which makes it hidden and
fragile.

This patch decouples this dependency by introducing an explicit
translation table of execbuf uAPI to ring id close to the only
call site (i915_gem_do_execbuffer).

This way we are free to change driver internal implementation
details without breaking userspace. All state relating to the
uAPI is now contained in, or next to, i915_gem_do_execbuffer.

As a side benefit, this patch decreases the compiled size
of i915_gem_do_execbuffer.

v2: Extract ring selection into eb_select_ring. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1452870770-13981-1-git-send-email-tvrtko.ursulin@linux.intel.com


# e28e404c 19-Jan-2016 Dave Gordon <david.s.gordon@intel.com>

drm/i915: tidy up a few leftovers

There are a few bits of code which the transformations implemented by
the previous patch reveal to be suboptimal, once the notion of a per-
ring default context has gone away. So this tidies up the leftovers.

It could have been squashed into the previous patch, but that would have
made that patch less clearly a simple transformation. In particular, any
change which alters the code block structure or indentation has been
deferred into this separate patch, because such things tend to make
diffs more difficult to read.

v4: Rebased

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453230175-19330-4-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ed54c1a1 19-Jan-2016 Dave Gordon <david.s.gordon@intel.com>

drm/i915: abolish separate per-ring default_context pointers

Now that we've eliminated a lot of uses of ring->default_context,
we can eliminate the pointer itself.

All the engines share the same default intel_context, so we can just
keep a single reference to it in the dev_priv structure rather than one
in each of the engine[] elements. This make refcounting more sensible
too, as we now have a refcount of one for the one pointer, rather than
a refcount of one but multiple pointers.

From an idea by Chris Wilson.

v2: transform an extra instance of ring->default_context introduced by
42f1cae8c drm/i915: Restore inhibiting the load of the default context
That patch's commentary includes:
v2: Mark the global default context as uninitialized on GPU reset so
that the context-local workarounds are reloaded upon re-enabling
The code implementing that now also benefits from the replacement of
the multiple (per-ring) pointers to the default context with a single
pointer to the unique kernel context.

v4: Rebased, remove underused local (Nick Hoath)

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1453230175-19330-3-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 26827088 19-Jan-2016 Dave Gordon <david.s.gordon@intel.com>

drm/i915: simplify allocation of driver-internal requests

There are a number of places where the driver needs a request, but isn't
working on behalf of any specific user or in a specific context. At
present, we associate them with the per-engine default context. A future
patch will abolish those per-engine context pointers; but we can already
eliminate a lot of the references to them, just by making the allocator
allow NULL as a shorthand for "an appropriate context for this ring",
which will mean that the callers don't need to know anything about how
the "appropriate context" is found (e.g. per-ring vs per-device, etc).

So this patch renames the existing i915_gem_request_alloc(), and makes
it local (static inline), and replaces it with a wrapper that provides
a default if the context is NULL, and also has a nicer calling
convention (doesn't require a pointer to an output parameter). Then we
change all callers to use the new convention:
OLD:
err = i915_gem_request_alloc(ring, user_ctx, &req);
if (err) ...
NEW:
req = i915_gem_request_alloc(ring, user_ctx);
if (IS_ERR(req)) ...
OLD:
err = i915_gem_request_alloc(ring, ring->default_context, &req);
if (err) ...
NEW:
req = i915_gem_request_alloc(ring, NULL);
if (IS_ERR(req)) ...

v4: Rebased

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453230175-19330-2-git-send-email-david.s.gordon@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e0313db0 15-Jan-2016 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Only grab timestamps when needed

No need to call ktime_get_raw_ns twice per unlimited wait and can
also elimate a local variable.

v2: Added comment about silencing the compiler warning. (Daniel Vetter)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1452870672-13901-1-git-send-email-tvrtko.ursulin@linux.intel.com


# 48ea1e32 11-Jan-2016 Michel Thierry <michel.thierry@intel.com>

drm/i915/gen9: Set PIN_ZONE_4G end to 4GB - 1 page

Kernel and userspace are able to handle 4GB (1<<32) address space range,
but "A32 Stateless Model" is not. According to documentation, A32 accesses
are based on General State Base Address and bound checking is in place.
Because size field (instruction State Base Address) limitation, it is not
possible to address full 4GB memory region.

A32 Stateless Model is used by some libraries and without this patch, the
last page of 4GB address space is not accessible in 32bit processes.

Reported-by: Artur Harasimiuk <artur.harasimiuk@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1452512367-23614-1-git-send-email-michel.thierry@intel.com
Cc: drm-intel-fixes@lists.freedesktop.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit 1892faa9ec5d51b07d646cbd5597cd30e049aa51)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 1892faa9 11-Jan-2016 Michel Thierry <michel.thierry@intel.com>

drm/i915/gen9: Set PIN_ZONE_4G end to 4GB - 1 page

Kernel and userspace are able to handle 4GB (1<<32) address space range,
but "A32 Stateless Model" is not. According to documentation, A32 accesses
are based on General State Base Address and bound checking is in place.
Because size field (instruction State Base Address) limitation, it is not
possible to address full 4GB memory region.

A32 Stateless Model is used by some libraries and without this patch, the
last page of 4GB address space is not accessible in 32bit processes.

Reported-by: Artur Harasimiuk <artur.harasimiuk@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1452512367-23614-1-git-send-email-michel.thierry@intel.com
Cc: drm-intel-fixes@lists.freedesktop.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 0f0cd472 11-Dec-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only spin whilst waiting on the current request

Limit busywaiting only to the request currently being processed by the
GPU. If the request is not currently being processed by the GPU, there
is a very low likelihood of it being completed within the 2 microsecond
spin timeout and so we will just be wasting CPU cycles.

v2: Check for logical inversion when rebasing - we were incorrectly
checking for this request being active, and instead busywaiting for
when the GPU was not yet processing the request of interest.

v3: Try another colour for the seqno names.
v4: Another colour for the function names.

v5: Remove the forced coherency when checking for the active request. On
reflection and plenty of recent experimentation, the issue is not a
cache coherency problem - but an irq/seqno ordering problem (timing issue).
Here, we do not need the w/a to force ordering of the read with an
interrupt.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449833608-22125-4-git-send-email-chris@chris-wilson.co.uk
(cherry picked from commit 821485dc2ad665f136c57ee589bf7a8210160fe2)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# f87a780f 11-Dec-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Limit the busy wait on requests to 5us not 10ms!

When waiting for high frequency requests, the finite amount of time
required to set up the irq and wait upon it limits the response rate. By
busywaiting on the request completion for a short while we can service
the high frequency waits as quick as possible. However, if it is a slow
request, we want to sleep as quickly as possible. The tradeoff between
waiting and sleeping is roughly the time it takes to sleep on a request,
on the order of a microsecond. Based on measurements of synchronous
workloads from across big core and little atom, I have set the limit for
busywaiting as 10 microseconds. In most of the synchronous cases, we can
reduce the limit down to as little as 2 miscroseconds, but that leaves
quite a few test cases regressing by factors of 3 and more.

The code currently uses the jiffie clock, but that is far too coarse (on
the order of 10 milliseconds) and results in poor interactivity as the
CPU ends up being hogged by slow requests. To get microsecond resolution
we need to use a high resolution timer. The cheapest of which is polling
local_clock(), but that is only valid on the same CPU. If we switch CPUs
because the task was preempted, we can also use that as an indicator that
the system is too busy to waste cycles on spinning and we should sleep
instead.

__i915_spin_request was introduced in
commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 [v4.2]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Apr 7 16:20:41 2015 +0100

drm/i915: Optimistically spin for the request completion

v2: Drop full u64 for unsigned long - the timer is 32bit wraparound safe,
so we can use native register sizes on smaller architectures. Mention
the approximate microseconds units for elapsed time and add some extra
comments describing the reason for busywaiting.

v3: Raise the limit to 10us
v4: Now 5us.

Reported-by: Jens Axboe <axboe@kernel.dk>
Link: https://lkml.org/lkml/2015/11/12/621
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449833608-22125-3-git-send-email-chris@chris-wilson.co.uk
(cherry picked from commit ca5b721e238226af1d767103ac852aeb8e4c0764)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# e7571f7f 11-Dec-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Break busywaiting for requests on pending signals

The busywait in __i915_spin_request() does not respect pending signals
and so may consume the entire timeslice for the task instead of
returning to userspace to handle the signal.

In the worst case this could cause a delay in signal processing of 20ms,
which would be a noticeable jitter in cursor tracking. If a higher
resolution signal was being used, for example to provide fairness of a
server timeslices between clients, we could expect to detect some
unfairness between clients (i.e. some windows not updating as fast as
others). This issue was noticed when inspecting a report of poor
interactivity resulting from excessively high __i915_spin_request usage.

Fixes regression from
commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 [v4.2]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Apr 7 16:20:41 2015 +0100

drm/i915: Optimistically spin for the request completion

v2: Try to assess the impact of the bug

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc; "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449833608-22125-2-git-send-email-chris@chris-wilson.co.uk
(cherry picked from commit 91b0c352ace9afec1fb51590c7b8bd60e0eb9fbd)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 62d622c1 20-Nov-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Set the map-and-fenceable flag for preallocated objects

As we mark the preallocated objects as bound, we should also flag them
correctly as being map-and-fenceable (if appropriate!) so that later
users do not get confused and try and rebind the pinned vma in order to
get a map-and-fenceable binding.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: drm-intel-fixes@lists.freedesktop.org
Link: http://patchwork.freedesktop.org/patch/msgid/1448029000-10616-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit d0710abbcd88b1ff17760e97d74a673e67b49ea1)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 821485dc 11-Dec-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only spin whilst waiting on the current request

Limit busywaiting only to the request currently being processed by the
GPU. If the request is not currently being processed by the GPU, there
is a very low likelihood of it being completed within the 2 microsecond
spin timeout and so we will just be wasting CPU cycles.

v2: Check for logical inversion when rebasing - we were incorrectly
checking for this request being active, and instead busywaiting for
when the GPU was not yet processing the request of interest.

v3: Try another colour for the seqno names.
v4: Another colour for the function names.

v5: Remove the forced coherency when checking for the active request. On
reflection and plenty of recent experimentation, the issue is not a
cache coherency problem - but an irq/seqno ordering problem (timing issue).
Here, we do not need the w/a to force ordering of the read with an
interrupt.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449833608-22125-4-git-send-email-chris@chris-wilson.co.uk


# ca5b721e 11-Dec-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Limit the busy wait on requests to 5us not 10ms!

When waiting for high frequency requests, the finite amount of time
required to set up the irq and wait upon it limits the response rate. By
busywaiting on the request completion for a short while we can service
the high frequency waits as quick as possible. However, if it is a slow
request, we want to sleep as quickly as possible. The tradeoff between
waiting and sleeping is roughly the time it takes to sleep on a request,
on the order of a microsecond. Based on measurements of synchronous
workloads from across big core and little atom, I have set the limit for
busywaiting as 10 microseconds. In most of the synchronous cases, we can
reduce the limit down to as little as 2 miscroseconds, but that leaves
quite a few test cases regressing by factors of 3 and more.

The code currently uses the jiffie clock, but that is far too coarse (on
the order of 10 milliseconds) and results in poor interactivity as the
CPU ends up being hogged by slow requests. To get microsecond resolution
we need to use a high resolution timer. The cheapest of which is polling
local_clock(), but that is only valid on the same CPU. If we switch CPUs
because the task was preempted, we can also use that as an indicator that
the system is too busy to waste cycles on spinning and we should sleep
instead.

__i915_spin_request was introduced in
commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 [v4.2]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Apr 7 16:20:41 2015 +0100

drm/i915: Optimistically spin for the request completion

v2: Drop full u64 for unsigned long - the timer is 32bit wraparound safe,
so we can use native register sizes on smaller architectures. Mention
the approximate microseconds units for elapsed time and add some extra
comments describing the reason for busywaiting.

v3: Raise the limit to 10us
v4: Now 5us.

Reported-by: Jens Axboe <axboe@kernel.dk>
Link: https://lkml.org/lkml/2015/11/12/621
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
Cc: stable@vger.kernel.org

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449833608-22125-3-git-send-email-chris@chris-wilson.co.uk


# 91b0c352 11-Dec-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Break busywaiting for requests on pending signals

The busywait in __i915_spin_request() does not respect pending signals
and so may consume the entire timeslice for the task instead of
returning to userspace to handle the signal.

In the worst case this could cause a delay in signal processing of 20ms,
which would be a noticeable jitter in cursor tracking. If a higher
resolution signal was being used, for example to provide fairness of a
server timeslices between clients, we could expect to detect some
unfairness between clients (i.e. some windows not updating as fast as
others). This issue was noticed when inspecting a report of poor
interactivity resulting from excessively high __i915_spin_request usage.

Fixes regression from
commit 2def4ad99befa25775dd2f714fdd4d92faec6e34 [v4.2]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Apr 7 16:20:41 2015 +0100

drm/i915: Optimistically spin for the request completion

v2: Try to assess the impact of the bug

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc; "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449833608-22125-2-git-send-email-chris@chris-wilson.co.uk


# d0710abb 20-Nov-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Set the map-and-fenceable flag for preallocated objects

As we mark the preallocated objects as bound, we should also flag them
correctly as being map-and-fenceable (if appropriate!) so that later
users do not get confused and try and rebind the pinned vma in order to
get a map-and-fenceable binding.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: drm-intel-fixes@lists.freedesktop.org
Link: http://patchwork.freedesktop.org/patch/msgid/1448029000-10616-1-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 9e7d18c0 10-Dec-2015 Dave Gordon <david.s.gordon@intel.com>

drm/i915: mark a newly-created GEM object dirty when filled with data

When creating a new (pageable) GEM object and filling it with data, we
must mark it as 'dirty', i.e. backing store is out-of-date w.r.t. the
newly-written content. This ensures that if the object is evicted under
memory pressure, its pages in the pagecache will be written to backing
store rather than discarded.

Based on an original version by Alex Dai.

Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1449773486-30822-3-git-send-email-david.s.gordon@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 033908ae 10-Dec-2015 Dave Gordon <david.s.gordon@intel.com>

drm/i915: mark GEM object pages dirty when mapped & written by the CPU

In various places, a single page of a (regular) GEM object is mapped into
CPU address space and updated. In each such case, either the page or the
the object should be marked dirty, to ensure that the modifications are
not discarded if the object is evicted under memory pressure.

The typical sequence is:
va = kmap_atomic(i915_gem_object_get_page(obj, pageno));
*(va+offset) = ...
kunmap_atomic(va);

Here we introduce i915_gem_object_get_dirty_page(), which performs the
same operation as i915_gem_object_get_page() but with the side-effect
of marking the returned page dirty in the pagecache. This will ensure
that if the object is subsequently evicted (due to memory pressure),
the changes are written to backing store rather than discarded.

Note that it works only for regular (shmfs-backed) GEM objects, but (at
least for now) those are the only ones that are updated in this way --
the objects in question are contexts and batchbuffers, which are always
shmfs-backed.

Separate patches deal with the cases where whole objects are (or may
be) dirtied.

v3: Mark two more pages dirty in the page-boundary-crossing
cases of the execbuffer relocation code [Chris Wilson]

Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1449773486-30822-2-git-send-email-david.s.gordon@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c5baa566 23-Oct-2015 Tomas Elf <tomas.elf@intel.com>

drm/i915: Update to post-reset execlist queue clean-up

When clearing an execlist queue, instead of traversing it and unreferencing all
requests while holding the spinlock (which might lead to thread sleeping with
IRQs are turned off - bad news!), just move all requests to the retire request
list while holding spinlock and then drop spinlock and invoke the execlists
request retirement path, which already deals with the intricacies of
purging/dereferencing execlist queue requests.

This patch can be considered v3 of:

commit b96db8b81c54ef30485ddb5992d63305d86ea8d3
Author: Tomas Elf <tomas.elf@intel.com>
drm/i915: Grab execlist spinlock to avoid post-reset concurrency issues

This patch assumes v2 of the above patch is part of the baseline, reverts v2
and adds changes on top to turn it into v3.

Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1445619757-19822-1-git-send-email-tomas.elf@intel.com
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Dave Gordon <dave.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# bf6ce93a 07-Dec-2015 Wayne Boyer <wayne.boyer@intel.com>

drm/i915: Remove VLV A0 hack

Do some further clean up based on the initial review of
drm/i915: Separate cherryview from valleyview.

In this case remove a hack for VLV A0.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Wayne Boyer <wayne.boyer@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449514270-15171-4-git-send-email-wayne.boyer@intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>


# 666a4537 09-Dec-2015 Wayne Boyer <wayne.boyer@intel.com>

drm/i915: Separate cherryview from valleyview

The cherryview device shares many characteristics with the valleyview
device. When support was added to the driver for cherryview, the
corresponding device info structure included .is_valleyview = 1.
This is not correct and leads to some confusion.

This patch changes .is_valleyview to .is_cherryview in the cherryview
device info structure and simplifies the IS_CHERRYVIEW macro.
Then where appropriate, instances of IS_VALLEYVIEW are replaced with
IS_VALLEYVIEW || IS_CHERRYVIEW or equivalent.

v2: Use IS_VALLEYVIEW || IS_CHERRYVIEW instead of defining a new macro.
Also add followup patches to fix issues discovered during the first
review. (Ville)
v3: Fix some style issues and one gen check. Remove CRT related changes
as CRT is not supported on CHV. (Imre, Ville)
v4: Make a few more optimizations. (Ville)

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Wayne Boyer <wayne.boyer@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449692975-14803-1-git-send-email-wayne.boyer@intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>


# 506a8e87 08-Dec-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add soft-pinning API for execbuffer

Userspace can pass in an offset that it presumes the object is located
at. The kernel will then do its utmost to fit the object into that
location. The assumption is that userspace is handling its own object
locations (for example along with full-ppgtt) and that the kernel will
rarely have to make space for the user's requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

v2: Fixed incorrect eviction found by Michal Winiarski - fix suggested by Chris
Wilson. Fixed incorrect error paths causing crash found by Michal Winiarski.
(Not published externally)

v3: Rebased because of trivial conflict in object_bind_to_vm. Fixed eviction
to allow eviction of soft-pinned objects when another soft-pinned object used
by a subsequent execbuffer overlaps reported by Michal Winiarski.
(Not published externally)

v4: Moved soft-pinned objects to the front of ordered_vmas so that they are
pinned first after an address conflict happens to avoid repeated conflicts in
rare cases (Suggested by Chris Wilson). Expanded comment on
drm_i915_gem_exec_object2.offset to cover this new API.

v5: Added I915_PARAM_HAS_EXEC_SOFTPIN parameter for detecting this capability
(Kristian). Added check for multiple pinnings on eviction (Akash). Made sure
buffers are not considered misplaced without the user specifying
EXEC_OBJECT_SUPPORTS_48B_ADDRESS. User must assume responsibility for any
addressing workarounds. Updated object2.offset field comment again to clarify
NO_RELOC case (Chris). checkpatch cleanup.

v6: Trivial rebase on latest drm-intel-nightly

v7: Catch attempts to pin above the max virtual address size and return
EINVAL (Tvrtko). Decouple EXEC_OBJECT_SUPPORTS_48B_ADDRESS and
EXEC_OBJECT_PINNED flags, user must pass both flags in any attempt to pin
something at an offset above 4GB (Chris, Daniel Vetter).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Zou Nanhai <nanhai.zou@intel.com>
Cc: Kristian Høgsberg <hoegsberg@gmail.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Acked-by: PDT
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1449575707-20933-1-git-send-email-thomas.daniel@intel.com


# 30ecad77 09-Dec-2015 Daniel Vetter <daniel.vetter@ffwll.ch>

drm: Move drm_display_mode an related docs into kerneldoc

This was in the documentation for modeset helper hooks, where it is a
bit misplaced.

v2: Reindent the drm_mode_status enum, inspired by Ville.

v3: Suggestions from Ville and Thierry.

v4: Small fixup that 0day spotted.

v5: Slight change to avoid accidental headings in kerneldoc output.

Cc: ville.syrjala@linux.intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1449218769-16577-27-git-send-email-daniel.vetter@ffwll.ch
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v3)
Reviewed-by: Thierry Reding <treding@nvidia.com> (v3)


# af3302b9 04-Dec-2015 Daniel Vetter <daniel.vetter@ffwll.ch>

Revert "drm/i915: Extend LRC pinning to cover GPU context writeback"

This reverts commit 6d65ba943a2d1e4292a07ca7ddb6c5138b9efa5d.

Mika Kuoppala traced down a use-after-free crash in module unload to
this commit, because ring->last_context is leaked beyond when the
context gets destroyed. Mika submitted a quick fix to patch that up in
the context destruction code, but that's too much of a hack.

The right fix is instead for the ring to hold a full reference onto
it's last context, like we do for legacy contexts.

Since this is causing a regression in BAT it gets reverted before we
can close this.

Cc: Nick Hoath <nicholas.hoath@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Alex Dai <yu.dai@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93248
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>


# b6aa0873 02-Dec-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix RPS pointer passed from wait_ioctl to i915_wait_request

In commit 2e1b873072dfe3bbcc158a9c21acde1ab0d36c55 [v4.2]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Apr 27 13:41:22 2015 +0100

drm/i915: Convert RPS tracking to a intel_rps_client struct

we converted the __i915_wait_request() to take a new intel_rps_client
struct (rather than having to pass fake drm_i915_file_private structs).
However, due to use of passing a void pointer, I didn't spot one
callsite in wait-ioctl was passing the wrong pointer.

Fwiw, the impact of this bug is zero. Along the rps path, we always
first call list_empty(rps) which when we pass in the wrong pointer
always evaluates to false and we return early and never chase the
invalid pointers.

The user visible impact is then wait-ioctl doesn't get the same
waitboosting as the other interfaces (set-domain, throttle), which is a
performance concern for the *very* few users of the wait interface.
There is also a libdrm_intel patch to use the wait-ioctl for
drm_intel_bo_wait_rendering() if anyone feels inclined to review
libdrm_intel patches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: Add Chris' explanation for why the impact of this is pretty
close to 0.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6d65ba94 01-Dec-2015 Nick Hoath <nicholas.hoath@intel.com>

drm/i915: Extend LRC pinning to cover GPU context writeback

Use the first retired request on a new context to unpin
the old context. This ensures that the hw context remains
bound until it has been written back to by the GPU.
Now that the context is pinned until later in the request/context
lifecycle, it no longer needs to be pinned from context_queue to
retire_requests.
This fixes an issue with GuC submission where the GPU might not
have finished writing back the context before it is unpinned. This
results in a GPU hang.

v2: Moved the new pin to cover GuC submission (Alex Dai)
Moved the new unpin to request_retire to fix coverage leak
v3: Added switch to default context if freeing a still pinned
context just in case the hw was actually still using it
v4: Unwrapped context unpin to allow calling without a request
v5: Only create a switch to idle context if the ring doesn't
already have a request pending on it (Alex Dai)
Rename unsaved to dirty to avoid double negatives (Dave Gordon)
Changed _no_req postfix to __ prefix for consistency (Dave Gordon)
Split out per engine cleanup from context_free as it
was getting unwieldy
Corrected locking (Dave Gordon)
v6: Removed some bikeshedding (Mika Kuoppala)
Added explanation of the GuC hang that this fixes (Daniel Vetter)
v7: Removed extra per request pinning from ring reset code (Alex Dai)
Added forced ring unpin/clean in error case in context free (Alex Dai)

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Issue: VIZ-4277
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Alex Dai <yu.dai@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# bb6d1984 26-Nov-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Check the timeout passed to i915_wait_request

We have relied upon the sole caller (wait_ioctl) validating the timeout
argument. However, when waiting for multiple requests I forgot to ensure
that the timeout was still positive on the later requests. This is more
simply done inside __i915_wait_request.

Fixes regression introduced in
commit b47161858ba13c9c7e03333132230d66e008dd55
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Apr 27 13:41:17 2015 +0100

drm/i915: Implement inter-engine read-read optimisations

The impact of the regression is 1 jiffie for each extra active ring for
a wait_ioctl with a timeout -- I don't think anyone has noticed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1448544702-5594-1-git-send-email-chris@chris-wilson.co.uk
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# f92a9162 04-Nov-2015 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Add functions to emit register offsets to the ring

When register type safety happens, we can't just try to emit the
register itself to the ring. Instead we'll need to extract the
offset from it first. Add some convenience functions that will do
that.

v2: Convert MOCS setup too

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1446672017-24497-20-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# 6fa1c5f1 04-Nov-2015 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Parametrize L3 error registers

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1446672017-24497-15-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# fd0fe6ac 04-Nov-2015 Imre Deak <imre.deak@intel.com>

drm/i915: get runtime PM reference around GEM set_caching IOCTL

After Damien's D3 fix I started to get runtime suspend residency for the
first time and that revealed a breakage on the set_caching IOCTL path
that accesses the HW but doesn't take an RPM ref. Fix this up.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: stable@vger.kernel.org
Link: http://patchwork.freedesktop.org/patch/msgid/1446665132-22491-1-git-send-email-imre.deak@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 1b683729 12-Nov-2015 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Remove redundant check in i915_gem_obj_to_vma

No need to verify VMA belongs to GGTT since:

1. The function must return a normal VMA belonging to passed in VM.
2. There can only be one normal VMA for any VM.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447329595-17495-1-git-send-email-tvrtko.ursulin@linux.intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# c62d2555 06-Nov-2015 Michal Hocko <mhocko@suse.com>

mm, fs: introduce mapping_gfp_constraint()

There are many places which use mapping_gfp_mask to restrict a more
generic gfp mask which would be used for allocations which are not
directly related to the page cache but they are performed in the same
context.

Let's introduce a helper function which makes the restriction explicit and
easier to track. This patch doesn't introduce any functional changes.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 71baba4b 06-Nov-2015 Mel Gorman <mgorman@techsingularity.net>

mm, page_alloc: rename __GFP_WAIT to __GFP_RECLAIM

__GFP_WAIT was used to signal that the caller was in atomic context and
could not sleep. Now it is possible to distinguish between true atomic
context and callers that are not willing to sleep. The latter should
clear __GFP_DIRECT_RECLAIM so kswapd will still wake. As clearing
__GFP_WAIT behaves differently, there is a risk that people will clear the
wrong flags. This patch renames __GFP_WAIT to __GFP_RECLAIM to clearly
indicate what it does -- setting it allows all reclaim activity, clearing
them prevents it.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d0164adc 06-Nov-2015 Mel Gorman <mgorman@techsingularity.net>

mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd

__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7580d774 18-Aug-2015 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

drm/i915: Wait for object idle without locks in atomic_commit, v2.

Make pinning and waiting a separate step, and wait for object idle
without struct_mutex held.

Changes since v1:
- Do not wait when a reset is in progress.
- Remove call to i915_gem_object_wait_rendering for
intel_overlay_do_put_image (Chris Wilson)

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# 9f9e539f 23-Oct-2015 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Shut up GuC errors when it's disabled

DRM_ERROR an continue without any issues aren't allowed since that
causes noise in the CI system. But we absolutely want to have the
DRM_ERROR when we want to run with GuC.

For simplicity just short-circuit all the loader code when it's not
needed.

v2: Mika&Chris complained that I shouldn't hit send on patches written
before coffee kicks in.

v3: Make it compile at least ...

Cc: Alex Dai <yu.dai@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1445591459-4327-1-git-send-email-daniel.vetter@ffwll.ch
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 608c1a52 03-Sep-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Recover all available ringbuffer space following reset

Having flushed all requests from all queues, we know that all
ringbuffers must now be empty. However, since we do not reclaim
all space when retiring the request (to prevent HEADs colliding
with rapid ringbuffer wraparound) the amount of available space
on each ringbuffer upon reset is less than when we start. Do one
more pass over all the ringbuffers to reset the available space

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>


# 7de1691a 19-Oct-2015 Tomas Elf <tomas.elf@intel.com>

drm/i915: Grab execlist spinlock to avoid post-reset concurrency issues.

Grab execlist lock when cleaning up execlist queues after GPU reset to avoid
concurrency problems between the context event interrupt handler and the reset
path immediately following a GPU reset.

* v2 (Chris Wilson):
Do execlist check and use simpler form of spinlock functions.

Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e87a005d 20-Oct-2015 Jani Nikula <jani.nikula@intel.com>

drm/i915: add helpers for platform specific revision id range checks

Revision checks are almost always accompanied by a platform check. (The
exceptions are platform specific code.) Add helpers to check for a
platform and a revision range: IS_SKL_REVID() and IS_BXT_REVID(). In
most places this simplifies and clarifies the code. It will be obvious
that revid macros are used for the correct platform.

This should make it easier to find all the revision checks for
workarounds for each platform, and make it easier to remove them once we
drop support for early hardware revisions.

This should also make it easier to differentiate between Skylake and
Kabylake revision checks when Kabylake support is added.

v2: rebase

Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1445343722-3312-3-git-send-email-jani.nikula@intel.com


# fffda3f4 20-Oct-2015 Jani Nikula <jani.nikula@intel.com>

drm/i915/bxt: add revision id for A1 stepping and use it

Prefer inclusive ranges for revision checks rather than "below B0". Per
specs A2 is not used, so revid <= A1 matches revid < B0.

Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1445343722-3312-2-git-send-email-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# ef55f92a 09-Oct-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop i915_gem_obj_is_pinned() from set-cache-level

Since the remove of the pin-ioctl, we only care about not changing the
cache level on buffers pinned to the hardware as indicated by
obj->pin_display. By knowing that only objects pinned to the hardware
will have an elevated vma->pin_count, so we can coallesce many of the
linear walks over the obj->vma_list.

v2: Try and retrospectively add comments explaining the steps in
rebinding the active VMA.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d39398f5 07-Oct-2015 Jani Nikula <jani.nikula@intel.com>

drm/i915/snb: remove pre-production hardware workaround

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e9f24d5f 05-Oct-2015 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Clean up associated VMAs on context destruction

Prevent leaking VMAs and PPGTT VMs when objects are imported
via flink.

Scenario is that any VMAs created by the importer will be left
dangling after the importer exits, or destroys the PPGTT context
with which they are associated.

This is caused by object destruction not running when the
importer closes the buffer object handle due the reference held
by the exporter. This also leaks the VM since the VMA has a
reference on it.

In practice these leaks can be observed by stopping and starting
the X server on a kernel with fbcon compiled in. Every time
X server exits another VMA will be leaked against the fbcon's
frame buffer object.

Also on systems where flink buffer sharing is used extensively,
like Android, this leak has even more serious consequences.

This version is takes a general approach from the earlier work
by Rafael Barbalho (drm/i915: Clean-up PPGTT on context
destruction) and tries to incorporate the subsequent discussion
between Chris Wilson and Daniel Vetter.

v2:

Removed immediate cleanup on object retire - it was causing a
recursive VMA unbind via i915_gem_object_wait_rendering. And
it is in fact not even needed since by definition context
cleanup worker runs only after the last context reference has
been dropped, hence all VMAs against the VM belonging to the
context are already on the inactive list.

v3:

Previous version could deadlock since VMA unbind waits on any
rendering on an object to complete. Objects can be busy in a
different VM which would mean that the cleanup loop would do
the wait with the struct mutex held.

This is an even simpler approach where we just unbind VMAs
without waiting since we know all VMAs belonging to this VM
are idle, and there is nothing in flight, at the point
context destructor runs.

v4:

Double underscore prefix for __915_vma_unbind_no_wait and a
commit message typo fix. (Michel Thierry)

Note that this is just a partial/interim fix since we have a bit a
fundamental issue with cleaning up, e.g.

https://bugs.freedesktop.org/show_bug.cgi?id=87729

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Testcase: igt/gem_ppgtt.c/flink-and-exit-vma-leak
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Rafael Barbalho <rafael.barbalho@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
[danvet: Add a note that this isn't everything.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 101b506a 01-Oct-2015 Michel Thierry <michel.thierry@intel.com>

drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset

There are some allocations that must be only referenced by 32-bit
offsets. To limit the chances of having the first 4GB already full,
objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
DRM_MM_CREATE_TOP flags

In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
General State Heap (GSH) or Instruction State Heap (ISH) must be in a
32-bit range, because the General State Offset and Instruction State
Offset are limited to 32-bits.

Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
they can be allocated above the 32-bit address range. To limit the
chances of having the first 4GB already full, objects will use
DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.

The libdrm user of the EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag is here:
http://lists.freedesktop.org/archives/intel-gfx/2015-September/075836.html

v2: Changed flag logic from neeeds_32b, to supports_48b.
v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
to use last PIN_ defined instead of hard-coded value; use correct limit
check in eb_vma_misplaced. (Chris)
v5: Don't touch PIN_OFFSET_MASK and update workaround comment (Chris)
v6: Apply pin-high for ggtt too (Chris)
v7: Handle simultaneous pin-high and pin-mappable end correctly (Akash)
Fix check for entries currently using +4GB addresses, use min_t and
other polish in object_bind_to_vm (Chris)
v8: Commit message updated to point to libdrm patch.
v9: vmas are allocated in the correct ozone, so only check flag when the
vma has not been allocated. (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a2cad9df 16-Sep-2015 Michał Winiarski <michal.winiarski@intel.com>

drm/i915/gtt: Do not initialize drm_mm twice.

It would be initialized just moments later by i915_init_vm.

Rearrange the code such that i915_init_vm() is next to its callers
inside i915_gem_gtt (and so we can make it static). After removing the
dance around the files, it is clear that we are repeating some work
inside the initializers (such as calling drm_mm_init() multiple times),
so take advantage of the refactor to also remove some redundant code and
clean up the interface.

v2: Commit msg update,
s/i915_init_vm/i915_address_space_init, move to i915_gem_gtt.c,
init address_space during i915_gem_setup_global_gtt for ggtt.
v3: Do not init global_link - we are adding it to vm_list moments later,
make i915_address_space_init static, use OOP style parameter order.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d9072a3e 15-Sep-2015 Geliang Tang <geliangtang@163.com>

drm/i915: fix kernel-doc warnings in i915_gem.c

Fix the following 'make htmldocs' warnings:

.//drivers/gpu/drm/i915/i915_gem.c:1729: warning: No description found for parameter 'vma'
.//drivers/gpu/drm/i915/i915_gem.c:1729: warning: No description found for parameter 'vmf'

.//drivers/gpu/drm/i915/i915_gem.c:4962: warning: No description found for parameter 'old'
.//drivers/gpu/drm/i915/i915_gem.c:4962: warning: No description found for parameter 'new'
.//drivers/gpu/drm/i915/i915_gem.c:4962: warning: No description found for parameter 'frontbuffer_bits'

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# e84fe803 10-Sep-2015 Nick Hoath <nicholas.hoath@intel.com>

drm/i915: Split alloc from init for lrc

Extend init/init_hw split to context init.
- Move context initialisation in to i915_gem_init_hw
- Move one off initialisation for render ring to
i915_gem_validate_context
- Move default context initialisation to logical_ring_init

Rename intel_lr_context_deferred_create to
intel_lr_context_deferred_alloc, to reflect reduced functionality &
alloc/init split.

This patch is intended to split out the allocation of resources &
initialisation to allow easier reuse of code for resume/gpu reset.

v2: Removed function ptr wrapping of do_switch_context (Daniel Vetter)
Left ->init_context int intel_lr_context_deferred_alloc
(Daniel Vetter)
Remove unnecessary init flag & ring type test. (Daniel Vetter)
Improve commit message (Daniel Vetter)
v3: On init/reinit, set the hw next sequence number to the sw next
sequence number. This is set to 1 at driver load time. This prevents
the seqno being reset on reinit (Chris Wilson)
v4: Set seqno back to ~0 - 0x1000 at start-of-day, and increment by 0x100
on reset.
This makes it obvious which bbs are which after a reset. (David Gordon
& John Harrison)
Rebase.
v5: Rebase. Fixed rebase breakage. Put context pinning in separate
function. Removed code churn. (Thomas Daniel)
v6: Cleanup up issues introduced in v2 & v5 (Thomas Daniel)

Issue: VIZ-4798
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: David Gordon <david.s.gordon@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 87bcdd2e 10-Sep-2015 Jesse Barnes <jbarnes@virtuousgeek.org>

drm/i915: don't try to load GuC fw on pre-gen9

This avoids some bad register writes and generally feels more correct
than unconditionally trying to redirect interrupts and such.

References: https://bugs.freedesktop.org/show_bug.cgi?id=91777
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f0fea8dd 07-Sep-2015 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Remove one very outdated comment

Comment disagrees with the code which has changed a lot since
it was documented.

Note that the logic to remove -EIO handling was dropped in

commit 1488fc08c1706288616c602416654fd38c773deb
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Apr 24 15:47:31 2012 +0100

drm/i915: Remove the deferred-free list

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e5756c10 14-Aug-2015 Imre Deak <imre.deak@intel.com>

drm/i915/bxt: don't allow cached GEM mappings on A stepping

Due to a coherency issue on BXT A steppings we can't guarantee a
coherent view of cached (CPU snooped) GPU mappings, so fail such
requests. User space is supposed to fall back to uncached mappings in
this case.

v2:
- limit the WA to A steppings, on later stepping this HW issue is fixed
v3:
- return error instead of trying to work around the issue in kernel,
since that could confuse user space (Chris)

Testcast: igt/gem_store_dword_batches_loop/cached-mapping
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 33a732f4 12-Aug-2015 Alex Dai <yu.dai@intel.com>

drm/i915: GuC-specific firmware loader

This fetches the required firmware image from the filesystem,
then loads it into the GuC's memory via a dedicated DMA engine.

This patch is derived from GuC loading work originally done by
Vinit Azad and Ben Widawsky.

v2:
Various improvements per review comments by Chris Wilson

v3:
Removed 'wait' parameter to intel_guc_ucode_load() as firmware
prefetch is no longer supported in the common firmware loader,
per Daniel Vetter's request.
Firmware checker callback fn now returns errno rather than bool.

v4:
Squash uC-independent code into GuC-specifc loader [Daniel Vetter]
Don't keep the driver working (by falling back to execlist mode)
if GuC firmware loading fails [Daniel Vetter]

v5:
Clarify WOPCM-related #defines [Tom O'Rourke]
Delete obsolete code no longer required with current h/w & f/w
[Tom O'Rourke]
Move the call to intel_guc_ucode_init() later, so that it can
allocate GEM objects, and have it fetch the firmware; then
intel_guc_ucode_load() doesn't need to fetch it later.
[Daniel Vetter].

v6:
Update comment describing intel_guc_ucode_load() [Tom O'Rourke]

Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tom O'Rourke <Tom.O'Rourke@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ed75a55b 11-Aug-2015 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: clflush on pin_to_display after pwrite to UC bo in LLC

Currently we don't clflush on pin_to_display if the bo is already
UC/WT and is not in the CPU write domain. This causes problems with
pwrite since pwrite doesn't change the write domain, and it avoids
clflushing on UC/WT buffers on LLC platforms unless the buffer is
currently being scanned out.

Fix the problem by marking the cache dirty and adjusting
i915_gem_object_set_cache_level() to clflush when the cache is dirty
even if the cache_level doesn't change.

My last attempt [1] at fixing this via write domain frobbing was shot
down, but now with the cache_dirty flag we can do things in a nicer way.

[1] http://lists.freedesktop.org/archives/intel-gfx/2014-November/055390.html

v2: Drop the I915_CACHE_NONE/WT checks from pwrite

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86422
Testcase: igt/kms_pwrite_crc
Testcase: igt/gem_pwrite_snooped
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 088e0df4 07-Aug-2015 Michel Thierry <michel.thierry@intel.com>

drm/i915/gtt: Allow >= 4GB offsets in X86_32

Similar to commit c44ef60e437019b8ca1dab8b4d2e8761fd4ce1e9 ("drm/i915/gtt:
Allow >= 4GB sizes for vm"), i915_gem_obj_offset and i915_gem_obj_ggtt_offset
return an unsigned long, which in only 4-bytes long in 32-bit kernels.

Change return type (and other related offset variables) to u64.

Since Global GTT is always limited to 4GB, this change would not be required
in i915_gem_obj_ggtt_offset, but this is done for consistency.

v2: Remove unnecessary offset variable in do_pin, as we already have
vma->node.start (Chris).
Update GGTT offset too (Tvrtko).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 65bd342f 29-Jul-2015 Michel Thierry <michel.thierry@intel.com>

drm/i915: object size needs to be u64

In a 48b world, users can try to allocate buffers bigger than 4GB; in
these cases it is important that size is a 64b variable.

v2: Drop the warning about bind with size 0, it shouldn't happen anyway.
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6c246959 27-Jul-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Keep the mm.bound_list in rough LRU order

When we shrink our working sets, we want to avoid stealing pages from
objects that likely to be reused in the near future. We first look at
inactive objects before processing active objects - but what about a
recently active object that is about to be used again. That object's
position in the bound_list is ordered by the time of binding, not the
time of last use, so the most recently used inactive object could well
be at the head of the shrink list. To compensate, give the object a bump
to MRU when it becomes inactive (thus transitioning to the end of the
first pass in shrink lists). Conversely, bumping on inactive makes
bumping on active useless, since when we do have to reap from the active
working set, everything is going to become inactive very quickly and the
order pretty much random - just hope for the best at that point, as once
we start stalling on active objects, we can hope that the rebinding
neatly orders vital objects.

Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: Resolve merge conflict.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 805de8f4 23-Apr-2015 Peter Zijlstra <peterz@infradead.org>

atomic: Replace atomic_{set,clear}_mask() usage

Replace the deprecated atomic_{set,clear}_mask() usage with the now
ubiquous atomic_{or,andnot}() functions.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 41a36b73 24-Jul-2015 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Extract i915_gem_fence.c

No code changes, just moving all the fence related code into a
separate file (and avoiding a bunch of forward declarations while at
it).

Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>


# ea70299d 09-Jul-2015 Dave Gordon <david.s.gordon@intel.com>

drm/i915: Add i915_gem_object_create_from_data()

i915_gem_object_create_from_data() is a generic function to save data
from a plain linear buffer in a new pageable gem object that can later
be accessed by the CPU and/or GPU.

We will need this for the microcontroller firmware loading support code.

Derived from i915_gem_object_write(), originally by Alex Dai

v2:
Change of function: now allocates & fills a new object, rather than
writing to an existing object
New name courtesy of Chris Wilson
Explicit domain-setting and other improvements per review comments
by Chris Wilson & Daniel Vetter

v4:
Rebased

Issue: VIZ-4884
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Tom O'Rourke <Tom.O'Rourke@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5ec5b516 08-Jul-2015 Imre Deak <imre.deak@intel.com>

drm/i915: remove unused has_dma_mapping flag

After the previous patch this flag will check always clear, as it's
never set for shmem backed and userptr objects, so we can remove it.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Yeah this isn't really fixes but it's a nice cleanup to
clarify the code but not really worth the hassle of backmerging. So
just add to -fixes, we're still early in -rc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e2273302 08-Jul-2015 Imre Deak <imre.deak@intel.com>

drm/i915: avoid leaking DMA mappings

We have 3 types of DMA mappings for GEM objects:
1. physically contiguous for stolen and for objects needing contiguous
memory
2. DMA-buf mappings imported via a DMA-buf attach operation
3. SG DMA mappings for shmem backed and userptr objects

For 1. and 2. the lifetime of the DMA mapping matches the lifetime of the
corresponding backing pages and so in practice we create/release the
mapping in the object's get_pages/put_pages callback.

For 3. the lifetime of the mapping matches that of any existing GPU binding
of the object, so we'll create the mapping when the object is bound to
the first vma and release the mapping when the object is unbound from its
last vma.

Since the object can be bound to multiple vmas, we can end up creating a
new DMA mapping in the 3. case even if the object already had one. This
is not allowed by the DMA API and can lead to leaked mapping data and
IOMMU memory space starvation in certain cases. For example HW IOMMU
drivers (intel_iommu) allocate a new range from their memory space
whenever a mapping is created, silently overriding a pre-existing
mapping.

Fix this by moving the creation/removal of DMA mappings to the object's
get_pages/put_pages callbacks. These callbacks already check for and do
an early return in case of any nested calls. This way objects of the 3.
case also become more like the other object types.

I noticed this issue by enabling DMA debugging, which got disabled after
a while due to its internal mapping tables getting full. It also reported
errors in connection to random other drivers that did a DMA mapping for
an address that was previously mapped by i915 but was never released.
Besides these diagnostic messages and the memory space starvation
problem for IOMMUs, I'm not aware of this causing a real issue.

The fix is based on a patch from Chris.

v2:
- move the DMA mapping create/remove calls to the get_pages/put_pages
callbacks instead of adding new callbacks for these (Chris)
v3:
- also fix the get_page cache logic on the userptr async path (Chris)

Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 94f7bbe1 09-Jul-2015 Tomas Elf <tomas.elf@intel.com>

drm/i915: Snapshot seqno of most recently submitted request.

The hang checker needs to inspect whether or not the ring request list is empty
as well as if the given engine has reached or passed the most recently
submitted request. The problem with this is that the hang checker cannot grab
the struct_mutex, which is required in order to safely inspect requests since
requests might be deallocated during inspection. In the past we've had kernel
panics due to this very unsynchronized access in the hang checker.

One solution to this problem is to not inspect the requests directly since
we're only interested in the seqno of the most recently submitted request - not
the request itself. Instead the seqno of the most recently submitted request is
stored separately, which the hang checker then inspects, circumventing the
issue of synchronization from the hang checker entirely.

This fixes a regression introduced in

commit 44cdd6d219bc64f6810b8ed0023a4d4db9e0fe68
Author: John Harrison <John.C.Harrison@Intel.com>
Date: Mon Nov 24 18:49:40 2014 +0000

drm/i915: Convert 'ring_idle()' to use requests not seqnos

v2 (Chris Wilson):
- Pass current engine seqno to ring_idle() from i915_hangcheck_elapsed() rather
than compute it over again.
- Remove extra whitespace.

Issue: VIZ-5998
Signed-off-by: Tomas Elf <tomas.elf@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add regressing commit citation provided by Chris.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# de152b62 07-Jul-2015 Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/i915: Add origin to frontbuffer tracking flush

This will be useful to PSR and FBC once we start making
dirty fb calls to also flush frontbuffer.

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8ba319da 03-Jul-2015 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Convert intel_lr_context_pin() for requests

Pass around requests to carry context deeper in callchain.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a647828a 03-Jul-2015 Niu,Bing <bing.niu@intel.com>

drm/i915: Also perform gpu reset under execlist mode.

It is found that i915 will not reset gpu under execlist mode when
unload module. that will lead to some issues when unload/load module
with different submission mode. e.g. from execlist mode to ring
buffer mode via loading/unloading i915. Because HW is not in a reset
state and registers are not clean under such condition.

Signed-off-by: Niu,Bing <bing.niu@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ca1543be 01-Jul-2015 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Report correct GGTT space usage

Currently only normal views were accounted which under-accounts
the usage as reported in debugfs.

Introduce new helper, i915_gem_obj_total_ggtt_size, and use it
from call sites which want to know how much GGTT space are
objects using.

v2: Single loop in i915_gem_get_aperture_ioctl. (Chris Wilson)

v3: Walk GGTT active/inactive lists in i915_gem_get_aperture_ioctl
for better efficiency. (Chris Wilson, Daniel Vetter)

v4: Make i915_gem_obj_total_ggtt_size private to debugfs. (Chris Wilson)

v5: Change unsigned long to u64. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 031b698a 26-Jun-2015 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Unconditionally do fb tracking invalidate in set_domain

We can't elide the fb tracking invalidate if the buffer is already in
the right domain since that would lead to missed screen updates. I'm
pretty sure I've written this already before but must have gotten lost
unfortunately :(

v2: Chris observed that all internal set_domain users already
correctly do the fb invalidate on their own, hence we can move this
just into the set_domain ioctl instead.

v3: I screwed up setting the invalidate ORIGIN_* correctly (Chris).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c44ef60e 25-Jun-2015 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915/gtt: Allow >= 4GB sizes for vm.

We can have exactly 4GB sized ppgtt with 32bit system.
size_t is inadequate for this.

v2: Convert a lot more places (Daniel)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a5ac0f90 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Remove the now obsolete 'i915_gem_check_olr()'

As there is no OLR to check, the check_olr() function is now a no-op and can be
removed.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# fcfa423c 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Move the request/file and request/pid association to creation time

In _i915_add_request(), the request is associated with a userland client.
Specifically it is linked to the 'file' structure and the current user process
is recorded. One problem here is that the current user process is not
necessarily the same as when the request was submitted to the driver. This is
especially true when the GPU scheduler arrives and decouples driver submission
from hardware submission. Note also that it is only in the case where the add
request comes from an execbuff call that there is a client to associate. Any
other add request call is kernel only so does not need to do it.

This patch moves the client association into a separate function. This is then
called from the execbuffer code path itself at a sensible time. It also removes
the now redundant 'file' pointer from the add request parameter list.

An extra cleanup of the client association is also added to the request clean up
code for the eventuality where the request is killed after association but
before being submitted (e.g. due to out of memory error somewhere). Once the
submission has happened, the request is on the request list and the regular
request list removal will clear the association. Note that this still needs to
happen at this point in time because the request might be kept floating around
much longer (due to someone holding a reference count) and the client should not
be worrying about this request after it has been retired.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# bccca494 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Remove the now obsolete 'outstanding_lazy_request'

The outstanding_lazy_request is no longer used anywhere in the driver.
Everything that was looking at it now has a request explicitly passed in from on
high. Everything that was relying upon it behind the scenes is now explicitly
creating/passing/submitting its own private request. Thus the OLR can be
removed.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ccd98fe4 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Add *_ring_begin() to request allocation

Now that the *_ring_begin() functions no longer call the request allocation
code, it is finally safe for the request allocation code to call *_ring_begin().
This is important to guarantee that the space reserved for the subsequent
i915_add_request() call does actually get reserved.

v2: Renamed functions according to review feedback (Tomas Elf).

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5fb9de1a 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Update intel_ring_begin() to take a request structure

Now that everything above has been converted to use requests, intel_ring_begin()
can be updated to take a request instead of a ring. This also means that it no
longer needs to lazily allocate a request if no-one happens to have done it
earlier.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 599d924c 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Update ring->sync_to() to take a request structure

Updated the ring->sync_to() implementations to take a request instead of a ring.
Also updated the tracer to include the request id.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
[danvet: Rebase since I didn't merge the patch which added ->uniq.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c4e76638 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Update ring->emit_request() to take a request structure

Updated the ring->emit_request() implementation to take a request instead of a
ringbuf/request pair. Also removed its use of the OLR for obtaining the
request's seqno.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ee044a88 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Update ring->add_request() to take a request structure

Updated the various ring->add_request() implementations to take a request
instead of a ring. This removes their reliance on the OLR to obtain the seqno
value that the request should be tagged with.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4866d729 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Update flush_all_caches() to take request structures

Updated the *_ring_flush_all_caches() functions to take requests instead of
rings or ringbuf/context pairs.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6909a666 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Update l3_remap to take a request structure

Converted i915_gem_l3_remap() to take a request structure instead of a ring.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b2af0376 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Update [vma|object]_move_to_active() to take request structures

Now that everything above has been converted to use request structures, it is
possible to update the lower level move_to_active() functions to be request
based as well.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 75289874 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Update add_request() to take a request structure

Now that all callers of i915_add_request() have a request pointer to hand, it is
possible to update the add request function to take a request pointer rather
than pulling it out of the OLR.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 91af127f 18-Jun-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Update i915_gem_object_sync() to take a request structure

The plan is to pass requests around as the basic submission tracking structure
rather than rings and contexts. This patch updates the i915_gem_object_sync()
code path.

v2: Much more complex patch to share a single request between the sync and the
page flip. The _sync() function now supports lazy allocation of the request
structure. That is, if one is passed in then that will be used. If one is not,
then a request will be allocated and passed back out. Note that the _sync() code
does not necessarily require a request. Thus one will only be created until
certain situations. The reason the lazy allocation must be done within the
_sync() code itself is because the decision to need one or not is not really
something that code above can second guess (except in the case where one is
definitely not required because no ring is passed in).

The call chains above _sync() now support passing a request through which most
callers passing in NULL and assuming that no request will be required (because
they also pass in NULL for the ring and therefore can't be generating any ring
code).

The exeception is intel_crtc_page_flip() which now supports having a request
returned from _sync(). If one is, then that request is shared by the page flip
(if the page flip is of a type to need a request). If _sync() does not generate
a request but the page flip does need one, then the page flip path will create
its own request.

v3: Updated comment description to be clearer about 'to_req' parameter (Tomas
Elf review request). Rebased onto newer tree that significantly changed the
synchronisation code.

v4: Updated comments from review feedback (Tomas Elf)

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ba01cc93 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Update i915_switch_context() to take a request structure

Now that the request is guaranteed to specify the context, it is possible to
update the context switch code to use requests rather than ring and context
pairs. This patch updates i915_switch_context() accordingly.

Also removed the warning that the request's context must match the last context
switch's context. As the context switch now gets the context object from the
request structure, there is no longer any scope for the two to become out of
step.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b3dd6b96 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Update ppgtt_init_ring() & context_enable() to take requests

The final step in removing the OLR from i915_gem_init_hw() is to pass the newly
allocated request structure in to each step rather than passing a ring
structure. This patch updates both i915_ppgtt_init_ring() and
i915_gem_context_enable() to take request pointers.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# dc4be6071 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Add explicit request management to i915_gem_init_hw()

Now that a single per ring loop is being done for all the different
intialisation steps in i915_gem_init_hw(), it is possible to add proper request
management as well. The last remaining issue is that the context enable call
eventually ends up within *_render_state_init() and this does its own private
_i915_add_request() call.

This patch adds explicit request creation and submission to the top level loop
and removes the add_request() from deep within the sub-functions.

v2: Updated for removal of batch_obj from add_request call in previous patch.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 90638cc1 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Moved the for_each_ring loop outside of i915_gem_context_enable()

The start of day context initialisation code in i915_gem_context_enable() loops
over each ring and calls the legacy switch context or the execlist init context
code as appropriate.

This patch moves the ring looping out of that function in to the top level
caller i915_gem_init_hw(). This means the a single pass can be made over all
rings doing the PPGTT, L3 remap and context initialisation of each ring
altogether.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4ad2fd88 18-Jun-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Split i915_ppgtt_init_hw() in half - generic and per ring

The i915_gem_init_hw() function calls a bunch of smaller initialisation
functions. Multiple of which have generic sections and per ring sections. This
means multiple passes are done over the rings. Each pass writes data to the ring
which floats around in that ring's OLR until some random point in the future
when an add_request() is done by some random other piece of code.

This patch breaks i915_ppgtt_init_hw() in two with the per ring initialisation
now being done in i915_ppgtt_init_ring(). The ring looping is now done at the
top level in i915_gem_init_hw().

v2: Fix dumb loop variable re-use.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 73cfa865 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Update i915_gpu_idle() to manage its own request

Added explicit request creation and submission to the GPU idle code path.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5b4a60c2 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Add flag to i915_add_request() to skip the cache flush

In order to explcitly track all GPU work (and completely remove the outstanding
lazy request), it is necessary to add extra i915_add_request() calls to various
places. Some of these do not need the implicit cache flush done as part of the
standard batch buffer submission process.

This patch adds a flag to _add_request() to specify whether the flush is
required or not.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 217e46b5 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Update alloc_request to return the allocated request

The alloc_request() function does not actually return the newly allocated
request. Instead, it must be pulled from ring->outstanding_lazy_request. This
patch fixes this so that code can create a request and start using it knowing
exactly which request it actually owns.

v2: Updated for new i915_gem_request_alloc() scheme.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 40e895ce 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Set context in request from creation even in legacy mode

In execlist mode, the context object pointer is written in to the request
structure (and reference counted) at the point of request creation. In legacy
mode, this only happens inside i915_add_request().

This patch updates the legacy code path to match the execlist version. This
allows all the intermediate code between request creation and request submission
to get at the context object given only a request structure. Thus negating the
need to pass context pointers here, there and everywhere.

v2: Moved the context reference so it does not need to be undone if the
get_seqno() fails.

v3: Fixed execlist mode always hitting a warning about invalid last_contexts
(which don't exist in execlist mode).

v4: Updated for new i915_gem_request_alloc() scheme.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# bf7dc5b7 29-May-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: i915_add_request must not fail

The i915_add_request() function is called to keep track of work that has been
written to the ring buffer. It adds epilogue commands to track progress (seqno
updates and such), moves the request structure onto the right list and other
such house keeping tasks. However, the work itself has already been written to
the ring and will get executed whether or not the add request call succeeds. So
no matter what goes wrong, there isn't a whole lot of point in failing the call.

At the moment, this is fine(ish). If the add request does bail early on and not
do the housekeeping, the request will still float around in the
ring->outstanding_lazy_request field and be picked up next time. It means
multiple pieces of work will be tagged as the same request and driver can't
actually wait for the first piece of work until something else has been
submitted. But it all sort of hangs together.

This patch series is all about removing the OLR and guaranteeing that each piece
of work gets its own personal request. That means that there is no more
'hoovering up of forgotten requests'. If the request does not get tracked then
it will be leaked. Thus the add request call _must_ not fail. The previous patch
should have already ensured that it _will_ not fail by removing the potential
for running out of ring space. This patch enforces the rule by actually removing
the early exit paths and the return code.

Note that if something does manage to fail and the epilogue commands don't get
written to the ring, the driver will still hang together. The request will be
added to the tracking lists. And as in the old case, any subsequent work will
generate a new seqno which will suffice for marking the old one as complete.

v2: Improved WARNings (Tomas Elf review request).

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 29b1b415 18-Jun-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Reserve ring buffer space for i915_add_request() commands

It is a bad idea for i915_add_request() to fail. The work will already have been
send to the ring and will be processed, but there will not be any tracking or
management of that work.

The only way the add request call can fail is if it can't write its epilogue
commands to the ring (cache flushing, seqno updates, interrupt signalling). The
reasons for that are mostly down to running out of ring buffer space and the
problems associated with trying to get some more. This patch prevents that
situation from happening in the first place.

When a request is created, it marks sufficient space as reserved for the
epilogue commands. Thus guaranteeing that by the time the epilogue is written,
there will be plenty of space for it. Note that a ring_begin() call is required
to actually reserve the space (and do any potential waiting). However, that is
not currently done at request creation time. This is because the ring_begin()
code can allocate a request. Hence calling begin() from the request allocation
code would lead to infinite recursion! Later patches in this series remove the
need for begin() to do the allocate. At that point, it becomes safe for the
allocate to call begin() and really reserve the space.

Until then, there is a potential for insufficient space to be available at the
point of calling i915_add_request(). However, that would only be in the case
where the request was created and immediately submitted without ever calling
ring_begin() and adding any work to that request. Which should never happen. And
even if it does, and if that request happens to fall down the tiny window of
opportunity for failing due to being out of ring space then does it really
matter because the request wasn't doing anything in the first place?

v2: Updated the 'reserved space too small' warning to include the offending
sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added
re-initialisation of tracking state after a buffer wrap to keep the sanity
checks accurate.

v3: Incremented the reserved size to accommodate Ironlake (after finally
managing to run on an ILK system). Also fixed missing wrap code in LRC mode.

v4: Added extra comment and removed duplicate WARN (feedback from Tomas).

For: VIZ-5115
CC: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 77a0d1ca 18-Jun-2015 Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/i915: Remove unused ring argument from frontbuffer invalidate and busy functions.

This patch doesn't have any functional change, but organize fruntbuffer
invalidate and busy by removing unecesarry signature argument for ring.

It was unsed on mark_fb_busy and only used on fb_obj_invalidate for the
same ORIGIN_CS usage. So let's clean it a bit

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 245ec9d8 14-Jun-2015 Jani Nikula <jani.nikula@intel.com>

Revert "drm/i915: Don't skip request retirement if the active list is empty"

This reverts commit 0aedb1626566efd72b369c01992ee7413c82a0c5.

I messed things up while applying [1] to drm-intel-fixes. Rectify.

[1] http://mid.gmane.org/1432827156-9605-1-git-send-email-ville.syrjala@linux.intel.com

Fixes: 0aedb1626566 ("drm/i915: Don't skip request retirement if the active list is empty")
Cc: stable@vger.kernel.org
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 11ee9615 28-May-2015 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Don't skip request retirement if the active list is empty

Apparently we can have requests even if though the active list is empty,
so do the request retirement regardless of whether there's anything
on the active list.

The way it happened here is that during suspend intel_ring_idle()
notices the olr hanging around and then proceeds to get rid of it by
adding a request. However since there was nothing on the active lists
i915_gem_retire_requests() didn't clean those up, and so the idle work
never runs, and we leave the GPU "busy" during suspend resulting in a
WARN later.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 016a65a3 11-Jun-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Always reset vma->ggtt_view.pages cache on unbinding

With the introduction of multiple views of an obj in the same vm, each
vma was taught to cache its copy of the pages (so that different views
could have different page arrangements). However, this missed decoupling
those vma->ggtt_view.pages when the vma released its reference on the
obj->pages. As we don't always free the vma, this leads to a possible
scenario (e.g. execbuffer interrupted by the shrinker) where the vma
points to a stale obj->pages, and explodes.

Fixes regression from commit fe14d5f4e5468c5b80a24f1a64abcbe116143670
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date: Wed Dec 10 17:27:58 2014 +0000

drm/i915: Infrastructure for supporting different GGTT views per object

Tvrtko says, if someone else will be confused how this can happen, key
is the reservation execbuffer path. That puts the VMA on the exec_list
which prevents i915_vma_unbind and i915_gem_vma_destroy from fully
destroying the VMA. So the VMA is left existing as an empty object in
the list - unbound and disassociated with the backing store. Kind of a
cached memory object. And then re-using it needs to clear the cached
pages pointer which is fixed above.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1227892
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
[Jani: Added Tvrtko's explanation to commit message.]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 0aedb162 28-May-2015 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Don't skip request retirement if the active list is empty

Apparently we can have requests even if though the active list is empty,
so do the request retirement regardless of whether there's anything
on the active list.

The way it happened here is that during suspend intel_ring_idle()
notices the olr hanging around and then proceeds to get rid of it by
adding a request. However since there was nothing on the active lists
i915_gem_retire_requests() didn't clean those up, and so the idle work
never runs, and we leave the GPU "busy" during suspend resulting in a
WARN later.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 8d3afd7d 21-May-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use spinlocks for checking when to waitboost

In commit 1854d5ca0dd7a9fc11243ff220a3e93fce2b4d3e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Apr 7 16:20:32 2015 +0100

drm/i915: Deminish contribution of wait-boosting from clients

we removed an atomic timer based check for allowing waitboosting and
moved it below the mutex taken during RPS. However, that mutex can be
held for long periods of time on Vallyview/Cherryview as communication
with the PCU is slow. As clients may frequently wait for results (e.g.
such as tranform feedback) we introduced contention between the client
and the RPS worker. We can take advantage of the RPS worker, by
switching the wait boost decision to use spin locks and defer the
actual reclocking to the worker.

Fixes a regression of up to 45% on Baytrail and Baswell!

v2 (Daniel):
- Use max_freq_softlimit instead of the not-yet-merged boost
frequency.
- Don't inject a fake irq into the boost work, instead treat
client_boost as just another legit waker.

v3: Drop the now unused mask (Chris).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90112
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d0bc54f2 21-May-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Introduce DRM_I915_THROTTLE_JIFFIES

As Daniel commented on

commit b7ffe1362c5f468b853223acc9268804aa92afc8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Apr 27 13:41:24 2015 +0100

drm/i915: Free RPS boosts for all laggards

it is better to be explicit when sharing hardcoded values such as
throttle/boost timeouts. Make it so!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 9a0c1e27 21-May-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use the correct destructor for freeing requests on error

After allocating from the slab cache, we then need to free the request
back into the slab cache upon error (and not call kfree as that leads
to eventual memory corruption).

Fixes regression from
commit efab6d8dd158fdccbe6a030f89fbf9ca0a9564e4
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Apr 7 16:20:57 2015 +0100

drm/i915: Use a separate slab for requests

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e61b9958 27-Apr-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Free RPS boosts for all laggards

If the client stalls on a congested request, chosen to be 20ms old to
match throttling, allow the client a free RPS boost.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: s/rq/req/]
[danvet: s/0/NULL/ reported by 0-day build]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 2e1b8730 27-Apr-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Convert RPS tracking to a intel_rps_client struct

Now that we have internal clients, rather than faking a whole
drm_i915_file_private just for tracking RPS boosts, create a new struct
intel_rps_client and pass it along when waiting.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: s/rq/req/]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a6f766f3 27-Apr-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts

Ring switches can occur many times per frame, and are often out of
control, causing frequent RPS boosting for no practical benefit. Treat
the sw semaphore synchronisation as a separate client and only allow it
to boost once per busy/idle cycle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: s/rq/req/]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b4716185 27-Apr-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Implement inter-engine read-read optimisations

Currently, we only track the last request globally across all engines.
This prevents us from issuing concurrent read requests on e.g. the RCS
and BCS engines (or more likely the render and media engines). Without
semaphores, we incur costly stalls as we synchronise between rings -
greatly impacting the current performance of Broadwell versus Haswell in
certain workloads (like video decode). With the introduction of
reference counted requests, it is much easier to track the last request
per ring, as well as the last global write request so that we can
optimise inter-engine read read requests (as well as better optimise
certain CPU waits).

v2: Fix inverted readonly condition for nonblocking waits.
v3: Handle non-continguous engine array after waits
v4: Rebase, tidy, rewrite ring list debugging
v5: Use obj->active as a bitfield, it looks cool
v6: Micro-optimise, mostly involving moving code around
v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
v8: Rebase
v9: Refactor i915_gem_object_sync() to allow the compiler to better
optimise it.

Benchmark: igt/gem_read_read_speed
hsw:gt3e (with semaphores):
Before: Time to read-read 1024k: 275.794µs
After: Time to read-read 1024k: 123.260µs

hsw:gt3e (w/o semaphores):
Before: Time to read-read 1024k: 230.433µs
After: Time to read-read 1024k: 124.593µs

bdw-u (w/o semaphores): Before After
Time to read-read 1x1: 26.274µs 10.350µs
Time to read-read 128x128: 40.097µs 21.366µs
Time to read-read 256x256: 77.087µs 42.608µs
Time to read-read 512x512: 281.999µs 181.155µs
Time to read-read 1024x1024: 1196.141µs 1118.223µs
Time to read-read 2048x2048: 5639.072µs 5225.837µs
Time to read-read 4096x4096: 22401.662µs 21137.067µs
Time to read-read 8192x8192: 89617.735µs 85637.681µs

Testcase: igt/gem_concurrent_blit (read-read and friends)
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8]
[danvet: s/\<rq\>/req/g]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# eed29a5b 21-May-2015 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: s/\<rq\>/req/g

The merged seqno->request conversion from John called request
variables req, but some (not all) of Chris' recent patches changed
those to just rq. We've had a lenghty (and inconclusive) discussion on
irc which is the more meaningful name with maybe at most a slight bias
towards req.

Given that the "don't change names without good reason to avoid
conflicts" rule applies, so lets go back to a req everywhere for
consistency. I'll sed any patches for which this will cause conflicts
before applying.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
[danvet: s/origina/merged/ as pointed out by Chris - the first
mass-conversion patch was from Chris, the merged one from John.]
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>


# 2e2f351d 27-Apr-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove domain flubbing from i915_gem_object_finish_gpu()

We no longer interpolate domains in the same manner, and even if we did,
we should trust setting either of the other write domains would trigger
an invalidation rather than force it. Remove the tweaking of the
read_domains since it serves no purpose and use
i915_gem_object_wait_rendering() directly.

Note that this goes back to

commit a8198eea156df47e0e843ac5c7d4c8774e121c42
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Apr 13 22:04:09 2011 +0100

drm/i915: Introduce i915_gem_object_finish_gpu()

and gpu domain tracking died in

commit cc889e0f6ce6a63c62db17d702ecfed86d58083f
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Wed Jun 13 20:45:19 2012 +0200

drm/i915: disable flushing_list/gpu_write_list

which is more than 1 year older.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add notes with information dug out of git history.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5e024f31 08-May-2015 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Remove unused variable from i915_gem_mmap_gtt

Lost in

commit c5ad54cf7dd8922bd1cee2d5871aebf73dc9638e
Author: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Date: Wed May 6 14:36:09 2015 +0300

drm/i915: Use partial view in mmap fault handler

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>


# e7ded2d7 08-May-2015 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Reject huge tiled objects

We do not yet support tiled objects bigger than the mappable
aperture size so reject them.

Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
[danvet: Rework the check a bit to avoid warnings.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c5ad54cf 06-May-2015 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Use partial view in mmap fault handler

Use partial view for huge BOs (bigger than half the mappable aperture)
in fault handler so that they can be accessed withough trying to make
room for them by evicting other objects.

v2:
- Only use partial views in the case where early rejection was
previously done.
- Account variable type changes from previous reroll.
v3:
- Add a comment about overwriting existing page entries.
(Tvrtko Ursulin)
- Whitespace fixes.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a6631ae1 06-May-2015 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Consider object pinned if any VMA is pinned

Do not skip special GGTT views when considering whether an object
is pinned or not.

Wrong behaviour was introduced in;

commit ec7adb6ee79c8c9fe64d63ad638a31cd62e55515
Author: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Date: Mon Mar 16 14:11:13 2015 +0200

drm/i915: Do not use ggtt_view with (aliasing) PPGTT

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 91e6711e 06-May-2015 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Do not make assumptions on GGTT VMA sizes

GGTT VMA sizes might be smaller than the whole object size due to
different GGTT views.

v2:
- Separate GGTT view constraint calculations from normal view
constraint calculations (Chris Wilson)
v3:
- Do not bother with debug wording. (Tvrtko Ursulin)
v4:
- Clearer logic for calculating map_and_fenceable (Tvrtko Ursulin)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
[danvet: Drop BUG_ON, it's redudant.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 432be69d 06-May-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove locking for get-caching query

Reading a single value from the object, the locking only provides futile
protection against userspace races. The locking is useless so remove it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5e562f1d 30-Apr-2015 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Clear vma->bound on unbinding

Unbinding doesn't always lead to unconditional destruction
of vma. This destruction avoidance happens if vma is part of
execbuffer relocation list or if vma is being considered for
eviction in i915_gem_evict_something().

For those other users, mark the vma unbound so that
the correct state of this vma is preserved.

Reported-by: Chris Wilson <chris@chris-wilson.co.ok>
Cc: Chris Wilson <chris@chris-wilson.co.ok>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 53292cdb 15-Apr-2015 Michel Thierry <michel.thierry@intel.com>

drm/i915: Workaround to avoid lite restore with HEAD==TAIL

WaIdleLiteRestore is an execlists-only workaround, and requires the driver
to ensure that any context always has HEAD!=TAIL when attempting lite
restore.

Add two extra MI_NOOP instructions at the end of each request, but keep
the requests tail pointing before the MI_NOOPs. We may not need to
executed them, and this is why request->tail is sampled before adding
these extra instructions.

If we submit a context to the ELSP which has previously been submitted,
move the tail pointer past the MI_NOOPs. This ensures HEAD!=TAIL.

v2: Move overallocation to gen8_emit_request, and added note about
sampling request->tail in commit message (Chris).

v3: Remove redundant request->tail assignment in __i915_add_request, in
lrc mode this is already set in execlists_context_queue.
Do not add wa implementation details inside gem (Chris).

v4: Apply the wa whenever the req has been resubmitted and update
comment (Chris).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 0875546c 20-Apr-2015 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Fix up the vma aliasing ppgtt binding

Currently we have the problem that the decision whether ptes need to
be (re)written is splattered all over the codebase. Move all that into
i915_vma_bind. This needs a few changes:
- Just reuse the PIN_* flags for i915_vma_bind and do the conversion
to vma->bound in there to avoid duplicating the conversion code all
over.
- We need to make binding for EXECBUF (i.e. pick aliasing ppgtt if
around) explicit, add PIN_USER for that.
- Two callers want to update ptes, give them a PIN_UPDATE for that.

Of course we still want to avoid double-binding, but that should be
taken care of:
- A ppgtt vma will only ever see PIN_USER, so no issue with
double-binding.
- A ggtt vma with aliasing ppgtt needs both types of binding, and we
track that properly now.
- A ggtt vma without aliasing ppgtt could be bound twice. In the
lower-level ->bind_vma functions hence unconditionally set
GLOBAL_BIND when writing the ggtt ptes.

There's still a bit room for cleanup, but that's for follow-up
patches.

v2: Fixup fumbles.

v3: s/PIN_EXECBUF/PIN_USER/ for clearer meaning, suggested by Chris.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>


# cd102a68 14-Apr-2015 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Remove misleading comment around bind_to_vm

It's true that we might need to context switch, but both the signalling
and implementation of the same are a few source files away. Remove it.

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 777dc5bb 14-Apr-2015 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Move vma vfuns to adddress_space

They change with the address space and not with each vma, so move them
into the right pile of vfuncs. Save 2 pointers per vma and clarifies
the code.

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>


# 8a0c39b1 13-Apr-2015 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Simplify and fix object to display tracking

Purpose of this tracking is to know when to flush the cache between
the CPU and the non-coherent display engine. Prior to:

commit 121920faf2ccce9aa66a7e2588415c9647b66104
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date: Mon Mar 23 11:10:37 2015 +0000

drm/i915/skl: Query display address through a wrapper

This worked by a mix of direct flag manipulation and checking for
existence of a pinned GGTT VMA.

With the introduction of rotated display mappings this approach is
no longer correct.

New simpler approach is to just keep this count over calls which pin
and unpin objects to and from display, at the slight cost of extra
space in every bo.

(Inspired and extracted code from a larger rework by Chris Wilson.)

v2: Remove the limit since it is not well defined. (Chris Wilson, Ville Syrjälä)
v3: Commit message corrections. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5678ad73 17-Mar-2015 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Fix view type in warning message

One month passed between posting a patch and it getting merged, and
unfortunately even though it still applies, it needs fixing to account
for changes in function parameters since:

commit d385612e15b8b6eb3db328d83f1872ef8a381788
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date: Tue Mar 17 14:45:29 2015 +0000

drm/i915: Log view type when printing warnings

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
[danvet: Squash in fixup from Tvrtko to fix the rebase conflict.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 30154650 07-Apr-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove obj->pin_mappable

The obj->pin_mappable flag only exists for debug purposes and is a
hindrance that is mistreated with rotated GGTT views. For debug
purposes, it suffices to mark objects with pin_display as being of note.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 2def4ad9 07-Apr-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Optimistically spin for the request completion

This provides a nice boost to mesa in swap bound scenarios (as mesa
throttles itself to the previous frame and given the scenario that will
complete shortly). It will also provide a good boost to systems running
with semaphores disabled and so frequently waiting on the GPU as it
switches rings. In the most favourable of microbenchmarks, this can
increase performance by around 15% - though in practice improvements
will be marginal and rarely noticeable.

v2: Account for user timeouts
v3: Limit the spinning to a single jiffie (~1us) at most. On an
otherwise idle system, there is no scheduler contention and so without a
limit we would spin until the GPU is ready.
v4: Drop forcewake - the lazy coherent access doesn't require it, and we
have no reason to believe that the forcewake itself improves seqno
coherency - it only adds delay.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1d335d1b 10-Apr-2015 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Move vm page allocation in proper place

Move to i915_vma_bind as it is part of the binding.

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d7b9ca2f 07-Apr-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove request->uniq

We already assign a unique identifier to every request: seqno. That
someone felt like adding a second one without even mentioning why and
tweaking ABI smells very fishy.

Fixes regression from
commit b3a38998f042b862f5ba4d7f2268f3a8dfb4883a
Author: Nick Hoath <nicholas.hoath@intel.com>
Date: Thu Feb 19 16:30:47 2015 +0000

drm/i915: Fix a use after free, and unbalanced refcounting

v2: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Nick Hoath <nicholas.hoath@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jani Nikula <jani.nikula@intel.com>
[danvet: Fixup because different merge order.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 423795cb 07-Apr-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prefer to check for idleness in worker rather than sync-flush

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e20d2ab7 07-Apr-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use a separate slab for vmas

vma are more frequently allocated than objects and so should equally
benefit from having a dedicated slab.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# efab6d8d 07-Apr-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use a separate slab for requests

requests are even more frequently allocated than objects and equally
benefit from having a dedicated slab.

v2: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4bb1bedb 07-Apr-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use the global runtime-pm wakelock for a busy GPU for execlists

When we submit a request to the GPU, we first take the rpm wakelock, and
only release it once the GPU has been idle for a small period of time
after all requests have been complete. This means that we are sure no
new interrupt can arrive whilst we do not hold the rpm wakelock and so
can drop the individual get/put around every single request inside
execlists.

Note: to close one potential issue we should mark the GPU as busy
earlier in __i915_add_request.

To elaborate: The issue is that we emit the irq signalling sequence
before we grab the rpm reference, which means we could miss the
resulting interrupt (since that's not set up when suspended). The only
bad side effect is a missed interrupt, gt mmio writes automatically
wake up the hw itself. But otoh we have an umbrella rpm reference for
the entirety of execbuf, as long as that's there we're covered.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Explain a bit more about the add_request issue, which after
some irc chatting with Chris turns out to not be an issue really.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8d9d5744 07-Apr-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Split batch pool into size buckets

Now with the trimmed memcpy before the command parser, we try to
allocate many different sizes of batches, predominantly one or two
pages. We can therefore speed up searching for a good sized batch by
keeping the objects of buckets of roughly the same size.

v2: Add a comment about bucket sizes

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 35c94185 07-Apr-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Free batch pool when idle

At runtime, this helps ensure that the batch pools are kept trim and
fast. Then at suspend, this releases memory that we do not need to
restore. It also ties into the oom-notifier to ensure that we recover as
much kernel memory as possible during OOM.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 06fbca71 07-Apr-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Split the batch pool by engine

I woke up one morning and found 50k objects sitting in the batch pool
and every search seemed to iterate the entire list... Painting the
screen in oils would provide a more fluid display.

One issue with the current design is that we only check for retirements
on the current ring when preparing to submit a new batch. This means
that we can have thousands of "active" batches on another ring that we
have to walk over. The simplest way to avoid that is to split the pools
per ring and then our LRU execution ordering will also ensure that the
inactive buffers remain at the front.

v2: execlists still requires duplicate code.
v3: execlists requires more duplicate code

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 7c27f525 07-Apr-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Re-enable RPS wait-boosting for all engines

This reverts commit ec5cc0f9b019af95e4571a9fa162d94294c8d90b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu Jun 12 10:28:55 2014 +0100

drm/i915: Restrict GPU boost to the RCS engine

The premise that media/blitter workloads are not affected by boosting is
patently false with a trip through igt. The question that remains is
what exactly is going wrong with the media workload that prompted this?
Hopefully that would be fixed by the missing agressive downclocking, in
addition to the extra restrictions imposed on how frequent a process is
allowed to boost.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll>
Acked-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1854d5ca 07-Apr-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Deminish contribution of wait-boosting from clients

With boosting for missed pageflips, we have a much stronger indication
of when we need to (temporarily) boost GPU frequency to ensure smooth
delivery of frames. So now only allow each client to perform one RPS boost
in each period of GPU activity due to stalling on results.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ee286370 07-Apr-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Cache last obj->pages location for i915_gem_object_get_page()

The biggest user of i915_gem_object_get_page() is the relocation
processing during execbuffer. Typically userspace passes in a set of
relocations in sorted order. Sadly, we alternate between relocations
increasing from the start of the buffers, and relocations decreasing
from the end. However the majority of consecutive lookups will still be
in the same page. We could cache the start of the last sg chain, however
for most callers, the entire sgl is inside a single chain and so we see
no improve from the extra layer of caching.

v2: Avoid the double increment inside unlikely()

References: https://bugs.freedesktop.org/show_bug.cgi?id=88308
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6689cb2b 18-Mar-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Move common request allocation code into a common function

The request allocation code is largely duplicated between legacy mode and
execlist mode. The actual difference between the two versions of the code is
pretty minimal.

This patch moves the common code out into a separate function. This is then
called by the execution specific version prior to setting up the one different
value.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f3dc74c0 18-Mar-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Rename 'do_execbuf' to 'execbuf_submit'

The submission portion of the execbuffer code path was abstracted into a
function pointer indirection as part of the legacy vs execlist work. The two
implementation functions are called 'i915_gem_ringbuffer_submission' and
'intel_execlists_submission' but the pointer was called 'do_execbuf'. There is
already a 'i915_gem_do_execbuffer' function (which is what calls the pointer
indirection). The name of the pointer is therefore considered to be backwards
and should be changed.

This patch renames it to 'execbuf_submit' which is hopefully a bit clearer.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 41037f9f 27-Mar-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add i915_gem_request_unreference__unlocked

We were missing a convenience stub to aquire the right mutex whilst
dropping the request, so add it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 9abc4648 27-Mar-2015 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Compare GGTT view structs instead of types

To allow for views where the view type is not defined by the view type only,
like it is in stereo or rotated 90 degree view, change the semantic to require
the whole view structure for comparison when we match a GGTT view.

This allows including parameters like offset to be included in the view which
is useful for eg. partial views.

v3:
- Rely on ggtt_view type being 0 for non-GGTT vma's, which equals to
I915_GGTT_VIEW_NORMAL. (Daniel Vetter)
- Do not use potentially slower comparison when we only want to know if
something is or is not a normal view.
- Rebase on top of rotated view patches. Add rotated view singleton.
- If one view is missing in comparison they're equal only if both are missing.

v4:
- Use comparison helper in obj_to_ggtt_view too. (Tvrtko Ursulin)
- Do WARN_ON if one view is NULL. (Tvrtko Ursulin)

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 72744cb1 24-Mar-2015 Michel Thierry <michel.thierry@intel.com>

drm/i915: Add dynamic page trace events

Traces for page directories and tables allocation and map.

v2: Removed references to teardown.
v3: bitmap_scnprintf has been deprecated.
v4: Replace bitmap_scnprintf with scnprintf correctly, and get right
range lengths. (Mika)

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 832a3aad 18-Mar-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Keep ring->active_list and ring->requests_list consistent

If we retire requests last, we may use a later seqno and so clear
the requests lists without clearing the active list, leading to
confusion. Hence we should retire requests first for consistency with
the early return. The order used to be important as the lifecycle for
the object on the active list was determined by request->seqno. However,
the requests themselves are now reference counted removing the
constraint from the order of retirement.

Fixes regression from

commit 1b5a433a4dd967b125131da42b89b5cc0d5b1f57
Author: John Harrison <John.C.Harrison@Intel.com>
Date: Mon Nov 24 18:49:42 2014 +0000

drm/i915: Convert 'i915_seqno_passed' calls into 'i915_gem_request_completed
'

and a

WARNING: CPU: 0 PID: 1383 at drivers/gpu/drm/i915/i915_gem_evict.c:279 i915_gem_evict_vm+0x10c/0x140()
WARN_ON(!list_empty(&vm->active_list))

Identified by updating WATCH_LISTS:

[drm:i915_verify_lists] *ERROR* blitter ring: active list not empty, but no requests
WARNING: CPU: 0 PID: 681 at drivers/gpu/drm/i915/i915_gem.c:2751 i915_gem_retire_requests_ring+0x149/0x230()
WARN_ON(i915_verify_lists(ring->dev))

Note that this is only a problem in evict_vm where the following happens
after a retire_request has cleaned out all requests, but not all active
bo:
- intel_ring_idle called from i915_gpu_idle notices that no requests are
outstanding and immediately returns.
- i915_gem_retire_requests_ring called from i915_gem_retire_requests also
immediately returns when there's no request, still leaving the bo on the
active list.
- evict_vm hits the WARN_ON(!list_empty(&vm->active_list)) after evicting
all active objects that there's still stuff left that shouldn't be
there.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 50470bb0 23-Mar-2015 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915/skl: Support secondary (rotated) frame buffer mapping

90/270 rotated scanout needs a rotated GTT view of the framebuffer.

This is put in a separate VMA with a dedicated ggtt view and wired such that
it is created when a framebuffer is pinned to a 90/270 rotated plane.

Rotation is only possible with Yb/Yf buffers and error is propagated to
user space in case of a mismatch.

Special rotated page view is constructed at the VMA creation time by
borrowing the DMA addresses from obj->pages.

v2:
* Do not bother with pages for rotated sg list, just populate the DMA
addresses. (Daniel Vetter)
* Checkpatch cleanup.

v3:
* Rebased on top of new plane handling (create rotated mapping when
setting the rotation property).
* Unpin rotated VMA on unpinning from display plane.
* Simplify rotation check using bitwise AND. (Chris Wilson)

v4:
* Fix unpinning of optional rotated mapping so it is really considered
to be optional.

v5:
* Rebased for fb modifier changes.
* Rebased for atomic commit.
* Only pin needed view for display. (Ville Syrjälä, Daniel Vetter)

v6:
* Rebased after preparatory work has been extracted out. (Daniel Vetter)

v7:
* Slightly simplified tiling geometry calculation.
* Moved rotated GGTT view implementation into i915_gem_gtt.c (Daniel Vetter)

v8:
* Do not use i915_gem_obj_size to get object size since that actually
returns the size of an VMA which may not exist.
* Rebased for ggtt view changes.

v9:
* Rebased after code review changes on the preceding patches.
* Tidy function definitions. (Joonas Lahtinen)

For: VIZ-4726
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com> (v4)
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e6617330 23-Mar-2015 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Use GGTT view when (un)pinning objects to planes

To support frame buffer rotation we need to be able to pass on the information
on what kind of GGTT view is required for display.

This patch just adds the parameter and makes all the callers default to the
normal view.

v2: Rebased for ggtt view changes.
v3: Don't limit PIN_MAPPABLE to normal views just yet. (Joonas Lahtinen)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v3)
[danvet: s/BUG/WARN/ in the patch hunk because. At least where the
BUG_ON isn't fatal right away.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 563222a7 18-Mar-2015 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Track page table reload need

This patch was formerly known as, "Force pd restore when PDEs change,
gen6-7." I had to change the name because it is needed for GEN8 too.

The real issue this is trying to solve is when a new object is mapped
into the current address space. The GPU does not snoop the new mapping
so we must do the gen specific action to reload the page tables.

GEN8 and GEN7 do differ in the way they load page tables for the RCS.
GEN8 does so with the context restore, while GEN7 requires the proper
load commands in the command streamer. Non-render is similar for both.

Caveat for GEN7
The docs say you cannot change the PDEs of a currently running context.
We never map new PDEs of a running context, and expect them to be
present - so I think this is okay. (We can unmap, but this should also
be okay since we only unmap unreferenced objects that the GPU shouldn't
be tryingto va->pa xlate.) The MI_SET_CONTEXT command does have a flag
to signal that even if the context is the same, force a reload. It's
unclear exactly what this does, but I have a hunch it's the right thing
to do.

The logic assumes that we always emit a context switch after mapping new
PDEs, and before we submit a batch. This is the case today, and has been
the case since the inception of hardware contexts. A note in the comment
let's the user know.

It's not just for gen8. If the current context has mappings change, we
need a context reload to switch

v2: Rebased after ppgtt clean up patches. Split the warning for aliasing
and true ppgtt options. And do not break aliasing ppgtt, where to->ppgtt
is always null.

v3: Invalidate PPGTT TLBs inside alloc_va_range.

v4: Rename ppgtt_invalidate_tlbs to mark_tlbs_dirty and move
pd_dirty_rings from i915_address_space to i915_hw_ppgtt. Fixes when
neither ctx->ppgtt and aliasing_ppgtt exist.

v5: Removed references to teardown_va_range.

v6: Updated needs_pd_load_pre/post.

v7: Fix pd_dirty_rings check in needs_pd_load_post, and update/move
comment about updated PDEs to object_pin/bind (Mika).

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 678d96fb 16-Mar-2015 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Track GEN6 page table usage

Instead of implementing the full tracking + dynamic allocation, this
patch does a bit less than half of the work, by tracking and warning on
unexpected conditions. The tracking itself follows which PTEs within a
page table are currently being used for objects. The next patch will
modify this to actually allocate the page tables only when necessary.

With the current patch there isn't much in the way of making a gen
agnostic range allocation function. However, in the next patch we'll add
more specificity which makes having separate functions a bit easier to
manage.

One important change introduced here is that DMA mappings are
created/destroyed at the same page directories/tables are
allocated/deallocated.

Notice that aliasing PPGTT is not managed here. The patch which actually
begins dynamic allocation/teardown explains the reasoning for this.

v2: s/pdp.page_directory/pdp.page_directories
Make a scratch page allocation helper

v3: Rebase and expand commit message.

v4: Allocate required pagetables only when it is needed, _bind_to_vm
instead of bind_vma (Daniel).

v5: Rebased to remove the unnecessary noise in the diff, also:
- PDE mask is GEN agnostic, renamed GEN6_PDE_MASK to I915_PDE_MASK.
- Removed unnecessary checks in gen6_alloc_va_range.
- Changed map/unmap_px_single macros to use dma functions directly and
be part of a static inline function instead.
- Moved drm_device plumbing through page tables operation to its own
patch.
- Moved allocate/teardown_va_range calls until they are fully
implemented (in subsequent patch).
- Merged pt and scratch_pt unmap_and_free path.
- Moved scratch page allocator helper to the patch that will use it.

v6: Reduce complexity by not tearing down pagetables dynamically, the
same can be achieved while freeing empty vms. (Daniel)

v7: s/i915_dma_map_px_single/i915_dma_map_single
s/gen6_write_pdes/gen6_write_pde
Prevent a NULL case when only GGTT is available. (Mika)

v8: Rebased after s/page_tables/page_table/.

v9: Reworked i915_pte_index and i915_pte_count.
Also exercise bitmap allocation here (gen6_alloc_va_range) and fix
incorrect write_page_range in i915_gem_restore_gtt_mappings (Mika).

Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# be6a0376 18-Mar-2015 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Extract i915_gem_shrinker.c

Two code changes:
- Extract i915_gem_shrinker_init.
- Inline i915_gem_object_is_purgeable since we open-code it everywhere
else too.

This already has the benefit of pulling all the shrinker code
together, next patch adds a bit of kerneldoc.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6fafab76 17-Mar-2015 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Turn on PIN_GLOBAL in i915_gem_object_ggtt_pin

This makes the interface consistent to old i915_gem_obj_ggtt_pin.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ec7adb6e 16-Mar-2015 Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

drm/i915: Do not use ggtt_view with (aliasing) PPGTT

GGTT views are only applicable when dealing with GGTT. Change the code to
reject ggtt_view where it should not be used and require it when it should
be.

v2:
- Dropped _ppgtt_ infixes, allow both types to be passed
- Disregard other but normal views when no view is specified
- More checks that valid parameters are passed
- More readable error checking

v3:
- Prefer WARN_ONCE over BUG_ON when there is code path for failure

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
[danvet: Drop unecessary forward decl from earlier patch iterations.]
[danvet: Remove unused variable spotted by Tvrtko.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 063e4e6b 13-Feb-2015 Paulo Zanoni <paulo.r.zanoni@intel.com>

drm/i915: also do frontbuffer tracking on pwrites

We need this for FBC, and possibly for PSR too.

v2: Don't only flush: invalidate too (Daniel).

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a4001f1b 13-Feb-2015 Paulo Zanoni <paulo.r.zanoni@intel.com>

drm/i915: pass which operation triggered the frontbuffer tracking

We want to port FBC to the frontbuffer tracking infrastructure, but
for that we need to know what caused the object invalidation so
we can react accordingly: CPU mmaps need manual, GTT mmaps and
flips don't need handling and ring rendering needs nukes.

v2: - s/ORIGIN_RENDER/ORIGIN_CS/ (Daniel, Rodrigo)
- Fix copy/pasted wrong documentation
- Rebase
v3: - Rebase
v4: - Don't pass the operation to flushes (Daniel).

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5e4f5189 13-Feb-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prevent TLB error on first execution on SNB

Long ago I found that I was getting sporadic errors when booting SNB,
with the symptom being that the first batch died with IPEHR != *ACTHD,
typically caused by the TLB being invalid. These magically disappeared
if I held the forcewake during the entire ring initialisation sequence.
(It can probably be shortened to a short critical section, but the whole
initialisation is full of register writes and so we would be taking and
releasing forcewake almost continually, and so holding it over the
entire sequence will probably be a net win!)

Note some of the kernels I encounted the issue already had the deferred
forcewake release, so it is still relevant.

I know that there have been a few other reports with similar failure
conditions on SNB, I think such as
References: https://bugs.freedesktop.org/show_bug.cgi?id=80913

v2: Wrap i915_gem_init_hw() with its own security blanket as we take
that path following resume and reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 762e4583 04-Mar-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make WAIT_IOCTL negative timeouts be indefinite again

This fixes a regression from

commit 5ed0bdf21a85d78e04f89f15ccf227562177cbd9
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Wed Jul 16 21:05:06 2014 +0000

drm: i915: Use nsec based interfaces

that made a negative timeout return immediately rather than the
previously defined behaviour of waiting indefinitely.

Testcase: igt/gem_wait
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89494
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[Jani: fixed a checkpatch complaint about whitespace.]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# e8dec1dd 27-Feb-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Clarify obj->map_and_fenceable

For an object right on the boundary of mappable space, as the fenceable
size is stricly greater than the actual size, its fence region may extend
out of mappable space.

Note that only pnv/g33 has fence_size > obj.size and an unmappable
range in the gtt, and there alignment constraints prevent bad things
from happening.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Clarify why this shouldn't change anything as per the
discussion on intel-gfx.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1d03184c 22-Feb-2015 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Remove DRIVER_MODESET checks from gem code

Hooray!

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 98e1bd4a 13-Feb-2015 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Cache ringbuf pointer in request structure

In execlist mode, the ringbuf is a function of the ring and context whereas in
legacy mode, it is derived from the ring alone. Thus the calculation required to
determine the ringbuf pointer from the ring (and context) also needs to test
execlist mode or not. This is messy.

Further, the request structure holds a pointer to both the ring and the context
for which it was created. Thus, given a request, it is possible to derive the
ringbuf in either legacy or execlist mode. Hence it is necessary to pass just
the request in to all the low level functions rather than some combination of
request, ring, context and ringbuf. However, rather than recalculating it each
time, it is much simpler to just cache the ringbuf pointer in the request
structure itself.

Caching the pointer means the calculation is done once at request creation time
and all further code and simply read it directly from the request structure.

OTC-Jira: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
[danvet: Drop contentless comment in lrc alloc request entirely. And
spelling fix in the commit message.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b3a38998 19-Feb-2015 Nick Hoath <nicholas.hoath@intel.com>

drm/i915: Fix a use after free, and unbalanced refcounting

When converting from implicitly tracked execlist queue items to ref counted
requests, not all frees of requests were replaced with unrefs, and extraneous
refs/unrefs of contexts were added.
Correct the unbalanced refcount & replace the frees.
Remove a noisy warning when hitting the request creation path.

drm_i915_gem_request and intel_context are both kref reference counted
structures. Upon allocation, drm_i915_gem_request's ref count should be
bumped using kref_init. When a context is assigned to the request,
the context's reference count should be bumped using i915_gem_context_reference.
i915_gem_request_reference will reduce the context reference count when
the request is freed.

Problem introduced in
commit 6d3d8274bc45de4babb62d64562d92af984dd238
Author: Nick Hoath <nicholas.hoath@intel.com>
AuthorDate: Thu Jan 15 13:10:39 2015 +0000

drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request

v2: Added comments explaining how the ctx pointer and the request object should
be ref-counted. Removed noisy warning.

v3: Cleaned up the language used in the commit & the header
description (Thanks David Gordon)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88652
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 071c92de 12-Feb-2015 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Add process identifier to requests

We use the pid of the process which opened our device when
we track which was the culprit of the gpu hang. But as that
file descriptor might get inherited, we might blame the
wrong process when we record the error state.

Track process identifiers in requests to always find
the correct offender.

v2: Track only user processes (Chris)

Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: drop NULL check before put_pid as suggested by Chris.]
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# eb82289a 10-Feb-2015 Yu Zhang <yu.c.zhang@linux.intel.com>

drm/i915: Partition the fence registers for vGPU in i915 driver

With Intel GVT-g, the fence registers are partitioned by multiple
vGPU instances in different VMs. Routine i915_gem_load() is modified
to reset the num_fence_regs, when the driver detects it's running in
a VM. Accesses to the fence registers from vGPU will be trapped and
remapped by the host side. And the allocated fence number is provided
in PV INFO page structure. By now, the value of fence number is fixed,
but in the future we can relax this limitation, to allocate the fence
registers dynamically from host side.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Jike Song <jike.song@intel.com>
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a5094051 26-Jan-2015 Chris Wilson <chris@chris-wilson.co.uk>

Revert "drm/i915: Fix mutex->owner inspection race under DEBUG_MUTEXES"

The core fix was applied in

commit a63b03e2d2477586440741677ecac45bcf28d7b1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Jan 6 10:29:35 2015 +0000

mutex: Always clear owner field upon mutex_unlock()

(note the absence of stable@ tag)

so we can now revert our band-aid commit 226e5ae9e5f910 for -next.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 737b1506 26-Jan-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Convert hangcheck from a timer into a delayed work item

When run as a timer, i915_hangcheck_elapsed() must adhere to all the
rules of running in a softirq context. This is advantageous to us as we
want to minimise the risk that a driver bug will prevent us from
detecting a hung GPU. However, that is irrelevant if the driver bug
prevents us from resetting and recovering. Still it is prudent not to
rely on mutexes inside the checker, but given the coarseness of
dev->struct_mutex doing so is extremely hard.

Give in and run from a work queue, i.e. outside of softirq.

v2: Use own workqueue to avoid deadlocks (Daniel)
Cleanup commit msg and add comment to i915_queue_hangcheck() (Chris)

Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Daniel Vetter <dnaiel.vetter@ffwll.chm>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: Remove accidental kerneldoc comment starter, to appease the 0
day builder.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e62b59e4 21-Jan-2015 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Simplify flush_cpu_write_domain

We can push down the decision whether to force flushing into the
implementation since in all places that matter obj->pin_display is
accurate already. The only place where the optimization really matters
is the sw_finish_ioctl, and that already checks for obj->pin_display
on its own.

I suspect that this was simply an artifact of how

commit 2c22569bba8af6c2976d5f9479fe54a53a39966b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Aug 9 12:26:45 2013 +0100

drm/i915: Update rules for writing through the LLC with the cpu

evolved - only v2 added the pin_display tracking.

Note that we still retain the gist of this logic from the above commit
with the explicit force argument for the low-level clflush function.

Ville noted in his review that there's a slight behavioural change in
the set_to_gtt_domain function, which now also will flush display
plane data. This opens-open the potential for userspace to start doing
buggy things by omitting the sw_finish_ioctl, which is why I've
rejected a functional equivalent patch from Ville a while ago:

http://lists.freedesktop.org/archives/intel-gfx/2013-November/036421.html

But on second consideration it's not that evil, and in any case the
justification here is more clarity, not allowing crazy userspace.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 0f71979a 13-Jan-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Performed deferred clflush inside set-cache-level

Currently we are hitting the WARN inside
i915_gem_object_set_cache_level() as we can now have an unbound object
in the GTT write domain (due to 43566dedde54f9 "drm/i915: Broaden
application of set-domain(GTT)"). To avoid the warning, we need to track
when we elided the clflush on a cacheable object and then evict the
cache for the object when we move the object out of a cacheable domain.

Reported-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Jani Nikula <jani.nikula@intel.com>
Testcase: igt/gem_mmap_wc/set-cache-level
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88607
Tested-by: huax.lu@intel.com
[danvet: Split if into nested if as discussion on the m-l.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1197b4f2 13-Jan-2015 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Balance context pinning on reset cleanup

We pin when we submit to execlist queue. Balance
the pinning when the submitted queue is cleaned on reset.

Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6d3d8274 15-Jan-2015 Nick Hoath <nicholas.hoath@intel.com>

drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request

Move all remaining elements that were unique to execlists queue items
in to the associated request.

Issue: VIZ-4274

v2: Rebase. Fixed issue of overzealous freeing of request.
v3: Removed re-addition of cleanup work queue (found by Daniel Vetter)
v4: Rebase.
v5: Actual removal of intel_ctx_submit_request. Update both tail and postfix
pointer in __i915_add_request (found by Thomas Daniel)
v6: Removed unrelated changes

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
[danvet: Reformat comment with strange linebreaks.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 21076372 15-Jan-2015 Nick Hoath <nicholas.hoath@intel.com>

drm/i915: Remove FIXME_lrc_ctx backpointer

The first pass implementation of execlists required a backpointer to the context to be held
in the intel_ringbuffer. However the context pointer is available higher in the call stack.
Remove the backpointer from the ring buffer structure and instead pass it down through the
call stack.

v2: Integrate this changeset with the removal of duplicate request/execlist queue item members.
v3: Rebase
v4: Rebase. Remove passing of context when the request is passed.

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 72f95afa 15-Jan-2015 Nick Hoath <nicholas.hoath@intel.com>

drm/i915: Removed duplicate members from submit_request

Where there were duplicate variables for the tail, context and ring (engine)
in the gem request and the execlist queue item, use the one from the request
and remove the duplicate from the execlist queue item.

Issue: VIZ-4274

v1: Rebase
v2: Fixed build issues. Keep separate postfix & tail pointers as these are
used in different ways. Reinserted missing full tail pointer update.

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# af1a7301 18-Dec-2014 Bob Paauwe <bob.j.paauwe@intel.com>

drm/i915: Only fence tiled region of object.

When creating a fence for a tiled object, only fence the area that
makes up the actual tiles. The object may be larger than the tiled
area and if we allow those extra addresses to be fenced, they'll
get converted to addresses beyond where the object is mapped. This
opens up the possiblity of writes beyond the end of object.

To prevent this, we adjust the size of the fence to only encompass
the area that makes up the actual tiles. The extra space is considered
un-tiled and now behaves as if it was a linear object.

Testcase: igt/gem_tiled_fence_overflow
Reported-by: Dan Hettena <danh@ghs.com>
Signed-off-by: Bob Paauwe <bob.j.paauwe@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# f48a0165 20-Jan-2015 David Woodhouse <dwmw2@infradead.org>

drm/i915: Init PPGTT before context enable

Commit 82460d972 ("drm/i915: Rework ppgtt init to no require an aliasing
ppgtt") introduced a regression on Broadwell, triggering the following
IOMMU fault at startup:

vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
dmar: DRHD: handling fault status reg 2
dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr 880000
DMAR:[fault reason 23] Unknown
fbcon: inteldrmfb (fb0) is primary device

Further commentary from Daniel:

I sugggested this change to David after staring at the offending patch
for a while. I have no idea and theory whatsoever why this would upset
the gpu less than the other way round. But it seems to work. David
promised to chase hw people a bit more to get a more meaningful answer.

Wrt the comment that this deletes: I've done some digging and afaict
loading context before ppgtt enable was once required before our recent
restructuring of the context/ppgtt init code: Before that context sw
setup (i.e. allocating the default context) and hw setup was smashed
together. Also the setup of the default context was the bit that
actually allocated the aliasing ppgtt structures. Which is the reason
for the context before ppgtt depency.

Or was, since with all the untangling there's no no real depency any
more (functional, who knows what the hw is doing), so the comment is
just stale.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 226e5ae9 02-Jan-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix mutex->owner inspection race under DEBUG_MUTEXES

If CONFIG_DEBUG_MUTEXES is set, the mutex->owner field is only cleared
if the mutex debugging is enabled which introduces a race in our
mutex_is_locked_by() - i.e. we may inspect the old owner value before it
is acquired by the new task.

This is the root cause of this error:

diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
index 5cf6731..3ef3736 100644
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -80,13 +80,13 @@ void debug_mutex_unlock(struct mutex *lock)
DEBUG_LOCKS_WARN_ON(lock->owner != current);

DEBUG_LOCKS_WARN_ON(!lock->wait_list.prev && !lock->wait_list.next);
- mutex_clear_owner(lock);
}

/*
* __mutex_slowpath_needs_to_unlock() is explicitly 0 for debug
* mutexes so that we can do it here after we've verified state.
*/
+ mutex_clear_owner(lock);
atomic_set(&lock->count, 1);
}

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87955
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 676fa572 24-Dec-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move the ban period onto the context

This will allow us to set per-file, or even per-context, periods in the
future.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1816f923 02-Jan-2015 Akash Goel <akash.goel@intel.com>

drm/i915: Support creation of unbound wc user mappings for objects

This patch provides support to create write-combining virtual mappings of
GEM object. It intends to provide the same funtionality of 'mmap_gtt'
interface without the constraints and contention of a limited aperture
space, but requires clients handles the linear to tile conversion on their
own. This is for improving the CPU write operation performance, as with such
mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
flush after update from CPU side, when object is passed onto GPU. This
type of mapping is specially useful in case of sub-region update,
i.e. when only a portion of the object is to be updated. Using a CPU mmap
in such cases would normally incur a clflush of the whole object, and
using a GTT mmapping would likely require eviction of an active object or
fence and thus stall. The write-combining CPU mmap avoids both.

To ensure the cache coherency, before using this mapping, the GTT domain
has been reused here. This provides the required cache flush if the object
is in CPU domain or synchronization against the concurrent rendering.
Although the access through an uncached mmap should automatically
invalidate the cache lines, this may not be true for non-temporal write
instructions and also not all pages of the object may be updated at any
given point of time through this mapping. Having a call to get_pages in
set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
Broaden application of set-domain(GTT)', would guarantee the clflush and
so there will be no cachelines holding the data for the object before it
is accessed through this map.

The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_MMAP_VERSION has been added for versioning the ioctl interface.

v2: Fix error handling, invalid flag detection, renaming (ickle)

v3: Rebase to latest drm-intel-nightly codebase

The new mmapping is exercised by igt/gem_mmap_wc,
igt/gem_concurrent_blit and igt/gem_gtt_speed.

Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 43566ded 02-Jan-2015 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Broaden application of set-domain(GTT)

Previously, this was restricted to only operate on bound objects - to
make pointer access through the GTT to the object coherent with writes
to and from the GPU. A second usecase is drm_intel_bo_wait_rendering()
which at present does not function unless the object also happens to
be bound into the GGTT (on current systems that is becoming increasingly
rare, especially for the typical requests from mesa). A third usecase is
a future patch wishing to extend the coverage of the GTT domain to
include objects not bound into the GGTT but still in its coherent cache
domain. For the latter pair of requests, we need to operate on the
object regardless of its bind state.

v2: After discussion with Akash, we came to the conclusion that the
get-pages was required in order for accurate domain tracking in the
corner cases (like the shrinker) and also useful for ensuring memory
coherency with earlier cached CPU mmaps in case userspace uses exotic
cache bypass (non-temporal) instructions.

v3: Fix the inactive object check.

v4: Rebase to latest drm-intel-nightly codebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# da6b51d0 23-Dec-2014 Dave Airlie <airlied@redhat.com>

Revert "drm/gem: Warn on illegal use of the dumb buffer interface v2"

This reverts commit 355a70183848f21198e9f6296bd646df3478a26d.

This had some bad side effects under normal operation, and should
have been dropped earlier.

Signed-off-by: Dave Airlie <airlied@redhat.com>


# 5d77d9c5 12-Nov-2014 Imre Deak <imre.deak@intel.com>

drm/i915: add missing rpm ref to i915_gem_pwrite_ioctl

Without this RPM ref we can hit the device suspended WARN via:
i915_gem_object_pin()->ggtt_bind_vma->gen6_ggtt_insert_entries(). I
noticed this on my BYT while keeping the i915 device in runtime
suspended state for a while. I chose this place to take the ref to
avoid the possible deadlock via the mutex_lock taken both later in this
function and in the runtime suspend handler. This can happen if an RPM
suspend event is queued and need to be flushed before taking the RPM
ref.

Testcase: igt/pm_rpm/gem-evict-pwrite
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87363
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# ce38ab05 04-Dec-2014 Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/i915: Organize Fence registers for future enablement.

Let's be optimistic that for future platforms this will remain the same
and reorg a bit.
This reorg in if blocks instead of switch make life easier for future
platform support addition.

v2: Jani pointed out I was missing reg_830 for some gen3 platforms. So let's make
this platforms subcases of Gen checks.

Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 78a42377 11-Dec-2014 Brad Volkin <bradley.d.volkin@intel.com>

drm/i915: Use batch pools with the command parser

This patch sets up all of the tracking and copying necessary to
use batch pools with the command parser and dispatches the copied
(shadow) batch to the hardware.

After this patch, the parser is in 'enabling' mode.

Note that performance takes a hit from the copy in some cases
and will likely need some work. At a rough pass, the memcpy
appears to be the bottleneck. Without having done a deeper
analysis, two ideas that come to mind are:
1) Copy sections of the batch at a time, as they are reached
by parsing. Might improve cache locality.
2) Copy only up to the userspace-supplied batch length and
memset the rest of the buffer. Reduces the number of reads.

v2:
- Remove setting the capacity of the pool
- One global pool instead of per-ring pools
- Replace batch_obj with shadow_batch_obj and hook into eb->vmas
- Memset any space in the shadow batch beyond what gets copied
- Rebased on execlist prep refactoring

v3:
- Rebase on chained batch handling
- Squash in setting the secure dispatch flag
- Add a note about the interaction w/secure dispatch pinning
- Check for request->batch_obj == NULL in i915_gem_free_request

v4:
- Fix read domains for shadow_batch_obj
- Remove the set_to_gtt_domain call from i915_parse_cmds
- ggtt_pin/unpin in the parser block to simplify error handling
- Check USES_FULL_PPGTT before setting DISPATCH_SECURE flag
- Remove i915_gem_batch_pool_put calls

v5:
- Move 'pending_read_domains |= I915_GEM_DOMAIN_COMMAND' after
the parser (danvet, from v4 0/7 feedback)

Issue: VIZ-4719
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-By: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 493018dc 11-Dec-2014 Brad Volkin <bradley.d.volkin@intel.com>

drm/i915: Implement a framework for batch buffer pools

This adds a small module for managing a pool of batch buffers.
The only current use case is for the command parser, as described
in the kerneldoc in the patch. The code is simple, but separating
it out makes it easier to change the underlying algorithms and to
extend to future use cases should they arise.

The interface is simple: init to create an empty pool, fini to
clean it up, get to obtain a new buffer. Note that all buffers are
expected to be inactive before cleaning up the pool.

Locking is currently based on the caller holding the struct_mutex.
We already do that in the places where we will use the batch pool
for the command parser.

v2:
- s/BUG_ON/WARN_ON/ for locking assertions
- Remove the cap on pool size
- Switch from alloc/free to init/fini

v3:
- Idiomatic looping structure in _fini
- Correct handling of purged objects
- Don't return a buffer that's too much larger than needed

v4:
- Rebased to latest -nightly

v5:
- Remove _put() function and clean up comments to match

v6:
- Move purged check inside the loop (danvet, from v4 1/7 feedback)

v7:
- Use single list instead of two. (Chris W)
- s/active_list/cache_list
- Squashed in debug patches (Chris W)
drm/i915: Add a batch pool debugfs file

It provides some useful information about the buffers in
the global command parser batch pool.

v2: rebase on global pool instead of per-ring pools
v3: rebase

drm/i915: Add batch pool details to i915_gem_objects debugfs

To better account for the potentially large memory consumption
of the batch pool.

v8:
- Keep cache in LRU order (danvet, from v6 1/5 feedback)

Issue: VIZ-4719
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-By: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# fe14d5f4 10-Dec-2014 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Infrastructure for supporting different GGTT views per object

Things like reliable GGTT mappings and mirrored 2d-on-3d display will need
to map objects into the same address space multiple times.

Added a GGTT view concept and linked it with the VMA to distinguish between
multiple instances per address space.

New objects and GEM functions which do not take this new view as a parameter
assume the default of zero (I915_GGTT_VIEW_NORMAL) which preserves the
previous behaviour.

This now means that objects can have multiple VMA entries so the code which
assumed there will only be one also had to be modified.

Alternative GGTT views are supposed to borrow DMA addresses from obj->pages
which is DMA mapped on first VMA instantiation and unmapped on the last one
going away.

v2:
* Removed per view special casing in i915_gem_ggtt_prepare /
finish_object in favour of creating and destroying DMA mappings
on first VMA instantiation and last VMA destruction. (Daniel Vetter)
* Simplified i915_vma_unbind which does not need to count the GGTT views.
(Daniel Vetter)
* Also moved obj->map_and_fenceable reset under the same check.
* Checkpatch cleanups.

v3:
* Only retire objects once the last VMA is unbound.

v4:
* Keep scatter-gather table for alternative views persistent for the
lifetime of the VMA.
* Propagate binding errors to callers and handle appropriately.

v5:
* Explicitly look for normal GGTT view in i915_gem_obj_bound to align
usage in i915_gem_object_ggtt_unpin. (Michel Thierry)
* Change to single if statement in i915_gem_obj_to_ggtt. (Michel Thierry)
* Removed stray semi-colon in i915_gem_object_set_cache_level.

For: VIZ-4544
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
[danvet: Drop hunk from i915_gem_shrink since it's just prettification
but upsets a __must_check warning.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 7bcc3777 05-Dec-2014 Jani Nikula <jani.nikula@intel.com>

drm/i915: release struct_mutex on the i915_gem_init_hw fail path

Release struct_mutex if init_rings() fails.

This is a regression introduced in
commit 35a57ffbb10840af219eeaf64718434242bb7c76
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Thu Nov 20 00:33:07 2014 +0100

drm/i915: Only init engines once

Reported-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 9cca3068 28-Nov-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Handle inaccurate time conversion issues

So apparently jiffies<->nsec<->ktime isn't accurate or something. At
elast if we timeout there's occasionally still a few hundred us left
(in a 2 second timeout).

Stuff I've tried and thrown out again:
- Sampling the before timestamp before jiffies. Doesn't improve test
path rate at all.
- Using jiffies. Way to inaccurate, which means way too much drift
with signals plus automatic ioctl restarting in userspace. In
hindsight we should have used an absolute timeout, but hey we need
something for v3 of the i915 gem wait interfaces ;-)
- Trying to figure out where accuracy gets lost. gl testcase really
don't care all that much about this (as long as isn't not massively
off), it's just that the testcase gets a bit upset if it receives an
EITME with timeout > 0.

So as long as we're in the ballbark it's good enough. So patch
everything up if we're at most one jiffies off. I get's me a solid
test again.

This regression is probably introduced in

commit 5ed0bdf21a85d78e04f89f15ccf227562177cbd9
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Wed Jul 16 21:05:06 2014 +0000

drm: i915: Use nsec based interfaces

Use ktime_get_raw_ns() and get rid of the back and forth timespec
conversions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: John Stultz <john.stultz@linaro.org>

Probably because I'm too lazy to confirm myself and still waiting for
QA ;-)

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82749
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 7bd0e226 04-Dec-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: compute wait_ioctl timeout correctly

We've lost the +1 required for correct timeouts in

commit 5ed0bdf21a85d78e04f89f15ccf227562177cbd9
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Wed Jul 16 21:05:06 2014 +0000

drm: i915: Use nsec based interfaces

Use ktime_get_raw_ns() and get rid of the back and forth timespec
conversions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: John Stultz <john.stultz@linaro.org>

So fix this up by reinstating our handrolled _timeout function. While
at it bother with handling MAX_JIFFIES.

v2: Convert to usecs (we don't care about the accuracy anyway) first
to avoid overflow issues Dave Gordon spotted.

v3: Drop the explicit MAX_JIFFY_OFFSET check, usecs_to_jiffies should
take care of that already. It might be a bit too enthusiastic about it
though.

v4: Chris has a much nicer color, so use his implementation.

This requires to export nsec_to_jiffies from time.c.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82749
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# f7635669 03-Dec-2014 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Stop putting GGTT VMA at the head of the list

Multiple GGTT VMAs per object will be introduced in the near future which will
make it impossible to guarantee normal GGTT view is at the head of the list.

Purpose of this patch is to break this assumption straight away so any
potential hidden assumptions in the code base can be bisected to this
simple patch.

For: VIZ-4544
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d5abdfda 20-Nov-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Move init_unused_rings to gem_init_hw

We need to do that every time we resume the rings, not just at load.
I've overlooked this in my untangling of the ring init code.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 35a57ffb 19-Nov-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Only init engines once

We can do this.

And now there's finally the clean split between software setup and
hardware setup I kinda wanted since multi-ring support was merged
aeons ago. It only took almost 5 years.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 581c26e8 24-Nov-2014 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Convert 'trace_irq' to use requests rather than seqnos

Updated the trace_irq code to use requests instead of seqnos. This includes
reference counting the request object to ensure it sticks around when required.
Note that getting access to the reference counting functions means moving the
inline i915_trace_irq_get() function from intel_ringbuffer.h to i915_drv.h.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
[danvet: Resolve conflict due to shuffled merge order.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 41c52415 24-Nov-2014 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Remove the now redundant 'obj->ring'

The ring member of the object structure was always updated with the
last_read_seqno member. Thus with the conversion to last_read_req, obj->ring is
now a direct copy of obj->last_read_req->ring. This makes it somewhat redundant
and potentially misleading (especially as there was no comment to explain its
purpose).

This checkin removes the redundant field. Many uses were simply testing for
non-null to see if the object is active on the GPU. Some of these have been
converted to check 'obj->active' instead. Others (where the last_read_req is
about to be used anyway) have been changed to check obj->last_read_req. The rest
simply pull the ring out from the request structure and proceed as before.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1b5a433a 24-Nov-2014 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Convert 'i915_seqno_passed' calls into 'i915_gem_request_completed'

Almost everywhere that caled i915_seqno_passed() was really asking 'has the
given seqno popped out of the hardware yet?'. Thus it had to query the current
hardware seqno and then do a signed delta comparison (which copes with wrapping
around zero but not with seqno values more than 2GB apart, although the latter
is unlikely!).

Now that the majority of seqno instances have been replaced with request
structures, it is possible to convert this test to be request based as well.
There is now a 'i915_gem_request_completed()' function which takes a request and
returns true or false as appropriate. Note that this currently just wraps up the
original _passed() test but a later patch in the series will reduce this to
simply returning a cached internal value, i.e.:
_completed(req) { return req->completed; }'

This checkin converts almost all _seqno_passed() calls. The only one left is in
the semaphore code which still requires seqnos not request structures.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
[danvet: Drop hunk touching the trace_irq code since I've dropped the
patch which converts that, and resolve resulting conflict.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ff79e857 24-Nov-2014 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Connect requests to rings at creation not submission

It makes a lot more sense (and makes future seqno -> request conversion patches
simpler) to fill in the 'ring' field of the request structure at the point of
creation rather than submission. Given that the request structure is assigned by
ring specific code and thus is locked to a ring from the start, there really is
no reason to defer this assignment.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 74328ee5 24-Nov-2014 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Convert trace functions from seqno to request

All the code above is now using requests not seqnos so it is possible to convert
the trace functions across. Note that rather than get into problematic reference
counting issues, the trace code only saves the seqno and ring values from the
request structure not the structure pointer itself.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 9400ae5c 24-Nov-2014 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Remove obsolete seqno parameter from 'i915_add_request'

There is no longer any need to retrieve a seqno value from an i915_add_request()
call. The calling code already knows which request structure is being processed
(it can only be ring->OLR). And as the request itself is now used in preference
to the basic seqno value, the latter is now redundant in this situation.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 9c654818 24-Nov-2014 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Convert __wait_seqno() to __wait_request()

Now that all code above is using request structures instead of seqno values, it
is possible to convert __wait_seqno() itself. Internally, it is still calling
i915_seqno_passed(), this will be updated later in the series. This step is just
changing the parameter list and function name.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a4b3a571 26-Nov-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Convert i915_wait_seqno to i915_wait_request

Updated i915_wait_seqno() to take a request structure instead of a seqno value
and renamed it accordingly. Internally, it just pulls the seqno out of the
request and calls on to __wait_seqno() as before. However, all the code further
up the stack is now simplified as it can just pass the request object straight
through without having to peek inside.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
[danvet: Squash in hunk from an earlier patch which was rebased
wrongly.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b6660d59 24-Nov-2014 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Make 'i915_gem_check_olr' actually check by request not seqno

Updated the _check_olr() function to actually take a request object and compare
it to the OLR rather than extracting seqnos and comparing those.

Note that there is one use case where the request object being processed is no
longer available at that point in the call stack. Hence a temporary copy of the
original function is still present (but called _check_ols() instead). This will
be removed in a subsequent patch.

Also, downgraded a BUG_ON to a WARN_ON as apparently the former is frowned upon
for shipping code.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6259cead 24-Nov-2014 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Remove 'outstanding_lazy_seqno'

The OLS value is now obsolete. Exactly the same value is guarateed to be always
available as PLR->seqno. Thus it is safe to remove the OLS completely. And also
to rename the PLR to OLR to keep the 'outstanding lazy ...' naming convention
valid.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ff865885 24-Nov-2014 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Ensure requests stick around during waits

Added reference counting of the request structure around __wait_seqno() calls.
This is a precursor to updating the wait code itself to take the request rather
than a seqno. At that point, it would be a Bad Idea for a request object to be
retired and freed while the wait code is still using it.

v3:

Note that even though the mutex lock is held during a call to i915_wait_seqno(),
it is still necessary to explicitly bump the reference count. It appears that
the shrinker can asynchronously retire items even though the mutex is locked.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
[danvet: Remove wrongly squashed hunk which breaks the build.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 54fb2411 24-Nov-2014 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Convert i915_gem_ring_throttle to use requests

Convert the throttle code to use the request structure rather than extracting a
ring/seqno pair from it and using those. This is in preparation for
__wait_seqno() becoming __wait_request().

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 97b2a6a1 24-Nov-2014 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Replace last_[rwf]_seqno with last_[rwf]_req

The object structure contains the last read, write and fenced seqno values for
use in syncrhonisation operations. These have now been replaced with their
request structure counterparts.

Note that to ensure that objects do not end up with dangling pointers, the
assignments of last_*_req include reference count updates. Thus a request cannot
be freed if an object is still hanging on to it for any reason.

v2: Corrected 'last_rendering_' to 'last_read_' in a number of comments that did
not get updated when 'last_rendering_seqno' became 'last_read|write_seqno'
several millenia ago.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# abfe262a 24-Nov-2014 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Add reference count to request structure

The plan is to use request structures everywhere that seqno values were
previously used. This means saving pointers to structures in places that used to
be simple integers. In turn, that means that the target structure now needs much
more stringent lifetime tracking. That is, it must not be freed while some other
random object still holds a pointer to it.

To achieve this tracking, a reference count needs to be added. Whenever a
pointer to the structure is saved away, the count must be incremented and the
free must only occur when all references have been released.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# bdcf120b 25-Nov-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Assert that we successfully downclock the GPU before suspend

Before suspending, we wait upon the outstanding GPU requests and flush
our pending idle handlers. This should downclock the GPU to its lowest
power state. Add a WARN to check that the delayed tasks were run and did
their job properly.

Suggested-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4feb7659 24-Nov-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Remove user pinning code

Now unused.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>


# 0794aed3 25-Nov-2014 Thomas Daniel <thomas.daniel@intel.com>

drm/i915: Fix context object leak for legacy contexts

Dynamic context pinning for LRCs introduced a leak in legacy mode.
Reinstate context unreference in i915_gem_free_request for legacy contexts.

Leak reported by i-g-t/drv_module_reload fixed by this patch.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86507
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: John Harrison<John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d472fcc8 24-Nov-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Disallow pin ioctl completely for kms drivers

The problem here is that SNA pins batchbuffers to etch out a bit more
performance. Iirc it started out as a w/a for i830M (which we've
implemented in the kernel since a long time already). The problem is
that the pin ioctl wasn't added in

commit d23db88c3ab233daed18709e3a24d6c95344117f
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri May 23 08:48:08 2014 +0200

drm/i915: Prevent negative relocation deltas from wrapping

Fix this by simply disallowing pinning from userspace so that the
kernel is in full control of batch placement again. Especially since
distros are moving towards running X as non-root, so most users won't
even be able to see any benefits.

UMS support is dead now, but we need this minimal patch for
backporting. Follow-up patch will remove the pin ioctl code
completely.

Note to backporters: You must have both

commit b45305fce5bb1abec263fcff9d81ebecd6306ede
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Mon Dec 17 16:21:27 2012 +0100

drm/i915: Implement workaround for broken CS tlb on i830/845

which laned in 3.8 and

commit c4d69da167fa967749aeb70bc0e94a457e5d00c1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Sep 8 14:25:41 2014 +0100

drm/i915: Evict CS TLBs between batches

which is also marked cc: stable. Otherwise this could introduce a
regression by disabling the userspace w/a without the kernel w/a being
fully functional on i830/45.

References: https://bugs.freedesktop.org/show_bug.cgi?id=76554#c116
Cc: stable@vger.kernel.org # requires c4d69da167fa967749a and v3.8
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>


# 355a7018 20-Nov-2014 Thomas Hellstrom <thellstrom@vmware.com>

drm/gem: Warn on illegal use of the dumb buffer interface v2

It happens on occasion that developers of generic user-space applications
abuse the dumb buffer API to get hold of drm buffers that they can both
mmap() and use for GPU acceleration, using the assumptions that dumb buffers
and buffers available for GPU are
a) The same type and can be aribtrarily type-casted.
b) fully coherent.

This patch makes the most widely used drivers warn nicely when that happens,
the next step will be to fail.

v2: Move drmP.h changes to drm_gem.h. Fix Radeon dumb mmap breakage.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 656bfa3a 20-Nov-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Pin tiled objects for L-shaped configs

Let's just throw in the towel on this one and take the cheap way out.

Based on a patch from Chris Wilson, but checking for a different bit.
Chris' patch checked for even bank layout, this one here for a magic
bit. Given the evidence we've gathered (not much) both work I think,
but checking for the magic bit might be more accurate.

Anyway, works on my gm45 here.

For paranoi restrict to gen4 (and mobile), since we've only ever seen
this on gm45 and i965gm.

Also add some debugfs output so that we can skip the tiled swapping
tests properly in these cases.

v2: Clean up the quirk'ed pin count in free_object to avoid upsetting
the WARN_ON. Spotted by Chris.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28813
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45092
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f548c0e9 19-Nov-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Can i915_gem_init_ioctl

Found one more!

With this we can clear up the ggtt init code a bit, yay!

Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>


# 377e91b2 19-Nov-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Sanitize ->lastclose

With this all the ums nonsense around gem setup/teardown has
disappeared, yay!

Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>


# 87255483 19-Nov-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Ditch dev_priv->ums.mm_suspend

Again just complicates gem init functions and makes a general mess out
of everything.

Good riddance!

v2: In my enthusiasm to start removing dri1/ums crud I went overboard a
bit and killed parts of hangcheck. Resurrect it.

Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>


# 71b14ab6 19-Nov-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: No-Op enter/leave vt gem ioctl

We've killed ums support by now, it's time to reap the benefits. This
one here is getting in the way of doing some ring init cleanup.

Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>


# 5c6c6003 06-Sep-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove DRI1 ring accessors and API

With the deprecation of UMS, and by association DRI1, we have a tough
choice when updating the ring access routines. We either rewrite the
DRI1 routines blindly without testing (so likely to be broken) or take
the liberty of declaring them no longer supported and remove them
entirely. This takes the latter approach.

v2: Also remove the DRI1 sarea updates

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Fix rebase conflicts.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# dcb4c12a 13-Nov-2014 Oscar Mateo <oscar.mateo@intel.com>

drm/i915/bdw: Pin the context backing objects to GGTT on-demand

Up until now, we have pinned every logical ring context backing object
during creation, and left it pinned until destruction. This made my life
easier, but it's a harmful thing to do, because we cause fragmentation
of the GGTT (and, eventually, we would run out of space).

This patch makes the pinning on-demand: the backing objects of the two
contexts that are written to the ELSP are pinned right before submission
and unpinned once the hardware is done with them. The only context that
is still pinned regardless is the global default one, so that the HWS can
still be accessed in the same way (ring->status_page).

v2: In the early version of this patch, we were pinning the context as
we put it into the ELSP: on the one hand, this is very efficient because
only a maximum two contexts are pinned at any given time, but on the other
hand, we cannot really pin in interrupt time :(

v3: Use a mutex rather than atomic_t to protect pin count to avoid races.
Do not unpin default context in free_request.

v4: Break out pin and unpin into functions. Fix style problems reported
by checkpatch

v5: Remove unpin_lock as all pinning and unpinning is done with the struct
mutex already locked. Add WARN_ONs to make sure this is the case in future.

Issue: VIZ-4277
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Akash Goel <akash.goels@gmail.com>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c86ee3a9 13-Nov-2014 Thomas Daniel <thomas.daniel@intel.com>

drm/i915/bdw: Clean up execlist queue items in retire_work

No longer create a work item to clean each execlist queue item.
Instead, move retired execlist requests to a queue and clean up the
items during retire_requests.

v2: Fix legacy ring path broken during overzealous cleanup

v3: Update idle detection to take execlists queue into account

v4: Grab execlist lock when checking queue state

v5: Fix leaking requests by freeing in execlists_retire_requests.

Issue: VIZ-4274
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
Reviewed-by: Akash Goel <akash.goels@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6a2c4232 04-Nov-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make the physical object coherent with GTT

Currently objects for which the hardware needs a contiguous physical
address are allocated a shadow backing storage to satisfy the contraint.
This shadow buffer is not wired into the normal obj->pages and so the
physical object is incoherent with accesses via the GPU, GTT and CPU. By
setting up the appropriate scatter-gather table, we can allow userspace
to access the physical object via either a GTT mmaping of or by rendering
into the GEM bo. However, keeping the CPU mmap of the shmemfs backing
storage coherent with the contiguous shadow is not yet possible.
Fortuituously, CPU mmaps of objects requiring physical addresses are not
expected to be coherent anyway.

This allows the physical constraint of the GEM object to be transparent
to userspace and allow it to efficiently render into or update them via
the GTT and GPU.

v2: Fix leak of pci handle spotted by Ville
v3: Remove the now duplicate call to detach_phys_object during free.
v4: Wait for rendering before pwrite. As this patch makes it possible to
render into the phys object, we should make it correct as well!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 16e9a21f 06-Nov-2014 Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>

drm/i915: Make __wait_seqno non-static and rename to __i915_wait_seqno

So that it can be used by the flip code to wait for rendering without
holding any locks.

Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c826c449 31-Oct-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Request PIN_GLOBAL when pinning a vma for GTT relocations

Always require PIN_GLOBAL when we want a mappable offset (PIN_MAPPABLE).
This causes the pin to fixup the global binding in cases were the vma
was already bound (and due to the proceeding bug, we considered it to be
already mappable).

References: https://bugs.freedesktop.org/show_bug.cgi?id=85671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add WARN_ON to check that PIN_MAP implies PIN_GLOBAL as
discussed on irc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ef79e17c 31-Oct-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only mark as map-and-fenceable when bound into the GGTT

We use the obj->map_and_fenceable hint for when we already have a
valid mapping of this object in the aperture. This hint can only apply
to the GGTT and not to the aliasing-ppGTT. One user of the hint is
execbuffer relocation, which began to fail when it tried to follow the
hint and perform the relocate through the non-existent GGTT mapping.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8e639549 30-Oct-2014 John Harrison <John.C.Harrison@Intel.com>

drm/i915: Remove redundant parameter to i915_gem_object_wait_rendering__tail()

An earlier commit (c8725f3dc0911d4354315a65150aecd8b7d0d74a: Do not call
retire_requests from wait_for_rendering) removed the use of the ring parameter
within wait_rendering__tail() but did not remove the parameter itself. As the
plan is to remove obj->ring which is where this parameter comes from, it is
simpler to just remove the parameter completely than to update it with a new
source.

For: VIZ-4377
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
CC: Chris Wilson <chris@chris-wilson.co.uk>
CC: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# aff43766 23-Oct-2014 Tvrtko Ursulin <tvrtko.ursulin@intel.com>

drm/i915: Move flags describing VMA mappings into the VMA

If these flags are on the object level it will be more difficult to allow
for multiple VMAs per object.

v2: Simplification and cleanup after code review comments (Chris Wilson).

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 11b5d511 29-Sep-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Correctly reject invalid flags for wait_ioctl

Not having checks for this isn't good.

I've checked igt and libdrm and they all already clear flags properly.
So we're lucky and should be able to sneak this ABI clarification in.

Testcase: igt/gem_wait/invalid-flags
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85280
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 34367381 15-Oct-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Document that mmap forwarding is discouraged

Too many new drm driver writers seem to look at i915 for inspiration.
But we have two ways to do mmap, so discourage readers from the old,
ugly version. In a new driver we'd just expose two mmap offsets per
object, one for the gtt map and the other for the cpu map.

v2: Make it clear that i915 does cpu mmaps this way for past
cluelessness^W^W historical reasons. Asked for by Jani.

Cc: "Cheng, Yao" <yao.cheng@intel.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# bb9059d3 08-Oct-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Suppress no action noise from oom shrinker

If we are not able to free anything (the shrinker leaves nothing on the
global object lists), do not log anything. This is useful when other
subsystems are being stress-tested for their oom behaviour and i915.ko
is shouting into the logs about doing nothing.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 005445c5 08-Oct-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Report the current number of bytes freed during oom

The shrinker reports the number of pages freed, but we try to log the
number of bytes - which leads to some nonsense values being reportedly
freed during oom.

Reported-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 60a53727 03-Oct-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove the duplicated logic between the two shrink phases

We can use the same logic to walk the different bound/unbound lists
during shrinker (as the unbound list is a degenerate case of the bound
list), slightly compacting the code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 01209dd5 13-Feb-2013 Damien Lespiau <damien.lespiau@intel.com>

drm/i915/skl: Fence registers on SKL are the same as SNB

v2: Rebased on top of the i915_gpu_error.c extraction.

Reviewed-by: Thomas Wood <thomas.wood@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b680c37a 19-Sep-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: DocBook integration for frontbuffer tracking

I shouldn't ask everyone to do this and fail myself ...

This extracts all the frontbuffer tracking functions into
intel_frontbuffer.c, adds a DOC overview section and also adds the
missing kerneldoc for i915_gem_track_fb and also pulls it into the
same section for convenience.

v2: Don't forget about the header files.

v3: Oops, might check compilation next time around. To make my life
easier drop the increase_pllclock from set_base_atomic since really,
it doesn't matter if you see your Oops or kgdb with a tiny bit of lag.

v4: Try to better explain how to actually use this, requested by Paulo
on irc.

v5: Explain invalidate/flush a bit clearer.

v6: s/business/busyness/

Acked-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Vandana Kannan <vandana.kannan@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>


# be2d599b 10-Sep-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove dead code, i915_gem_verify_gtt

The data structure it was supposed to be sanity checking has long gone.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4144f9b5 11-Sep-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Match GTT space sanity checker with implementation

If we believe that the device can cross cache domains in its prefetcher
(i.e. we allow neighbouring pages in different domains), we don't supply
a color_adjust callback. Use the presence of this callback to better
determine when we should be verifying that the GTT space we just
used is valid.

v2: Remove the superfluous struct drm_device function param as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Also adjust the comment per irc discussion with Chris.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1d1ef21d 09-Sep-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop any active reference before unbinding

Before we process the final unbind on an object and move it to the
unbound list, it is semantically cleaner if there are no more active
references to the object. (An active reference would imply that it was
still being accessed by the GPU after it became inaccessible.) The
caveat is that all callsites must be prepared for the object to
disappeared during the unbind - i.e. they must hold their own reference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 21ab4e74 09-Sep-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Objects on the unbound list may still have an active reference

Due to the lazy retirement semantics, even though we have unbound an
object, it may still hold onto an active reference. So in the debug code,
play safe.

v2: Export i915_gem_shrink() rather than opencoding it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 2232f031 04-Sep-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Fix EIO/wedged handling in gem fault handler

In

commit 1f83fee08d625f8d0130f9fe5ef7b17c2e022f3c
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Thu Nov 15 17:17:22 2012 +0100

drm/i915: clear up wedged transitions

I've accidentally inverted the EIO/wedged handling in the fault
handler: We want to return the EIO as a SIGBUS only if it's not
because of the gpu having died, to prevent userspace from unduly
dying.

In my defence the comment right above is completely misleading, so fix
both.

v2: Drop the WARN_ON, it's not actually a bug to e.g. receive an -EIO
when swap-in fails.

v3: Don't remove too much ... oops.

Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 81e7f200 14-Aug-2014 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Idle unused rings on gen2/3 during init/resume

gen2/3 platforms have a boatload of rings we're not using. On my 830
the BIOS/hw can leave some of those "active" after resume which will
prevent c3 entry. The ring is apparently considered active whenever
head != tail even if the ring is disabled.

Disable and clear all such unused ringbuffers on init/resume.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ecdb5fd8 20-Aug-2014 Thomas Daniel <thomas.daniel@intel.com>

drm/i915/bdw: Don't execute context reset and switch with Execlists

These two functions make no sense in an Logical Ring Context & Execlists
world.

v2: We got rid of lrc_enabled and centralized everything in the sanitized
i915.enable_execlists instead.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>

v3: Rebased. Corrected a typo in comment for i915_switch_context and
added a comment that it should not be called in execlist mode. Added
WARN_ON if i915_switch_context is called in execlist mode. Moved check
for execlist mode out of i915_switch_context and into callers. Added
comment in context_reset explaining why nothing is done in execlist
mode.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
[danvet: Simplify the patch subject so I can understand it.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6689c167 15-Aug-2014 McAulay, Alistair <alistair.mcaulay@intel.com>

drm/i915: Rework GPU reset sequence to match driver load & thaw

This patch is to address Daniels concerns over different code during reset:

http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.html

"The reason for aiming as hard as possible to use the exact same code for
driver load, gpu reset and runtime pm/system resume is that we've simply
seen too many bugs due to slight variations and unintended omissions."

Tested using igt drv_hangman.

V2: Cleaner way of preventing check_wedge returning -EAGAIN
V3: Clean the last_context during reset, to ensure do_switch() does the MI_SET_CONTEXT. As per review.
Signed-off-by: McAulay, Alistair <alistair.mcaulay@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: Rebase over ctx->ppgtt rework and extend the comment in
check_wedge a bit.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# cc9130be 24-Jul-2014 Oscar Mateo <oscar.mateo@intel.com>

drm/i915/bdw: Make sure gpu reset still works with Execlists

If we reset a ring after a hang, we have to make sure that we clear
out all queued Execlists requests.

v2: The ring is, at this point, already being correctly re-programmed
for Execlists, and the hangcheck counters cleared.

v3: Daniel suggests to drop the "if (execlists)" because the Execlists
queue should be empty in legacy mode (which is true, if we do the
INIT_LIST_HEAD).

v4: Do the pending intel_runtime_pm_put

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 48e29f55 24-Jul-2014 Oscar Mateo <oscar.mateo@intel.com>

drm/i915/bdw: Emission of requests with logical rings

On a previous iteration of this patch, I created an Execlists
version of __i915_add_request and asbtracted it away as a
vfunc. Daniel Vetter wondered then why that was needed:

"with the clean split in command submission I expect every
function to know wether it'll submit to an lrc (everything in
intel_lrc.c) or wether it'll submit to a legacy ring (existing
code), so I don't see a need for an add_request vfunc."

The honest, hairy truth is that this patch is the glue keeping
the whole logical ring puzzle together:

- i915_add_request is used by intel_ring_idle, which in turn is
used by i915_gpu_idle, which in turn is used in several places
inside the eviction and gtt codes.
- Also, it is used by i915_gem_check_olr, which is littered all
over i915_gem.c
- ...

If I were to duplicate all the code that directly or indirectly
uses __i915_add_request, I'll end up creating a separate driver.

To show the differences between the existing legacy version and
the new Execlists one, this time I have special-cased
__i915_add_request instead of adding an add_request vfunc. I
hope this helps to untangle this Gordian knot.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet: Adjust to ringbuf->FIXME_lrc_ctx per the discussion with
Thomas Daniel.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 82460d97 06-Aug-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Rework ppgtt init to no require an aliasing ppgtt

Currently we abuse the aliasing ppgtt to set up the ppgtt support in
general. Which is a bit backwards since with full ppgtt we don't ever
need the aliasing ppgtt.

So untangle this and separate the ppgtt init from the aliasing
ppgtt. While at it drag it out of the context enabling (which just
does a switch to the default context).

Note that we still have the differentiation between synchronous and
asynchronous ppgtt setup, but that will soon vanish. So also correctly
wire up the return value handling to be prepared for when ->switch_mm
drops the synchronous parameter and could start to fail.

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 896ab1a5 06-Aug-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Fix up checks for aliasing ppgtt

A subsequent patch will no longer initialize the aliasing ppgtt if we
have full ppgtt enabled, since we simply don't need that any more.

Unfortunately a few places check for the aliasing ppgtt instead of
checking for ppgtt in general. Fix them up.

One special case are the gtt offset and size macros, which have some
code to remap the aliasing ppgtt to the global gtt. The aliasing ppgtt
is _not_ a logical address space, so passing that in as the vm is
plain and simple a bug. So just WARN about it and carry on - we have a
gracefully fall-through anyway if we can't find the vma.

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6c5566a8 06-Aug-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Allow i915_gem_setup_global_gtt to fail

We already needs this just as a safety check in case the preallocation
reservation dance fails. But we definitely need this to be able to
move tha aliasing ppgtt setup back out of the context code to this
place, where it belongs.

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5dc383b0 06-Aug-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Add proper prefix to obj_to_ggtt

Stuff in headers really aught to have this.

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 841cd773 06-Aug-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Only refcount ppgtt if it actually is one

This essentially unbreaks non-ppgtt operation where we'd scribble over
random memory.

While at it give the vm_to_ppgtt function a proper prefix and make it
a bit more paranoid.

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ee960be7 06-Aug-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Some cleanups for the ppgtt lifetime handling

So when reviewing Michel's patch I've noticed a few things and cleaned
them up:
- The early checks in ppgtt_release are now redundant: The inactive
list should always be empty now, so we can ditch these checks. Even
for the aliasing ppgtt (though that's a different confusion) since
we tear that down after all the objects are gone.
- The ppgtt handling functions are splattered all over. Consolidate
them in i915_gem_gtt.c, give them OCD prefixes and add wrappers for
get/put.
- There was a bit a confusion in ppgtt_release about whether it cares
about the active or inactive list. It should care about them both,
so augment the WARNINGs to check for both.

There's still create_vm_for_ctx left to do, put that is blocked on the
removal of ppgtt->ctx. Once that's done we can rename it to
i915_ppgtt_create and move it to its siblings for handling ppgtts.

v2: Move the ppgtt checks into the inline get/put functions as
suggested by Chris.

v3: Inline the now redundant ppgtt local variable.

Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b9d06dd9 06-Aug-2014 Michel Thierry <michel.thierry@intel.com>

drm/i915: vma/ppgtt lifetime rules

VMAs should take a reference of the address space they use.

Now, when the fd is closed, it will release the ref that the context was
holding, but it will still be referenced by any vmas that are still
active.

ppgtt_release() should then only be called when the last thing referencing
it releases the ref, and it can just call the base cleanup and free the
ppgtt.

Note that with this we will extend the lifetime of ppgtts which
contain shared objects. But all the non-shared objects will get
removed as soon as they drop of the active list and for the shared
ones the shrinker can eventually reap them. Since we currently can't
evict ppgtt pagetables either I don't think that temporary leak is
important.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
[danvet: Add note about potential ppgtt leak with this approach.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 454afebd 24-Jul-2014 Oscar Mateo <oscar.mateo@intel.com>

drm/i915/bdw: Skeleton for the new logical rings submission path

Execlists are indeed a brave new world with respect to workload
submission to the GPU.

In previous version of these series, I have tried to impact the
legacy ringbuffer submission path as little as possible (mostly,
passing the context around and using the correct ringbuffer when I
needed one) but Daniel is afraid (probably with a reason) that
these changes and, especially, future ones, will end up breaking
older gens.

This commit and some others coming next will try to limit the
damage by creating an alternative path for workload submission.
The first step is here: laying out a new ring init/fini.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a83014d3 24-Jul-2014 Oscar Mateo <oscar.mateo@intel.com>

drm/i915: Abstract the legacy workload submission mechanism away

As suggested by Daniel Vetter. The idea, in subsequent patches, is to
provide an alternative to these vfuncs for the Execlists submission
mechanism.

v2: Splitted into two and reordered to illustrate our intentions, instead
of showing it off. Also, remove the add_request vfunc and added the
stop_ring one.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
[danvet:
- Make checkpatch happy.
- Be grumpy about the excessive vtable.
- Ditch gt->is_ring_initialized.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 127f1003 24-Jul-2014 Oscar Mateo <oscar.mateo@intel.com>

drm/i915/bdw: Macro for LRCs and module option for Execlists

GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
These expanded contexts enable a number of new abilities, especially
"Execlists".

The macro is defined to off until we have things in place to hope to
work.

v2: Rename "advanced contexts" to the more correct "logical ring
contexts".

v3: Add a module parameter to enable execlists. Execlist are relatively
new, and so it'd be wise to be able to switch back to ring submission
to debug subtle problems that will inevitably arise.

v4: Add an intel_enable_execlists function.

v5: Sanitize early, as suggested by Daniel. Remove lrc_enabled.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v3)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v4 & v5)
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 82b6b6d7 09-Aug-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove fenced_gpu_access and pending_fenced_gpu_access

This migrates the fence tracking onto the existing seqno
infrastructure so that the later conversion to tracking via requests is
simplified.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e6a84468 10-Aug-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Force CPU relocations if not GTT mapped

Move the decision on whether we need to have a mappable object during
execbuffer to the fore and then reuse that decision by propagating the
flag through to reservation. As a corollary, before doing the actual
relocation through the GTT, we can make sure that we do have a GTT
mapping through which to operate.

Note that the key to make this work is to ditch the
obj->map_and_fenceable unbind optimization - with full ppgtt it
doesn't make a lot of sense any more anyway.

v2: Revamp and resend to ease future patches.
v3: Refresh patch rationale

References: https://bugs.freedesktop.org/show_bug.cgi?id=81094
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: Explain why obj->map_and_fenceable is key and split out the
secure batch fix.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# dc8cd1e7 09-Aug-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only perform set-to-gtt domain for objects bound to the global gtt

If an object is not bound into the global GTT, then it cannot be
accessed via the GTT. This restores the original code that was muddled
by ppGTT. In the process, we remove a WARN that had long outlived its
usefulness and was simply being coded around instead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 274fa1c1 05-Aug-2014 Deepak S <deepak.s@linux.intel.com>

drm/i915: Bring GPU Freq to min while suspending.

We might be leaving the PGU Frequency (and thus vnn) high during the suspend.
Flusing the delayed work queue should take care of this.

Signed-off-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5ed0bdf2 16-Jul-2014 Thomas Gleixner <tglx@linutronix.de>

drm: i915: Use nsec based interfaces

Use ktime_get_raw_ns() and get rid of the back and forth timespec
conversions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: John Stultz <john.stultz@linaro.org>


# eedd10f4 16-Jun-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Simplify i915_gem_release_all_mmaps()

An object can only have an active gtt mapping if it is currently bound
into the global gtt. Therefore we can simply walk the list of all bound
objects and check the flag upon those for an active gtt mapping.

From commit 48018a57a8f5900e7e53ffaa0adeb784095accfb
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date: Fri Dec 13 15:22:31 2013 -0200

drm/i915: release the GTT mmaps when going into D3

Also note that the WARN is inappropriate for this function as GPU
activity is orthogonal to GTT mmap status. Rather it is the caller that
relies upon this condition and so it should assert that the GPU is idle
itself.

References: https://bugs.freedesktop.org/show_bug.cgi?id=80081
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
[danvet: cherry-pick from -next to -fixes.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 9490edb5 11-Jul-2014 Armin Reese <armin.c.reese@intel.com>

drm/i915: Do not unmap object unless no other VMAs reference it

When using an IOMMU, GEM objects are mapped by their DMA address as the
physical address is unknown. This depends on the underlying IOMMU
driver to map and unmap the physical pages properly as defined in
intel_iommu.c.

The current code will tell the IOMMU to unmap the GEM BO's pages on the
destruction of the first VMA that "maps" that BO. This is clearly wrong
as there may be other VMAs "mapping" that BO (using flink). The scanout
is one such example.

The patch fixes this issue by only unmapping the DMA maps when there are
no more VMAs mapping that object. This is equivalent to when an object
is considered unbound as can be seen by the code. On the first VMA that
again because bound, we will remap.

An alternate solution would be to move the dma mapping to object
creation and destrubtion. I am not sure if this is considered an
unfriendly thing to do.

Some notes to backporters trying to backport full PPGTT:

The bug can never be hit without enabling the IOMMU. The existing code
will also do the right thing when the object is shared via dmabuf. The
failure should be demonstrable with flink. In cases when not using
intel_iommu_strict it is likely (likely, as defined by: off the top of
my head) on current workloads to *not* hit this bug since we often
teardown all VMAs for an object shared across multiple VMs. We also
finish access to that object before the first dma_unmapping.
intel_iommu_strict with flinked buffers is likely to hit this issue.

Signed-off-by: Armin Reese <armin.c.reese@intel.com>
[danvet: Add the excellent commit message provided by Ben.]
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 9df7575f 20-Jun-2014 Jesse Barnes <jbarnes@virtuousgeek.org>

drm/i915: add helper for checking whether IRQs are enabled

Now that we use the runtime IRQ enable/disable functions in our suspend
path, we can simply check the pm._irqs_disabled flag everywhere. So
rename it to catch the users, and add an inline for it to make the
checks clear everywhere.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a1db2fa7 11-Jul-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Abandon oom quickly if killed by a signal

Whilst waiting to obtain our locks for the last resort shrinking before
an oom, we check whether or not a fatal signal was pending. If there was,
we do not need to keep waiting as the oom will be aborted.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1b5d063f 03-Jul-2014 Oscar Mateo <oscar.mateo@intel.com>

drm/i915: Generalize intel_ring_get_tail to take a ringbuf

Again, it's low-level enough to simply take a ringbuf and nothing
else.

Trivial change.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ec5cc0f9 12-Jun-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Restrict GPU boost to the RCS engine

Make the assumption that media workloads are not as latency sensitive
for __wait_seqno, and that upclocking the GPU does not affect the BLT
engine. Under that assumption, we only wait to forcibly upclock the GPU
when we are stalling for results from the render pipeline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ddd4dbc6 30-Jun-2014 Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/i915: Updating comments.

ring index calculation table was out of date after other rings were added,
although the formula is flexible and scale when adding new rings.

So this patch just update the comments and add a brief explanation
why to use sync_seqno[ring index].

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f99d7069 19-Jun-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Track frontbuffer invalidation/flushing

So these are the guts of the new beast. This tracks when a frontbuffer
gets invalidated (due to frontbuffer rendering) and hence should be
constantly scaned out, and when it's flushed again and can be
compressed/one-shot-upload.

Rules for flushing are simple: The frontbuffer needs one more full
upload starting from the next vblank. Which means that the flushing
can _only_ be called once the frontbuffer update has been latched.

But this poses a problem for pageflips: We can't just delay the
flushing until the pageflip is latched, since that would pose the risk
that we override frontbuffer rendering that has been scheduled
in-between the pageflip ioctl and the actual latching.

To handle this track asynchronous invalidations (and also pageflip)
state per-ring and delay any in-between flushing until the rendering
has completed. And also cancel any delayed flushing if we get a new
invalidation request (whether delayed or not).

Also call intel_mark_fb_busy in both cases in all cases to make sure
that we keep the screen at the highest refresh rate both on flips,
synchronous plane updates and for frontbuffer rendering.

v2: Lots of improvements

Suggestions from Chris:
- Move invalidate/flush in flush_*_domain and set_to_*_domain.
- Drop the flush in busy_ioctl since it's redundant. Was a leftover
from an earlier concept to track flips/delayed flushes.
- Don't forget about the initial modeset enable/final disable.
Suggested by Chris.

Track flips accurately, too. Since flips complete independently of
rendering we need to track pending flips in a separate mask. Again if
an invalidate happens we need to cancel the evenutal flush to avoid
races.

v3:
Provide correct header declarations for flip functions. Currently not
needed outside of intel_display.c, but part of the proper interface.

v4: Add proper domain management to fbcon so that the fbcon buffer is
also tracked correctly.

v5: Fixup locking around the fbcon set_to_gtt_domain call.

v6: More comments from Chris:
- Split out fbcon changes.
- Drop superflous checks for potential scanout before calling intel_fb
functions - we can micro-optimize this later.
- s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem
object. We already have precedence for fb_obj in the pin_and_fence
functions.

v7: Clarify the semantics of the flip flush handling by renaming
things a bit:
- Don't go through a gem object but take the relevant frontbuffer bits
directly. These functions center on the plane, the actual object is
irrelevant - even a flip to the same object as already active should
cause a flush.
- Add a new intel_frontbuffer_flip for synchronous plane updates. It
currently just calls intel_frontbuffer_flush since the implemenation
differs.

This way we achieve a clear split between one-shot update events on
one side and frontbuffer rendering with potentially a very long delay
between the invalidate and flush.

Chris and I also had some discussions about mark_busy and whether it
is appropriate to call from flush. But mark busy is a state which
should be derived from the 3 events (invalidate, flush, flip) we now
have by the users, like psr does by tracking relevant information in
psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for
frontbuffer) needs to have similar logic. With that the overall
mark_busy in the core could be removed.

v8: Only when retiring gpu buffers only flush frontbuffer bits we
actually invalidated in a batch. Just for safety since before any
additional usage/invalidate we should always retire current rendering.
Suggested by Chris Wilson.

v9: Actually use intel_frontbuffer_flip in all appropriate places.
Spotted by Chris.

v10: Address more comments from Chris:
- Don't call _flip in set_base when the crtc is inactive, avoids redunancy
in the modeset case with the initial enabling of all planes.
- Add comments explaining that the initial/final plane enable/disable
still has work left to do before it's fully generic.

v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris.

v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a071fa00 18-Jun-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Introduce accurate frontbuffer tracking

So from just a quick look we seem to have enough information to
accurately figure out whether a given gem bo is used as a frontbuffer
and where exactly: We have obj->pin_count as a first check with no
false negatives and only negligible false positives. And then we can
just walk the modeset objects and figure out where exactly a buffer is
used as scanout.

Except that we can't due to locking order: If we already hold
dev->struct_mutex we can't acquire any modeset locks, so could
potential chase freed pointers and other evil stuff.

So we need something else. For that introduce a new set of bits
obj->frontbuffer_bits to track where a buffer object is used. That we
can then chase without grabbing any modeset locks.

Of course the consumers of this (DRRS, PSR, FBC, ...) still need to be
able to do their magic both when called from modeset and from gem
code. But that can be easily achieved by adding locks for these
specific subsystems which always nest within either kms or gem
locking.

This patch just adds the relevant update code to all places.

Note that if we ever support multi-planar scanout targets then we need
one frontbuffer tracking bit per attachment point that we expose to
userspace.

v2:
- Fix more oopsen. Oops.
- WARN if we leak obj->frontbuffer_bits when freeing a gem buffer. Fix
the bugs this brought to light.
- s/update_frontbuffer_bits/update_fb_bits/. More consistent with the
fb tracking functions (fb for gem object, frontbuffer for raw bits).
And the function name was way too long.

v3: Size obj->frontbuffer_bits correctly so that all pipes fit in.

v4: Don't update fb bits in set_base on failure. Noticed by Chris.

v5: s/i915_gem_update_fb_bits/i915_gem_track_fb/ Also remove a few
local enum pipe variables which are now no longer needed to make the
function arguments no drop over the 80 char limit.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 3108e99e 18-Jun-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Drop schedule_back from psr_exit

It doesn't make sense to never again schedule the work, since by the
time we might want to re-enable psr the world might have changed and
we can do it again.

The only exception is when we shut down the pipe, but that's an
entirely different thing and needs to be handled in psr_disable.

Note that later patch will again split psr_exit into psr_invalidate
and psr_flush. But the split is different and this simplification
helps with the transition.

v2: Improve the commit message a bit.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f25748ea 17-Jun-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Don't BUG_ON in i915_gem_obj_offset

A WARN_ON is perfectly fine.

The BUG in here seems to be the cause behind hard-hangs when I cat the
i915_gem_pageflip debugfs file (which calls this from an irq
spinlock). But only while running a full igt run after a while. I
still need to root cause the underlying issue.

I'll also start reject patches which add new BUG_ON but don't come
with a really good justification for it. The general rule really
should be to just WARN and hope the driver survives for long enough.

v2: Make the WARN a bit more useful per Chris' suggestion.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# beff0d0f 17-Jun-2014 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Don't prefault the entire obj if the vma is smaller

Take the minimum of the object size and the vma size and prefault
only that much. Avoids a SIGBUS when mmapping only a portion of the
object.

Prefaulting was introduced here:
commit b90b91d87038f6b257b40a02b42ed4f9705e06f0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Jun 10 12:14:40 2014 +0100

drm/i915: Prefault the entire object on first page fault

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Testcase: igt/gem_mmap/short-mmap
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 46612707 24-May-2014 David Herrmann <dh.herrmann@gmail.com>

drm/i915: use shmem helpers if possible

Instead of shuffling gfp-masks all the time, use the
shmem_read_mapping_page() helper. Note that __GFP_IO and __GFP_WAIT are
set in mapping_gfp_mask() for i915, so the behavior is still the same.

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>


# 84c33a64 02-Jun-2014 Sourab Gupta <sourab.gupta@intel.com>

drm/i915: Replaced Blitter ring based flips with MMIO flips

This patch enables the framework for using MMIO based flip calls,
in contrast with the CS based flip calls which are being used currently.

MMIO based flip calls can be enabled on architectures where
Render and Blitter engines reside in different power wells. The
decision to use MMIO flips can be made based on workloads to give
100% residency for Media power well.

v2: The MMIO flips now use the interrupt driven mechanism for issuing the
flips when target seqno is reached. (Incorporating Ville's idea)

v3: Rebasing on latest code. Code restructuring after incorporating
Damien's comments

v4: Addressing Ville's review comments
-general cleanup
-updating only base addr instead of calling update_primary_plane
-extending patch for gen5+ platforms

v5: Addressed Ville's review comments
-Making mmio flip vs cs flip selection based on module parameter
-Adding check for DRIVER_MODESET feature in notify_ring before calling
notify mmio flip.
-Other changes mostly in function arguments

v6: -Having a seperate function to check condition for using mmio flips (Ville)
-propogating error code from i915_gem_check_olr (Ville)

v7: -Adding __must_check with i915_gem_check_olr (Chris)
-Renaming mmio_flip_data to mmio_flip (Chris)
-Rebasing on latest nightly

v8: -Rebasing on latest code
-squash 3rd patch in series(mmio setbase vs page flip race) with this patch
-Added new tiling mode update in intel_do_mmio_flip (Chris)

v9: -check for obj->last_write_seqno being 0 instead of obj->ring being NULL in
intel_postpone_flip, as this is a more restrictive condition (Chris)

v10: -Applied Chris's suggestions for squashing patches 2,3 into this patch.
These patches make the selection of CS vs MMIO flip at the page flip time, and
make the module parameter for using mmio flips as tristate, the states being
'force CS flips', 'force mmio flips', 'driver discretion'.
Changed the logic for driver discretion (Chris)

v11: Minor code cleanup(better readability, fixing whitespace errors, using
lockdep to check mutex locked status in postpone_flip, removal of __must_check
in function definition) (Chris)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk> # snb, ivb
[danvet: Fix up parameter alignement checkpatch spotted.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6254b204 16-Jun-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Simplify i915_gem_release_all_mmaps()

An object can only have an active gtt mapping if it is currently bound
into the global gtt. Therefore we can simply walk the list of all bound
objects and check the flag upon those for an active gtt mapping.

From commit 48018a57a8f5900e7e53ffaa0adeb784095accfb
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date: Fri Dec 13 15:22:31 2013 -0200

drm/i915: release the GTT mmaps when going into D3

Also note that the WARN is inappropriate for this function as GPU
activity is orthogonal to GTT mmap status. Rather it is the caller that
relies upon this condition and so it should assert that the GPU is idle
itself.

References: https://bugs.freedesktop.org/show_bug.cgi?id=80081
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 7c8f8a70 13-Jun-2014 Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/i915: Force PSR exit by inactivating it.

The perfect solution for psr_exit is the hardware tracking the changes and
doing the psr exit by itself. This scenario works for HSW and BDW with some
environments like Gnome and Wayland.

However there are many other scenarios that this isn't true. Mainly one right
now is KDE users on HSW and BDW with PSR on. User would miss many screen
updates. For instances any key typed could be seen only when mouse cursor is
moved. So this patch introduces the ability of trigger PSR exit on kernel side
on some common cases that.

Most of the cases are coverred by psr_exit at set_domain. The remaining cases
are coverred by triggering it at set_domain, busy_ioctl, sw_finish and
mark_busy.

The downside here might be reducing the residency time on the cases this
already work very wall like Gnome environment. But so far let's get focused
on fixinge issues sio PSR couild be used for everybody and we could even
get it enabled by default. Later we can add some alternatives to choose the
level of PSR efficiency over boot flag of even over crtc property.

v2: remove exit from connector_dpms. Daniel pointed this is the wrong way and
also this isn't needed for BDW and HSW anyway.

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Vijay Purushothaman <vijay.a.purushothaman@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b90b91d8 09-Jun-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prefault the entire object on first page fault

Inserting additional PTEs has no side-effect for us as the pfn are fixed
for the entire time the object is resident in the global GTT. The
downside is that we pay the entire cost of faulting the object upon the
first hit, for which we in return receive the benefit of removing the
per-page faulting overhead.

On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
Upload rate for 2 linear surfaces: 8127MiB/s -> 8134MiB/s
Upload rate for 2 tiled surfaces: 8607MiB/s -> 8625MiB/s
Upload rate for 4 linear surfaces: 8127MiB/s -> 8127MiB/s
Upload rate for 4 tiled surfaces: 8611MiB/s -> 8602MiB/s
Upload rate for 8 linear surfaces: 8114MiB/s -> 8124MiB/s
Upload rate for 8 tiled surfaces: 8601MiB/s -> 8603MiB/s
Upload rate for 16 linear surfaces: 8110MiB/s -> 8123MiB/s
Upload rate for 16 tiled surfaces: 8595MiB/s -> 8606MiB/s
Upload rate for 32 linear surfaces: 8104MiB/s -> 8121MiB/s
Upload rate for 32 tiled surfaces: 8589MiB/s -> 8605MiB/s
Upload rate for 64 linear surfaces: 8107MiB/s -> 8121MiB/s
Upload rate for 64 tiled surfaces: 2013MiB/s -> 3017MiB/s

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Goel, Akash" <akash.goel@intel.com>
Testcasee: igt/gem_fence_upload/performance
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ef0cf27c 06-Jun-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use the .release hook to drop the stolen drm_mm tracking

Now that we have a release hook into i915_gem_object_free, we can move
the explicit call to the internal stolen function and hook it up
throught the callback instead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f461d1be2 25-May-2014 David Herrmann <dh.herrmann@gmail.com>

drm/i915: use shmem helpers if possible

Instead of shuffling gfp-masks all the time, use the
shmem_read_mapping_page() helper. Note that __GFP_IO and __GFP_WAIT are
set in mapping_gfp_mask() for i915, so the behavior is still the same.

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ddeff6ee 28-May-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Silence the WARN if the user tries to GTT mmap an incoherent object

If the user tries to mmap through the GTT an object that is marked as
snooped, we report an error rather than allow the GPU to hang the
machine. The choice of EINVAL, however, was unfortunate as we turn that
into a WARN rather than a quiet SIGBUS.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# dbb42748 25-Feb-2014 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Move the C3 LP write bit setup to gen3_init_clock_gating() for KMS

Move the MI_ARB_STATE MI_ARB_C3_LP_WRITE_ENABLE setup to
gen3_init_clock_gating() from i915_gem_load() when KMS is enabled. Leave
it in i915_gem_load() for the UMS case, but add an explcit check, just
to make it easier to spot it when we eventually rip out UMS support.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d23db88c 23-May-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prevent negative relocation deltas from wrapping

This is pure evil. Userspace, I'm looking at you SNA, repacks batch
buffers on the fly after generation as they are being passed to the
kernel for execution. These batches also contain self-referenced
relocations as a single buffer encompasses the state commands, kernels,
vertices and sampler. During generation the buffers are placed at known
offsets within the full batch, and then the relocation deltas (as passed
to the kernel) are tweaked as the batch is repacked into a smaller buffer.
This means that userspace is passing negative relocations deltas, which
subsequently wrap to large values if the batch is at a low address. The
GPU hangs when it then tries to use the large value as a base for its
address offsets, rather than wrapping back to the real value (as one
would hope). As the GPU uses positive offsets from the base, we can
treat the relocation address as the minimum address read by the GPU.
For the upper bound, we trust that userspace will not read beyond the
end of the buffer.

So, how do we fix negative relocations from wrapping? We can either
check that every relocation looks valid when we write it, and then
position each object such that we prevent the offset wraparound, or we
just special-case the self-referential behaviour of SNA and force all
batches to be above 256k. Daniel prefers the latter approach.

This fixes a GPU hang when it tries to use an address (relocation +
offset) greater than the GTT size. The issue would occur quite easily
with full-ppgtt as each fd gets its own VM space, so low offsets would
often be handed out. However, with the rearrangement of the low GTT due
to capturing the BIOS framebuffer, it is already affecting kernels 3.15
onwards. I think only IVB+ is susceptible to this bug, but the workaround
should only kick in rarely, so it seems sensible to always apply it.

v3: Use a bias for batch buffers to prevent small negative delta relocations
from wrapping.

v4 from Daniel:
- s/BIAS/BATCH_OFFSET_BIAS/
- Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
were growing rather cumbersome.
- Add a comment to eb_get_batch explaining why we do this.
- Apply the batch offset bias everywhere but mention that we've only
observed it on gen7 gpus.
- Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.

v5: Add static to eb_get_batch, spotted by 0-day tester.

Testcase: igt/gem_bad_reloc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 00731155 20-May-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix dynamic allocation of physical handles

A single object may be referenced by multiple registers fundamentally
breaking the static allotment of ids in the current design. When the
object is used the second time, the physical address of the first
assignment is relinquished and a second one granted. However, the
hardware is still reading (and possibly writing) to the old physical
address now returned to the system. Eventually hilarity will ensue, but
in the short term, it just means that cursors are broken when using more
than one pipe.

v2: Fix up leak of pci handle when handling an error during attachment,
and avoid a double kmap/kunmap. (Ville)
Rebase against -fixes.

v3: And fix the error handling added in v2 (Ville)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77351
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 273497e5 22-May-2014 Oscar Mateo <oscar.mateo@intel.com>

drm/i915: s/i915_hw_context/intel_context

Up until now, contexts had one (and only one) backing object that was
used by the hardware to save/restore render ring contexts (via the
MI_SET_CONTEXT command). Other rings did not have or need this, so
our i915_hw_context struct had a 1:1 relationship with a a real HW
context.

With Logical Ring Contexts and Execlists, this is not possible anymore:
all rings need a backing object, and it cannot be reused. To prepare
for that, rename our contexts to the more generic term intel_context.

No functional changes.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ee1b1e5e 22-May-2014 Oscar Mateo <oscar.mateo@intel.com>

drm/i915: Split the ringbuffers from the rings (2/3)

This refactoring has been performed using the following Coccinelle
semantic script:

@@
struct intel_engine_cs r;
@@
(
- (r).obj
+ r.buffer->obj
|
- (r).virtual_start
+ r.buffer->virtual_start
|
- (r).head
+ r.buffer->head
|
- (r).tail
+ r.buffer->tail
|
- (r).space
+ r.buffer->space
|
- (r).size
+ r.buffer->size
|
- (r).effective_size
+ r.buffer->effective_size
|
- (r).last_retired_head
+ r.buffer->last_retired_head
)

@@
struct intel_engine_cs *r;
@@
(
- (r)->obj
+ r->buffer->obj
|
- (r)->virtual_start
+ r->buffer->virtual_start
|
- (r)->head
+ r->buffer->head
|
- (r)->tail
+ r->buffer->tail
|
- (r)->space
+ r->buffer->space
|
- (r)->size
+ r->buffer->size
|
- (r)->effective_size
+ r->buffer->effective_size
|
- (r)->last_retired_head
+ r->buffer->last_retired_head
)

@@
expression E;
@@
(
- LP_RING(E)->obj
+ LP_RING(E)->buffer->obj
|
- LP_RING(E)->virtual_start
+ LP_RING(E)->buffer->virtual_start
|
- LP_RING(E)->head
+ LP_RING(E)->buffer->head
|
- LP_RING(E)->tail
+ LP_RING(E)->buffer->tail
|
- LP_RING(E)->space
+ LP_RING(E)->buffer->space
|
- LP_RING(E)->size
+ LP_RING(E)->buffer->size
|
- LP_RING(E)->effective_size
+ LP_RING(E)->buffer->effective_size
|
- LP_RING(E)->last_retired_head
+ LP_RING(E)->buffer->last_retired_head
)

Note: On top of this this patch also removes the now unused ringbuffer
fields in intel_engine_cs.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
[danvet: Add note about fixup patch included here.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a4872ba6 22-May-2014 Oscar Mateo <oscar.mateo@intel.com>

drm/i915: s/intel_ring_buffer/intel_engine_cs

In the upcoming patches we plan to break the correlation between
engine command streamers (a.k.a. rings) and ringbuffers, so it
makes sense to refactor the code and make the change obvious.

No functional changes.

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 340fbd8c 22-May-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only discard backing storage on releasing the last ref

Before purging our pages (as opposed to copying back the contents from
the GPU), make sure that there is not an exposed CPU mmapping through
which the user can inspect the results.

Regression from

commit 5537252b6b6d71fb1a8ed7395a8e5babf91953fd
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Mar 25 13:23:06 2014 +0000

drm/i915: Invalidate our pages under memory pressure

Testcase: igt/gem_mmap/new-object
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79005
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Guo Jinxian <jinxianx.guo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 2cfcd32a 20-May-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Implement an oom-notifier for last resort shrinking

Before the process killer is invoked, oom-notifiers are executed for one
last try at recovering pages. We can hook into this callback to be sure
that everything that can be is purged from our page lists, and to give a
summary of how much memory is still pinned by the GPU in the case of an
oom. This should be really valuable for debugging OOM issues.

Note that the last-ditch effort call to shrink_all we've previously
called from our normal shrinker when we could free as much as the vm
demaned is moved into the oom notifier. Since the shrinker accounting
races against bind/unbind operations we might have called shrink_all
prematurely, which this approach with an oom notifier avoids.

References: https://bugs.freedesktop.org/show_bug.cgi?id=72742
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: lu hua <huax.lu@intel.com>
[danvet: Bikeshed logical | into || and pimp commit message.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5537252b 25-Mar-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Invalidate our pages under memory pressure

Try to flush out dirty pages into the swapcache (and from there into the
swapfile) when under memory pressure and forced to drop GEM objects from
memory. In effect, this should just allow us to discard unused pages for
memory reclaim and to start writeback earlier.

v2: Hugh Dickins warned that explicitly starting writeback from
shrink_slab was prone to deadlocks within shmemfs.

Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Robert Beckett <robert.beckett@intel.com>
Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b453c4db 25-Mar-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refactor common lock handling between shrinker count/scan

We can share a few lines of tricky lock handling we need to use for both
shrinker routines and in the process fix the return value for count()
when reporting a deadlock.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Robert Beckett <robert.beckett@intel.com>
Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ceabbba5 25-Mar-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Include bound and active pages in the count of shrinkable objects

When the machine is under a lot of memory pressure and being stressed by
multiple GPU threads, we quite often report fewer than shrinker->batch
(i.e. SHRINK_BATCH) pages to be freed. This causes the shrink_control to
skip calling into i915.ko to release pages, despite the GPU holding onto
most of the physical pages in its active lists.

References: https://bugs.freedesktop.org/show_bug.cgi?id=72742
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Robert Beckett <robert.beckett@intel.com>
Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 0820baf3 25-Mar-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Translate ENOSPC from shmem_get_page() to ENOMEM

shmemfs first checks if there is enough memory to allocate the page
and reports ENOSPC should there be insufficient, along with
the usual ENOMEM for a genuine allocation failure.

We use ENOSPC in our driver to mean that we have run out of aperture
space and so want to translate the error from shmemfs back to
our usual understanding of ENOMEM. None of the the other GEM users
appear to distinguish between ENOMEM and ENOSPC in their error handling,
hence it is easiest to do the fixup in i915.ko

Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Robert Beckett <robert.beckett@intel.com>
Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5cc9ed4b 16-May-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

By exporting the ability to map user address and inserting PTEs
representing their backing pages into the GTT, we can exploit UMA in order
to utilize normal application data as a texture source or even as a
render target (depending upon the capabilities of the chipset). This has
a number of uses, with zero-copy downloads to the GPU and efficient
readback making the intermixed streaming of CPU and GPU operations
fairly efficient. This ability has many widespread implications from
faster rendering of client-side software rasterisers (chromium),
mitigation of stalls due to read back (firefox) and to faster pipelining
of texture data (such as pixel buffer objects in GL or data blobs in CL).

v2: Compile with CONFIG_MMU_NOTIFIER
v3: We can sleep while performing invalidate-range, which we can utilise
to drop our page references prior to the kernel manipulating the vma
(for either discard or cloning) and so protect normal users.
v4: Only run the invalidate notifier if the range intercepts the bo.
v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
v6: Recheck after reacquire mutex for lost mmu.
v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
v8: Fix rebasing error after forwarding porting the back port.
v9: Limit the userptr to page aligned entries. We now expect userspace
to handle all the offset-in-page adjustments itself.
v10: Prevent vma from being copied across fork to avoid issues with cow.
v11: Drop vma behaviour changes -- locking is nigh on impossible.
Use a worker to load user pages to avoid lock inversions.
v12: Use get_task_mm()/mmput() for correct refcounting of mm.
v13: Use a worker to release the mmu_notifier to avoid lock inversion
v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
with its own locking and tree of objects for each mm/mmu_notifier.
v15: Prevent overlapping userptr objects, and invalidate all objects
within the mmu_notifier range
v16: Fix a typo for iterating over multiple objects in the range and
rearrange error path to destroy the mmu_notifier locklessly.
Also close a race between invalidate_range and the get_pages_worker.
v17: Close a race between get_pages_worker/invalidate_range and fresh
allocations of the same userptr range - and notice that
struct_mutex was presumed to be held when during creation it wasn't.
v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
for the struct sg_table and to clear it before reporting an error.
v19: Always error out on read-only userptr requests as we don't have the
hardware infrastructure to support them at the moment.
v20: Refuse to implement read-only support until we have the required
infrastructure - but reserve the bit in flags for future use.
v21: use_mm() is not required for get_user_pages(). It is only meant to
be used to fix up the kernel thread's current->mm for use with
copy_user().
v22: Use sg_alloc_table_from_pages for that chunky feeling
v23: Export a function for sanity checking dma-buf rather than encode
userptr details elsewhere, and clean up comments based on
suggestions by Bradley.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Akash Goel <akash.goel@intel.com>
Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
[danvet: Frob ioctl allocation to pick the next one - will cause a bit
of fuss with create2 apparently, but such are the rules.]
[danvet2: oops, forgot to git add after manual patch application]
[danvet3: Appease sparse.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 19656430 16-May-2014 Oscar Mateo <oscar.mateo@intel.com>

drm/i915: Gracefully handle obj not bound to GGTT in is_pin_display

Otherwise, we do a NULL pointer dereference.

I've seen this happen while handling an error in
i915_gem_object_pin_to_display_plane():

If i915_gem_object_set_cache_level() fails, we call is_pin_display()
to handle the error. At this point, the object is still not pinned
to GGTT and maybe not even bound, so we have to check before we
dereference its GGTT vma.

The IGT kms_flip/bo-too-big tests for this bug.

v2: Chris Wilson says restoring the old value is easier, but that
is_pin_display is useful as a theory of operation. Take the solomonic
decision: at least this way is_pin_display is a little more robust
(until Chris can kill it off).

v3: Chris suggests the WARN in i915_gem_obj_to_ggtt has outlived its
usefulness: add a reminder to remove it.

Issue: VIZ-3772
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Testcase: igt/kms_flip/bo-too-big
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8b1bc9b4 14-Feb-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Only do gtt cleanup in vma_unbind for the global vma

Otherwise we end up tearing down fences when e.g. the client quits
way too early. Might or might not fix a fence pin_count BUG Ville has
reported.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# aff10b30 14-Feb-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Don't drop pinned fences

Userspace can currently provoke this when e.g. trying to use a pinned
scanout as a cursor or overlay target. Later on that might lead to
some fun fence pin count mayhem.

Spurred by Ville's report that something goes wrong here and
originally I've thought that this might slip through the pwrite gtt
fastpath. But that one checks of obj tiling, so should be ok.

But one thing that _does_ blow up is the vma unbinding with more than
one address space. The next patch will fix this.

v2: Use a WARN_ON - Chris pointed out that we already catch all cases
so userspace can't provoke this like I've originally feared.

While reviewing relevant code I've noticed a pile of DRM_ERROR in the
overlay&cursor code which are all triggerable by userspace. Tune them
down while at it.

v3: Split out the DRM_ERROR->DRM_DEBUG_KMS change into a separate patch,
as requested by Chris.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d8ffa60b 12-May-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: WARN_ON fence pin leaks

The fence pin count should always be <= the bo pin count. If that's
not the case then we have a funny problem and are leaking references
somewhere.

Which means we can catch fence pin leaks by checking for the same
upper limit as we do for the bo pin count. Inspired by a discussion
with Ville about a fence leak igt testcase.

v2: Also check for fence->pin_count <= ggtt_vma->pin_count, since that
might catch a leak even quicker. Also de-inline them, they're getting
too big.

v3: Don't separately check for MAX_PIN_COUNT since the > vma->pin_count
check will catch that already (Chris).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1cf0ba14 05-May-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush request queue when waiting for ring space

During the review of

commit 1f70999f9052f5a1b0ce1a55aff3808f2ec9fe42
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Jan 27 22:43:07 2014 +0000

drm/i915: Prevent recursion by retiring requests when the ring is full

Ville raised the point that our interaction with request->tail was
likely to foul up other uses elsewhere (such as hang check comparing
ACTHD against requests).

However, we also need to restore the implicit retire requests that certain
test cases depend upon (e.g. igt/gem_exec_lut_handle), this raises the
spectre that the ppgtt will randomly call i915_gpu_idle() and recurse
back into intel_ring_begin().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78023
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
[danvet: Remove now unused 'tail' variable as spotted by Brad.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6e7186af 06-May-2014 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Make aliasing a 2nd class VM

There is a good debate to be had about how best to fit the aliasing
PPGTT into the code. However, as it stands right now, getting aliasing
PPGTT bindings is a hack, and done through implicit arguments. To make
this absolutely clear, WARN and return an error if a driver writer tries
to do something they shouldn't.

I have no issue with an eventual revert of this patch. It makes sense
for what we have today.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ebc348b2 29-Apr-2014 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Move semaphore specific ring members to struct

This will be helpful in abstracting some of the code in preparation for
gen8 semaphores.

v2: Move mbox stuff to a separate struct

v3: Rebased over VCS2 work

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c8725f3d 16-Mar-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Do not call retire_requests from wait_for_rendering

A common issue we have is that retiring requests causes recursion
through GTT manipulation or page table manipulation which we can only
handle at very specific points. However, to maintain internal
consistency (enforced through our sanity checks on write_domain at
various points in the GEM object lifecycle) we do need to retire the
object prior to marking it with a new write_domain, and also clear the
write_domain for the implicit flush following a batch.

Note that this then allows the unbound objects to still be on the active
lists, and so care must be taken when removing objects from unbound lists
(similar to the caveats we face processing the bound lists).

v2: Fix i915_gem_shrink_all() to handle updated object lifetime rules,
by refactoring it to call into __i915_gem_shrink().

v3: Missed an object-retire prior to changing cache domains in
i915_gem_object_set_cache_leve()

v4: Rebase

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 981a5aea 14-Apr-2014 Imre Deak <imre.deak@intel.com>

drm/i915: vlv: clean up GTLC wake control/status register macros

These will be needed by the upcoming VLV RPM helpers.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 845f74a7 16-Apr-2014 Zhao Yakui <yakui.zhao@intel.com>

drm/i915:Initialize the second BSD ring on BDW GT3 machine

Based on the hardware spec, the BDW GT3 machine has two independent
BSD ring that can be used to dispatch the video commands.
So just initialize it.

V3->V4: Follow Imre's comment to do some minor updates. For example:
more comments are added to describe the semaphore between ring.

Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
[danvet: Fix up checkpatch error.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 60990320 09-Apr-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Allow the module to load even if we fail to setup rings

Even without enabling the ringbuffers to allow command execution, we can
still control the display engines to enable modesetting. So make the
ringbuffer initialization failure soft, and mark the GPU as wedged
instead.

v2: Only treat an EIO from ring initialisation as a soft failure, and
abort module load for any other failure, such as allocation failures.

v3: Add an *ERROR* prior to declaring the GPU wedged so that it stands
out like a sore thumb in the logs

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e3efda49 09-Apr-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Preserve ring buffers objects across resume

Tearing down the ring buffers across resume is overkill, risks
unnecessary failure and increases fragmentation.

After failure, since the device is still active we may end up trying to
write into the dangling iomapping and trigger an oops.

v2: stop_ringbuffers() was meant to call stop(ring) not
cleanup(ring) during resume!

Reported-by: Jae-hyeon Park <jhyeon@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=72351
References: https://bugs.freedesktop.org/show_bug.cgi?id=76554
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
[danvet: s/ring->obj == NULL/!intel_ring_initialized(ring)/ as
suggested by Oscar.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# bb0f1b5c 03-Nov-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm: pass the irq explicitly to drm_irq_install

Unfortunately this requires a drm-wide change, and I didn't see a sane
way around that. Luckily it's fairly simple, we just need to inline
the respective get_irq implementation from either drm_pci.c or
drm_platform.c.

With that we can now also remove drm_dev_to_irq from drm_irq.c.

Reviewed-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e090c53b 03-Nov-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/irq: remove cargo-culted locking from irq_install/uninstall

The dev->struct_mutex locking in drm_irq.c only protects
dev->irq_enabled. Which isn't really much at all and only prevents
especially nasty ums userspace from concurrently installing the
interrupt handling a few times. Or at least trying.

There are tons of unlocked readers of dev->irqs_enabled in the vblank
wait code (and by extension also in the pageflip code since that uses
the same vblank timestamp engine).

Real modesetting drivers should ensure that nothing can go haywire
with a sane setup teardown sequence. So we only really need this for
the drm_control ioctl, everywhere else this will just paper over
nastiness.

Note that drm/i915 is a bit specially due to the gem+ums combination.
So there we also need to properly protect the entervt and leavevt
ioctls. But it's definitely saner to do everything in one go than to
drop the lock in-between.

Finally there's the gpu reset code in drm/i915. That one's just race
(concurrent userspace calls to for vblank waits of pageflips could
spuriously fail). So wrap it up in with a nice comment since fixing
this is more involved.

v2: Rebase and fix commit message (Thierry)

Reviewed-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 691e6415 09-Apr-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Always use kref tracking for all contexts.

If we always initialize kref for the context, even if we are using fake
contexts for hangstats when there is no hw support, we can forgo the
dance to dereference the ctx->obj and inspect whether we are permitted
to use kref inside i915_gem_context_reference() and _unreference().

My ulterior motive here is to improve the debugging of a use-after-free
of ctx->obj. This patch avoids the dereference here and instead forces
the assertion checks associated with kref.

v2: Refactor the fake contexts to being even more like the real
contexts, so that there is much less duplicated and special case code.

v3: Tweaks.
v4: Tweaks, minor.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: lu hua <huax.lu@intel.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[Jani: tiny change to backport to drm-intel-fixes.]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 88b4aa87 28-Mar-2014 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: add flags to i915_ring_stop

Piglit runner and QA are both looking at the dmesg for
DRM_ERRORs with test cases. Add a flag to control those
when we they are expected from related test cases.

Also add flag to control if contexts should be banned
that introduced the hang. Hangcheck is timer based and
preventing bans by adding sleeps to testcases makes
testing slower.

v2: intel_ring_stopped(), readable comment (Chris)
v3: keep compatibility (Daniel)

References: https://bugs.freedesktop.org/show_bug.cgi?id=75876
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 62347f9e 02-Apr-2014 Lauri Kasanen <cand@gmx.com>

drm: Add support for two-ended allocation, v3

Clients like i915 need to segregate cache domains within the GTT which
can lead to small amounts of fragmentation. By allocating the uncached
buffers from the bottom and the cacheable buffers from the top, we can
reduce the amount of wasted space and also optimize allocation of the
mappable portion of the GTT to only those buffers that require CPU
access through the GTT.

For other drivers, allocating small bos from one end and large ones
from the other helps improve the quality of fragmentation.

Based on drm_mm work by Chris Wilson.

v3: Changed to use a TTM placement flag
v2: Updated kerneldoc

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Christian König <deathsimple@vodafone.de>
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: David Airlie <airlied@redhat.com>


# 3e31c6c0 31-Mar-2014 Jani Nikula <jani.nikula@intel.com>

drm/i915/gem: prefer struct drm_i915_private to drm_i915_private_t

No functional changes.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# df6f783a 21-Mar-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix unsafe loop iteration over vma whilst unbinding them

On non-LLC platforms, when changing the cache level of an object, we may
need to unbind it so that prefetching across page boundaries does not
cross into a different memory domain. This requires us to unbind
conflicting vma, but we did so iterating over the objects vma in an
unsafe manner (as the list was being modified as we iterated).

The regression was introduced in
commit 3089c6f239d7d2c4cb2dd5c353e8984cf79af1d7
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Wed Jul 31 17:00:03 2013 -0700

drm/i915: make caching operate on all address spaces
apparently as far back as v3.12-rc1, but it has only just begun to
trigger real world bug reports.

Reported-and-tested-by: Nikolay Martynov <mar.kolya@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76384
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5d584b2e 07-Mar-2014 Paulo Zanoni <paulo.r.zanoni@intel.com>

drm/i915: move pc8.irqs_disabled to pm.irqs_disabled

When other platforms add runtime PM support they will also need to
disable interrupts, so move the variable to the runtime PM struct.

Also notice that the longer-term goal is to completely kill the
regsave struct, and I even have patches for that.

v2: - Rebase.
v3: - Rebase.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6796cb16 03-Jan-2014 David Herrmann <dh.herrmann@gmail.com>

drm: use anon-inode instead of relying on cdevs

DRM drivers share a common address_space across all character-devices of a
single DRM device. This allows simple buffer eviction and mapping-control.
However, DRM core currently waits for the first ->open() on any char-dev
to mark the underlying inode as backing inode of the device. This delayed
initialization causes ugly conditions all over the place:
if (dev->dev_mapping)
do_sth();

To avoid delayed initialization and to stop reusing the inode of the
char-dev, we allocate an anonymous inode for each DRM device and reset
filp->f_mapping to it on ->open().

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>


# 3ddffb7b 12-Mar-2014 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Unbind all vmas whose new cache_level doesn't agree with the neighbours

When we change the cache_level for an object we need to make sure
we don't put differing types of snoopable memory too close to each
other on non-LLC machines.

Currently i915_gem_object_set_cache_level() will stop looking when
it finds just one vma that has such a conflict. Drop the bogus break
statement to make sure it will unbind all vmas which need to be moved
around to avoid the conflict.

I suppose this is a theoretical issue as currently we don't enable
ppgtt on non-LLC machines, so each object can only have one vma.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c2831a94 07-Mar-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Do not force non-caching copies for pwrite along shmem path

We don't always want to write into main memory with pwrite. The shmem
fast path in particular is used for memory that is cacheable - under
such circumstances forcing the cache eviction is undesirable. As we will
always flush the cache when targeting incoherent buffers, we can rely on
that second pass to apply the cache coherency rules and so benefit from
in-cache copies otherwise.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 17793c9a 07-Mar-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Process page flags once rather than per pwrite/pread

We used to lock individual pages inside the buffer object and so needed
to update the page flags every time. However, we now pin the pages into
the object for the duration of the pwrite/pread (and hopefully much
longer) and so we can forgo the flag updates until we release all the
pages.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4c914c0c 18-Feb-2014 Brad Volkin <bradley.d.volkin@intel.com>

drm/i915: Refactor shmem pread setup

The command parser is going to need the same synchronization and
setup logic, so factor it out for reuse.

v2: Add a check that the object is backed by shmem

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# cb216aa8 03-Mar-2014 Damien Lespiau <damien.lespiau@intel.com>

drm/i915: Make i915_gem_retire_requests_ring() static

Its last usage outside of i915_gem.c was removed in:

commit 1f70999f9052f5a1b0ce1a55aff3808f2ec9fe42
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Jan 27 22:43:07 2014 +0000

drm/i915: Prevent recursion by retiring requests when the ring is full

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ab0e7ff9 25-Feb-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Record pid/comm of hanging task

After finding the guilty batch and request, we can use it to find the
process that submitted the batch and then add the culprit into the error
state.

This is a slightly different approach from Ben's in that instead of
adding the extra information into the struct i915_hw_context, we use the
information already captured in struct drm_file which is then referenced
from the request.

v2: Also capture the workaround buffer for gen2, so that we can compare
its contents against the intended batch for the active request.

v3: Rebase (Mika)
v4: Check for null context (Chris)
checkpatch warnings fixed

Link: http://lists.freedesktop.org/archives/intel-gfx/2013-August/032280.html
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v4)
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8d9fc7fd 25-Feb-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rely on accurate request tracking for finding hung batches

In the past, it was possible to have multiple batches per request due to
a stray signal or ENOMEM. As a result we had to scan each active object
(filtered by those having the COMMAND domain) for the one that contained
the ACTHD pointer. This was then made more complicated by the
introduction of ppgtt, whereby ACTHD then pointed into the address space
of the context and so also needed to be taken into account.

This is a fairly robust approach (though the implementation is a little
fragile and depends upon the per-generation setup, registers and
parameters). However, due to the requirements for hangstats, we needed a
robust method for associating batches with a particular request and
having that we can rely upon it for finding the associated batch object
for error capture.

If the batch buffer tracking is not robust enough, that should become
apparent quite quickly through an erroneous error capture. That should
also help to make sure that the runtime reporting to userspace is
robust. It also means that we then report the oldest incomplete batch on
each ring, which can be useful for determining the state of userspace at
the time of a hang.

v2: Use i915_gem_find_active_request (Mika)

v3: remove check for ring->get_seqno, split long lines (Ben)

v4: check that context is available (Chris)
checkpatch warnings fixed

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v3)
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v3)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 64bf9303 25-Feb-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reset vma->mm_list after unbinding

In place of true activity counting, we walk the list of vma associated
with an object managing each on the vm's active/inactive list everytime
we call move-to-inactive. This depends upon the vma->mm_list being
cleared after unbinding, or else we run into difficulty when tracking
the object in multiple vm's - we see a use-after free and corruption of
the mm_list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ccc7bed0 21-Feb-2014 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Don't ban default context when stop_rings!=0

If we've explicitly stopped the rings for testing purposes, don't ban
the default context. Fixes kms_flip hang tests.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f62a0076 21-Feb-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Accurately track when we mark the hardware as idle/busy

We currently call intel_mark_idle() too often, as we do so as a
side-effect of processing the request queue. However, we the calls to
intel_mark_idle() are expected to be paired with a call to
intel_mark_busy() (or else we try to idle the hardware by accessing
registers that are already disabled). Make the idle/busy tracking
explicit to prevent the multiple calls.

v2: We can drop some of the complexity in __i915_add_request() as
queue_delayed_work() already behaves as we want (not requeuing the item
if it is already in the queue) and mark_busy/mark_idle imply that the
idle task is inactive.

v3: We do still need to cancel the pending idle task so that it is sent
again after the current busy load completes (not in the middle of it).

Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8ea99c92 14-Feb-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Only bind each object rather than for every execbuffer

One side-effect of the introduction of ppgtt was that we needed to
rebind the object into the appropriate vm (and global gtt in some
peculiar cases). For simplicity this was done twice for every object on
every call to execbuffer. However, that adds a tremendous amount of CPU
overhead (rewriting all the PTE for all objects into WC memory) per
draw. The fix is to push all the decision about which vm to bind into
and when down into the low-level bind routines through hints rather than
performing the bind unconditionally in the execbuffer routine.

Note that this is a regression introduced in the full ppgtt feature
branch, before this we've only done re-bound objects when the relevant
has_(aliasing_ppgtt|global_gtt)_mapping flag was clear. But since
that's per-object and not per-vma that optimization broke.

v2: Split out prep work and unrelated changes.

v3: Bring back functional change around PIN_GLOBAL that I've
accidentally split out.

v4: Remove the temporary hack for the old binding logic to avoid
bisection issues.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72906
Tested-by: jianx.zhou@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 262de145 14-Feb-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Directly return the vma from bind_to_vm

This is prep work for reworking the object_pin logic. Atm
it still does a (now redundant) lookup of the vma. The next
patch will fix this.

Split out from Chris vma-bind rework.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b287110e 14-Feb-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Simplify i915_gem_object_ggtt_unpin

Split out from Chris vma-bind rework.

Jani wondered why this is save, and the reason is that i915_vma_unbind
does all these checks, too. So they're redundant.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# bf3d149b 14-Feb-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: split PIN_GLOBAL out from PIN_MAPPABLE

With abitrary pin flags it makes sense to split out a "please bind
this into global gtt" from the "please allocate in the mappable
range".

Use this unconditionally in our global gtt pin helper since this is
what its callers want. Later patches will drop PIN_MAPPABLE where it's
not strictly needed.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1ec9e26d 14-Feb-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Consolidate binding parameters into flags

Anything more than just one bool parameter is just a pain to read,
symbolic constants are much better.

Split out from Chris' vma-binding rework patch.

v2: Undo the behaviour change in object_pin that Chris spotted.

v3: Split out misplaced hunk to handle set_cache_level errors,
spotted by Jani.

v4: Keep the current over-zealous binding logic in the execbuffer code
working with a quick hack while the overall binding code gets shuffled
around.

v5: Reorder the PIN_ flags for more natural patch splitup.

v6: Pull out the PIN_GLOBAL split-up again.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6e4930f6 07-Feb-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush GPU rendering with a lockless wait during a pagefault

Arjan van de Ven reported that on his test machine that he was seeing
stalls of greater than 1 frame greatly impacting the user experience. He
tracked this down to being the locked flush during a pagefault as being
the culprit hogging the struct_mutex and so blocking any other user from
proceeding. Stalling on a pagefault is bad behaviour on userspace's
part, for one it means that they are ignoring the coherency rules on
pointer access through the GTT, but fortunately we can apply the same
trick as the set-to-domain ioctl to do a lightweight, nonblocking flush
of outstanding rendering first.

"Prior to the patch it looks like this
(this one testrun does not show the 20ms+ I've seen occasionally)

4.99 ms 2.36 ms 31360 __wait_seqno i915_wait_seqno i915_gem_object_wait_rendering i915_gem_object_set_to_gtt_domain i915_gem_fault __do_fault handle_
+pte_fault handle_mm_fault __do_page_fault do_page_fault page_fault
4.99 ms 2.75 ms 107751 __wait_seqno i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
4.99 ms 1.63 ms 1666 i915_mutex_lock_interruptible i915_gem_fault __do_fault handle_pte_fault handle_mm_fault __do_page_fault do_page_fault page_fa
+ult
4.93 ms 2.45 ms 980 i915_mutex_lock_interruptible intel_crtc_page_flip drm_mode_page_flip_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_
+sysret
4.89 ms 2.20 ms 3283 i915_mutex_lock_interruptible i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
4.34 ms 1.66 ms 1715 i915_mutex_lock_interruptible i915_gem_pwrite_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
3.73 ms 3.73 ms 49 i915_mutex_lock_interruptible i915_gem_set_domain_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
3.17 ms 0.33 ms 931 i915_mutex_lock_interruptible i915_gem_madvise_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
2.97 ms 0.43 ms 1029 i915_mutex_lock_interruptible i915_gem_busy_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
2.55 ms 0.51 ms 735 i915_gem_get_tiling drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret

After the patch it looks like this:

4.99 ms 2.14 ms 22212 __wait_seqno i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
4.86 ms 0.99 ms 14170 __wait_seqno i915_gem_object_wait_rendering__nonblocking i915_gem_fault __do_fault handle_pte_fault handle_mm_fault __do_page_
+fault do_page_fault page_fault
3.59 ms 1.31 ms 325 i915_gem_get_tiling drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
3.37 ms 3.37 ms 65 i915_mutex_lock_interruptible i915_gem_wait_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
2.58 ms 2.58 ms 65 i915_mutex_lock_interruptible i915_gem_do_execbuffer.isra.23 i915_gem_execbuffer2 drm_ioctl i915_compat_ioctl compat_sys_ioctl
+ia32_sysret
2.19 ms 2.19 ms 65 i915_mutex_lock_interruptible intel_crtc_page_flip drm_mode_page_flip_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_
+sysret
2.18 ms 2.18 ms 65 i915_mutex_lock_interruptible i915_gem_busy_ioctl drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret
1.66 ms 1.66 ms 65 i915_gem_set_tiling drm_ioctl i915_compat_ioctl compat_sys_ioctl ia32_sysret

It may not look like it, but this is quite a large difference, and I've
been unable to reproduce > 5 msec delays at all, while before they do
happen (just not in the trace above)."

gem_gtt_hog on an old Pineview (GMA3150),
before: 4969.119ms
after: 4122.749ms

Reported-by: Arjan van de Ven <arjan.van.de.ven@intel.com>
Testcase: igt/gem_gtt_hog
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# bd9b6a4e 10-Feb-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Downgrade *ERROR* message for invalid user input

When we detect that the user passed along an invalid handle or object,
we emit a warning as an aide for debugging. Since these are indeed only
for debugging user triggerable errors (and the errors are reported back
to userspace by the errno), the messages should only be at the debug
level and not claiming that there is a catastrophic error in the
driver/hardware.

References: https://bugs.freedesktop.org/show_bug.cgi?id=74704
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 3d13ef2e 07-Feb-2014 Damien Lespiau <damien.lespiau@intel.com>

drm/i915: Always use INTEL_INFO() to access the device_info structure

If we make sure that all the dev_priv->info usages are wrapped by
INTEL_INFO(), we can easily modify the ->info field to be structure and
not a pointer while keeping the const protection in the INTEL_INFO()
macro.

v2: Rebased onto latest drm-nightly

Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8c99e57d 31-Jan-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Treat using a purged buffer as a source of EFAULT

Since a purged buffer is one without any associated pages, attempting to
use it should generate EFAULT rather than EINVAL, as it is not strictly
an invalid parameter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 45d67817 31-Jan-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Convert EFAULT into a silent SIGBUS

EFAULT will be a possible return code where backing storage is
transient, such after it is purged by madvise. As such it is to be
expected and so should not trigger a WARN inside i915_gem_fault() but be
converted silently to SIGBUS.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e3848694 31-Jan-2014 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: release mutex in i915_gem_init()'s error path

Found with smatch.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 939fd762 30-Jan-2014 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Get rid of acthd based guilty batch search

As we seek the guilty batch using request and hangcheck
score, this code is not needed anymore.

v2: Rebase. Passing dev_priv instead of getting it from last_ring

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b6b0fac0 30-Jan-2014 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Use hangcheck score to find guilty context

With full ppgtt using acthd is not enough to find guilty
batch buffer. We get multiple false positives as acthd is
per vm.

Instead of scanning which vm was running on a ring,
to find corressponding context, use a different, simpler,
strategy of finding batches that caused gpu hang:

If hangcheck has declared ring to be hung, find first non complete
request on that ring and claim it was guilty.

v2: Rebase

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73652
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 3fac8978 30-Jan-2014 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Tune down debug output when context is banned

If we have stopped rings then we know that test is running
so no need for spam. In addition, only spam when default
context gets banned.

v2: - make sure default context ban gets shown (Chris)
- use helper for checking for default context, everywhere (Chris)

v3: - dont be quiet when debug is set (Ben, Daniel)

Reference: https://bugs.freedesktop.org/show_bug.cgi?id=73652
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 44e2c070 30-Jan-2014 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Use i915_hw_context to set reset stats

With full ppgtt support drm_i915_file_private gained knowledge
about the default context. Also reset stats are now inside
i915_hw_context so we can use proper abstraction.

v2: Move BUG_ON and WARN_ON to more proper locations (Ben)

v3: Pass dev directly to i915_context_is_banned to avoid the need to
dereference ctx->last_ring. Spotted by Mika when checking my
s/BUG/WARN/ change, I've missed this ->last_ring dereference.

Suggested-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> (v2)
Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v2)
[danvet: s/BUG/WARN/]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6ba844b0 22-Jan-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: GEN7_MSG_CONTROL is ivb-only

At least I couldn't find it in the Haswell Bspec any more and we've
tried to test-boot a Haswell machine with num_pipes forced to 0 (i.e.
hit the PCH_NOP path) and the unclaimed register logic complained.

So restrict this dance to just ivb platforms.

v2: Art pointed out that the bits simply moved on hsw+

v3: Buy code terseneness with a notch of sublety as suggested by
Chris.

v4: Frob the right bit, spotted by Art.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arthur Ranyan <arthur.j.runyan@intel.com>
Cc: Dave Airlie <airlied@gmail.com>
Reviewed-by: Art Runyan <arthur.j.runyan@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d330a953 21-Jan-2014 Jani Nikula <jani.nikula@intel.com>

drm/i915: move module parameters into a struct, in a new file

With 20+ module parameters, I think referring to them via a struct
improves clarity over just having a bunch of globals. While at it, move
the parameter initialization and definitions into a new file
i915_params.c to reduce clutter in i915_drv.c.

Apart from the ill-named i915_enable_rc6, i915_enable_fbc and
i915_enable_ppgtt parameters, for which we lose the "i915_" prefix
internally, the module parameters now look the same both on the kernel
command line and in code. For example, "i915.modeset".

The downsides of the change are losing static on a couple of variables
and not having the initialization and module_param_named() right next to
each other. On the other hand, all module parameters are now defined in
one place at i915_params.c. Plus you can do this to find all module
parameter references:

$ git grep "i915\." -- drivers/gpu/drm/i915

v2:
- move the definitions into a new file
- s/i915_params/i915/
- make i915_try_reset i915.reset, for consistency

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f72d21ed 09-Jan-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Place the Global GTT VM first in the list of VM

This is useful for debugging as we then know that the first entry is
always the global GTT, and all later entries the per-process GTT VM.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5dce5b93 20-Jan-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wait for completion of pending flips when starved of fences

On older generations (gen2, gen3) the GPU requires fences for many
operations, such as blits. The display hardware also requires fences for
scanouts and this leads to a situation where an arbitrary number of
fences may be pinned by old scanouts following a pageflip but before we
have executed the unpin workqueue. This is unpredictable by userspace
and leads to random EDEADLK when submitting an otherwise benign
execbuffer. However, we can detect when we have an outstanding flip and
so cause userspace to wait upon their completion before finally
declaring that the system is starved of fences. This is really no worse
than forcing the GPU to stall waiting for older execbuffer to retire and
release their fences before we can reallocate them for the next
execbuffer.

v2: move the test for a pending fb unpin to a common routine for
later reuse during eviction

Reported-and-tested-by: dimon@gmx.net
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73696
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1d62beea 01-Jan-2014 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915/ppgtt: Defer request freeing on reset

We need to defer the free request until the object/vma is capable of
being freed - or else we have a problem when we try to destroy the
context.

The exact same issue is described and fixed here:
commit e20780439b26ba95aeb29d3e27cd8cc32bc82a4c
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Fri Dec 6 14:11:22 2013 -0800

drm/i915: Defer request freeing

I had this fix previously, but decided not to keep it for some reason I
can no longer remember.

gem_reset_stats is a really good test at hitting the problem.

For the inquisitive:
[ 170.516392] ------------[ cut here ]------------
[ 170.517227] WARNING: CPU: 1 PID: 105 at drivers/gpu/drm/drm_mm.c:578 drm_mm_takedown+0x2e/0x30 [drm]()
[ 170.518064] Memory manager not clean during takedown.
[ 170.518941] CPU: 1 PID: 105 Comm: kworker/1:1 Not tainted 3.13.0-rc4-BEN+ #28
[ 170.519787] Hardware name: Hewlett-Packard HP EliteBook 8470p/179B, BIOS 68ICF Ver. F.02 04/27/2012
[ 170.520662] Call Trace:
[ 170.521517] [<ffffffff814f0589>] dump_stack+0x4e/0x7a
[ 170.522373] [<ffffffff81049e6d>] warn_slowpath_common+0x7d/0xa0
[ 170.523227] [<ffffffff81049edc>] warn_slowpath_fmt+0x4c/0x50
[ 170.524079] [<ffffffffa06c414e>] drm_mm_takedown+0x2e/0x30 [drm]
[ 170.524934] [<ffffffffa07213f3>] gen6_ppgtt_cleanup+0x23/0x110
[i915]
[ 170.525777] [<ffffffffa07837ed>] ppgtt_release.part.5+0x24/0x29
[i915]
[ 170.526603] [<ffffffffa071aaa5>] i915_gem_context_free+0x195/0x1a0
[i915]
[ 170.527423] [<ffffffffa071189d>] i915_gem_free_request+0x9d/0xb0
[i915]
[ 170.528247] [<ffffffffa0718af9>] i915_gem_reset+0x1f9/0x3f0 [i915]
[ 170.529065] [<ffffffffa0700cce>] i915_reset+0x4e/0x180 [i915]
[ 170.529870] [<ffffffffa070829d>] i915_error_work_func+0xcd/0x120
[i915]
[ 170.530666] [<ffffffff8106c13a>] process_one_work+0x1fa/0x6d0
[ 170.531453] [<ffffffff8106c0d8>] ? process_one_work+0x198/0x6d0
[ 170.532230] [<ffffffff8106c72b>] worker_thread+0x11b/0x3a0
[ 170.532996] [<ffffffff8106c610>] ? process_one_work+0x6d0/0x6d0
[ 170.533771] [<ffffffff810743ef>] kthread+0xff/0x120
[ 170.534548] [<ffffffff810742f0>] ? insert_kthread_work+0x80/0x80
[ 170.535322] [<ffffffff814f97ac>] ret_from_fork+0x7c/0xb0
[ 170.536089] [<ffffffff810742f0>] ? insert_kthread_work+0x80/0x80
[ 170.536847] ---[ end trace 3d4c12892e42d58f ]---

v2: Whitespace fix. (Chris)

Note: This is a bug that only hits the ppgtt topic branch but I've
figured that doing the request cleanup in this order is generally the
right thing to do.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Add a code comment to clarify what's actually going on since
the lifetime rules aroung ppgtt cleanup are ... fuzzy a best atm. Also
add a note about why we need this.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 86648500 14-Jan-2014 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Tune down reset_stat output from ERROR to debug

This is user-triggerable and hence we should not allow it to spam
dmesg. Also, it upsets the nice dmesg tracking piglit does.

Note that this is just extra debugging information, mostly
unwanted, in case of a hang and that there is a separate message to the
user giving instructions on how to report a bug for a GPU hang.

v2: Add note as suggests in Chris' reply.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72740
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e9103038 07-Jan-2014 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Free requests after object release when retiring requests

Freeing a request triggers the destruction of the context. This needs to
occur after all objects are themselves unbound from the context, and so
the free request needs to occur after the object release during retire.

This tidies up

commit e20780439b26ba95aeb29d3e27cd8cc32bc82a4c
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Fri Dec 6 14:11:22 2013 -0800

drm/i915: Defer request freeing

by simply swapping the order of operations rather than introducing
further complexity - as noted during review.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# bfca0527 18-Dec-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

Revert "drm/i915: Do not allow buffers at offset 0"

This reverts commit 4fe9adbc36097317864bfec3c32047da7c45a2fa.

The patch completely lacks a detailed explanation of what exactly
blows up and how, so is insufficiently justified as a band-aid.

Otoh the justification as a safety measure against userspace botching
up relocations is also fairly weak: If we want real project we need to
at least make the gab big enough that the gpu doesn't scribble over
more important stuff. With 4k screens that would be 32MB.

Also I think this would be much better in conjunction with a (debug)
switch to disable our use of the scratch page.

Hence revert this.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 02f6bccc 18-Dec-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Reject the pin ioctl on gen6+

Especially with ppgtt this kinda stopped making sense. And if we
indeed need this to hack around an issue, we need something that also
works for non-root.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 7e0d96bc 06-Dec-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Use multiple VMs -- the point of no return

As with processes which run on the CPU, the goal of multiple VMs is to
provide process isolation. Specific to GEN, there is also the ability to
map more objects per process (2GB each instead of 2Gb-2k total).

For the most part, all the pipes have been laid, and all we need to do
is remove asserts and actually start changing address spaces with the
context switch. Since prior to this we've converted the setting of the
page tables to a streamed version, this is quite easy.

One important thing to point out (since it'd been hotly contested) is
that with this patch, every context created will have it's own address
space (provided the HW can do it).

v2: Disable BDW on rebase

NOTE: I tried to make this commit as small as possible. I needed one
place where I could "turn everything on" and that is here. It could be
split into finer commits, but I didn't really see much point.

Cc: Eric Anholt <eric@anholt.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4fe9adbc 06-Dec-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Do not allow buffers at offset 0

This is primarily a band aid for an unexplainable error in
gem_reloc_vs_gpu/forked-faulting-reloc-thrashing. Essentially as soon as
a relocated buffer (which had a non-zero presumed offset) moved to
offset 0, something goes bad. Since I have been unable to solve this,
and potentially this is a good thing to do anyway, since many things can
accidentally write to offset 0, why not?

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e2078043 06-Dec-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Defer request freeing

With context destruction, we always want to be able to tear down the
underlying address space. This is invoked on the last unreference to the
context which could happen before we've moved all objects to the
inactive list. To enable a clean tear down the address space, make sure
to process the request free lastly.

Without this change, we cannot guarantee to we don't still have active
objects in the VM.

As an example of a failing case:
CTX-A is created, count=1
CTX-A is used during execbuf
does a context switch count = 2
and add_request count = 3
CTX B runs, switches, CTX-A count = 2
CTX-A is destroyed, count = 1
retire requests is called
free_request from CTX-A, count = 0 <--- free context with active object

As mentioned above, by doing the free request after processing the
active list, we can avoid this case.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 41bde553 06-Dec-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Get context early in execbuf

We need to have the address space when reserving space for the objects.
Since the address space and context are tied together, and reserve
occurs before context switch (for good reason), we must lookup our
context earlier in the process.

This leaves some room for optimizations where we no longer need to use
ctx_id in certain places. This will be addressed in a subsequent patch.

Important tricky bit:
Because slow relocations during execbuffer drop struct_mutex

Perhaps it would be best to acquire the reference when we get the
context, but I'll save that for another day (note I have written the
patch before, and I found the changes required to be uglier than this).

Note that since we currently access everything via context id, and not
the data structure this is fine, though not desirable. The next change
attempts to get the context only once via the context ID idr lookup, and
as such, the following can happen:

CTX-A is created, refcount = 1
CTX-A execbuf, mutex dropped
close IOCTL called on CTX-A, refcount = 0
CTX-A resumes in execbuf.

v2: Rebased on top of
commit b6359918b885da7c7b58c050674278dbd06020ab
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Wed Oct 30 15:44:16 2013 +0200

drm/i915: add i915_get_reset_stats_ioctl

v3: Rebased on top of
commit 25b3dfc87bff80317d67ddd2cd4cfb91e6fe7d79
Author: Mika Westerberg <mika.westerberg@linux.intel.com>
Date: Tue Nov 12 11:57:30 2013 +0200

Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Tue Nov 26 16:14:33 2013 +0200

drm/i915: check context reset stats before relocations

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c482972a 06-Dec-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Piggy back hangstats off of contexts

To simplify the codepaths somewhat, we can simply always create a
context. Contexts already keep hangstat information. This prevents us
from having to differentiate at other parts in the code.

There is allocation overhead, but it should not be measurable.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# bdf4fd7e 06-Dec-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Do aliasing PPGTT init with contexts

We have a default context which suits the aliasing PPGTT well. Tie them
together so it looks like any other context/PPGTT pair. This makes the
code cleaner as it won't have to special case aliasing as often.

The patch has one slightly tricky part in the default context creation
function. In the future (and on aliased setup) we create a new VM for a
context (potentially). However, if we have aliasing PPGTT, which occurs
at this point in time for all platforms GEN6+, we can simply manage the
refcounting to allow things to behave as normal. Now is a good time to
recall that the aliasing_ppgtt doesn't have a real VM, it uses the GGTT
drm_mm.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a3d67d23 06-Dec-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: PPGTT vfuncs should take a ppgtt argument

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 2fa48d8d 06-Dec-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Split context enabling from init

We **need** to do this for exactly 1 reason, because we want to embed a
PPGTT into the context, but we don't want to special case the default
context.

To achieve that, we must be able to initialize contexts after the GTT is
setup (so we can allocate and pin the default context's BO), but before
the PPGTT and rings are initialized. This is because, currently, context
initialization requires ring usage. We don't have rings until after the
GTT is setup. If we split the enabling part of context initialization,
the part requiring the ringbuffer, we can untangle this, and then later
embed the PPGTT

Incidentally this allows us to also adhere to the original design of
context init/fini in future patches: they were only ever meant to be
called at driver load and unload.

v2: Move hw_contexts_disabled test in i915_gem_context_enable() (Chris)

v3: BUG_ON after checking for disabled contexts. Or else it blows up pre
gen6 (Ben)

v4: Forward port
Modified enable for each ring, since that patch is earlier in the series
Dropped ring arg from create_default_context so it can be used by others

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# acce9ffa 06-Dec-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Better reset handling for contexts

This patch adds to changes for contexts on reset:
Sets last context to default - this will prevent the context switch
happening after a reset. That switch is not possible because the
rings are hung during reset and context switch requires reset. This
behavior will need to be reworked in the future, but this is what we
want for now.

In the future, we'll also want to reset the guilty context to
uninitialized. We should wait for ARB_Robustness related code to land
for that.

This is somewhat for paranoia. Because we really don't know what the
GPU was doing when it hung, or the state it was in (mid context write,
for example), later restoring the context is a bad idea. By setting the
flag to not initialized, the next load of that context will not restore
the state, and thus on the subsequent switch away from the context will
overwrite the old data.

NOTE: This code needs a fixup when we actually have multiple VMs. The
issue that can occur is inactive objects in a VM will need to be
destroyed before the last context unref. This can now happen via the
fake switch introduced in this patch (and it other ways in the future)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e422b888 06-Dec-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Add a context open function

We'll be doing a bit more stuff with each file, so having our own open
function should make things clean.

This also allows us to easily add conditionals for stuff we don't want
to do when we don't have HW contexts.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6f65e29a 06-Dec-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Create bind/unbind abstraction for VMAs

To sum up what goes on here, we abstract the vma binding, similarly to
the previous object binding. This helps for distinguishing legacy
binding, versus modern binding. To keep the code churn as minimal as
possible, I am leaving in insert_entries(). It serves as the per
platform pte writing basically. bind_vma and insert_entries do share a
lot of similarities, and I did have designs to combine the two, but as
mentioned already... too much churn in an already massive patchset.

What follows are the 3 commits which existed discretely in the original
submissions. Upon rebasing on Broadwell support, it became clear that
separation was not good, and only made for more error prone code. Below
are the 3 commit messages with all their history.

drm/i915: Add bind/unbind object functions to VMA
drm/i915: Use the new vm [un]bind functions
drm/i915: reduce vm->insert_entries() usage

drm/i915: Add bind/unbind object functions to VMA

As we plumb the code with more VM information, it has become more
obvious that the easiest way to deal with bind and unbind is to simply
put the function pointers in the vm, and let those choose the correct
way to handle the page table updates. This change allows many places in
the code to simply be vm->bind, and not have to worry about
distinguishing PPGTT vs GGTT.

Notice that this patch has no impact on functionality. I've decided to
save the actual change until the next patch because I think it's easier
to review that way. I'm happy to squash the two, or let Daniel do it on
merge.

v2:
Make ggtt handle the quirky aliasing ppgtt
Add flags to bind object to support above
Don't ever call bind/unbind directly for PPGTT until we have real, full
PPGTT (use NULLs to assert this)
Make sure we rebind the ggtt if there already is a ggtt binding. This
happens on set cache levels.
Use VMA for bind/unbind (Daniel, Ben)

v3: Reorganize ggtt_vma_bind to be more concise and easier to read
(Ville). Change logic in unbind to only unbind ggtt when there is a
global mapping, and to remove a redundant check if the aliasing ppgtt
exists.

v4: Make the bind function a bit smarter about the cache levels to avoid
unnecessary multiple remaps. "I accept it is a wart, I think unifying
the pin_vma / bind_vma could be unified later" (Chris)
Removed the git notes, and put version info here. (Daniel)

v5: Update the comment to not suck (Chris)

v6:
Move bind/unbind to the VMA. It makes more sense in the VMA structure
(always has, but I was previously lazy). With this change, it will allow
us to keep a distinct insert_entries.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915: Use the new vm [un]bind functions

Building on the last patch which created the new function pointers in
the VM for bind/unbind, here we actually put those new function pointers
to use.

Split out as a separate patch to aid in review. I'm fine with squashing
into the previous patch if people request it.

v2: Updated to address the smart ggtt which can do aliasing as needed
Make sure we bind to global gtt when mappable and fenceable. I thought
we could get away without this initialy, but we cannot.

v3: Make the global GTT binding explicitly use the ggtt VM for
bind_vma(). While at it, use the new ggtt_vma helper (Chris)

At this point the original mailing list thread diverges. ie.

v4^:
use target_obj instead of obj for gen6 relocate_entry
vma->bind_vma() can be called safely during pin. So simply do that
instead of the complicated conditionals.
Don't restore PPGTT bound objects on resume path
Bug fix in resume path for globally bound Bos
Properly handle secure dispatch
Rebased on vma bind/unbind conversion

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>

drm/i915: reduce vm->insert_entries() usage

FKA: drm/i915: eliminate vm->insert_entries()

With bind/unbind function pointers in place, we no longer need
insert_entries. We could, and want, to remove clear_range, however it's
not totally easy at this point. Since it's used in a couple of place
still that don't only deal in objects: setup, ppgtt init, and restore
gtt mappings.

v2: Don't actually remove insert_entries, just limit its usage. It will
be useful when we introduce gen8. It will always be called from the vma
bind/unbind.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d7f46fc4 06-Dec-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Make pin count per VMA

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# feb822cf 06-Dec-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Handle inactivating objects for all VMAs

This came from a patch called, "drm/i915: Move active to vma"

When moving an object to the inactive list, we do it for all VMs for
which the object is bound.

The primary difference from that patch is this time around we don't not
track 'active' per vma, but rather by object. Therefore, we only need
one unref.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c39538a8 06-Dec-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Takedown drm_mm on failed gtt setup

This was found by code inspection. If the GTT setup fails then we are
left without properly tearing down the drm_mm.

Hopefully this never happens.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6e164c33 06-Dec-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Allow ggtt lookups to not WARN

To be able to effectively use the GGTT object lookup function, we don't
want to warn when there is no GGTT mapping. Let the caller deal with it
instead.

Originally, I had intended to have this behavior, and has not
introduced the WARN. It was introduced during review with the addition
of the follow commit

commit 5c2abbeab798154166d42fce4f71790caa6dd9bc
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Tue Sep 24 09:57:57 2013 -0700

drm/i915: Provide a cheap ggtt vma lookup

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6f425321 06-Dec-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Don't unconditionally try to deref aliasing ppgtt

Since the beginning, the functions which try to properly reference the
aliasing PPGTT have deferences a potentially null aliasing_ppgtt member.
Since the accessors are meant to be global, this will not do.

Introduced originally in:
commit a70a3148b0c61cb7c588ea650db785b261b378a3
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Wed Jul 31 16:59:56 2013 -0700

drm/i915: Make proper functions for VMs

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 48018a57 13-Dec-2013 Paulo Zanoni <paulo.r.zanoni@intel.com>

drm/i915: release the GTT mmaps when going into D3

So we'll get a fault when someone tries to access the mmap, then we'll
wake up from D3.

v2: - Rebase
v3: - Use gtt active/inactive

Testcase: igt/pm_pc8/gem-mmap-gtt
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[danvet: Add comment + WARN as discussed with Paulo on irc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 168c3f21 12-Dec-2013 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: dont call irq_put when irq test is on

If test is running, irq_get was not called so we should gain
balance by not doing irq_put

"So the rule is: if you access unlocked values, you use ACCESS_ONCE().
You don't say "but it can't matter". Because you simply don't know."
-- Linus

v2: use local variable so it can't change during test (Chris)

v3: update commit msg and use ACCESS_ONCE (Ville)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 47e9766d 10-Dec-2013 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Fix timeout with missed interrupts in __wait_seqno

Commit 094f9a54e355 ("drm/i915: Fix __wait_seqno to use true infinite
timeouts") added support for __wait_seqno to detect missing interrupts and
go around them by polling. As there is also timeout detection in
__wait_seqno, the polling and timeout detection were done with the same
timer.

When there has been missed interrupts and polling is needed, the timer is
set to trigger in (now + 1) jiffies in future, instead of the caller
specified timeout.

Now when io_schedule() returns, we calculate the jiffies left to timeout
using the timer expiration value. As the current jiffies is now bound to be
always equal or greater than the expiration value, the timeout_jiffies will
become zero or negative and we return -ETIME to caller even tho the
timeout was never reached.

Fix this by decoupling timeout calculation from timer expiration.

v2: Commit message with some sense in it (Chris Wilson)

v3: add parenthesis on timeout_expire calculation

v4: don't read jiffies without timeout (Chris Wilson)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4db080f9 04-Dec-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix erroneous dereference of batch_obj inside reset_status

As the rings may be processed and their requests deallocated in a
different order to the natural retirement during a reset,

/* Whilst this request exists, batch_obj will be on the
* active_list, and so will hold the active reference. Only when this
* request is retired will the the batch_obj be moved onto the
* inactive_list and lose its active reference. Hence we do not need
* to explicitly hold another reference here.
*/

is violated, and the batch_obj may be dereferenced after it had been
freed on another ring. This can be simply avoided by processing the
status update prior to deallocating any requests.

Fixes regression (a possible OOPS following a GPU hang) from
commit aa60c664e6df502578454621c3a9b1f087ff8d25
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Wed Jun 12 15:13:20 2013 +0300

drm/i915: find guilty batch buffer on ring resets

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: Add the code comment Chris supplied.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f65c9168 27-Nov-2013 Paulo Zanoni <paulo.r.zanoni@intel.com>

drm/i915: add runtime put/get calls at the basic places

If I add code to enable runtime PM on my Haswell machine, start a
desktop environment, then enable runtime PM, these functions will
complain that they're trying to read/write registers while the
graphics card is suspended.

v2: - Simplify i915_gem_fault changes.

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[danvet: Drop the hunk in i915_hangcheck_elapsed, it's the wrong thing
to do.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 70903c3b 04-Dec-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix ordering of unbind vs unpin pages

It is useful to assert that if the object is bound, then it must have
its pages pinned to prevent the shrinker from reaping its backing store.
This is even more useful with the introduction of real-ppgtt whereupon
we may have the object bound into several vma, with each instance
pinning the backing store. This assertion breaks down during unbind
where we unpinned the backing store before decoupling the vma binding.
This can be fixed with a trivial reording of the unbind sequence, which
reinforces the

pin pages
bind to vma
...
unbind from vma
unpin pages

concept.

v2: Bonus comment

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 0bf21347 29-Nov-2013 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: MI_PREDICATE_RESULT_2 is HSW only

The MI_PREDICATE_RESULT_2 register exits only on HSW. On other
platforms the same offset is either reserved, or contains some
other register. So write the register only on HSW.

This regression has been introduced in

commit 9435373ef8870e0a84b6fec0ad89b952bf3097fa
Author: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Date: Wed Aug 28 16:45:46 2013 -0300

drm/i915: Report enabled slices on Haswell GT3

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[danvet: Add regression notice.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 31a5336e 02-Nov-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915/bdw: Swizzling support

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5ab31333 02-Nov-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915/bdw: Fences on gen8 look just like gen7

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8245be31 06-Nov-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Require HW contexts (when possible)

v2: Fixed the botched locking on init_hw failure in i915_reset (Ville)
Call cleanup_ringbuffer on failed context create in init_hw (Ville)

v3: Add dev argument ti clean_ringbuffer

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# de45eaf7 18-Oct-2013 Paulo Zanoni <paulo.r.zanoni@intel.com>

drm/i915: fix open-coded DIV_ROUND_UP

Use the nice Kernel macro, it makes the code much more readable.

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# aa5f8021 10-Oct-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Use unsigned long for obj->user_pin_count

At least on linux sizeof(long) == sizeof(void*) and the thinking
is that you can grab about as many references as there's memory.

Doesn't really matter, just a bit of OCD since the fixed size data
type in a pure in-kernel datastructure look off.

v2: Ville asked for an overflow check since no one prevents userspace
from incrementing the pin count forever.

v3: s/INT/LONG/, noticed by Chris.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 45c5f202 16-Oct-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Disable all GEM timers and work on unload

We have two once very similar functions, i915_gpu_idle() and
i915_gem_idle(). The former is used as the lower level operation to
flush work on the GPU, whereas the latter is the high level interface to
flush the GEM bookkeeping in addition to flushing the GPU. As such
i915_gem_idle() also clears out the request and activity lists and
cancels the delayed work. This is what we need for unloading the driver,
unfortunately we called i915_gpu_idle() instead.

In the process, make sure that when cancelling the delayed work and
timer, which is synchronous, that we do not hold any locks to prevent a
deadlock if the work item is already waiting upon the mutex. This
requires us to push the mutex down from the caller to i915_gem_idle().

v2: s/i915_gem_idle/i915_gem_suspend/

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70334
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: xunx.fang@intel.com
[danvet: Only set ums.suspended for !kms as discussed earlier. Chris
noticed that this slipped through.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 3d57e5bd 14-Oct-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Do a fuller init after reset

I had this lying around from he original PPGTT series, and thought we
might try to get it in by itself.

It's convenient to just call i915_gem_init_hw at reset because we'll be
adding new things to that function, and having just one function to call
instead of reimplementing it in two places is nice.

In order to accommodate we cleanup ringbuffers in order to bring them
back up cleanly. Optionally, we could also teardown/re initialize the
default context but this was causing some problems on reset which I
wasn't able to fully debug, and is unnecessary with the previous context
init/enable split.

This essentially reverts:
commit 8e88a2bd5987178d16d53686197404e149e996d9
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Tue Jun 19 18:40:00 2012 +0200

drm/i915: don't call modeset_init_hw in i915_reset

It seems to work for me on ILK now. Perhaps it's due to:
commit 8a5c2ae753c588bcb2a4e38d1c6a39865dbf1ff3
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date: Thu Mar 28 13:57:19 2013 -0700

drm/i915: fix ILK GPU reset for render

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 3bbbe706 07-Oct-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: check that the i965g/gm 4G limit is really obeyed

In truly crazy circumstances shmem might give us the wrong type of
page. So be a bit paranoid and double check this.

Reviewer: Damien Lespiau <damien.lespiau@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
References: http://lkml.org/lkml/2011/7/11/238
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d9973b43 04-Oct-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix type mismatch and accounting in i915_gem_shrink

The interface uses an unsigned long, and we can use the unsigned counter
throughout our code, so do so. In the process, we notice one instance
where the shrink count is based on a heuristic rather than the result,
and another where we ask for too many pages to be purged.

v2: nr_to_scan needs to be promoted to a long as well, so just use
sc->nr_to_scan directly.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5035c275 04-Oct-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Call io_schedule() whilst whilsting for the GPU

Since we are waiting upon IO completion, inform the kernel through use
of the io_schedule() call rather than the regular schedule(). This
should allow the kernel to make better decisions regarding scheduling
and power management.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 16eb5f43 02-Oct-2013 David Herrmann <dh.herrmann@gmail.com>

drm: kill ->gem_init_object() and friends

All drivers embed gem-objects into their own buffer objects. There is no
reason to keep drm_gem_object_alloc(), gem->driver_private and
->gem_init_object() anymore.

New drivers are highly encouraged to do the same. There is no benefit in
allocating gem-objects separately.

Cc: Dave Airlie <airlied@gmail.com>
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Ben Skeggs <skeggsb@gmail.com>
Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# b29c19b6 25-Sep-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Boost RPS frequency for CPU stalls

If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.

This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.

In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)

Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.

Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.

v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.

v3: Cater for front-buffer rendering with manual throttling.

v4: Tidy up.

v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.

Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 094f9a54 25-Sep-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix __wait_seqno to use true infinite timeouts

When we switched to always using a timeout in conjunction with
wait_seqno, we lost the ability to detect missed interrupts. Since, we
have had issues with interrupts on a number of generations, and they are
required to be delivered in a timely fashion for a smooth UX, it is
important that we do log errors found in the wild and prevent the
display stalling for upwards of 1s every time the seqno interrupt is
missed.

Rather than continue to fix up the timeouts to work around the interface
impedence in wait_event_*(), open code the combination of
wait_event[_interruptible][_timeout], and use the exposed timer to
poll for seqno should we detect a lost interrupt.

v2: In order to satisfy the debug requirement of logging missed
interrupts with the real world requirments of making machines work even
if interrupts are hosed, we revert to polling after detecting a missed
interrupt.

v3: Throw in a debugfs interface to simulate broken hw not reporting
interrupts.

v4: s/EGAIN/EAGAIN/ (Imre)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
[danvet: Don't use the struct typedef in new code.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b52b89da 25-Sep-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add a tracepoint for using a semaphore

So that we can find the callers who introduce a ring stall. A single
ring stall is not too unwelcome, the right issue becomes when they start
to interlock and prevent any concurrent work. That, however, is a little
tricker to detect with a mere tracepoint!

v2: Rebrand it as a ring event, rather than an object event.
v3: Include the seqno in the tracepoint for posterity or something.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e2d05a8b 24-Sep-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Convert active API to VMA

Even though we track object activity and not VMA, because we have the
active_list be based on the VM, it makes the most sense to use VMAs in
the APIs.

NOTE: Daniel intends to eventually rip out active/inactive LRUs, but for
now, leave them be.

v2: Remove leftover hunk from the previous patch which didn't keep
i915_gem_object_move_to_active. That patch had to rely on the ring to
get the dev instead of the obj. (Chris)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5c2abbea 24-Sep-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Provide a cheap ggtt vma lookup

"We do fairly often lookup the ggtt vma for an obj." - Chris Wilson. As
such, provide a function to offer slightly cheaper access to the vma.
Not performance tested. By my quick estimation it saves at least 3
pointer dereferences from the existing mechanism.

This patch mostly matches code from Chris in
<20130911221430.GB7825@nuc-i3427.alporthouse.com>

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f7403347 13-Sep-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Do not unlock upon error in i915_gem_idle()

We never took the lock ourselves and all callers expect the struct_mutex
to be locked upon return (be it success or error), thereore dropping the
lock along the error paths looks to be a vestigial error from

commit db1b76ca6a79c774074ae87bee7afc0825a478f5
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Tue Jul 9 16:51:37 2013 +0200

drm/i915: don't frob mm.suspended when not using ums

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b14c5679 18-Sep-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: use pointer = k[cmz...]alloc(sizeof(*pointer), ...) pattern

Done while reviewing all our allocations for fubar. Also a few errant
cases of lacking () for the sizeof operator - just a bit of OCD.

I've left out all the conversions that also should use kcalloc from
this patch (it's only 2).

Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d3227046 25-Sep-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Fix up usage of SHRINK_STOP

In

commit 81e49f811404f428a9d9a63295a0c267e802fa12
Author: Glauber Costa <glommer@openvz.org>
Date: Wed Aug 28 10:18:13 2013 +1000

i915: bail out earlier when shrinker cannot acquire mutex

SHRINK_STOP was added to tell the core shrinker code to bail out and
go to the next shrinker since the i915 shrinker couldn't acquire
required locks. But the SHRINK_STOP return code was added to the
->count_objects callback and not the ->scan_objects callback as it
should have been, resulting in tons of dmesg noise like

shrink_slab: i915_gem_inactive_scan+0x0/0x9c negative objects to delete nr=-xxxxxxxxx

Fix discusssed with Dave Chinner.

References: http://www.spinics.net/lists/intel-gfx/msg33597.html
Reported-by: Knut Petersen <Knut_Petersen@t-online.de>
Cc: Knut Petersen <Knut_Petersen@t-online.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 040d2baa 19-Sep-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: s/HAS_L3_GPU_CACHE/HAS_L3_DPF

We'd only ever used this define to denote whether or not we have the
dynamic parity feature (DPF) and never to determine whether or not L3
exists. Baytrail is a good example of where L3 exists, and not DPF.

This patch provides clarify in the code for future use cases which might
want to actually query whether or not L3 exists.

v2: Add /* DPF == dynamic parity feature */

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a33afea5 17-Sep-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Keep a list of all contexts

I have implemented this patch before without creating a separate list
(I'm having trouble finding the links, but the messages ids are:
<1364942743-6041-2-git-send-email-ben@bwidawsk.net>
<1365118914-15753-9-git-send-email-ben@bwidawsk.net>)

However, the code is much simpler to just use a list and it makes the
code from the next patch a lot more pretty.

As you'll see in the next patch, the reason for this is to be able to
specify when a context needs to get L3 remapping. More details there.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c3787e2e 17-Sep-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Make l3 remapping use the ring

Using LRI for setting the remapping registers allows us to stream l3
remapping information. This is necessary to handle per context remaps as
we'll see implemented in an upcoming patch.

Using the ring also means we don't need to frob the DOP clock gating
bits.

v2: Add comment about lack of worry for concurrent register access
(Daniel)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[danvet: Bikeshed the comment a bit by doing a s/XXX/Note - there's
nothing to fix.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 35a85ac6 19-Sep-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Add second slice l3 remapping

Certain HSW SKUs have a second bank of L3. This L3 remapping has a
separate register set, and interrupt from the first "slice". A slice is
simply a term to define some subset of the GPU's l3 cache. This patch
implements both the interrupt handler, and ability to communicate with
userspace about this second slice.

v2: Remove redundant check about non-existent slice.
Change warning about interrupts of unknown slices to WARN_ON_ONCE
Handle the case where we get 2 slice interrupts concurrently, and switch
the tracking of interrupts to be non-destructive (all Ville)
Don't enable/mask the second slice parity interrupt for ivb/vlv (even
though all docs I can find claim it's rsvd) (Ville + Bryan)
Keep BYT excluded from L3 parity

v3: Fix the slice = ffs to be decremented by one (found by Ville). When
I initially did my testing on the series, I was using 1-based slice
counting, so this code was correct. Not sure why my simpler tests that
I've been running since then didn't pick it up sooner.

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 571c608d 12-Sep-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: kill set_need_resched

This is just a remnant from the old days when our reset handling was
horribly racy, suffered from terribly locking issues and often happily
live-locked. Those days are now gone so we can drop the hacks and just
rip the reschedule-point out.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 23f54483 11-Sep-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Synchronize pread/pwrite with wait_rendering

lifted from Daniel:
pread/pwrite isn't about the object's domain at all, but purely about
synchronizing for outstanding rendering. Replacing the call to
set_to_gtt_domain with a wait_rendering would imo improve code
readability. Furthermore we could pimp pread to only block for
outstanding writes and not for reads.

Since you're not the first one to trip over this: Can I volunteer you
for a follow-up patch to fix this?

v2: Switch the pwrite patch to use \!read_only. This was a typo in the
original code. (Chris, Daniel)

Recommended-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Fix up the logic fumble - wait_rendering has a bool readonly
paramater, set_to_gtt_domain otoh has bool write. Breakage reported by
Jani Nikula, I've double-checked that igt/gem_concurrent_blt/prw-*
would have caught this.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 81e49f81 27-Aug-2013 Glauber Costa <glommer@openvz.org>

i915: bail out earlier when shrinker cannot acquire mutex

The main shrinker driver will keep trying for a while to free objects if
the returned value from the shrink scan procedure is 0. That means "no
objects now", but a retry could very well succeed.

But what we should say here is a different thing: that it is impossible to
shrink, and we would better bail out soon. We find this behavior more
appropriate for the case where the lock cannot be taken. Specially given
the hammer behavior of the i915: if another thread is already shrinking,
we are likely not to be able to shrink anything anyway when we finally
acquire the mutex.

Signed-off-by: Glauber Costa <glommer@openvz.org>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 7dc19d5a 27-Aug-2013 Dave Chinner <dchinner@redhat.com>

drivers: convert shrinkers to new count/scan API

Convert the driver shrinkers to the new API. Most changes are compile
tested only because I either don't have the hardware or it's staging
stuff.

FWIW, the md and android code is pretty good, but the rest of it makes me
want to claw my eyes out. The amount of broken code I just encountered is
mind boggling. I've added comments explaining what is broken, but I fear
that some of the code would be best dealt with by being dragged behind the
bike shed, burying in mud up to it's neck and then run over repeatedly
with a blunt lawn mower.

Special mention goes to the zcache/zcache2 drivers. They can't co-exist
in the build at the same time, they are under different menu options in
menuconfig, they only show up when you've got the right set of mm
subsystem options configured and so even compile testing is an exercise in
pulling teeth. And that doesn't even take into account the horrible,
broken code...

[glommer@openvz.org: fixes for i915, android lowmem, zcache, bcache]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 5a1d5eb0 10-Sep-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove the double-list iteration from bound_any()

The purpose of the function is to find out whether the object is still
bound in any address space. This can be easily checked by looking at the
vma currently associated with the object, rather than asking if any of
the global address spaces have an active vma on the object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# be62acb4 30-Aug-2013 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: ban badly behaving contexts

Now when we have mechanism in place to track which context
was guilty of hanging the gpu, it is possible to punish
for bad behaviour.

If context has recently submitted a faulty batchbuffers guilty of
gpu hang and submits another batch which hangs gpu in quick
succession, ban it permanently. If ctx is banned, no more
batchbuffers will be queued for execution.

There is no need for global wedge machinery anymore and
it would be unwise to wedge the whole gpu if we have multiple
hanging batches queued for execution. Instead just ban
the guilty ones and carry on.

v2: Store guilty ban status bool in gpu_error instead of pointers
that might become danling before hang is declared.

v3: Use return value for banned status instead of stashing state
into gpu_error (Chris Wilson)

v4: - rebase on top of fixed hang stats api
- add define for ban period
- rename commit and improve commit msg

v5: - rely context banning instead of wedging the gpu
- beautification and fix for ban calculation (Chris)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 57094f82 04-Sep-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Hold an object reference whilst we shrink it

Whilst running the shrinker, we need to hold a reference as we unbind
the objects, or else we may end up waiting for and retiring requests,
which in turn may result in this object being freed.

This is very similar to the eviction code which also has to be very
careful to keep a reference to its objects as it retires and unbinds
them.

Another similarity, that Ben pointed out, is that as we may call
retire-requests, the unbound_list is outside of our control. We must
only process a single element of that list at a time, that is we can not
rely on the "safe" next pointer being valid after a call to
i915_vma_unbind().

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffffa0082892>] i915_gem_gtt_finish_object+0x68/0xbd [i915]
PGD 758d3067 PUD ac0d6067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: dm_mod snd_hda_codec_realtek iTCO_wdt iTCO_vendor_support pcspkr snd_hda_intel i2c_i801 snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd lpc_ich mfd_core soundcore battery ac option usb_wwan usbserial uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev i915 video button drm_kms_helper drm acpi_cpufreq mperf freq_table
CPU: 1 PID: 16835 Comm: fbo-maxsize Not tainted 3.11.0-rc7_nightlytop_8fdad4_20130902_+ #7977
task: ffff8800712106d0 ti: ffff880028e4a000 task.ti: ffff880028e4a000
RIP: 0010:[<ffffffffa0082892>] [<ffffffffa0082892>] i915_gem_gtt_finish_object+0x68/0xbd [i915]
RSP: 0018:ffff880028e4b9e8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff880145734000 RCX: ffff880145735328
RDX: ffff8801457353fc RSI: 0000000000000000 RDI: ffff88007597cc00
RBP: ffff88007597cc00 R08: 0000000000000001 R09: ffff88014f257f00
R10: ffffea0001d65f00 R11: 0000000000bba60b R12: ffff880149e5b000
R13: ffff880145734001 R14: ffff88007597ccc8 R15: ffff88007597cc00
FS: 00007ff5bc919740(0000) GS:ffff88014f240000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000028f4c000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
0000000000000000 ffff88007597cc00 ffff8801440d6840 0000000000000000
ffff880145734000 ffffffffa007c854 0000000000000010 ffff88007597c900
0000000000018000 00000000004a1201 ffff88007597cc60 ffffffffa007d183
Call Trace:
[<ffffffffa007c854>] ? i915_vma_unbind+0xe2/0x1d1 [i915]
[<ffffffffa007d183>] ? __i915_gem_shrink+0xf1/0x162 [i915]
[<ffffffffa007d2ee>] ? i915_gem_object_get_pages_gtt+0xfa/0x303 [i915]
[<ffffffffa00795f4>] ? i915_gem_object_get_pages+0x54/0x89 [i915]
[<ffffffffa007cbda>] ? i915_gem_object_pin+0x238/0x5ce [i915]
[<ffffffff812cba5f>] ? __sg_page_iter_next+0x2b/0x58
[<ffffffffa0082056>] ? gen6_ppgtt_insert_entries+0xf2/0x114 [i915]
[<ffffffffa007fe4b>] ? i915_gem_execbuffer_reserve_vma.isra.13+0x79/0x18d [i915]
[<ffffffffa008017c>] ? i915_gem_execbuffer_reserve+0x21d/0x347 [i915]
[<ffffffffa0080bfb>] ? i915_gem_do_execbuffer.isra.17+0x4f3/0xe61 [i915]
[<ffffffffa00795f4>] ? i915_gem_object_get_pages+0x54/0x89 [i915]
[<ffffffffa007e405>] ? i915_gem_pwrite_ioctl+0x743/0x7a5 [i915]
[<ffffffffa0081a46>] ? i915_gem_execbuffer2+0x15e/0x1e4 [i915]
[<ffffffffa000e20d>] ? drm_ioctl+0x2a5/0x3c4 [drm]
[<ffffffffa00818e8>] ? i915_gem_execbuffer+0x37f/0x37f [i915]
[<ffffffff816f64c0>] ? __do_page_fault+0x3ab/0x449
[<ffffffff810be3da>] ? do_mmap_pgoff+0x2b2/0x341
[<ffffffff810e49be>] ? vfs_ioctl+0x1e/0x31
[<ffffffff810e5194>] ? do_vfs_ioctl+0x3ad/0x3ef
[<ffffffff810e5224>] ? SyS_ioctl+0x4e/0x7e
[<ffffffff816f88d2>] ? system_call_fastpath+0x16/0x1b
Code: 52 0c a0 48 c7 c6 22 30 0d a0 31 c0 e8 ef 00 f9 ff bf c6 a7 00 00 e8 90 5d 24 e1 f6 85 13 01 00 00 10 75 44 48 8b 85 18 01 00 00 <8b> 50 08 48 8b 30 49 8b 84 24 88 02 00 00 48 89 c7 48 81 c7 98
RIP [<ffffffffa0082892>] i915_gem_gtt_finish_object+0x68/0xbd [i915]
RSP <ffff880028e4b9e8>
CR2: 0000000000000008

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68171
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
[danvet: Bikeshed the comments a bit as discussed with Chris.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 3c0e234c 04-Sep-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915; Preallocate the lazy request

It is possible for us to be forced to perform an allocation for the lazy
request whilst running the shrinker. This allocation may fail, leaving
us unable to reclaim any memory leading to premature OOM. A neat
solution to the problem is to preallocate the request at the same time
as acquiring the seqno for the ring transaction. This means that we can
report ENOMEM prior to touching the rings.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1823521d 04-Sep-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename ring->outstanding_lazy_request

Prior to preallocating an request for lazy emission, rename the existing
field to make way (and differentiate the seqno from the request struct).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 9a7e0c2a 26-Aug-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rearrange the comments in i915_add_request()

The comments were a little out-of-sequence with the code, forcing the
reader to jump around whilst reading. Whilst moving the comments around,
add one to explain the context reference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c0321e2c 26-Aug-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Do not add an interrupt for a context switch

We use the request to ensure we hold a reference to the context for the
duration that it remains in use by the ring. Each request only holds a
reference to the current context, hence we emit a request after
switching contexts with the final reference to the old context. However,
the extra interrupt caused by that request is not useful (no timing
critical function will wait for the context object), instead the overhead
of servicing the IRQ shows up in some (lightweight) benchmarks. In order
to keep the useful property of using the request to manage the context
lifetime, we want to add a dummy request that is associated with the
interrupt from the subsequent real request following the batch.

The extra interrupt was added as a side-effect of using
i915_add_request() in

commit 112522f6789581824903f6f72082b5b841a7f0f9
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu May 2 16:48:07 2013 +0300

drm/i915: put context upon switching

v2: Daniel convinced me that the request here was solely for context
lifetime tracking and that we have the active ref to keep the object
alive whilst the MI_SET_CONTEXT. So the only concern then is which
context should get the blame for MI_SET_CONTEXT failing. The old scheme
added a request for the old context so that any hang upto and including
the switch away would mark the old context as guilty. Now any hang here
implicates the new context. However since we have already gone through a
complete flush with the last context in its last request, and all that
lies in no-man's-land is an invalidate flush and the MI_SET_CONTEXT, we
should be safe in not unduly placing blame on the new context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 0ff501cb 29-Aug-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Fix list corruption in vma_unbind

The saga around the breadcrumb vmas used by execbuf continues ...

This time around we've managed to unconditionally move the object to
the unbound list on the last vma unbind even though it might never
have been on either the bound or unbound list. Hilarity ensued.

Chris Wilson tracked this one down but compared to his patches I've
simply opted to completely separate the unbound case for not-yet bound
vmas. Otherwise we imo end up with semantically hard to parse checks
around the list_move_tail(global_list, ...).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68462
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 9435373e 28-Aug-2013 Rodrigo Vivi <rodrigo.vivi@gmail.com>

drm/i915: Report enabled slices on Haswell GT3

Batchbuffers constructed by userspace can conditionalise their URB
allocations through the use of the MI_SET_PREDICATE command. This
command can read the MI_PREDICATE_RESULT_2 register to see how many
slices are enabled on GT3, and by virtue of the result, scale their
memory allocations to fit enabled memory.

Of course, this only works if the kernel sets the appropriate bit in the
register first.

v2: Better commit subject and message by Chris Wilson.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Credits-to: Yejun Guo <yejun.guo@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b93dab6e 26-Aug-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: More vma fixups around unbind/destroy

The important bugfix here is that we must not unlink the vma when
we keep it around as a placeholder for the execbuf code. Since then we
won't find it again when execbuf gets interrupt and restarted and
create a 2nd vma. And since the code as-is isn't fit yet to deal with
more than one vma, hilarity ensues.

Specifically the dma map/unmap of the sg table isn't adjusted for
multiple vmas yet and will blow up like this:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
PGD 56bb5067 PUD ad3dd067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: tcp_lp ppdev parport_pc lp parport ipv6 dm_mod dcdbas snd_hda_codec_hdmi pcspkr snd_hda_codec_realtek serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_codec lpc_ich snd_hwdep mfd_core snd_pcm snd_page_alloc snd_timer snd soundcore acpi_cpufreq i915 video button drm_kms_helper drm mperf freq_table
CPU: 1 PID: 16650 Comm: fbo-maxsize Not tainted 3.11.0-rc4_nightlytop_d93f59_debug_20130814_+ #6957
Hardware name: Dell Inc. OptiPlex 9010/03JR84, BIOS A01 05/04/2012
task: ffff8800563b3f00 ti: ffff88004bdf4000 task.ti: ffff88004bdf4000
RIP: 0010:[<ffffffffa008fb37>] [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
RSP: 0018:ffff88004bdf5958 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8801135e0000 RCX: ffff8800ad3bf8e0
RDX: ffff8800ad3bf8e0 RSI: 0000000000000000 RDI: ffff8801007ee780
RBP: ffff88004bdf5978 R08: ffff8800ad3bf8e0 R09: 0000000000000000
R10: ffffffff86ca1810 R11: ffff880036a17101 R12: ffff8801007ee780
R13: 0000000000018001 R14: ffff880118c4e000 R15: ffff8801007ee780
FS: 00007f401a0ce740(0000) GS:ffff88011e280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000005635c000 CR4: 00000000001407e0
Stack:
ffff8801007ee780 ffff88005c253180 0000000000018000 ffff8801135e0000
ffff88004bdf59a8 ffffffffa0088e55 0000000000000011 ffff8801007eec00
0000000000018000 ffff880036a17101 ffff88004bdf5a08 ffffffffa0089026
Call Trace:
[<ffffffffa0088e55>] i915_vma_unbind+0xdf/0x1ab [i915]
[<ffffffffa0089026>] __i915_gem_shrink+0x105/0x177 [i915]
[<ffffffffa0089452>] i915_gem_object_get_pages_gtt+0x108/0x309 [i915]
[<ffffffffa0085ba9>] i915_gem_object_get_pages+0x61/0x90 [i915]
[<ffffffffa008f22b>] ? gen6_ppgtt_insert_entries+0x103/0x125 [i915]
[<ffffffffa008a113>] i915_gem_object_pin+0x1fa/0x5df [i915]
[<ffffffffa008cdfe>] i915_gem_execbuffer_reserve_object.isra.6+0x8d/0x1bc [i915]
[<ffffffffa008d156>] i915_gem_execbuffer_reserve+0x229/0x367 [i915]
[<ffffffffa008dbf6>] i915_gem_do_execbuffer.isra.12+0x4dc/0xf3a [i915]
[<ffffffff810fc823>] ? might_fault+0x40/0x90
[<ffffffffa008eb89>] i915_gem_execbuffer2+0x187/0x222 [i915]
[<ffffffffa000971c>] drm_ioctl+0x308/0x442 [drm]
[<ffffffffa008ea02>] ? i915_gem_execbuffer+0x3ae/0x3ae [i915]
[<ffffffff817db156>] ? __do_page_fault+0x3dd/0x481
[<ffffffff8112fdba>] vfs_ioctl+0x26/0x39
[<ffffffff811306a2>] do_vfs_ioctl+0x40e/0x451
[<ffffffff817deda7>] ? sysret_check+0x1b/0x56
[<ffffffff8113073c>] SyS_ioctl+0x57/0x87
[<ffffffff8135bbfe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff817ded82>] system_call_fastpath+0x16/0x1b
Code: 48 c7 c6 84 30 0e a0 31 c0 e8 d0 e9 f7 ff bf c6 a7 00 00 e8 07 af 2c e1 41 f6 84 24 03 01 00 00 10 75 44 49 8b 84 24 08 01 00 00 <8b> 50 08 48 8b 30 49 8b 86 b0 04 00 00 48 89 c7 48 81 c7 98 00
RIP [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915]
RSP <ffff88004bdf5958>
CR2: 0000000000000008

As a consequence we need to change the "only one vma for now" check in
vma_unbind - since vma_destroy isn't always called the obj->vma_list
might not be empty. Instead check that the vma list is singular at the
beginning of vma_unbind. This is also more symmetric with bind_to_vm.

This fixes the igt/gem_evict_everything|alignment testcases.

v2:
- Add a paranoid WARN to mark_free in the eviction code to make sure
we never try to evict a vma used by the execbuf code right now.
- Move the check for a temporary execbuf vma into vma_destroy -
otherwise the failure path cleanup in bind_to_vm will blow up.

Our first attempting at fixing this was

commit 1be81a2f2cfd8789a627401d470423358fba2d76
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Aug 20 12:56:40 2013 +0100

drm/i915: Don't destroy the vma placeholder during execbuffer reservation

Squash with this when merging!

v3: Improvements suggested in Chris' review:
- Move the WARN_ON in vma_destroy that checks for vmas with an drm_mm
allocation before the early return.
- Bail out if we hit the WARN in mark_free to hopefully make the
kernel survive for long enough to capture it.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68298
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68171
Tested-by: lu hua <huax.lu@intel.com> (v2)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# aaa05667 19-Aug-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Don't destroy the vma placeholder during execbuffer reservation

The execbuffer handle and exec_link were moved from the object into the
vma. As the vma may be unbound and destroyed whilst attempting to
reserve the execbuffer objects (either through a forced unbind to fix up
a misalignment or through an evict-everything call) we need to prevent
the free of the i915_vma itself. Otherwise not only is the list of
objects to reserve corrupt, but we continue to reference stale vma
entries.

Fixes kernel crash with i-g-t/gem_evict_everything

This regression has been introduced in

commit 04038a515d6eda6dd0857c0ade0b3950d372f4c0
Author: Ben Widawsky <ben@bwidawsk.net>
AuthorDate: Wed Aug 14 11:38:36 2013 +0200

drm/i915: Convert execbuf code to use vmas

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
References: http://www.spinics.net/lists/intel-gfx/msg32038.html
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68298
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e656a6cb 14-Aug-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: inline vma_create into lookup_or_create_vma

In the execbuf code we don't clean up any vmas which ended up not
getting bound for code simplicity. To make sure that we don't end up
creating multiple vma for the same vm kill the somewhat dangerous
vma_create function and inline it into lookup_or_create.

This is just a safety measure to prevent surprises in the future.

Also update the somewhat confused comment in the execbuf code and
clarify what kind of magic is going on with a new one.

v2: Keep the function separate as requested by Chris. But give it a __
prefix for paranoia and move it tighter together with the other vma
stuff.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 27173f1f 14-Aug-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Convert execbuf code to use vmas

In order to transition more of our code over to using a VMA instead of
an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up
until now, we've only had a VMA when actually binding an object.

The previous patch helped handle the distinction on bound vs. unbound.
This patch will help us catch leaks, and other issues before we actually
shuffle a bunch of stuff around.

This attempts to convert all the execbuf code to speak in vmas. Since
the execbuf code is very self contained it was a nice isolated
conversion.

The meat of the code is about turning eb_objects into eb_vma, and then
wiring up the rest of the code to use vmas instead of obj, vm pairs.

Unfortunately, to do this, we must move the exec_list link from the obj
structure. This list is reused in the eviction code, so we must also
modify the eviction code to make this work.

WARNING: This patch makes an already hotly profiled path slower. The cost is
unavoidable. In reply to this mail, I will attach the extra data.

v2: Release table lock early, and two a 2 phase vma lookup to avoid
having to use a GFP_ATOMIC. (Chris)

v3: s/obj_exec_list/obj_exec_link/
Updates to address
commit 6d2b888569d366beb4be72cacfde41adee2c25e1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Aug 7 18:30:54 2013 +0100

drm/i915: List objects allocated from stolen memory in debugfs

v4: Use obj = vma->obj for neatness in some places (Chris)
need_reloc_mappable() should return false if ppgtt (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out prep patches. Also remove a FIXME comment which is
now taken care of.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d2933a5b 30-Aug-2013 Damien Lespiau <damien.lespiau@intel.com>

drm/i915: Don't call sg_free_table() if sg_alloc_table() fails

One needs to call __sg_free_table() if __sg_alloc_table() fails, but
sg_alloc_table() does that for us already.

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewd-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# fac15c10 29-Aug-2013 Joe Perches <joe@perches.com>

i915_gem: Convert kmem_cache_alloc(...GFP_ZERO) to kmem_cache_zalloc

The helper exists, might as well use it instead of __GFP_ZERO.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c67a470b 19-Aug-2013 Paulo Zanoni <paulo.r.zanoni@intel.com>

drm/i915: allow package C8+ states on Haswell (disabled)

This patch allows PC8+ states on Haswell. These states can only be
reached when all the display outputs are disabled, and they allow some
more power savings.

The fact that the graphics device is allowing PC8+ doesn't mean that
the machine will actually enter PC8+: all the other devices also need
to allow PC8+.

For now this option is disabled by default. You need i915.allow_pc8=1
if you want it.

This patch adds a big comment inside i915_drv.h explaining how it
works and how it tracks things. Read it.

v2: (this is not really v2, many previous versions were already sent,
but they had different names)
- Use the new functions to enable/disable GTIMR and GEN6_PMIMR
- Rename almost all variables and functions to names suggested by
Chris
- More WARNs on the IRQ handling code
- Also disable PC8 when there's GPU work to do (thanks to Ben for
the help on this), so apps can run caster
- Enable PC8 on a delayed work function that is delayed for 5
seconds. This makes sure we only enable PC8+ if we're really
idle
- Make sure we're not in PC8+ when suspending
v3: - WARN if IRQs are disabled on __wait_seqno
- Replace some DRM_ERRORs with WARNs
- Fix calls to restore GT and PM interrupts
- Use intel_mark_busy instead of intel_ring_advance to disable PC8
v4: - Use the force_wake, Luke!
v5: - Remove the "IIR is not zero" WARNs
- Move the force_wake chunk to its own patch
- Only restore what's missing from RC6, not everything

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# accfef2e 14-Aug-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: prepare bind_to_vm for preallocated vma

In the new execbuf code we want to track buffers using the vmas even
before they're all properly mapped. Which means that bind_to_vm needs
to deal with buffers which have preallocated vmas which aren't yet
bound.

This patch implements this prep work and adjusts our WARN/BUG checks.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out from Ben's big execbuf patch. Also move one BUG
back to its original place to deflate the diff a notch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 82a55ad1 14-Aug-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Switch eviction code to use vmas

The execbuf wants to do relocations usings vmas, so we need a
vma->exec_list. The eviction code also uses the old obj execbuf list
for it's own book-keeping, but would really prefer to deal in vmas
only. So switch it over to the new list.

Again this is just a prep patch for the big execbuf vma conversion.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out from Ben's big execbuf vma patch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b25cb2f8 14-Aug-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: s/obj->exec_list/obj->obj_exec_link in debugfs

To convert the execbuf code over to use vmas natively we need to
shuffle the exec_list a bit. This patch here just prepares things with
the debugfs code, which also uses the old exec_list list_head, newly
called obj_exec_link.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Split out from Ben's big patch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4b6d846e 12-Aug-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop the overzealous warning from i915_gem_set_cache_level

By our earlier reckoning, move from a snooped/llc setting to an uncached
setting, leaves the CPU cache in a consistent state irrespective of our
domain tracking - so we can forgo the warning about the lack of
invalidation. Similarly for any writes posted to the snooped CPU domain,
we know will be safely clflushed to the uncached PTEs after forcing the
domain change.

This WARN started to pop up with

commit d46f1c3f1372e3a72fab97c60480aa4a1084387f
Author: Chris Wilson <chris@chris-wilson.co.uk>
AuthorDate: Thu Aug 8 14:41:06 2013 +0100

drm/i915: Allow the GPU to cache stolen memory

Ville brought up a scenario where the interaction of a set_caching
ioctl call from userspace on a scanout buffer (i.e. obj->pin_display
is set) resulted in the code getting confused and not properly
flushing stale cpu cachelines. Luckily we already prevent this by
rejecting caching changes when obj->pin_count is set.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68040
Tested-by: cancan,feng <cancan.feng@intel.com>
[danvet: Add buglink, bisect result and explain why Ville's scenario
is already taken care of.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 49987099 14-Aug-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: use vma->node directly and rewrap map&fence in bind

Use () to make for neater alignment of the split lines, too. With this
we ditch another jump through the obj_gtt_size/offset indirection
maze.

Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4bd561b3 13-Aug-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: cleanup map&fence in bind

Cleanup the map and fenceable setting during bind to make more sense,
and not check i915_is_ggtt() 2 unnecessary times

v2: Move the bools into the if block (Chris) - There are ways to tidy
this function (fence calculations for instance) even further, but they
are quite invasive, so I am punting on those unless specifically asked.

v3: Add newline between variable declaration and logic (Chris)

Recommended-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 433544bd 13-Aug-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Remove node only when allocated

VMAs can be created and not bound. One may think of it as lazy cleanup,
and safely gloss over the conditions which manufacture it. In either
case, when the object backing the i915 vma is destroyed, we must cleanup
the vma without stumbling into a bunch of pitfalls that assume the vma
is bound.

NOTE: I was pretty certain the above condition could only happen when we
introduced the use of VMAs being looked up at execbuf, and already
existing. Paulo has hit this though, so I must be missing something. As
I believe the patch is correct anyway, therefore I won't scratch my head
too hard.

v2: use goto destroy as a compromise (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4257d3ba 08-Aug-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Allow the user to set bo into the DISPLAY cache domain

This is primarily for the benefit of the create2 ioctl so that the
caller can avoid the later step of rebinding the bo with new PTE bits.
After introducing WT (and possibly GFDT) cacheing for display targets,
not everything in the display is earmarked as UC, and more importantly
what is is controlled by the kernel.

Note that set_cache_level/get_cache_level for DISPLAY is not necessarily
idempotent; get_cache_level may return UC for architectures that have no
special cache domain for the display engine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 651d794f 08-Aug-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use Write-Through cacheing for the display plane on Iris

Haswell GT3e has the unique feature of supporting Write-Through cacheing
of objects within the eLLC/LLC. The purpose of this is to enable the display
plane to remain coherent whilst objects lie resident in the eLLC/LLC - so
that we, in theory, get the best of both worlds, perfect display and fast
access.

However, we still need to be careful as the CPU does not see the WT when
accessing the cache. In particular, this means that we need to flush the
cache lines after writing to an object through the CPU, and on
transitioning from a cached state to WT.

v2: Actually do the clflush on transition to WT, nagging by Ville.
v3: Flush the CPU cache after writes into WT objects.
v4: Rease onto LLC updates and report WT as "uncached" for
get_cache_level_ioctl to remain symmetric with set_cache_level_ioctl.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f2f4d82f 10-Aug-2013 Jani Nikula <jani.nikula@intel.com>

drm/i915: give more distinctive names to ring hangcheck action enums

The short lowercase names are bound to collide. The default warnings
don't even warn about shadowing.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 7ace7ef2 09-Aug-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: WARN_ON failed map_and_fenceable

I just noticed in our code we don't really check the assertion, and
given some of the code I am changing in this area, I feel a WARN is very
nice to have.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: s/&/&&/ to fix typo on the check.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 000433b6 08-Aug-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only do a chipset flush after a clflush

Now that we skip clflushes more often, return a boolean indicating
whether the clflush was actually performed, and only if it was do the
chipset flush. (Though on most of the architectures where the clflush will
be skipped, the chipset flush is a no-op!)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 2c22569b 08-Aug-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Update rules for writing through the LLC with the cpu

As mentioned in the previous commit, reads and writes from both the CPU
and GPU go through the LLC. This gives us coherency between the CPU and
GPU irrespective of the attribute settings either device sets. We can
use to avoid having to clflush even uncached memory.

Except for the scanout.

The scanout resides within another functional block that does not use
the LLC but reads directly from main memory. So in order to maintain
coherency with the scanout, writes to uncached memory must be flushed.
In order to optimize writes elsewhere, we start tracking whether an
framebuffer is attached to an object.

v2: Use pin_display tracking rather than fb_count (to ensure we flush
cursors as well etc) and only force the clflush along explicit writes to
the scanout paths (i.e. pin_to_display_plane and pwrite into scanout).

v3: Force the flush after hitting the slowpath in pwrite, as after
dropping the lock the object's cache domain may be invalidated. (Ville)

Based on a patch by Ville Syrjälä.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# cc98b413 08-Aug-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track when an object is pinned for use by the display engine

The display engine has unique coherency rules such that it requires
special handling to ensure that all writes to cursors, scanouts and
sprites are clflushed. This patch introduces the infrastructure to
simply track when an object is being accessed by the display engine.

v2: Explain the is_pin_display() magic as the sources for obj->pin_count
and their individual rules is not obvious. (Ville)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c76ce038 08-Aug-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Update rules for reading cache lines through the LLC

The LLC is a fun device. The cache is a distinct functional block within
the SA that arbitrates access from both the CPU and GPU cores. As such
all writes to memory land first in the LLC before further action is
taken. For example, an uncached write from either the CPU or GPU will
then proceed to memory and evict the cacheline from the LLC. This means that
a read from the LLC always returns the correct information even if the PTE
bit in the GPU differs from the PAT bit in the CPU. For the older
snooping architecture on non-LLC, the fundamental principle still holds
except that some coordination is required between the CPU and GPU to
explicitly perform the snooping (which is handled by our request
tracking).

The upshot of this is that we know that we can issue a read from either
LLC devices or snoopable memory and trust the contents of the cache -
i.e. we can forgo a clflush before a read in these circumstances.
Writing to memory from the CPU is a little more tricky as we have to
consider that the scanout does not read from the CPU cache at all, but
from main memory. So we have to currently treat all requests to write to
uncached memory as having to be flushed to main memory for coherency
with all consumers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 58e73e15 08-Aug-2013 Dan Carpenter <dan.carpenter@oracle.com>

drm/i915: unbreak i915_gem_object_ggtt_unbind()

There is an extra semi-colon here so we just leak and never unbind
anything.

This regression has been introduced in

commit 07fe0b12800d4752d729d4122c01f41f80a5ba5a
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Wed Jul 31 17:00:10 2013 -0700

drm/i915: plumb VM into bind/unbind code

Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8b9c2b94 31-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Add vma to list at creation

With the current code there shouldn't be a distinction - however with an
upcoming change we intend to allocate a vma much earlier, before it's
actually bound anywhere.

To do this we have to check node allocation as well for the _bound()
check.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: move list_del(&vma->vma_link) from vma_unbind to vma_destroy,
again fallout from the loss of "rm/i915: Cleanup more of VMA in
destroy".]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

fixup for drm/i915: Add vma to list at creation


# ca191b13 31-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: mm_list is per VMA

formerly: "drm/i915: Create VMAs (part 5) - move mm_list"

The mm_list is used for the active/inactive LRUs. Since those LRUs are
per address space, the link should be per VMx .

Because we'll only ever have 1 VMA before this point, it's not incorrect
to defer this change until this point in the patch series, and doing it
here makes the change much easier to understand.

Shamelessly manipulated out of Daniel:
"active/inactive stuff is used by eviction when we run out of address
space, so needs to be per-vma and per-address space. Bound/unbound otoh
is used by the shrinker which only cares about the amount of memory used
and not one bit about in which address space this memory is all used in.
Of course to actual kick out an object we need to unbind it from every
address space, but for that we have the per-object list of vmas."

v2: only bump GGTT LRU in i915_gem_object_set_to_gtt_domain (Chris)

v3: Moved earlier in the series

v4: Add dropped message from v3

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Frob patch to apply and use vma->node.size directly as
discused with Ben. Also drop a needles BUG_ON before move_to_inactive,
the function itself has the same check.]
[danvet 2nd: Rebase on top of the lost "drm/i915: Cleanup more of VMA
in destroy", specifically unlink the vma from the mm_list in
vma_unbind (to keep it symmetric with bind_to_vm) instead of
vma_destroy.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5cacaac7 31-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Fix up map and fenceable for VMA

formerly: "drm/i915: Create VMAs (part 3.5) - map and fenceable
tracking"

The map_and_fenceable tracking is per object. GTT mapping, and fences
only apply to global GTT. As such, object operations which are not
performed on the global GTT should not effect mappable or fenceable
characteristics.

Functionally, this commit could very well be squashed in to a previous
patch which updated object operations to take a VM argument. This
commit is split out because it's a bit tricky (or at least it was for
me).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Drop the bogus hunk in i915_vma_unbind as discussed with
Ben.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 9843877d 31-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: turn bound_ggtt checks to bound_any

In some places, we want to know if an object is bound in any address
space, and not just the global GTT. This often applies when there is a
single global resource (object, pages, etc.)

function | reason
--------------------------------------------------
i915_gem_object_is_inactive | global object
i915_gem_object_put_pages | object's pages
915_gem_object_unpin | global object
i915_gem_execbuffer_unreserve_object | temporary until we plumb vma
pread/pwrite | see the note below

Note: set_to_gtt_domain in pwrite/pread is abused as a wait_rendering
call - but that once only worked if the object is bound. We really
should replace this with a plain wait_rendering call, which would have
the upside that in pread it would be clearer that we actually only
wait for oustanding gpu writes.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Explain the set_to_gtt_domain in pwrite/pread and volunteer
Ben to replace those with wait_rendering calls.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f6cd1f15 31-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Use new bind/unbind in eviction code

Eviction code, like the rest of the converted code needs to be aware of
the address space for which it is evicting (or the everything case, all
addresses). With the updated bind/unbind interfaces of the last patch,
we can now safely move the eviction code over.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 07fe0b12 31-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: plumb VM into bind/unbind code

As alluded to in several patches, and it will be reiterated later... A
VMA is an abstraction for a GEM BO bound into an address space.
Therefore it stands to reason, that the existing bind, and unbind are
the ones which will be the most impacted. This patch implements this,
and updates all callers which weren't already updated in the series
(because it was too messy).

This patch represents the bulk of an earlier, larger patch. I've pulled
out a bunch of things by the request of Daniel. The history is preserved
for posterity with the email convention of ">" One big change from the
original patch aside from a bunch of cropping is I've created an
i915_vma_unbind() function. That is because we always have the VMA
anyway, and doing an extra lookup is useful. There is a caveat, we
retain an i915_gem_object_ggtt_unbind, for the global cases which might
not talk in VMAs.

> drm/i915: plumb VM into object operations
>
> This patch was formerly known as:
> "drm/i915: Create VMAs (part 3) - plumbing"
>
> This patch adds a VM argument, bind/unbind, and the object
> offset/size/color getters/setters. It preserves the old ggtt helper
> functions because things still need, and will continue to need them.
>
> Some code will still need to be ported over after this.
>
> v2: Fix purge to pick an object and unbind all vmas
> This was doable because of the global bound list change.
>
> v3: With the commit to actually pin/unpin pages in place, there is no
> longer a need to check if unbind succeeded before calling put_pages().
> Make put_pages only BUG() after checking pin count.
>
> v4: Rebased on top of the new hangcheck work by Mika
> plumbed eb_destroy also
> Many checkpatch related fixes
>
> v5: Very large rebase
>
> v6:
> Change BUG_ON to WARN_ON (Daniel)
> Rename vm to ggtt in preallocate stolen, since it is always ggtt when
> dealing with stolen memory. (Daniel)
> list_for_each will short-circuit already (Daniel)
> remove superflous space (Daniel)
> Use per object list of vmas (Daniel)
> Make obj_bound_any() use obj_bound for each vm (Ben)
> s/bind_to_gtt/bind_to_vm/ (Ben)
>
> Fixed up the inactive shrinker. As Daniel noticed the code could
> potentially count the same object multiple times. While it's not
> possible in the current case, since 1 object can only ever be bound into
> 1 address space thus far - we may as well try to get something more
> future proof in place now. With a prep patch before this to switch over
> to using the bound list + inactive check, we're now able to carry that
> forward for every address space an object is bound into.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Rebase on top of the loss of "drm/i915: Cleanup more of VMA
in destroy".]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 80dcfdbd 31-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Rework __i915_gem_shrink

In order to do this for all VMs, it's convenient to rework the logic a
bit. This should have no functional impact.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 31e5d7c6 27-Jul-2013 David Herrmann <dh.herrmann@gmail.com>

drm/mm: add "best_match" flag to drm_mm_insert_node()

Add a "best_match" flag similar to the drm_mm_search_*() helpers so we
can convert TTM to use them in follow up patches. We can also inline the
non-generic helpers and move them into the header to allow compile-time
optimizations.

To make calls to drm_mm_{search,insert}_node() more readable, this
converts the boolean argument to a flagset. There are pending patches that
add additional flags for top-down allocators and more.

v2:
- use flag parameter instead of boolean "best_match"
- convert *_search_free() helpers to also use flags argument

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 43387b37 16-Jul-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/gem: create drm_gem_dumb_destroy

All the gem based kms drivers really want the same function to
destroy a dumb framebuffer backing storage object.

So give it to them and roll it out in all drivers.

This still leaves the option open for kms drivers which don't use GEM
for backing storage, but it does decently simplify matters for gem
drivers.

Acked-by: Inki Dae <inki.dae@samsung.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Intel Graphics Development <intel-gfx@lists.freedesktop.org>
Cc: Ben Skeggs <skeggsb@gmail.com>
Reviwed-by: Rob Clark <robdclark@gmail.com>
Cc: Alex Deucher <alexdeucher@gmail.com>
Acked-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 637efacf 05-Aug-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: eliminate dead domain clearing on reset

The code itself is no longer accurate without updating once we have
multiple address space since clearing the domains of every object
requires scanning the inactive list for all VMs.

"This code is dead. Just remove it rather than port it to vma." - Chris
Wilson

Recommended-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d1ccbb5d 31-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: make reset&hangcheck code VM aware

Hangcheck, and some of the recent reset code for guilty batches need to
know which address space the object was in at the time of a hangcheck.
This is because we use offsets in the (PP|G)GTT to determine this
information, and those offsets can differ depending on which VM they are
bound into.

Since we still only have 1 VM ever, this code shouldn't yet have any
impact.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 3e123027 31-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: BUG_ON put_pages later

With multiple VMs, the eviction code benefits from being able to blindly
put pages without needing to know if there are any entities still
holding on to those pages. As such it's preferable to return the -EBUSY
before the BUG.

Eviction code is the only user for now, but overall it makes sense
anyway, IMO.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 3089c6f2 31-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: make caching operate on all address spaces

For now, objects will maintain the same cache levels amongst all address
spaces. This is to limit the risk of bugs, as playing with cacheability
in the different domains can be very error prone.

In the future, it may be optimal to allow setting domains per VMA (ie.
an object bound into an address space).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c37e2204 31-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Add VM to pin

To verbalize it, one can say, "pin an object into the given address
space." The semantics of pinning remain the same otherwise.

Certain objects will always have to be bound into the global GTT.
Therefore, global GTT is a special case, and keep a special interface
around for it (i915_gem_obj_ggtt_pin).

v2: s/i915_gem_ggtt_pin/i915_gem_obj_ggtt_pin

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# fcb4a578 31-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Use bound list for inactive shrink

Do to the move active/inactive lists, it no longer makes sense to use
them for shrinking, since shrinking isn't VM specific (such a need may
also exist, but doesn't yet).

What we can do instead is use the global bound list to find all objects
which aren't active.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a70a3148 31-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Make proper functions for VMs

Earlier in the conversion sequence we attempted to quickly wedge in the
transitional interface as static inlines.

Now that we're sure these interfaces are sane, for easier debug and to
decrease code size (since many of these functions may be called quite a
bit), make them real functions

While at it, kill off the set_color interface. We'll always have the
VMA, or easily get to it.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# fc8c067e 31-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Create an init vm

Move all the similar address space (VM) initialization code to one
function. Until we have multiple VMs, there should only ever be 1 VM.
The aliasing ppgtt is a special case without it's own VM (since it
doesn't need it's own address space management).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c20e8355 24-Jul-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: fix the racy object accounting

Just use a spinlock to protect them.

v2: Rebase onto the new object create refcount fix patch.

v3: Don't kill dev_priv->mm.object_memory as requested by Chris and
hence just use a spinlock instead of atomic_t.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67287
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 51335df9 24-Jul-2013 David Herrmann <dh.herrmann@gmail.com>

drm/vma: provide drm_vma_node_unmap() helper

Instead of unmapping the nodes in TTM and GEM users manually, we provide
a generic wrapper which does the correct thing for all vma-nodes.

v2: remove bdev->dev_mapping test in ttm_bo_unmap_virtual_unlocked() as
ttm_mem_io_free_vm() does nothing in that case (io_reserved_vm is 0).
v4: Fix docbook comments
v5: use drm_vma_node_size()

Cc: Dave Airlie <airlied@redhat.com>
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@gmail.com>


# 0de23977 24-Jul-2013 David Herrmann <dh.herrmann@gmail.com>

drm/gem: convert to new unified vma manager

Use the new vma manager instead of the old hashtable. Also convert all
drivers to use the new convenience helpers. This drops all the
(map_list.hash.key << PAGE_SHIFT) non-sense.

Locking and access-management is exactly the same as before with an
additional lock inside of the vma-manager, which strictly wouldn't be
needed for gem.

v2:
- rebase on drm-next
- init nodes via drm_vma_node_reset() in drm_gem.c
v3:
- fix tegra
v4:
- remove duplicate if (drm_vma_node_has_offset()) checks
- inline now trivial drm_vma_node_offset_addr() calls
v5:
- skip node-reset on gem-init due to kzalloc()
- do not allow mapping gem-objects with offsets (backwards compat)
- remove unneccessary casts

Cc: Inki Dae <inki.dae@samsung.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@gmail.com>


# d861e338 24-Jul-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: fix reference counting in i915_gem_create

This function is called without the dev->struct_mutex held, hence we
need to use the _unlocked unreference variants.

As soon as the object is registered userspace can sneak in here with a
gem_close ioctl call, so the object can (and with my new evil tests
actually does) get the final unreference in this place. The lack of
locking then results in hilarity and some good leakage.

To fix this we simply need to revert

Chris Wilson <chris@chris-wilson.co.uk>

v2: We need to make the trace call _before_ we drop our ref - the
object might very well be gone by then already.

v3: Just revert the original patch as suggested by Chris Wilson.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Remove the added white line again to tighten the return
block, requested by Chris.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# bc6bc15b 21-Jul-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: fix up error cleanup in i915_gem_object_bind_to_gtt

This has been broken in

commit 2f63315692b1d3c055972ad33fc7168ae908b97b
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Wed Jul 17 12:19:03 2013 -0700

drm/i915: Create VMAs

which resulted in an OOPS the first time around we've hit -ENOSPC.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67156
Cc: Imre Deak <imre.deak@intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Tested-by: meng <mengmeng.meng@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 0b74b508 18-Jul-2013 Xiong Zhang <xiong.y.zhang@intel.com>

drm/i915: add prefault_disable module option

prefault is stll enabled by default which prevent most of pwrite/pread/reloc
from running slow path, in order to verify these slow pathes, prefault need
to be disabled.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
[danvet: Make checkpatch happy and bikeshed the module option help
text a bit.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6286ef9b 18-Jul-2013 Dan Carpenter <dan.carpenter@oracle.com>

drm/i915: use after free on error path

i915_gem_vma_destroy() frees its argument so we have to move the
drm_mm_remove_node() call up a few lines.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# db473b36 18-Jul-2013 Dan Carpenter <dan.carpenter@oracle.com>

drm/i915: checking for NULL instead of IS_ERR()

i915_gem_vma_create() returns and ERR_PTR() or a valid pointer, it never
returns NULL.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 94a335db 17-Jul-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: correctly restore fences with objects attached

To avoid stalls we delay tiling changes and especially hold of
committing the new fence state for as long as possible.
Synchronization points are in the execbuf code and in our gtt fault
handler.

Unfortunately we've missed that tricky detail when adding proper fence
restore code in

commit 19b2dbde5732170a03bd82cc8bd442cf88d856f7
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Jun 12 10:15:12 2013 +0100

drm/i915: Restore fences after resume and GPU resets

The result was that we've restored fences for objects with no tiling,
since the object<->fence link still existed after resume. Now that
wouldn't have been too bad since any subsequent access would have
fixed things up, but if we've changed from tiled to untiled real havoc
happened:

The tiling stride is stored -1 in the fence register, so a stride of 0
resulted in all 1s in the top 32bits, and so a completely bogus fence
spanning everything from the start of the object to the top of the
GTT. The tell-tale in the register dumps looks like:

FENCE START 2: 0x0214d001
FENCE END 2: 0xfffff3ff

Bit 11 isn't set since the hw doesn't store it, even when writing all
1s (at least on my snb here).

To prevent such a gaffle in the future add a sanity check for fences
with an untiled object attached in i915_gem_write_fence.

v2: Fix the WARN, spotted by Chris.

v3: Trying to reuse get_fences looked ugly and obfuscated the code.
Instead reuse update_fence and to make it really dtrt also move the
fence dirty state clearing into update_fence.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Stéphane Marchesin <marcheu@chromium.org>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=60530
Cc: stable@vger.kernel.org (for 3.10 only)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Matthew Garrett <matthew.garrett@nebula.com>
Tested-by: Björn Bidar <theodorstormgrade@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 2f633156 17-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Create VMAs

Formerly: "drm/i915: Create VMAs (part 1)"

In a previous patch, the notion of a VM was introduced. A VMA describes
an area of part of the VM address space. A VMA is similar to the concept
in the linux mm. However, instead of representing regular memory, a VMA
is backed by a GEM BO. There may be many VMAs for a given object, one
for each VM the object is to be used in. This may occur through flink,
dma-buf, or a number of other transient states.

Currently the code depends on only 1 VMA per object, for the global GTT
(and aliasing PPGTT). The following patches will address this and make
the rest of the infrastructure more suited

v2: s/i915_obj/i915_gem_obj (Chris)

v3: Only move an object to the now global unbound list if there are no
more VMAs for the object which are bound into a VM (ie. the list is
empty).

v4: killed obj->gtt_space
some reworks due to rebase

v5: Free vma on error path (Imre)

v6: Another missed vma free in i915_gem_object_bind_to_gtt error path
(Imre)
Fixed vma freeing in stolen preallocation (Imre)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Imre Deak <imre.deak@intel.com>
[danvet: Squash in fixup from Ben to not deref a non-existing vma in
set_cache_level, reported by Chris.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5cef07e1 16-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Move active/inactive lists to new mm

Shamelessly manipulated out of Daniel :-)
"When moving the lists around explain that the active/inactive stuff is
used by eviction when we run out of address space, so needs to be
per-vma and per-address space. Bound/unbound otoh is used by the
shrinker which only cares about the amount of memory used and not one
bit about in which address space this memory is all used in. Of course
to actual kick out an object we need to unbind it from every address
space, but for that we have the per-object list of vmas."

v2: Leave the bound list as a global one. (Chris, indirectly)

v3: Rebased with no i915_gtt_vm. In most places I added a new *vm local,
since it will eventually be replaces by a vm argument.
Put comment back inline, since it no longer makes sense to do otherwise.

v4: Rebased on hangcheck/error state movement

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 93bd8649 16-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Put the mm in the parent address space

Every address space should support object allocation. It therefore makes
sense to have the allocator be part of the "superclass" which GGTT and
PPGTT will derive.

Since our maximum address space size is only 2GB we're not yet able to
avoid doing allocation/eviction; but we'd hope one day this becomes
almost irrelvant.

v2: Rebased

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 853ba5d2 16-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Move gtt and ppgtt under address space umbrella

The GTT and PPGTT can be thought of more generally as GPU address
spaces. Many of their actions (insert entries), state (LRU lists), and
many of their characteristics (size) can be shared. Do that.

The change itself doesn't actually impact most of the VMA/VM rework
coming up, it just fits in with the grand scheme of abstracting the GPU
VM operations. GGTT will usually be a special case where we either know
an object must be in the GGTT (dislay engine, workarounds, etc.).

The scratch page is left as part of the VM (even though it's currently
shared with the ppgtt code) because in the future when we have Full
PPGTT, I intend to create a separate scratch page for each.

v2: Drop usage of i915_gtt_vm (Daniel)
Make cleanup also part of the parent class (Ben)
Modified commit msg
Rebased

v3: Properly share scratch page (Imre)
Finish commit message (Daniel, Imre)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 10cd45b6 03-Jul-2013 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: introduce i915_queue_hangcheck

To run hangcheck in near future.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 59124506 04-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: store eLLC size

The eLLC cannot be determined by PCIID because as far as we know, even
machines supporting eLLC may not have it enabled, or fused off or
whatever. It's possible this isn't actually true, and at that point we
can switch to a DEV_INFO flag instead.

I've defined everything where the docs are clear, and left the rest as
magic.

But we need it before we set the pte_encode function pointers, which
happens really early, in gtt_init.

The problem with just doing the normal sequence earlier is we don't have
the ability to use forcewake until after the pte functions have been set
up.

Since all solutions are somewhat ugly (barring rewriting all the init
ordering), I've opted to do the detection really early, and the enabling
later - since the register to detect doesn't require forcewake.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Move dev_priv->ellc_size away from the dri1 dungeon to a nice
place right next to the l3 parity stuff. Also squash in the follow-up
commit to read out the eLLC size a bit earlier.]
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 05e21cc4 04-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Define some of the eLLC magic

The EDRAM present register isn't really defined in the docs. It just
says check to see if it's set to 1. So I haven't defined the 1 value not
knowing what it actually means.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 46a0b638 10-Jul-2013 Chris Wilson <chris@chris-wilson.co.uk>

Revert "drm/i915: Workaround incoherence between fences and LLC across multiple CPUs"

This reverts commit 25ff119 and the follow on for Valleyview commit 2dc8aae.

commit 25ff1195f8a0b3724541ae7bbe331b4296de9c06
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu Apr 4 21:31:03 2013 +0100

drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

commit 2dc8aae06d53458dd3624dc0accd4f81100ee631
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed May 22 17:08:06 2013 +0100

drm/i915: Workaround incoherence with fence updates on Valleyview

Jon Bloomfield came up with a plausible explanation and cheap fix
(drm/i915: Fix incoherence with fence updates on Sandybridge+) for the
race condition, so lets run with it.

This is a candidate for stable as the old workaround incurs a
significant cost (calling wbinvd on all CPUs before performing the
register write) for some workloads as noted by Carsten Emde.

Link: http://lists.freedesktop.org/archives/intel-gfx/2013-June/028819.html
References: https://www.osadl.org/?id=1543#c7602
References: https://bugs.freedesktop.org/show_bug.cgi?id=63825
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Carsten Emde <C.Emde@osadl.org>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d18b9619 10-Jul-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix incoherence with fence updates on Sandybridge+

This hopefully fixes the root cause behind the workaround added in

commit 25ff1195f8a0b3724541ae7bbe331b4296de9c06
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu Apr 4 21:31:03 2013 +0100

drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

Thanks to further investigation by Jon Bloomfield, he realised that
the 64-bit register might be broken up by the hardware into two 32-bit
writes (a problem we have encountered elsewhere). This non-atomicity
would then cause an issue where a second thread would see an
intermediate register state (new high dword, old low dword), and this
register would randomly be used in preference to its own thread register.
This would cause the second thread to read from and write into a fairly
random tiled location. Breaking the operation into 3 explicit 32-bit
updates (first disable the fence, poke the upper bits, then poke the lower
bits and enable) ensures that, given proper serialisation between the
32-bit register write and the memory transfer, that the fence value is
always consistent.

Armed with this knowledge, we can explain how the previous workaround
work. The key to the corruption is that a second thread sees an
erroneous fence register that conflicts and overrides its own. By
serialising the fence update across all CPUs, we have a small window
where no GTT access is occurring and so hide the potential corruption.
This also leads to the conclusion that the earlier workaround was
incomplete.

v2: Be overly paranoid about the order in which fence updates become
visible to the GPU to make really sure that we turn the fence off before
doing the update, and then only switch the fence on afterwards.

Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Carsten Emde <C.Emde@osadl.org>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# db1b76ca 09-Jul-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: don't frob mm.suspended when not using ums

In kernel modeset driver mode we're in full control of the chip,
always. So there's no need at all to set mm.suspended in
i915_gem_idle. Hence move that out into the leavevt ioctl. Since
i915_gem_idle doesn't suspend gem any more we can also drop the
re-enabling for KMS in the thaw function.

Also clean up the handling of mm.suspend at driver load by coalescing
all the assignments.

Stumbled over while reading through our resume code for unrelated
reasons.

v2: Shovel mm.suspended into the (newly created) ums dungeon as
suggested by Chris Wilson. The plan is that once we've completely
stopped relying on the register save/restore code we could shovel even
that in there.

v3: Improve the locking for the entervt/leavevt ioctls a bit by moving
the dev->struct_mutex locking outside of i915_gem_idle. Also don't
clear dev_priv->ums.mm_suspended for the kms case, we allocate it with
kzalloc. Both suggested by Chris Wilson.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 02978ff5 09-Jul-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix write-read race with multiple rings

Daniel noticed a problem where is we wrote to an object with ring A in
the middle of a very long running batch, then executed a quick batch on
ring B before a batch that reads from the same object, its obj->ring would
now point to ring B, but its last_write_seqno would be still relative to
ring A. This would allow for the user to read from the object before the
GPU had completed the write, as set_domain would only check that ring B
had passed the last_write_seqno.

To fix this simply (and inelegantly), we bump the last_write_seqno when
switching rings so that the last_write_seqno is always relative to the
current obj->ring.

This fixes igt/tests/gem_write_read_ring_switch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
[danvet: Add note about the newly created igt which exercises this
bug.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 06755608 05-Jul-2013 Xiong Zhang <xiong.y.zhang@intel.com>

drm/i915: Correct obj->mm_list link to dev_priv->dev_priv->mm.inactive_list

obj->mm_list link to dev_priv->mm.inactive_list/active_list
obj->global_list link to dev_priv->mm.unbound_list/bound_list

This regression has been introduced in

commit 93927ca52a55c23e0a6a305e7e9082e8411ac9fa
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Thu Jan 10 18:03:00 2013 +0100

drm/i915: Revert shrinker changes from "Track unbound pages"

Cc: stable@vger.kernel.org
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
[danvet: Add regression notice.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c6cfb325 05-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Embed drm_mm_node in i915 gem obj

Embedding the node in the obj is more natural in the transition to VMAs
which will also have embedded nodes. This change also helps transition
away from put_block to remove node.

Though it's quite an uncommon occurrence, it's somewhat convenient to not
fail at bind time because we cannot allocate the node. Though in
practice there are other allocations (like the request structure) which
would probably make this point not terribly useful.

Quoting Daniel:
Note that the only difference between put_block and remove_node is
that the former fills up the preallocation cache. Which we don't need
anyway and hence is just wasted space.

v2: Clean up the stolen preallocation code.
Rebased on the reserve_node patches
renames ggtt_ stuff to gtt_ stuff
WARN_ON if the object is already bound (which doesn't mean it's in the
bound list, tricky)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# edd41a87 05-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Kill obj->gtt_offset

With the getters in place from the previous patch this members serves no
purpose other than saving one spare pointer chase, which will be killed
in the next patch anyway.

Moving to VMAs, this members adds unnecessary confusion since an object
may exist at different offsets in different VMs.

v2: Properly preserve the stolen offset. This code is a bit hacky but it
all goes away when we embed the drm_mm_node and removes the need for the
incorrect patch I submitted previously: "Use gtt_space->start for stolen
reservation"

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f343c5f6 05-Jul-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Getter/setter for object attributes

Soon we want to gut a lot of our existing assumptions how many address
spaces an object can live in, and in doing so, embed the drm_mm_node in
the object (and later the VMA).

It's possible in the future we'll want to add more getter/setter
methods, but for now this is enough to enable the VMAs.

v2: Reworked commit message (Ben)
Added comments to the main functions (Ben)
sed -i "s/i915_gem_obj_set_color/i915_gem_obj_ggtt_set_color/" drivers/gpu/drm/i915/*.[ch]
sed -i "s/i915_gem_obj_bound/i915_gem_obj_ggtt_bound/" drivers/gpu/drm/i915/*.[ch]
sed -i "s/i915_gem_obj_size/i915_gem_obj_ggtt_size/" drivers/gpu/drm/i915/*.[ch]
sed -i "s/i915_gem_obj_offset/i915_gem_obj_ggtt_offset/" drivers/gpu/drm/i915/*.[ch]
(Daniel)

v3: Rebased on new reserve_node patch
Changed DRM_DEBUG_KMS to actually work (will need fixing later)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d26e3af8 29-Jun-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refactor the wait_rendering completion into a common routine

Harmonise the completion logic between the non-blocking and normal
wait_rendering paths, and move that logic into a common function.

In the process, we note that the last_write_seqno is by definition the
earlier of the two read/write seqnos and so all successful waits will
have passed the last_write_seqno. Therefore we can unconditionally clear
the write seqno and its domains in the completion logic.

v2: Add the missing ring parameter, because sometimes it is good to have
things compile.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# daa13e1c 28-Jun-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only clear write-domains after a successful wait-seqno

In the introduction of the non-blocking wait, I cut'n'pasted the wait
completion code from normal locked path. Unfortunately, this neglected
that the normal path returned early if the wait returned early. The
result is that read-only waits may return whilst the GPU is still
writing to the bo.

Fixes regression from
commit 3236f57a0162391f84b93f39fc1882c49a8998c7 [v3.7]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Aug 24 09:35:09 2012 +0100

drm/i915: Use a non-blocking wait for set-to-domain ioctl

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66163
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 3765f304 07-Jun-2013 Jani Nikula <jani.nikula@intel.com>

drm/i915: fix build warning on format specifier mismatch

drivers/gpu/drm/i915/i915_gem.c: In function ‘i915_gem_object_bind_to_gtt’:
drivers/gpu/drm/i915/i915_gem.c:3002:3: warning: format ‘%ld’ expects
argument of type ‘long int’, but argument 5 has type ‘size_t’ [-Wformat]

v2: Use %zu instead of %d. Two char patch, and 100% wrong. (Ville)

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1625e7e5 24-Jun-2013 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

drm/i915: make compact dma scatter lists creation work with SWIOTLB backend.

Git commit 90797e6d1ec0dfde6ba62a48b9ee3803887d6ed4
("drm/i915: create compact dma scatter lists for gem objects") makes
certain assumptions about the under laying DMA API that are not always
correct.

On a ThinkPad X230 with an Intel HD 4000 with Xen during the bootup
I see:

[drm:intel_pipe_set_base] *ERROR* pin & fence failed
[drm:intel_crtc_set_config] *ERROR* failed to set mode on [CRTC:3], err = -28

Bit of debugging traced it down to dma_map_sg failing (in
i915_gem_gtt_prepare_object) as some of the SG entries were huge (3MB).

That unfortunately are sizes that the SWIOTLB is incapable of handling -
the maximum it can handle is a an entry of 512KB of virtual contiguous
memory for its bounce buffer. (See IO_TLB_SEGSIZE).

Previous to the above mention git commit the SG entries were of 4KB, and
the code introduced by above git commit squashed the CPU contiguous PFNs
in one big virtual address provided to DMA API.

This patch is a simple semi-revert - were we emulate the old behavior
if we detect that SWIOTLB is online. If it is not online then we continue
on with the new compact scatter gather mechanism.

An alternative solution would be for the the '.get_pages' and the
i915_gem_gtt_prepare_object to retry with smaller max gap of the
amount of PFNs that can be combined together - but with this issue
discovered during rc7 that might be too risky.

Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Chris Wilson <chris@chris-wilson.co.uk>
CC: Imre Deak <imre.deak@intel.com>
CC: Daniel Vetter <daniel.vetter@ffwll.ch>
CC: David Airlie <airlied@linux.ie>
CC: <dri-devel@lists.freedesktop.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 426729dc 24-Jun-2013 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

drm/i915: make compact dma scatter lists creation work with SWIOTLB backend.

Git commit 90797e6d1ec0dfde6ba62a48b9ee3803887d6ed4
("drm/i915: create compact dma scatter lists for gem objects") makes
certain assumptions about the under laying DMA API that are not always
correct.

On a ThinkPad X230 with an Intel HD 4000 with Xen during the bootup
I see:

[drm:intel_pipe_set_base] *ERROR* pin & fence failed
[drm:intel_crtc_set_config] *ERROR* failed to set mode on [CRTC:3], err = -28

Bit of debugging traced it down to dma_map_sg failing (in
i915_gem_gtt_prepare_object) as some of the SG entries were huge (3MB).

That unfortunately are sizes that the SWIOTLB is incapable of handling -
the maximum it can handle is a an entry of 512KB of virtual contiguous
memory for its bounce buffer. (See IO_TLB_SEGSIZE).

Previous to the above mention git commit the SG entries were of 4KB, and
the code introduced by above git commit squashed the CPU contiguous PFNs
in one big virtual address provided to DMA API.

This patch is a simple semi-revert - were we emulate the old behavior
if we detect that SWIOTLB is online. If it is not online then we continue
on with the new compact scatter gather mechanism.

An alternative solution would be for the the '.get_pages' and the
i915_gem_gtt_prepare_object to retry with smaller max gap of the
amount of PFNs that can be combined together - but with this issue
discovered during rc7 that might be too risky.

Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Chris Wilson <chris@chris-wilson.co.uk>
CC: Imre Deak <imre.deak@intel.com>
CC: Daniel Vetter <daniel.vetter@ffwll.ch>
CC: David Airlie <airlied@linux.ie>
CC: <dri-devel@lists.freedesktop.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 19b2dbde 12-Jun-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Restore fences after resume and GPU resets

Stéphane Marchesin found that fences for pinned objects (i.e. the
scanout) were not being restored upon resume, leading to corruption on
the display and reference counting issues. This is due to a bug in

commit 312817a39f17dbb4de000165b5b724e3728cd91c [2.6.38]
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Nov 22 11:50:11 2010 +0000

drm/i915: Only save and restore fences for UMS

that zapped the pinned fences even though they were in use.
Fortuitously, whilst we forced a VT switch during suspend and resume,
no fences were ever pinned at the time. However, we now can do
switchless S3 transitions and so the old bug finally surfaces.

Reported-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# aa60c664 12-Jun-2013 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: find guilty batch buffer on ring resets

After hang check timer has declared gpu to be hung,
rings are reset. In ring reset, when clearing
request list, do post mortem analysis to find out
the guilty batch buffer.

Select requests for further analysis by inspecting
the completed sequence number which has been updated
into the HWS page. If request was completed, it can't
be related to the hang.

For noncompleted requests mark the batch as guilty
if the ring was not waiting and the ring head was
stuck inside the buffer object or in the flush region
right after the batch. For everything else, mark
them as innocents.

v2: Fixed a typo in commit message (Ville Syrjälä)

v3: - more descriptive function parameters (Chris Wilson)
- use masked head address when inspecting if request is in ring
- s/hangcheck.last_action/hangcheck.action
- added comment about unmasked head hitting batch_obj range

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 7d736f4f 12-Jun-2013 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: add batch bo to i915_add_request()

In order to track down a batch buffer and context which
caused the ring to hang, store reference to bo into the request struct.
Request can also cause gpu to hang after the batch in the flush section
in the ring. To detect this add start of the flush portion offset into the
request.

v2: Included comment about request vs batch_obj lifetimes (Chris Wilson)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 0025c077 11-Jun-2013 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: change i915_add_request to macro

Only execbuffer needed all the parameters on i915_add_request().
By putting __i915_add_request behind macro, all current callsites
become cleaner. Following patch will introduce a new parameter
for __i915_add_request. With this patch, only the relevant callsite
will reflect the change making commit smaller and easier to understand.

v2: _i915_add_request as function name (Chris Wilson)

v3: change name __i915_add_request and fix ordering of params (Ben Widawsky)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 7abb690a 24-May-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Fix spurious -EIO/SIGBUS on wedged gpus

Chris Wilson noticed that since

commit 1f83fee08d625f8d0130f9fe5ef7b17c2e022f3c [v3.9]
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Thu Nov 15 17:17:22 2012 +0100

drm/i915: clear up wedged transitions

X can again get -EIO when it does not expect it. And even worse score
a SIGBUS when accessing gtt mmaps. The established ABI is that we
_only_ return an -EIO from execbuf - all other ioctls should just
work. And since the reset code moves all bos out of gpu domains and
clears out all the last_seqno/ring tracking there really shouldn't be
any reason for non-execbuf code to ever touch the hw and see an -EIO.

After some extensive discussions we've noticed that these spurios -EIO
are caused by i915_gem_wait_for_error:

http://www.mail-archive.com/intel-gfx@lists.freedesktop.org/msg20540.html

That is easy to fix by returning 0 instead of -EIO, since grabbing the
dev->struct_mutex does not yet mean that we actually want to touch the
hw. And so there is no reason at all to fail with -EIO.

But that's not the entire since, since often (at least it's easily
googleable) dmesg indicates that the reset fails and we declare the
gpu wedged. Then, quite a bit later X wakes up with the "Timed out
waiting for the gpu reset to complete" DRM_ERROR message in
wait_for_errror and brings down the desktop with an -EIO/SIGBUS.

So clearly we're missing a wakeup somewhere, since the gpu reset just
doesn't take 10 seconds to complete. And indeed we're do handle the
terminally wedged state wrong.

Fix this all up.

References: https://bugs.freedesktop.org/show_bug.cgi?id=63921
References: https://bugs.freedesktop.org/show_bug.cgi?id=64073
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 35c20a60 31-May-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Rename the gtt_list to global_list

Since it will be used for the global bound/unbound list with full PPGTT,
this helps clarify things for upcoming code rework.

Recommended-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 401c29f6 31-May-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: unpin pages at unbind

If we properly keep track of the pages_pin_count, then when we later add
multiple address spaces, the put_pages doesn't need any special checks
to be able to perform it's job.

CC: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Rebased on top of the fix for stolen memory pinning.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1d64ae71 31-May-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Unpin stolen pages

The way the stolen handling works is we take a pin on the backing pages,
but we never actually get a reference to the bo. On freeing objects
allocated with stolen memory, the final unref will end up freeing the
object with pinned pages count left. To enable an assertion to catch
bugs in this code path, this patch cleans up that remaining pin.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 9a8a2213 28-May-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Vebox ringbuffer init

v2: Add set_seqno which didn't exist before rebase (Haihao)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 0a9ae0d7 25-May-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: pre-fixes for checkpatch

Since I'll need to modify i915_gem_object_bind_to_gtt(), fix the errors
now to get checkpatch to not complain.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Resolve conflict with Chris' improved debug output, and
bikeshed the new variable with s/max/gtt_max/ a bit while at it.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 2dc8aae0 22-May-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Workaround incoherence with fence updates on Valleyview

In commit 25ff1195f8a0b3724541ae7bbe331b4296de9c06
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu Apr 4 21:31:03 2013 +0100

drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

we introduced an empirical workaround for memory corruption when using
fences from multiple CPUs. At the time, we did not have any results for
Valleyview, so the presumption was that it was limited to recent
generations using LLC. Now we have evidence that Valleyview also suffers
incoherence and requires a similar but different workaround. For
Valleyview, the wbinvd instruction is insufficient and we require the
serialising register write per-CPU. Conversely, that serialising
register write is not enough for SNB/IVB/HSW. To compromise and keep the
code relatively clean, employ both serialisation techniques in the same
workaround.

Reported-by: Jon Bloomfield <jon.bloomfield@intel.com>
Tested-by: Jon Bloomfield <jon.bloomfield@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a36689cb 21-May-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Be more informative when reporting "too large for aperture" error

This should help debugging the truly unexpected cases where it occurs -
in particular to see which value is garbage.

References: https://bugzilla.kernel.org/show_bug.cgi?id=58511
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: s/%ld/%zd/ as spotted by Wu Fengguang's autobuilder.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e054cc39 21-May-2013 Imre Deak <imre.deak@intel.com>

drm/i915: avoid premature timeouts in __wait_seqno()

At the moment wait_event_timeout/wait_event_interruptible_timeout may
time out 1 jiffy too early, as the calculated expiry time is 1 less than
needed. Besides timing out too early this also means that the
calculation of the remaining time will be incorrect and we will pass a
non-zero remaining time to user space in case of a time out. This is one
reason for the following bugzilla report:

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64270

Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 0e50e96b 02-May-2013 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: add context into request struct

Storing context reference into request struct
allows us to inspect context and its associated
objects when requests are retired.

Both ppgtt and arb robustness work will need
this.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4f42f4ef 26-Apr-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Always normalize return timeout for wait_timeout_ioctl

As we recompute the remaining timeout after waiting, there is a
potential for that timeout to be less than zero and so need sanitizing.
The timeout is always returned to userspace and validated, so we should
always perform the sanitation.

v2 [vsyrjala]: Only normalize the timespec if it's invalid
v3: Add a comment to clarify the situation and remove the now
useless WARN_ON() (ickle)

Cc: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 42b5aeab 09-Apr-2013 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: IVB/HSW have 32 fence register

Increase the number of fence registers to 32 on IVB/HSW. VLV however
only has 16 fence registers according to the docs.

Increasing the number of fences was attempted before [1], but there was
some uncertainty about the maximum CPU fence number for FBC. Since then
BSpec has been updated to state that there are in fact 32 fence registers,
and the CPU fence number field in the SNB_DPFC_CTL_SA register is 5 bits,
and the CPU fence number field in the ILK_DPFC_CONTROL register must be
zero. So now it all makes sense.

[1] http://lists.freedesktop.org/archives/intel-gfx/2011-October/012865.html

v2: Include some background information based on the previous attempt

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b7c36d25 08-Apr-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Allow PPGTT enable to fail

I'm really not happy that we have to support this, but this will be the
simplest way to handle cases where PPGTT init can fail, which I promise
will be coming in the future.

v2: Resolve conflicts due to patch series reordering.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6197349b 08-Apr-2013 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Abstract PPGTT enabling

Since we've already set up a nice vtable to abstract other PPGTT
functions, also abstract the actual register programming to enable
things.

This function will probably need to change a bit as we implement real
processes.

v2: Resolve conflicts due to patch series reordering.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 25ff1195 04-Apr-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Workaround incoherence between fences and LLC across multiple CPUs

In order to fully serialize access to the fenced region and the update
to the fence register we need to take extreme measures on SNB+, and
manually flush writes to memory prior to writing the fence register in
conjunction with the memory barriers placed around the register write.

Fixes i-g-t/gem_fence_thrash

v2: Bring a bigger gun
v3: Switch the bigger gun for heavier bullets (Arjan van de Ven)
v4: Remove changes for working generations.
v5: Reduce to a per-cpu wbinvd() call prior to updating the fences.
v6: Rewrite comments to ellide forgotten history.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Tested-by: Jon Bloomfield <jon.bloomfield@intel.com> (v2)
Cc: stable@vger.kernel.org
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 88a2b2a3 05-Apr-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Don't wait for PCH on reset

BIOS should be setting this, but in case it doesn't...

v2: Define the bits we actually want to clear (Jesse)
Make it an RMW op (Jesse)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 2db76d7c 26-Mar-2013 Imre Deak <imre.deak@intel.com>

lib/scatterlist: sg_page_iter: support sg lists w/o backing pages

The i915 driver uses sg lists for memory without backing 'struct page'
pages, similarly to other IO memory regions, setting only the DMA
address for these. It does this, so that it can program the HW MMU
tables in a uniform way both for sg lists with and without backing pages.

Without a valid page pointer we can't call nth_page to get the current
page in __sg_page_iter_next, so add a helper that relevant users can
call separately. Also add a helper to get the DMA address of the current
page (idea from Daniel).

Convert all places in i915, to use the new API.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f9c513e9 26-Mar-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Always call fence-lost prior to removing the fence

There is a minute window for a race between put-fence removing the fence
and for a new transaction by an external party on the GTT mmap. That is
we must zap the mmap prior to removing the fence and not afterwards.

Fixes regression from
commit 61050808bb019ebea966b7b5bfd357aaf219fb51
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Apr 17 15:31:31 2012 +0100

drm/i915: Refactor put_fence() to use the common fence writing routine

v2: Remember the fence to remove with a local variable (gcc)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 90797e6d 18-Feb-2013 Imre Deak <imre.deak@intel.com>

drm/i915: create compact dma scatter lists for gem objects

So far we created a sparse dma scatter list for gem objects, where each
scatter list entry represented only a single page. In the future we'll
have to handle compact scatter lists too where each entry can consist of
multiple pages, for example for objects imported through PRIME.

The previous patches have already fixed up all other places where the
i915 driver _walked_ these lists. Here we have the corresponding fix to
_create_ compact lists. It's not a performance or memory footprint
improvement, but it helps to better exercise the new logic.

Reference: http://www.spinics.net/lists/dri-devel/msg33917.html
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 67d5a50c 18-Feb-2013 Imre Deak <imre.deak@intel.com>

drm/i915: handle walking compact dma scatter lists

So far the assumption was that each dma scatter list entry contains only
a single page. This might not hold in the future, when we'll introduce
compact scatter lists, so prepare for this everywhere in the i915 code
where we walk such a list.

We'll fix the place _creating_ these lists separately in the next patch
to help the reviewing/bisectability.

Reference: http://www.spinics.net/lists/dri-devel/msg33917.html
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d62b4892 08-Mar-2013 Jesse Barnes <jbarnes@virtuousgeek.org>

drm/i915: allow force wake at init time on VLV v2

We need to set the 'allow force wake' bit to enable forcewake handling
later on.

v2: split from clock gating patch (Jani)
check for allowwakeack (Ville)

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 2bb4629a 22-Feb-2013 Ville Syrjälä <ville.syrjala@linux.intel.com>

drm/i915: Add to_user_ptr()

to_user_ptr() simply casts a pointer passed as u64 from user space
to void __user * correctly. Using this lets us get rid of all the
tiresome casts.

The idea came from Chris Wilson <chris@chris-wilson.co.uk>.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 496ad9aa 23-Jan-2013 Al Viro <viro@zeniv.linux.org.uk>

new helper: file_inode(file)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# eb32e458 14-Feb-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Use HAS_L3_GPU_CACHE in i915_gem_l3_remap

Yet another remnant ... this might explain why l3 remapping didn't
really work on HSW.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57441
Spotted-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 769ce464 13-Feb-2013 Imre Deak <imre.deak@intel.com>

drm/i915: don't clflush gem objects in stolen memory

As explained by Chris Wilson gem objects in stolen memory are always
coherent with the GPU so we don't need to ever flush the CPU caches for
these.

This fixes a breakage - at least with the compact sg patches applied -
during the resume/restore gtt mappings path, when we tried to clflush an
FB object in stolen memory, but since stolen objects don't have backing
pages we passed an invalid page pointer to drm_clflush_page().

Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4fc7c971 08-Feb-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Extract ring init from hw_init

The ring initialization will differ a bit in upcoming generations, and
this split will prepare the code for what's needed.

This patch also fixes a bug introduced in:
commit 99433931950f33039d9e1a52b4ed9af3f1b58e84
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Tue Jan 22 14:12:17 2013 +0200

drm/i915: use gem_set_seqno() on hardware init

After doing the extraction, the bad error handling became obvious. I
acknowledge that this should be two patches, but it's a pretty
small/trivial patch. If requested, I can certainly do the fix as a
distinct patch.

v2: Should be cleanup blt, not init blt on failure (Chris)

v3: Forgot to git add on v2

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 725a5b54 08-Jan-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only run idle processing from i915_gem_retire_requests_worker

When adding the fb idle detection to mark-inactive, it was forgotten
that userspace can drive the processing of retire-requests. We assumed
that it would be principally driven by the retire requests worker,
running once every second whilst active and so we would get the deferred
timer for free. Instead we spend too many CPU cycles reclocking the LVDS
preventing real work from being done.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reported-and-tested-by: Alexander Lam <lambchop468@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58843
Cc: stable@vger.kernel.org
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 99433931 22-Jan-2013 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: use gem_set_seqno() on hardware init

When machine was rebooted or module was reloaded,
gem_hw_init() set last_seqno to be identical to next_seqno.
This lead to situation that waits for first ever request
always passed immediately regardless if it was actually
executed.

Use gem_set_seqno() to be consistent how hw is
initialized on init, wrap and on resume.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f69061be 06-Dec-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: create a race-free reset detection

With the previous patch the state transition handling of the reset
code itself is now (hopefully) race free and solid. But that still
leaves out everyone else - with the various lock-free wait paths
we have there's the possibility that the reset happens between the
point where we read the seqno we should wait on and the actual wait.

And if __wait_seqno then never sees the RESET_IN_PROGRESS state, we'll
happily wait for a seqno which will in all likelyhood never signal.

In practice this is not a big problem since the X server gets
constantly interrupted, and can then submit more work (hopefully) to
unblock everyone else: As soon as a new seqno write lands, all waiters
will unblock. But running the i-g-t reset testcase ZZ_hangman can
expose this race, especially on slower hw with fewer cpu cores.

Now looking forward to ARB_robustness and friends that's not the best
possible behaviour, hence this patch adds a reset_counter to be able
to detect any reset, even if a given thread never observed the
in-progress state.

The important part is to correctly order things:
- The write side needs to increment the counter after any seqno gets
reset. Hence we need to do that at the end of the reset work, and
again wake everyone up. We also need to place a barrier in between
any possible seqno changes and the counter increment, since any
unlock operations only guarantee that nothing leaks out, but not
that at later load operation gets moved ahead.
- On the read side we need to ensure that no reset can sneak in and
invalidate the seqno. In all cases we can use the one-sided barrier
that unlock operations guarantee (of the lock protecting the
respective seqno/ring pair) to ensure correct ordering. Hence it is
sufficient to place the atomic read before the mutex/spin_unlock and
no additional barriers are required.

The end-result of all this is that we need to wake up everyone twice
in a reset operation:
- First, before the reset starts, to get any lockholders of the locks,
so that the reset can proceed.
- Second, after the reset is completed, to allow waiters to properly
and reliably detect the reset condition and bail out.

I admit that this entire reset_counter thing smells a bit like
overkill, but I think it's justified since it makes it really explicit
what the bail-out condition is. And we need a reset counter anyway to
implement ARB_robustness, and imo with finer-grained locking on the
horizont this is the most resilient scheme I could think of.

v2: Drop spurious change in the wait_for_error EXIT_COND - we only
need to wait until we leave the reset-in-progress wedged state.

v3: Don't play tricks with barriers in the throttle ioctl, the
spin_unlock is barrier enough.

I've also considered using a little helper to grab the current
reset_counter, but then decided that hiding the atomic_read isn't a
great idea, since having it explicitly show up in the code is a nice
remainder to reviews to check the memory barriers.

v4: Add a comment to explain why we need to fall through in
__wait_seqno in the end variable assignments.

v5: Review from Damien:
- s/smb/smp/ in a comment
- don't increment the reset counter after we've set it to WEDGED. Now
we (again) properly wedge the gpu when the reset fails.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 97c809fd 09-Oct-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only apply the mb() when flushing the GTT domain during a finish

Now that we seem to have brought order to the GTT barriers, the last one
to review is the terminal barrier before we unbind the buffer from the
GTT. This needs to only be performed if the buffer still resides in the
GTT domain, and so we can skip some needless barriers otherwise.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d0a57789 09-Oct-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only insert the mb() before updating the fence parameter

With a fence, we only need to insert a memory barrier around the actual
fence alteration for CPU accesses through the GTT. Performing the
barrier in flush-fence was inserting unnecessary and expensive barriers
for never fenced objects.

Note removing the barriers from flush-fence, which was effectively a
barrier before every direct access through the GTT, revealed that we
where missing a barrier before the first access through the GTT. Lack of
that barrier was sufficient to cause GPU hangs.

v2: Add a couple more comments to explain the new barriers

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1f83fee0 15-Nov-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: clear up wedged transitions

We have two important transitions of the wedged state in the current
code:

- 0 -> 1: This means a hang has been detected, and signals to everyone
that they please get of any locks, so that the reset work item can
do its job.

- 1 -> 0: The reset handler has completed.

Now the last transition mixes up two states: "Reset completed and
successful" and "Reset failed". To distinguish these two we do some
tricks with the reset completion, but I simply could not convince
myself that this doesn't race under odd circumstances.

Hence split this up, and add a new terminal state indicating that the
hw is gone for good.

Also add explicit #defines for both states, update comments.

v2: Split out the reset handling bugfix for the throttle ioctl.

v3: s/tmp/wedged/ sugested by Chris Wilson. Also fixup up a rebase
error which prevented this patch from actually compiling.

v4: To unify the wedged state with the reset counter, keep the
reset-in-progress state just as a flag. The terminally-wedged state is
now denoted with a big number.

v5: Add a comment to the reset_counter special values explaining that
WEDGED & RESET_IN_PROGRESS needs to be true for the code to be
correct.

v6: Fixup logic errors introduced with the wedged+reset_counter
unification. Since WEDGED implies reset-in-progress (in a way we're
terminally stuck in the dead-but-reset-not-completed state), we need
ensure that we check for this everywhere. The specific bug was in
wait_for_error, which would simply have timed out.

v7: Extract an inline i915_reset_in_progress helper to make the code
more readable. Also annote the reset-in-progress case with an
unlikely, to help the compiler optimize the fastpath. Do the same for
the terminally wedged case with i915_terminally_wedged.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 308887aa 14-Nov-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: fix reset handling in the throttle ioctl

While auditing the code I've noticed one place (the throttle ioctl)
which does not yet wait for the reset handler to complete and doesn't
properly decode the wedge state into -EAGAIN/-EIO. Fix this up by
calling the right helpers. This might explain the oddball "my
compositor just died in a successfull gpu reset" reports. Or maybe not, since
current mesa doesn't use this ioctl to throttle command submission.

The throttle ioctl doesn't take the struct_mutex, so to avoid busy-looping
with -EAGAIN while a reset is in process, check for errors first and wait
for the handler to complete if a reset is pending by calling
i915_gem_wait_for_error.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 33196ded 14-Nov-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: move wedged to the other gpu error handling stuff

And to make Ben Widawsky happier, use the gpu_error instead of
the entire device as the argument in some functions.

Drop the outdated comment on ->wedged for now, a follow-up patch will
change the semantics and add a proper comment again.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 99584db3 14-Nov-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: extract hangcheck/reset/error_state state into substruct

This has been sprinkled all over the place in dev_priv. I think
it'd be good to also move all the code into a separate file like
i915_gem_error.c, but that's for another patch.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 93d18799 17-Jan-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Remove use of gtt_mappable_entries

Mappable_end, ie. size is almost always what you want as opposed to the
number of entries. Since we already have that information, we can scrap
the number of entries and only calculate it when needed.

If gtt_start is !0, this will have slightly different behavior. This
difference can only occur in DRI1, and exists when we try to kick out
the firmware fb. The new code seems like a bugfix to me.

The other case where we've changed the behavior is during init we check
the mappable region against our current known upper and lower limits
(64MB, and 512MB). This now matches the comment, and makes things more
convenient after removing gtt_mappable_entries.

Also worth noting is the setting of mappable_end is taken out of setup
because we do it earlier now in the DRI2 case and therefore need to add
that tiny hunk to support the DRI1 IOCTL.

v2: Move up mappable end to before legacy AGP init

v3: Add the dev_priv inclusion here from previous rebase error in patch
5

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> (v2)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: squash in fix for a printk format flag mismatch warning.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5d4545ae 17-Jan-2013 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Create a gtt structure

The purpose of the gtt structure is to help isolate our gtt specific
properties from the rest of the code (in doing so it help us finish the
isolation from the AGP connection).

The following members are pulled out (and renamed):
gtt_start
gtt_total
gtt_mappable_end
gtt_mappable
gtt_base_addr
gsm

The gtt structure will serve as a nice place to put gen specific gtt
routines in upcoming patches. As far as what else I feel belongs in this
structure: it is meant to encapsulate the GTT's physical properties.
This is why I've not added fields which track various drm_mm properties,
or things like gtt_mtrr (which is itself a pretty transient field).

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
[Ben modified commit messages]
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 43e28f09 08-Jan-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Bail if we attempt to allocate pages for a purged object

Move the existing checking inside bind_to_gtt() to the more appropriate
layer in order to prevent recreation of the pages after they have been
explicitly truncated.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# dd624afd 14-Jan-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add a debug interface to forcibly evict and shrink our object caches

As a means to investigate some bad system behaviour related to the
purging of the active, inactive and unbound lists, it is useful to be
able to manually control when those lists should be cleared.

v2: use _safe list iterators as we kick objects from the list as we
walk.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add a small comment explaining why we don't need to check and
wait for gpu resets, acked by Chris on irc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 0fa87796 07-Jan-2013 Imre Deak <imre.deak@intel.com>

drm/i915: use gtt_get_size() instead of open coding it

Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 56c844e5 07-Jan-2013 Imre Deak <imre.deak@intel.com>

drm/i915: merge {i965, sandybridge}_write_fence_reg()

The two functions are rather similar, so merge them.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d865110c 07-Jan-2013 Imre Deak <imre.deak@intel.com>

drm/i915: merge get_gtt_alignment/get_unfenced_gtt_alignment()

The two functions are rather similar, so merge them.

Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 93927ca5 10-Jan-2013 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Revert shrinker changes from "Track unbound pages"

This partially reverts

commit 6c085a728cf000ac1865d66f8c9b52935558b328
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Aug 20 11:40:46 2012 +0200

drm/i915: Track unbound pages

Closer inspection of that patch revealed a bunch of unrelated changes
in the shrinker:
- The shrinker count is now in pages instead of objects.
- For counting the shrinkable objects the old code only looked at the
inactive list, the new code looks at all bounds objects (including
pinned ones). That is obviously in addition to the new unbound list.
- The shrinker cound is no longer scaled with
sysctl_vfs_cache_pressure. Note though that with the default tuning
value of vfs_cache_pressue = 100 this doesn't affect the shrinker
behaviour.
- When actually shrinking objects, the old code first dropped
purgeable objects, then normal (inactive) objects. Only then did it,
in a last-ditch effort idle the gpu and evict everything. The new
code omits the intermediate step of evicting normal inactive
objects.

Safe for the first change, which seems benign, and the shrinker count
scaling, which is a bit a different story, the endresult of all these
changes is that the shrinker is _much_ more likely to fall back to the
last-ditch resort of idling the gpu and evicting everything. The old
code could only do that if something else evicted lots of objects
meanwhile (since without any other changes the nr_to_scan will be
smaller than the object count).

Reverting the vfs_cache_pressure behaviour itself is a bit bogus: Only
dentry/inode object caches should scale their shrinker counts with
vfs_cache_pressure. Originally I've had that change reverted, too. But
Chris Wilson insisted that it's too bogus and shouldn't again see the
light of day.

Hence revert all these other changes and restore the old shrinker
behaviour, with the minor adjustment that we now first scan the
unbound list, then the inactive list for each object category
(purgeable or normal).

A similar patch has been tested by a few people affected by the gen4/5
hangs which started to appear in 3.7, which some people bisected to
the "drm/i915: Track unbound pages" commit. But just disabling the
unbound logic alone didn't change things at all.

Note that this patch doesn't fix the referenced bugs, it only hides
the underlying bug(s) well enough to restore pre-3.7 behaviour. The
key to achieve that is to massively reduce the likelyhood of going
into a full gpu stall and evicting everything.

v2: Reword commit message a bit, taking Chris Wilson's comment into
account.

v3: On Chris Wilson's insistency, do not reinstate the rather bogus
vfs_cache_pressure change.

Tested-by: Greg KH <gregkh@linuxfoundation.org>
Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=55984
References: https://bugs.freedesktop.org/show_bug.cgi?id=57122
References: https://bugs.freedesktop.org/show_bug.cgi?id=56916
References: https://bugs.freedesktop.org/show_bug.cgi?id=57136
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 93be8788 02-Jan-2013 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915; Only increment the user-pin-count after successfully pinning the bo

As along the error path we do not correct the user pin-count for the
failure, we may end up with userspace believing that it has a pinned
object at offset 0 (when interrupted by a signal for example).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d7e5008f 18-Dec-2012 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Move even more gtt code to i915_gem_gtt

This really should have been part of the kill agp series.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# da494d7c 20-Dec-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: disable shrinker lock stealing for create_mmap_offset

The mmap offset structure is not part of the drm/i915 code, but
provided by gem helpers. To avoid leaky abstractions (by either
depending upon implementation details of said helper wrt to
preallocations, or reimplementing it in our code and so fuzzing
around in internal details of that helpr) simply disable
the shrinker lock stealing accross calls into the helper functions.

This should fix igt/gem_tiled_swapping.

v2: Fix cleanup path confusion bemoaned by Chris Wilson.

Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 677feac2 19-Dec-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: optionally disable shrinker lock stealing

commit 5774506f157a91400c587b85d1ce4de56f0d32f6
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Nov 21 13:04:04 2012 +0000

drm/i915: Borrow our struct_mutex for the direct reclaim

added a nice trick to steal the struct_mutex lock in the shrinker if
it's the current task holding it. But this also caused the requirement
that every place which allocates memory needs to be careful about the
gem state of objects, since the shrinker could have pulled the rug out
from under it. We've usually solved this by carefully preallocating
things or ensure that buffers are pinned already.

But the shrinker also reaps mmap offset, so allocating those needs to
be careful, too. Now that code has been factored out into some common
helpers, so either we have fragile code depending upon the common
helper not doing something we don't want it to do. Or we need to
reimplement the mmap offset creation and so also leak implementation
details into our code.

Since this all results in leaky abstraction, cop out by disabling the
lock borrowing trick while calling down into the helpers. That way our
craziness is nicely confined to files in drm/i915.

v2: Split out the change to create_mmap_offset as request by Chris Wilson.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# fca26bb4 19-Dec-2012 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Introduce i915_gem_set_seqno()

This function can be used to set the driver's next_seqno
to arbitrary value.

i915_gem_set_seqno() will idle the gpu, retire outstanding
requests, clear the semaphore mailboxes and set the hardware
status page's seqno index.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ba1a7067 19-Dec-2012 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Always clear semaphore mboxes on seqno wrap

In preparation for setting the seqno to arbitrary value on init or
through debugfs. We need to always clear the semaphores and set the
hws page seqno index by calling intel_ring_init_seqno().

v2: rewrote the commit message as suggested by Chris Wilson.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f7e98ad4 19-Dec-2012 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Initialize hardware semaphore state on ring init

Hardware status page needs to have proper seqno set
as our initial seqno can be arbitrary. If initial seqno is close
to wrap boundary on init and i915_seqno_passed() (31bit space)
refers to hw status page which contains zero, errorneous result
will be returned.

v2: clear mboxes and set hws page directly instead of going
through rings. Suggested by Chris Wilson.

v3: hws needs to be updated for all gens. Noticed by Chris
Wilson.

References: https://bugs.freedesktop.org/show_bug.cgi?id=58230
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8782e26c 18-Dec-2012 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: Bug on unsupported swizzled platforms

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 7dbf9d6e 18-Dec-2012 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: BUG() if fences are used on unsupported platform

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# dc9dd7a2 07-Dec-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Preallocate the drm_mm_node prior to manipulating the GTT drm_mm manager

As we may reap neighbouring objects in order to free up pages for
allocations, we need to be careful not to allocate in the middle of the
drm_mm manager. To accomplish this, we can simply allocate the
drm_mm_node up front and then use the combined search & insert
drm_mm routines, reducing our code footprint in the process.

Fixes (partially) i-g-t/gem_tiled_swapping

Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Again fixup atomic bikeshed.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# eb119bd6 15-Dec-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Access to snooped system memory through the GTT is incoherent

We ignore all the user requests to handle flushing to the GTT domain if
the user requests such on a snoopable bo, and as such access through the
GTT to such pages remains incoherent. The specs even warn that such
behaviour is undefined - a strong reason never to do so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 9e8e3687 04-Dec-2012 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Set initial seqno value close to wrap boundary

To gain confidence in the wrap handling, make it happen quite
soon after the boot.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 107f27a5 10-Dec-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Open-code i915_gpu_idle() for handling seqno wrapping

The complication is that during seqno wrapping we must be extremely
careful not to write to any ring as that will require a new seqno, and
so would recurse back into the seqno wrap handler. So we cannot call
i915_gpu_idle() as that does additional work beyond simply retiring the
current set of requests, and instead must do the minimal work ourselves
during seqno wrapping.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f72b3435 10-Dec-2012 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Don't emit semaphore wait if wrap happened

If wrap just happened we need to prevent emitting waits for
pre wrap values. Detect this and emit no-ops instead.

v2: Use olr > seqno to detect wrap instead of *seqno == 0
as suggested by Chris Wilson.

v3: Use last used seqno to detect the wraparound. From
Chris Wilson

v4: Fixed unnecessary last_seqno assigment

References: https://bugs.freedesktop.org/show_bug.cgi?id=57967
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# caf49191 10-Dec-2012 Linus Torvalds <torvalds@linux-foundation.org>

Revert "revert "Revert "mm: remove __GFP_NO_KSWAPD""" and associated damage

This reverts commits a50915394f1fc02c2861d3b7ce7014788aa5066e and
d7c3b937bdf45f0b844400b7bf6fd3ed50bac604.

This is a revert of a revert of a revert. In addition, it reverts the
even older i915 change to stop using the __GFP_NO_KSWAPD flag due to the
original commits in linux-next.

It turns out that the original patch really was bogus, and that the
original revert was the correct thing to do after all. We thought we
had fixed the problem, and then reverted the revert, but the problem
really is fundamental: waking up kswapd simply isn't the right thing to
do, and direct reclaim sometimes simply _is_ the right thing to do.

When certain allocations fail, we simply should try some direct reclaim,
and if that fails, fail the allocation. That's the right thing to do
for THP allocations, which can easily fail, and the GPU allocations want
to do that too.

So starting kswapd is sometimes simply wrong, and removing the flag that
said "don't start kswapd" was a mistake. Let's hope we never revisit
this mistake again - and certainly not this many times ;)

Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e9b73c67 03-Dec-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reduce memory pressure during shrinker by preallocating swizzle pages

On a machine with bit17 swizzling, we need to store the bit17 of the
physical page address in put-pages. This requires a memory allocation,
on average less than a page, which may be difficult to satisfy is the
request to put-pages is on behalf of the shrinker. We could allow that
allocation to pull from the reserved memory pools, but it seems much
safer to preallocate the array for tiled objects on affected machines.

v2: Export i915_gem_object_needs_bit17_swizzle() for reuse.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 498d2ac1 04-Dec-2012 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Add intel_ring_handle_seqno wrap

If there are pre-wrap values in semaphore-mbox registers after wrap,
syncing against some after-wrap request will complete immediately.
Fix this by emitting ring commands to set mbox registers to zero
when the wrap happens.

v2: Use __intel_ring_begin to emit ring commands, from
Chris Wilson.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add a small comment to handle_seqno_wrap.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1a240d4d 29-Nov-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: fixup sparse warnings

- __iomem where there is none (I love how we mix these things up).
- Use gfp_t instead of an other plain type.
- Unconfuse one place about enum pipe vs enum transcoder - for the pch
transcoder we actually use the pipe enum. Fixup the other cases
where we assign the pipe to the cpu transcoder with explicit casts.
- Declare the mch_lock properly in a header.

There is still a decent mess in intel_bios.c about __iomem, but heck,
this is x86 and we're allowed to do that.

Makes-sparse-happy: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Use a space after the cast consistently and fix up the
newly-added cast in i915_irq.c to properly use __iomem.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4239ca77 03-Dec-2012 Damien Lespiau <damien.lespiau@intel.com>

drm/i915: Fix dieing -> dying typo

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a2165e31 03-Dec-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Decouple the object from the unbound list before freeing pages

As we may actually allocate in order to save the physical swizzling bits
during the free, we have to be careful not to trigger the shrinker on
the same object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Added a small comment in the code to really drive the
scariness of this patch home.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 42dcedd4 15-Nov-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use a slab for object allocation

The primary purpose of this was to debug some use-after-free memory
corruption that was causing an OOPS inside drm/i915. As it turned out
the corruption was being caused elsewhere and i915.ko as a major user of
many objects was being hit hardest.

Indeed as we do frequent the generic kmalloc caches, dedicating one to
ourselves (or at least naming one for us depending upon the core) aids
debugging our own slab usage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 0104fdbb 15-Nov-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Introduce i915_gem_object_create_stolen()

Allow for the creation of GEM objects backed by stolen memory. As these
are not backed by ordinary pages, we create a fake dma mapping and store
the address in the scatterlist rather than obj->pages.

v2: Mark _i915_gem_object_create_stolen() as static, as noticed by Jesse
Barnes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8dcf015e 15-Nov-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: optimize the shmem_pwrite slowpath handling

Since we drop dev->struct_mutex when going through the slowpath, the
object might have been moved out of the cpu domain. Hence we need to
clflush the entire object to ensure that after the ioctl returns,
everything is coherent again (interwoven writes are ill-defined
anyway).

But we only need to do this if we start in the cpu domain and the
object requires flushing for coherency. So don't do the flushing if
the object is coherent anyway or if we've done in-line clfushing
already.

v2: i915_gem_clflush_object already checks whether the object is
coherent and if so, drops the flushing. Hence we don't need to check
that ourselves, simplifying the condition.

v3: Reorder the checks for better clarity (and adjust the comment
accordingly), suggested by Chris Wilson.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a39a6805 15-Nov-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: simplify shmem pwrite/pread slowpath handling

The shmem paths for pwrite/pread used a clever trick to hold onto a
single page when dropping the big dev->struct_mutex for the slowpath.
But this ran the risk of reinstating (or not completely purging) the
backing storage when dropping purgeable objects.

Hence the code needed to keep track of whether it ever dropped the
lock, and if it did, manually check whether it needs to re-purge the
backing storage. But thanks to the pages pin count introduced in

commit a5570178c059cec59e9835be20bc8546377fa7b5
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Sep 4 21:02:54 2012 +0100

drm/i915: Pin backing pages whilst exporting through a dmabuf vmap

which allowed us to pin the backing storage and remove that page
reference trick from shmem_pwrite/read in

commit f60d7f0c1d55a935475ab394955cafddefaa6533
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Sep 4 21:02:56 2012 +0100

drm/i915: Pin backing pages for pread

and

commit 755d22184f1e5015b040acee794542d9cf8a16c5
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Sep 4 21:02:55 2012 +0100

drm/i915: Pin backing pages for pwrite

we can now abolish this check. The slowpath cleanup completely
disappears from pread, and for pwrite we're only left with the domain
fixup in case someone moved the object out of the cpu domain from
under us. A follow-on patch will optimize that a notch more.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 7b01e260 28-Nov-2012 Mika Kuoppala <mika.kuoppala@linux.intel.com>

drm/i915: Set sync_seqno properly after seqno wrap

i915_gem_handle_seqno_wrap() will zero all sync_seqnos but as the
wrap can happen inside ring->sync_to(), pre wrap seqno was
carried over and overwrote the zeroed sync_seqno.

When wrap is handled, all outstanding requests will be retired and
objects moved to inactive queue, causing their last_read_seqno to be zero.
Use this to update the sync_seqno correctly.

RING_SYNC registers after wrap will contain pre wrap values which
are >= seqno. So injecting the semaphore wait into ring completes
immediately.

Original idea for using last_read_seqno from Chris Wilson.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 3e960501 27-Nov-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rearrange code to only have a single method for waiting upon the ring

Replace the wait for the ring to be clear with the more common wait for
the ring to be idle. The principle advantage is one less exported
intel_ring_wait function, and the removal of a hardcoded value.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b662a066 27-Nov-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Simplify flushing activity on the ring

As we now always preallocate the seqno before writing to the ring, we
can trivially test if we have any pending activity on the ring by
inspecting the olr. This makes it then possible to flush operations that
are not normally associated with a request, like power-management.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 9d773091 27-Nov-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Preallocate next seqno before touching the ring

Based on the work by Mika Kuoppala, we realised that we need to handle
seqno wraparound prior to committing our changes to the ring. The most
obvious point then is to grab the seqno inside intel_ring_begin(), and
then to reuse that seqno for all ring operations until the next request.
As intel_ring_begin() can fail, the callers must already be prepared to
handle such failure and so we can safely add further checks.

This patch looks like it should be split up into the interface
changes and the tweaks to move seqno wrapping from the execbuffer into
the core seqno increment. However, I found no easy way to break it into
incremental steps without introducing further broken behaviour.

v2: Mika found a silly mistake and a subtle error in the existing code;
inside i915_gem_retire_requests() we were resetting the sync_seqno of
the target ring based on the seqno from this ring - which are only
related by the order of their allocation, not retirement. Hence we were
applying the optimisation that the rings were synchronised too early,
fortunately the only real casualty there is the handling of seqno
wrapping.

v3: Do not forget to reset the sync_seqno upon module reinitialisation,
ala resume.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=863861
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b5d17794 22-Nov-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wait upon the last request seqno, rather than a future seqno

In commit 69c2fc891343cb5217c866d10709343cff190bdc
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Jul 20 12:41:03 2012 +0100

drm/i915: Remove the per-ring write list

the explicit flush was removed from i915_ring_idle(). However, we
continued to wait upon the next seqno which now did not correspond to
any request (except for the unusual condition of a failure to queue a
request after execbuffer) and so would wait indefinitely.

This has an important side-effect that i915_gpu_idle() does not cause
the seqno to be incremented. This is vital if we are to be able to idle
the GPU to handle seqno wraparound, as in subsequent patches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5774506f 21-Nov-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Borrow our struct_mutex for the direct reclaim

If we have hit oom whilst holding our struct_mutex, then currently we
cannot reap our own GPU buffers which likely pin most of memory, making
an outright OOM more likely. So if we are running in direct reclaim and
already hold the mutex, attempt to free buffers knowing that the
original function can not continue until we return.

v2: Add a note explaining that the mutex may be stolen due to
pre-emption, and that is bad.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8742267a 21-Nov-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Defer assignment of obj->gtt_space until after all possible mallocs

As we may invoke the shrinker whilst trying to allocate memory to hold
the gtt_space for this object, we need to be careful not to mark the
drm_mm_node as activated (by assigning it to this object) before we
have finished our sequence of allocations.

Note: We also need to move the binding of the object into the actual
pagetables down a bit. The best way seems to be to move it out into
the callsites.

Reported-by: Imre Deak <imre.deak@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Added small note to commit message to summarize review
discussion.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c9839303 20-Nov-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pin the object whilst faulting it in

In order to prevent reaping of the object whilst setting it up to
handle the pagefault, we need to mark it as pinned. This has the nice
side-effect of eliminating some special cases from the pagefault handler
as well!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# fbdda6fb 20-Nov-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Guard pages being reaped by OOM whilst binding-to-GTT

In the circumstances that the shrinker is allowed to steal the mutex
in order to reap pages, we need to be careful to prevent it operating on
the current object and shooting ourselves in the foot.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 26b1ff35 04-Nov-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Move the remaining gtt code

It's pretty much all consolidated now that we've killed AGP. We can move
the one outlier, and defines too.

(Kill some unused defines in the process)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e76e9aeb 04-Nov-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Stop using AGP layer for GEN6+

As a quick hack we make the old intel_gtt structure mutable so we can
fool a bunch of the existing code which depends on elements in that data
structure. We can/should try to remove this in a subsequent patch.

This should preserve the old gtt init behavior which upon writing these
patches seems incorrect. The next patch will fix these things.

The one exception is VLV which doesn't have the preserved flush control
write behavior. Since we want to do that for all GEN6+ stuff, we'll
handle that in a later patch. Mainstream VLV support doesn't actually
exist yet anyway.

v2: Update the comment to remove the "voodoo"
Check that the last pte written matches what we readback

v3: actually kill cache_level_to_agp_type since most of the flags will
disappear in an upcoming patch

v4: v3 was actually not what we wanted (Daniel)
Make the ggtt bind assertions better and stricter (Chris)
Fix some uncaught errors at gtt init (Chris)
Some other random stuff that Chris wanted

v5: check for i==0 in gen6_ggtt_bind_object to shut up gcc (Ben)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by [v4]: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Make the cache_level -> agp_flags conversion for pre-gen6 a
tad more robust by mapping everything != CACHE_NONE to the cached agp
flag - we have a 1:1 uncached mapping, but different modes of
cacheable (at least on later generations). Suggested by Chris Wilson.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a4da4fa4 02-Nov-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: extract l3_parity substruct from dev_priv

Pretty astonishing how far apart these two members landed ... Especially since
I've already removed almost 200 lines in between.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 74ce6b6c 19-Oct-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Initialize obj->pages before use by i915_gem_object_do_bit17_swizzle()

If we leave obj->pages set to NULL before attempting to deswizzle them,
then an OOPS is well deserved.

Fixes regression introduced in commit 9da3da660d8c19a54f6e93361d147509be3fff84
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Jun 1 15:20:22 2012 +0100

drm/i915: Replace the array of pages with a scatterlist

Reported-and-tested-by: Krzysztof Kolasa
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a7c2e1aa 17-Oct-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: shut up spurious WARN in the gtt fault handler

-ENOSPC can happen if userspace is being simplistic and tries to map a
too big object. To aid further spurious WARN debugging, also print out
the error code.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56017
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# eda2d7f5 10-Oct-2012 Rodrigo Vivi <rodrigo.vivi@gmail.com>

drm/i915: HSW CRW stability magic

This magic brings stability to HSW CRW machines.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# acb868d3 26-Sep-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Disallow preallocation of requests

The intention was to allow the caller to avoid a failure to queue a
request having already written commands to the ring. However, this is a
moot point as the i915_add_request() can fail for other reasons than a
mere allocation failure and those failure cases are more likely than
ENOMEM. So the overlay code already had to handle i915_add_request()
failures, and due to

commit 3bb73aba1ed5198a2c1dfaac4f3c95459930d84a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Jul 20 12:40:59 2012 +0100

drm/i915: Allow late allocation of request for i915_add_request()

the error handling code in intel_overlay.c was subject to causing
double-frees, as found by coverity.

Rather than further complicate i915_add_request() and callers, realise
the battle is lost and adapt intel_overlay.c to take advantage of the
late allocation of requests.

v2: Handle callers passing in a NULL seqno.
v3: Ditto. This time for sure.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# bcb45086 05-Oct-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Align the retire_requests worker to the nearest second

By using round_jiffies() we can align the wakeup of our worker to the
nearest second in order to batch wakeups and reduce system load, which
is useful for unimportant coarse tasks like our retire_requests.

v2: round_jiffies_relative() already returns the relative timeout value,
so no need to incorrectly perform the subtraction twice. The timer
interface still leaves the possibility for the value of jiffies to
change be we program the timer.

Suggested-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# cecc21fe 05-Oct-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Align the hangcheck wakeup to the nearest second

round_jiffies() aligns the wakeup time to the nearest second in order to
batch wakeups and reduce system load, which is useful for unimportant
coarse timers like our hangcheck.

v2: round_jiffies_relative() returns the relative jiffie value, whereas
we need the absolute value for the timer.

Suggested-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c77d7162 06-Oct-2012 Willy Tarreau <w@1wt.eu>

drm/i915: remove useless BUG_ON which caused a regression in 3.5.

starting an old X server causes a kernel BUG since commit 1b50247a8d:

------------[ cut here ]------------
kernel BUG at drivers/gpu/drm/i915/i915_gem.c:3661!
invalid opcode: 0000 [#1] SMP
Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss uvcvideo
+videobuf2_core videodev videobuf2_vmalloc videobuf2_memops uhci_hcd ath9k mac80211 snd_hda_codec_realtek ath9k_common microcode
+ath9k_hw psmouse serio_raw sg ath cfg80211 atl1c lpc_ich mfd_core ehci_hcd snd_hda_intel snd_hda_codec snd_hwdep snd_pcm rtc_cmos
+snd_timer snd evdev eeepc_laptop snd_page_alloc sparse_keymap

Pid: 2866, comm: X Not tainted 3.5.6-rc1-eeepc #1 ASUSTeK Computer INC. 1005HA/1005HA
EIP: 0060:[<c12dc291>] EFLAGS: 00013297 CPU: 0
EIP is at i915_gem_entervt_ioctl+0xf1/0x110
EAX: f5941df4 EBX: f5940000 ECX: 00000000 EDX: 00020000
ESI: f5835400 EDI: 00000000 EBP: f51d7e38 ESP: f51d7e20
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: b760e0a0 CR3: 351b6000 CR4: 000007d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process X (pid: 2866, ti=f51d6000 task=f61af8d0 task.ti=f51d6000)
Stack:
00000001 00000000 f5835414 f51d7e84 f5835400 f54f85c0 f51d7f10 c12b530b
00000001 c151b139 c14751b6 c152e030 00000b32 00006459 00000059 0000e200
00000001 00000000 00006459 c159ddd0 c12dc1a0 ffffffea 00000000 00000000
Call Trace:
[<c12b530b>] drm_ioctl+0x2eb/0x440
[<c12dc1a0>] ? i915_gem_init+0xe0/0xe0
[<c1052b2b>] ? enqueue_hrtimer+0x1b/0x50
[<c1053321>] ? __hrtimer_start_range_ns+0x161/0x330
[<c10530b3>] ? lock_hrtimer_base+0x23/0x50
[<c1053163>] ? hrtimer_try_to_cancel+0x33/0x70
[<c12b5020>] ? drm_version+0x90/0x90
[<c10ca171>] vfs_ioctl+0x31/0x50
[<c10ca2e4>] do_vfs_ioctl+0x64/0x510
[<c10535de>] ? hrtimer_nanosleep+0x8e/0x100
[<c1052c20>] ? update_rmtp+0x80/0x80
[<c10ca7c9>] sys_ioctl+0x39/0x60
[<c1433949>] syscall_call+0x7/0xb
Code: 83 c4 0c 5b 5e 5f 5d c3 c7 44 24 04 2c 05 53 c1 c7 04 24 6f ef 47 c1 e8 6e e0 fd ff c7 83 38 1e 00 00 00 00 00 00 e9 3f ff ff
+ff <0f> 0b eb fe 0f 0b eb fe 8d b4 26 00 00 00 00 0f 0b eb fe 8d b6
EIP: [<c12dc291>] i915_gem_entervt_ioctl+0xf1/0x110 SS:ESP 0068:f51d7e20
---[ end trace dd332ec083cbd513 ]---

The crash happens here in i915_gem_entervt_ioctl() :

3659 BUG_ON(!list_empty(&dev_priv->mm.active_list));
3660 BUG_ON(!list_empty(&dev_priv->mm.flushing_list));
-> 3661 BUG_ON(!list_empty(&dev_priv->mm.inactive_list));
3662 mutex_unlock(&dev->struct_mutex);

Quoting Chris :
"That BUG_ON there is silly and can simply be removed. The check is to
verify that no batches were submitted to the kernel whilst the UMS/GEM
client was suspended - to which the BUG_ONs are a crude approximation.
Furthermore, the checks are too late, since it means we attempted to
program the hardware whilst it was in an invalid state, the BUG_ONs are
the least of your concerns at that point."

Note that this regression has been introduced in

commit 1b50247a8ddde4af5aaa0e6bc125615372ce6c16
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Apr 24 15:47:30 2012 +0100

drm/i915: Remove the list of pinned inactive objects

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Willy Tarreau <w@1wt.eu>
[danvet: Added note about the regressing commit and cc: stable.]
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4d0f817e 03-Oct-2012 Mika Kuoppala <mika.kuoppala@intel.com>

drm/i915: print warning if vmi915_gem_fault error is not handled

Falling into default case in vmi915_gem_fault is a bug. Be more
verbose about it.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e79e0fe3 03-Oct-2012 Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

drm/i915: EBUSY status handling added to i915_gem_fault().

Subsequent threads returning EBUSY from vm_insert_pfn() was not handled
correctly. As a result concurrent access from new threads to
mmapped data caused SIGBUS.

Note that this fixes i-g-t/tests/gem_threaded_tiled_access.

Tested-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 760285e7 02-Oct-2012 David Howells <dhowells@redhat.com>

UAPI: (Scripted) Convert #include "..." to #include <path/...> in drivers/gpu/

Convert #include "..." to #include <path/...> in drivers/gpu/.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>


# 4126d5d6 02-Oct-2012 David Howells <dhowells@redhat.com>

UAPI: (Scripted) Remove redundant DRM UAPI header #inclusions from drivers/gpu/.

Remove redundant DRM UAPI header #inclusions from drivers/gpu/.

Remove redundant #inclusions of core DRM UAPI headers (drm.h, drm_mode.h and
drm_sarea.h). They are now #included via drmP.h and drm_crtc.h via a preceding
patch.

Without this patch and the patch to make include the UAPI headers from the core
headers, after the UAPI split, the DRM C sources cannot find these UAPI headers
because the DRM code relies on specific -I flags to make #include "..." work
on headers in include/drm/ - but that does not work after the UAPI split without
adding more -I flags.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>


# 3bc2913e 26-Sep-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Fix set_caching locking

On the EINVAL case we don't release struct_mutex. It should be safe to
grab the lock after checking the parameters, which also resolves the
issues.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 199adf40 21-Sep-2012 Ben Widawsky <benjamin.widawsky@intel.com>

drm/i915: s/cacheing/caching/

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 2f745ad3 04-Sep-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Convert the dmabuf object to use the new i915_gem_object_ops

By providing a callback for when we need to bind the pages, and then
release them again later, we can shorten the amount of time we hold the
foreign pages mapped and pinned, and importantly the dmabuf objects then
behave as any other normal object with respect to the shrinker and
memory management.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 9da3da66 01-Jun-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Replace the array of pages with a scatterlist

Rather than have multiple data structures for describing our page layout
in conjunction with the array of pages, we can migrate all users over to
a scatterlist.

One major advantage, other than unifying the page tracking structures,
this offers is that we replace the vmalloc'ed array (which can be up to
a megabyte in size) with a chain of individual pages which helps reduce
memory pressure.

The disadvantage is that we then do not have a simple array to iterate,
or to access randomly. The common case for this is in the relocation
processing, which will typically fit within a single scatterlist page
and so be almost the same cost as the simple array. For iterating over
the array, the extra function call could be optimised away, but in
reality is an insignificant cost of either binding the pages, or
performing the pwrite/pread.

v2: Fix drm_clflush_sg() to not invoke wbinvd as well! And fix the
trivial compile error from rebasing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f60d7f0c 04-Sep-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pin backing pages for pread

By using the recently introduced pinning of pages, we can safely drop
the mutex in the knowledge that the pages are not going to disappear
beneath us, and so we can simplify the code for iterating over the pages.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 755d2218 04-Sep-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pin backing pages for pwrite

By using the recently introduced pinning of pages, we can safely drop
the mutex in the knowledge that the pages are not going to disappear
jeneath us, and so we can simplify the code for iterating over the pages.

Note: The old code had such complicated page refcounting since it used
obj->pages as a micro-optimization if it's there, but that could
(before this patch) disappear when we drop the dev->struct_mutex.
Hence some manual page refcounting was required for the slow path,
complicated by the fact that pages returned by shmem_read_mapping_page
already have a pageref, which needs to be dropped again.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Added note to explain the question Ben raised in review.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a5570178 04-Sep-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pin backing pages whilst exporting through a dmabuf vmap

We need to refcount our pages in order to prevent reaping them at
inopportune times, such as when they currently vmapped or exported to
another driver. However, we also wish to keep the lazy deallocation of
our pages so we need to take a pin/unpinned approach rather than a
simple refcount.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 37e680a1 07-Jun-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Introduce drm_i915_gem_object_ops

In order to specialise functions depending upon the type of object, we
can attach vfuncs to each object via a new ->ops pointer.

For instance, this will be used in future patches to only bind pages from
a dma-buf for the duration that the object is used by the GPU - and so
prevent them from pinning those pages for the entire of the object.

v2: Bonus comments.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 7e81a42e 15-Sep-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reduce a pin-leak BUG into a WARN

Pin-leaks persist and we get the perennial bug reports of machine
lockups to the BUG_ON(pin_count==MAX). If we instead loudly report that
the object cannot be pinned at that time it should prevent the driver from
locking up, and hopefully restore a semblance of working whilst still
leaving us a OOPS to debug.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d7c3b937 27-Aug-2012 Sedat Dilek <sedat.dilek@gmail.com>

drm/i915: Remove __GFP_NO_KSWAPD

When I pulled-in today's drm-intel-next into linux-next (next-20120824)
I saw this build-breakage:

drivers/gpu/drm/i915/i915_gem.c: In function 'i915_gem_object_get_pages_gtt':
drivers/gpu/drm/i915/i915_gem.c:1778:40: error: '__GFP_NO_KSWAPD' undeclared (first use in this function)
drivers/gpu/drm/i915/i915_gem.c:1778:40: note: each undeclared identifier is reported only once for each function it appears in

This is caused by commit ba099ef165f8 ("mm: remove __GFP_NO_KSWAPD")
and commit b6beae2c2014 ("mm: remove __GFP_NO_KSWAPD fixes") in
linux-next (next-20120824).

Fix this by removing __GFP_NO_KSWAPD from drm/i915 driver.

Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 3236f57a 24-Aug-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use a non-blocking wait for set-to-domain ioctl

The principal use for set-to-domain is for userspace to serialise
operations with a particular buffer, for example to maintain coherency
with a CPU map or to ratelimit its rendering by waiting on all previous
operations before continuing. As such we tend to hold the struct_mutex
for long periods during the synchronisation and so cause contention
issues with other users of the graphics device, even for independent
operations as memory management. An example is the contention between
compiz and X which causes jitter in the display and a drop in peak
throughput.

The ultimate solution would be a set of fine grained locks and lockless
operations, but an intermediate step is to first attempt the
synchronisation for set-to-domain without holding the mutex. This
introduces a number of race conditions, so we limit it use to the ioctl
periphery where we have no dependent state and can safely complete with
a locked synchronisation afterwards.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b361237b 24-Aug-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Juggle code order to ease flow of the next patch

Move the wait-for-rendering logic around in the file so that we can
group it together with the subsequent variations. The general goal is to
have the lower level routines clustered together and then the higher
level logic building upon those low level routines that came before.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 0327d6ba 11-Aug-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Extract general object init routine

As we wish to create specialised object constructions in the near
future that share the same basic GEM object struct, export the default
initializer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4d6294bf 11-Aug-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Protect private gem objects from truncate (such as imported dmabuf)

If the object has no backing shmemfs filp, then we obviously cannot
perform a truncation operation upon it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 86a1ee26 11-Aug-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only pwrite through the GTT if there is space in the aperture

Avoid stalling and waiting for the GPU by checking to see if there is
sufficient inactive space in the aperture for us to bind the buffer
prior to writing through the GTT. If there is inadequate space we will
have to stall waiting for the GPU, and incur overheads moving objects
about. Instead, only incur the clflush overhead on the target object by
writing through shmem.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d8cb5086 11-Aug-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Try harder to allocate an mmap_offset

Given the persistence of an offset for the lifetime of an object, itis
easy to contemplate how the mmap space becomes badly fragmented to the
point that further allocations fail with ENOSPC. Our only recourse at
this point is to try to purge the objects to release some space and
reattempt the allocation.

References: https://bugs.freedesktop.org/show_bug.cgi?id=39552
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c4670ad0 20-Aug-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add some sanity checks to unbound tracking

A pair of universally true checks that just need to be put in the right
place depending on where in the patch sequence you go. Note that
i915_gem_object_put_pages_gtt() already gains the
BUG_ON(obj->gtt_space), but on reflection that needed to migrate to
put_pages().

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6c085a72 20-Aug-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track unbound pages

When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.

To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.

As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.

Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).

Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.

v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.

v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 225067ee 20-Aug-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: move functions around

Prep work to make Chris Wilson's unbound tracking patch a bit easier
to read. Alas, I'd have preferred that moving the page allocation
retry loop from bind to get_pages would have been a separate patch,
too. But that looks like real work ;-)

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b6c7488d 14-Aug-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915/contexts: fix list corruption

After reset we unconditionally reinitialize lists. If the context switch
hasn't yet completed before the suspend, the default context object will
end up on lists that are going to go away when we resume.

The patch forces the context switch to be synchronous before suspend
assuring that the active/inactive tracking is correct at the time of
resume.

References: https://bugs.freedesktop.org/show_bug.cgi?id=52429
Tested-by: Guang A Yang <guang.a.yang@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b2eadbc8 09-Aug-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Lazily apply the SNB+ seqno w/a

Avoid the forcewake overhead when simply retiring requests, as often the
last seen seqno is good enough to satisfy the retirment process and will
be promptly re-run in any case. Only ensure that we force the coherent
seqno read when we are explicitly waiting upon a completion event to be
sure that none go missing, and also for when we are reporting seqno
values in case of error or debugging.

This greatly reduces the load for userspace using the busy-ioctl to
track active buffers, for instance halving the CPU used by X in pushing
the pixels from a software render (flash). The effect will be even more
magnified with userptr and so providing a zero-copy upload path in that
instance, or in similar instances where X is simply compositing DRI
buffers.

v2: Reverse the polarity of the tachyon stream. Daniel suggested that
'force' was too generic for the parameter name and that 'lazy_coherency'
better encapsulated the semantics of it being an optimization and its
purpose. Also notice that gen6_get_seqno() is only used by gen6/7
chipsets and so the test for IS_GEN6 || IS_GEN7 is redundant in that
function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e6994aee 10-Jul-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Export ability of changing cache levels to userspace

By selecting the cache level (essentially whether or not the CPU snoops
any updates to the bo, and on more recent machines whether it resides
inside the CPU's last-level-cache) a userspace driver is able to then
manage all of its memory within buffer objects, if it so desires. This
enables the userspace driver to accelerate uploads and more importantly
downloads from the GPU and to able to mix CPU and GPU rendering/activity
efficiently.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Added code comment about where we plan to stuff platform
specific cacheing control bits in the ioctl struct.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 42d6ab48 26-Jul-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Segregate memory domains in the GTT using coloring

Several functions of the GPU have the restriction that differing memory
domains cannot be placed next to each other (as the GPU may prefetch
beyond the end of one domain and hang as it crosses into the other
domain). We use the facility of the drm_mm to mark ranges with a
particular color that corresponds to the cache attributes of those pages
in order to prevent allocating adjacent blocks of differing memory
types.

v2: Rebase ontop of drm_mm coloring v2.
v3: Fix rebinding existing gtt_space and add a verification routine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f047e395 20-Jul-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid concurrent access when marking the device as idle/busy

As suggested by Daniel, rip out the independent timers for device and
crtc busyness and integrate the manual powermanagement of the display
engine into the GEM core and its request tracking. The benefits are that
the code is a lot smaller, fewer moving parts and should fit more neatly
into the overall activity tracking of the driver.

v2: Complete overhaul and removal of the racy timers and workers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a7b9761d 19-Jul-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Split i915_gem_flush_ring() into seperate invalidate/flush funcs

By moving the function to intel_ringbuffer and currying the appropriate
parameter, hopefully we make the callsites easier to read and
understand.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 26b9c4a5 19-Jul-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove the explicit flush of the GPU write domain

Rely instead on the insertion of the implicit flush before the seqno
breadcrumb.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 86d5bc37 19-Jul-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove explicit flush from i915_gem_object_flush_fence()

As the flush is either performed explictly immediately after the
execbuffer dispatch, or before the serialisation of last_fenced_seqno we
can forgo the explict i915_gem_flush_ring().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 69c2fc89 19-Jul-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove the per-ring write list

This is now handled by a global flag to ensure we emit a flush before
the next serialisation point (if we failed to queue one previously).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 65ce3027 19-Jul-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove the defunct flushing list

As we guarantee to emit a flush before emitting the breadcrumb or
the next batchbuffer, there is no further need for the flushing list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 0201f1ec 19-Jul-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Replace the pending_gpu_write flag with an explicit seqno

As we always flush the GPU cache prior to emitting the breadcrumb, we no
longer have to worry about the deferred flush causing the
pending_gpu_write to be delayed. So we can instead utilize the known
last_write_seqno to hopefully minimise the wait times.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e5f1d962 19-Jul-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove assertion over write domain after i915_gem_object_sync()

As we move to lazily clearing the GPU write domain only when the buffer
becomes inactive, this leaves a window of opportunity for
i915_gem_object_pin_to_display_plane() to detect a seemingly
inconsistent value. This function is special as it tries to pipeline the
operation to avoid the stall and so may not retires the buffer and we
may not get the opportunity to clear the write domain. However, we know
all is good, so drop the assertion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 3bb73aba 19-Jul-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Allow late allocation of request for i915_add_request()

Request preallocation was added to i915_add_request() in order to
support the overlay. However, not all users care and can quite happily
ignore the failure to allocate the request as they will simply repeat
the request in the future.

By pushing the allocation down into i915_add_request(), we can then
remove some rather ugly error handling in the callers.

v2: Nullify request->file_priv otherwise we chase a garbage pointer
when retiring requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e9808edd 03-Jul-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Return a mask of the active rings in the high word of busy_ioctl

The intention is to help select which engine to use for copies with
interoperating clients - such as a GL client making a request to the X
server to perform a SwapBuffers, which may require copying from the
active GL back buffer to the X front buffer.

We choose to report a mask of the active rings to future proof the
interface against any changes which may allow for the object to reside
upon multiple rings.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: bikeshed away the write ring mask and add the explanation
Chris sent in a follow-up mail why we decided to use masks.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# eeef9b38 16-Jul-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add -EIO to the list of known errors for __wait_seqno

This prevents a WARN introduced with

commit de2b998552c1534e87bfbc51ec5734b02bc89020
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Wed Jul 4 22:52:50 2012 +0200

drm/i915: don't return a spurious -EIO from intel_ring_begin

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 67b1b571 05-Jul-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Disable the BLT on pre-production SNB hardware

It never quite worked despite the numerous workarounds, yet I still see
people trying to use this hardware and filing bug reports. As we no
longer even try to implement the workarounds, since 6a233c78878
(drm/i915/ringbuffer: kill snb blt workaround), simply disable the ring.

v2: Add a message to inform the user about the limited capabilities of
their pre-production hardware.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6b9d89b4 10-Jul-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm: Add colouring to the range allocator

In order to support snoopable memory on non-LLC architectures (so that
we can bind vgem objects into the i915 GATT for example), we have to
avoid the prefetcher on the GPU from crossing memory domains and so
prevent allocation of a snoopable PTE immediately following an uncached
PTE. To do that, we need to extend the range allocator with support for
tracking and segregating different node colours.

This will be used by i915 to segregate memory domains within the GTT.

v2: Now with more drm_mm helpers and less driver interference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@redhat.com
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@gmail.com>


# a9340cca 04-Jul-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: properly SIGBUS on I/O errors

... instead of looping endless with no hope of ever serving that
page-fault. We only need to break out of this loop when the gpu died,
to run the reset work (and hopefully resurrect it).

To clarify questions Chris raised on irc: This is about handling I/O
errors not from our own code, but e.g. when the disk died when trying
to swap in a gem bo. So this patch remidies the issue that the current
handling only handles gpu-death-induced cases of -EIO. Admittedly,
dying disks are much rarer than hanging gpus ...To clarify questions
Chris raised on irc: This is about handling I/O errors not from our
own code, but e.g. when the disk died when trying to swap in a gem bo.
So this patch remidies the issue that the current handling only
handles gpu-death-induced cases of -EIO. Admittedly, dying disks are
much rarer than hanging gpus ...

This seems to have been lost in:

commit d9bc7e9f32716901c617e1f0fb6ce0f74f172686
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Feb 7 13:09:31 2011 +0000

drm/i915: Fix infinite loop regression from 21dd3734

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 0a6759c6 04-Jul-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: don't hang userspace when the gpu reset is stuck

With the gpu reset no longer using a trylock we've increased the
chances of userspace getting stuck quite a bit. To make that
(hopefully) rare case more paletable time out when waiting for the gpu
reset code to complete and signal this little issue to the caller by
returning -EIO.

This should help userspace to somewhat gracefully fall back and
hopefully allow the user to grab some logs and reboot the machine
(instead of staring at a frozen X screen in agony).

Suggested by Chris Wilson because I've been stubborn about allowing
the gpu reset code no to fail, ever (by removing the trylock).

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d6b2c790 04-Jul-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: non-interruptible sleeps can't handle -EAGAIN

So don't return -EAGAIN, even in the case of a gpu hang. Remap it to
-EIO instead. Note that this isn't really an issue with
interruptability, but more that we have quite a few codepaths (mostly
around kms stuff) that simply can't handle any errors and hence not
even -EAGAIN. Instead of adding proper failure paths so that we could
restart these ioctls we've opted for the cheap way out of sleeping
non-interruptibly. Which works everywhere but when the gpu dies,
which this patch fixes.

So essentially interruptible == false means 'wait for the gpu or die
trying'.'

This patch is a bit ugly because intel_ring_begin is all non-interruptible
and hence only returns -EIO. But as the comment in there says,
auditing all the callsites would be a pain.

To avoid duplicating code, reuse i915_gem_check_wedge in __wait_seqno
and intel_wait_ring_buffer. Also use the opportunity to clarify the
different cases in i915_gem_check_wedge a bit with comments.

v2: Don't access dev_priv->mm.interruptible from check_wedge - we
might not hold dev->struct_mutex, making this racy. Instead pass
interruptible in as a parameter. I've noticed this because I've hit a
BUG_ON(!mutex_is_locked) at the top of check_wedge. This has been
added in

commit b4aca0106c466b5a0329318203f65bac2d91b682
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Wed Apr 25 20:50:12 2012 -0700

drm/i915: extract some common olr+wedge code

although that commit is missing any justification for this. I guess
it's just copy&paste, because the same commit add the same BUG_ON
check to check_olr, where it indeed makes sense.

But in check_wedge everything we access is protected by other means,
so this is superflous. And because it now gets in the way (we add a
new caller in __wait_seqno, which can be called without
dev->struct_mutext) let's just remove it.

v3: Group all the i915_gem_check_wedge refactoring into this patch, so
that this patch here is all about not returning -EAGAIN to callsites
that can't handle syscall restarting.

v4: Add clarification what interuptible == fales means in our code,
requested by Ben Widawsky.

v5: Fix EAGAIN mispell noticed by Chris Wilson.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# cc889e0f 13-Jun-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: disable flushing_list/gpu_write_list

This is just the minimal patch to disable all this code so that we can
do decent amounts of QA before we rip it all out.

The complicating thing is that we need to flush the gpu caches after
the batchbuffer is emitted. Which is past the point of no return where
execbuffer can't fail any more (otherwise we risk submitting the same
batch multiple times).

Hence we need to add a flag to track whether any caches associated
with that ring are dirty. And emit the flush in add_request if that's
the case.

Note that this has a quite a few behaviour changes:
- Caches get flushed/invalidated unconditionally.
- Invalidation now happens after potential inter-ring sync.

I've bantered around a bit with Chris on irc whether this fixes
anything, and it might or might not. The only thing clear is that with
these changes it's much easier to reason about correctness.

Also rip out a lone get_next_request_seqno in the execbuffer
retire_commands function. I've dug around and I couldn't figure out
why that is still there, with the outstanding lazy request stuff it
shouldn't be necessary.

v2: Chris Wilson complained that I also invalidate the read caches
when flushing after a batchbuffer. Now optimized.

v3: Added some comments to explain the new flushing behaviour.

Cc: Eric Anholt <eric@anholt.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f2ef6eb1 04-Jun-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: switch to default context on idle

To keep things as sane as possible, switch to the default context before
idling. This should help free context objects, as well as put things in
a more well defined state before suspending.

v2: remove seqno from context switch call (daniel)
return error on failed context switch instead of WARN+continue (daniel)

v3: move idling to i915_gpu idle (from i915_gem_idle) (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>


# 254f965c 04-Jun-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: preliminary context support

Very basic code for context setup/destruction in the driver.

Adds the file i915_gem_context.c This file implements HW context
support. On gen5+ a HW context consists of an opaque GPU object which is
referenced at times of context saves and restores. With RC6 enabled,
the context is also referenced as the GPU enters and exists from RC6
(GPU has it's own internal power context, except on gen5). Though
something like a context does exist for the media ring, the code only
supports contexts for the render ring.

In software, there is a distinction between contexts created by the
user, and the default HW context. The default HW context is used by GPU
clients that do not request setup of their own hardware context. The
default context's state is never restored to help prevent programming
errors. This would happen if a client ran and piggy-backed off another
clients GPU state. The default context only exists to give the GPU some
offset to load as the current to invoke a save of the context we
actually care about. In fact, the code could likely be constructed,
albeit in a more complicated fashion, to never use the default context,
though that limits the driver's ability to swap out, and/or destroy
other contexts.

All other contexts are created as a request by the GPU client. These
contexts store GPU state, and thus allow GPU clients to not re-emit
state (and potentially query certain state) at any time. The kernel
driver makes certain that the appropriate commands are inserted.

There are 4 entry points into the contexts, init, fini, open, close.
The names are self-explanatory except that init can be called during
reset, and also during pm thaw/resume. As we expect our context to be
preserved across these events, we do not reinitialize in this case.

As Adam Jackson pointed out, The cutoff of 1MB where a HW context is
considered too big is arbitrary. The reason for this is even though
context sizes are increasing with every generation, they have yet to
eclipse even 32k. If we somehow read back way more than that, it
probably means BIOS has done something strange, or we're running on a
platform that wasn't designed for this.

v2: rename load/unload to init/fini (daniel)
remove ILK support for get_size() (indirectly daniel)
add HAS_HW_CONTEXTS macro to clarify supported platforms (daniel)
added comments (Ben)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>


# 8ecd1a66 07-Jun-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: call intel_enable_gtt

When drm/i915 is in control of the gtt, we need to call
the enable function at all the relevant places ourselves.

Reviewed-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# dd2757f8 07-Jun-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: stop using dev->agp->base

For that to work we need to export the base address of the gtt
mmio window from intel-gtt. Also replace all other uses of
dev->agp by values we already have at hand.

Reviewed-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# eac1f14f 05-Jun-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Inifite timeout for wait ioctl

Change the ns_timeout parameter of the wait ioctl to a signed value.
Doing this allows the kernel to provide an infinite wait when a timeout
of less than 0 is provided. This mimics select/poll.

Initially the parameter was meant to match up with the GL spec 1:1, but
after being made aware of how much 2^64 - 1 nanoseconds actually is, I
do not think anyone will ever notice the loss of 1 bit.

The infinite timeout on waiting is similar to the existing i915
userspace interface with the exception that struct_mutex is dropped
while doing the wait in this ioctl.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 30dfebf3 01-Jun-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: extract object active state flushing code

Both busy_ioctl and the new wait_ioct need to do the same dance (or at
least should). Some slight changes:
- busy_ioctl now unconditionally checks for olr. Before emitting a
require flush would have prevent the olr check and hence required a
second call to the busy ioctl to really emit the request.
- the timeout wait now also retires request. Not really required for
abi-reasons, but makes a notch more sense imo.

I've tested this by pimping the i-g-t test some more and also checking
the polling behviour of the wait_rendering_timeout ioctl versus what
busy_ioctl returns.

v2: Too many people complained about unplug, new color is
flush_active.

v3: Kill the comment about the unplug moniker.

v4: s/un-active/inactive/

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b9524a1e 25-May-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: remap l3 on hw init

If any l3 rows have been previously remapped, we must remap them after
GPU reset/resume too.

v2: Just return (no warn) on remapping init if not IVB (Jesse)
Move the check of schizo userspace to i915_gem_l3_remap (Jesse)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 199b2bc2 24-May-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: s/i915_wait_request/i915_wait_seqno/g

Wait request is poorly named IMO. After working with these functions for
some time, I feel it's much clearer to name the functions more
appropriately.

Of course we must update the callers to use the new name as well.

This leaves room within our namespace for a *real* wait request function
at some point.

Note to maintainer: this patch is optional.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 23ba4fd0 24-May-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: wait render timeout ioctl

This helps implement GL_ARB_sync but stops short of allowing full blown
sync objects. Finally we can use the new timed seqno waiting function
to allow userspace to wait on a buffer object with a timeout. This
implements that interface.

The IOCTL will take as input a buffer object handle, and a timeout in
nanoseconds (flags is currently optional but will likely be used for
permutations of flush operations). Users may specify 0 nanoseconds to
instantly check.

The wait ioctl with a timeout of 0 reimplements the busy ioctl. With any
non-zero timeout parameter the wait ioctl will wait for the given number
of nanoseconds on an object becoming unbusy. Since the wait itself does
so holding struct_mutex the object may become re-busied before this
completes. A similar but shorter race condition exists in the busy
ioctl.

v2: ETIME/ERESTARTSYS instead of changing to EBUSY, and EGAIN (Chris)
Flush the object from the gpu write domain (Chris + Daniel)
Fix leaked refcount in good case (Chris)
Naturally align ioctl struct (Chris)

v3: Drop lock after getting seqno to avoid ugly dance (Chris)

v4: check for 0 timeout after olr check to allow polling (Chris)

v5: Updated the comment. (Chris)

v6: Return -ETIME instead of -EBUSY when timeout_ns is 0 (Daniel)
Fix the commit message comment to be less ugly (Ben)
Add a warning to check the return timespec (Ben)

v7: Use DRM_AUTH for the ioctl. (Eugeni)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 31d8d651 24-May-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove the error message for unbinding pinned buffers

This is now used intentionally to prevent proliferation of is-pinned
checks upon the inactive list following:

commit 1b50247a8ddde4af5aaa0e6bc125615372ce6c16
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Apr 24 15:47:30 2012 +0100

drm/i915: Remove the list of pinned inactive objects

Reported-and-tested-by: guang.a.yang@intel.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50075
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# bed1ea95 24-May-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Limit page allocations to lowmem (dma32) for i965

Broadwater and Crestline share a limitation that prevent it from
relocating general surface state above 4GiB. The only recourse we have
since any buffer object may be used as a relocation target is then to
limit all object allocations on 965g[m] to DMA32.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5c81fe85 24-May-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: timeout parameter for seqno wait

Insert a wait parameter in the code so we can possibly timeout on a
seqno wait if need be. The code should be functionally the same as
before because all the callers will continue to retry if an arbitrary
timeout elapses.

We'd like to have nanosecond granularity, but the only way to do this is
with hrtimer, and that doesn't fit well with the needs of this code.

v2: Fix rebase error (Chris)
Return proper time even in wedged + signal case (Chris + Ben)
Use timespec constructs (Ben)
Didn't take Daniel's advice regarding the Frankenstein-ness of the
function. I did try his advice, but in the end I liked the way the
original code looked, better.

v3: Make wakeups far less frequent for infinite waits (Chris)

v4: Remove dummy_wait variable (Daniel)
Use raw monotonic time instead of jiffies (made the code a bit cleaner) (Ben)
Added a couple of warnings (Ben)

v5: Remove warnings (Daniel)
Use more accurate time diff for default case (Daniel)
Bikeshed for setting the return timespec in timeout case (Daniel)
s/jiffies/time in one of the comments

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1286ff73 10-May-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

i915: add dmabuf/prime buffer sharing support.

This adds handle->fd and fd->handle support to i915, this is to allow
for offloading of rendering in one direction and outputs in the other.

v2 from Daniel Vetter:
- fixup conflicts with the prepare/finish gtt prep work.
- implement ppgtt binding support.

Note that we have squat i-g-t testcoverage for any of the lifetime and
access rules dma_buf/prime support brings along. And there are quite a
few intricate situations here.

Also note that the integration with the existing code is a bit
hackish, especially around get_gtt_pages and put_gtt_pages. It imo
would be easier with the prep code from Chris Wilson's unbound series,
but that is for 3.6.

Also note that I didn't bother to put the new prepare/finish gtt hooks
to good use by moving the dma_buf_map/unmap_attachment calls in there
(like we've originally planned for).

Last but not least this patch is only compile-tested, but I've changed
very little compared to Dave Airlie's version. So there's a decent
chance v2 on drm-next works as well as v1 on 3.4-rc.

v3: Right when I've hit sent I've noticed that I've screwed up one
obj->sg_list (for dmar support) and obj->sg_table (for prime support)
disdinction. We should be able to merge these 2 paths, but that's
material for another patch.

v4: fix the error reporting bugs pointed out by ickle.

v5: fix another error, and stop non-gtt mmaps on shared objects
stop pread/pwrite on imported objects, add fake kmap

Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b4519513 11-May-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Introduce for_each_ring() macro

In many places we wish to iterate over the rings associated with the
GPU, so refactor them to use a common macro.

Along the way, there are a few code removals that should be side-effect
free and some rearrangement which should only have a cosmetic impact,
such as error-state.

Note that this slightly changes the semantics in the hangcheck code:
We now always cycle through all enabled rings instead of
short-circuiting the logic.

v2: Pull in a couple of suggestions from Ben and Daniel for
intel_ring_initialized() and not removing the warning (just moving them
to a new home, closer to the error).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Added note to commit message about the small behaviour
change, suggested by Ben Widawsky.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b4aca010 25-Apr-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: extract some common olr+wedge code

The new wait_rendering ioctl also needs to check for an oustanding
lazy request, and we already duplicate that logic at three places. So
extract it.

While at it, also extract the code to check the gpu wedging state to
improve code flow.

v2: Don't use seqno as an outparam (Chris)

v3 by danvet: Kill stale comment and pimp commit message

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 53ca26ca 26-Apr-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915 disallow physical batchbuffers for KMS

Even the horrible gen3 XvMC code has learned to do this
right by the time xf86-video-intel releases learned to do
kernel modesetting. So we can just disallow this.

Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8781342d 02-May-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: create dev_priv->dri1 dragon dungeon^W^W sub-struct

... and shove allow_batchbuffer in there. More dragons will
follow suit.

There's the curious case that we allow this for KMS ...

Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 3b88cc0d 26-Apr-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: use __wait_seqno for ring throttle

It turns out throttle had an almost identical bit of code to do the
wait. Now we can call the new helper directly. This is just a bonus,
and not needed for the overall series.

v2: remove irq_get/put which is now in __wait_seqno (Ben)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4146b08d 26-Apr-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: remove polled wait from throttle

It's about to go away anyway. Just here to help bisection.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 604dd3ec 26-Apr-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: extract __wait_seqno from i915_wait_request

i915_wait_request is actually a fairly large function encapsulating
quite a few different operations. Because being able to wait on seqnos
in various conditions is useful, extracting that bit of code to a helper
function seems useful

v2: pull the irq_get/put as well (Ben)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c58cf4f1 26-Apr-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: drop polled waits from i915_wait_request

The only time irq_get should fail is during unload or suspend. Both of
these points should try to quiesce the GPU before disabling interrupts
and so the atomic polling should never occur.

This was recommended by Chris Wilson as a way of reducing added
complexity to the polled wait which I introduced in an RFC patch.

09:57 < ickle_> it's only there as a fudge for waiting after irqs
after uninstalled during s&r, we aren't actually meant to hit it
09:57 < ickle_> so maybe we should just kill the code there and fix the breakage

v2: return -ENODEV instead of -EBUSY when irq_get fails

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 9574b3fe 26-Apr-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: kill waiting_seqno

The waiting_seqno is not terribly useful, and as such we can remove it
so that we'll be able to extract lockless code.

v2: Keep the information for error_state (Chris)
Check if ring is initialized in hangcheck (Chris)
Capture the waiting ring (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: add some bikeshed to clarify a comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# be998e2e 26-Apr-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: move vbetool invoked ier stuff

This extra bit of interrupt enabling code doesn't belong in the wait
seqno function. If anything we should pull it out to a helper so the
throttle code can also use it. The history is a bit vague, but I am
going to attempt to just dump it, unless someone can argue otherwise.

Removing this allows for a shared lock free wait seqno function. To keep
tabs on this issue though, the IER value is stored on error capture
(recommended by Chris Wilson)

v2: fixed typo EIR->IER (Ben)
Fix some white space (Ben)
Move IER capture to globally instead of per ring (Ben)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: ier is a 16 bit reg on gen2!]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b2da9fe5 26-Apr-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: remove do_retire from i915_wait_request

This originates from a hack by me to quickly fix a bug in an earlier
patch where we needed control over whether or not waiting on a seqno
actually did any retire list processing. Since the two operations aren't
clearly related, we should pull the parameter out of the wait function,
and make the caller responsible for retiring if the action is desired.

The only function call site which did not get an explicit retire_request call
(on purpose) is i915_gem_inactive_shrink(). That code was already calling
retire_request a second time.

v2: don't modify any behavior excepit i915_gem_inactive_shrink(Daniel)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 50743298 26-Apr-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: use the new masked bit macro some more

I've missed this one.

v2: Chris Wilson noticed another register.
v3: Color choice improvements.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 63ed2cb2 23-Apr-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: rip out GEM drm feature checks

We always set it so there's no point in checking. We could
instead add a bit that tells us whether gem is actually
initialized (i.e. either kms or gem_init_ioctl called), but
that's imho not worth it.

So just rip it out.

There's a little change in the wait_ring timeout, but we've never
run with anything else than the 60 second timeout, even on dri1
userspace.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 7bb6fb8d 24-Apr-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: disallow gem ums init ioctl for kms

This ioctl used in a kms driver is only useful to create massive
havoc.

v2: Bail out with -ENODEV as suggested by Chris Wilson.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1070a42b 24-Apr-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move GEM initialisation from i915_dma.c to i915_gem.c

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1488fc08 24-Apr-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove the deferred-free list

The use of the mm_list by deferred-free breaks the following patches to
extend the range of objects tracked. We can simplify things if we just
make the unbind during free uninterrutible.

Note that unbinding should never fail, because we hold an additional
reference on every active object. Only the ilk vt-d workaround breaks
this, but already takes care of not failing by waiting for the gpu to
quiescent non-interruptible. But the existence of the deferred free
list casted some doubts on this theory, hence WARN if the unbind fails
and only then retry non-interruptible.

We can kill this additional code after a release in case the theory is
indeed right and no one has hit that WARN.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1b50247a 24-Apr-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove the list of pinned inactive objects

Simplify object tracking by removing the inactive but pinned list. The
only place where this was used is for counting the available memory,
which is just as easy performed by checking all objects on the rare
occasions it is required (application startup). For ease of debugging,
we keep the reporting of pinned objects through the error-state and
debugfs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a39d7efc 24-Apr-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove i915_gem_evict_inactive()

This was only used by one external caller who would just be as happy
with evict-everything, so perform the replacement and make the function
private.

In the process we note that unbinding the inactive list should not fail,
and make it a warning instead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8325a09d 24-Apr-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Bump the inactive LRU on set-to-GTT-domain

Currently, we only bump the inactive LRU of an object when we bind
into the GTT for a page-fault. As the object may be used many times
before its mapping is zapped, we do not mark it as active as
frequently as we should. Userspace should be calling set-to-GTT-domain
before each pointer deference (for synchronous access) and so is a
good place to mark the buffer as active.

Marking the buffer as recently used places it at the end of the
inactive eviction queue, though still before anything with outstanding
rendering. This reduces the likelihood of evicting a buffer that is
going to be used again by the CPU in the near future. This way we can
hopefully avoid to kick out upload buffers right before we use them on
the gpu.

Note that we need to check that the object is not active or pinned,
for otherwise we create havoc on the active/pinned lists, which also
use obj->mm_list.

The active lists are sorted by and evicted in last GPU rendering
order, access by the CPU to a still active buffer therefore does not
affect its eviction ordering. Pinned objects are currently excluded
from eviction, therefore the only list that we need to bump for GTT
access by the CPU is the inactive list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Added further explanations to the commit message as discussed
on irc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6b26c86d 24-Apr-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: create macros to handle masked bits

... and put them to so good use.

Note that there's functional change in vlv clock gating code, we now
no longer spuriously read back the current value of the bit. According
to Bspec the high bits should always read zero, so ORing this in
should have no effect.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5d82e3e6 21-Apr-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Clarify the semantics of tiling_changed

Rename obj->tiling_changed to obj->fence_dirty so that it is clear that
it flags when the parameters for an active fence (including the
no-fence) register are changed.

Also, do not set this flag when the object does not have a fence
register allocated currently and the gpu does not depend upon the
unfence. This case works exactly like when a tiled object lost its
fence and hence does not need additional handling for the tiling
change in the code.

v2: Use fence_dirty to better express what the flag tracks and add a few
more details to the comments to serve as a reminder of how the GPU also
uses the unfenced register slot.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add some bikeshed to the commit message about the stricter
use of fence_dirty.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4f0c7cfb 16-Apr-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: [sparse] __iomem fixes for gem

As with one of the earlier patches in the series, we're forced to cast
for copy_[to|from]_user. Again because of the nature of the GEN x86
exclusivity, this should be safe.

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
[danvet: Added some bikeshed.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 6be5ceb0 20-Apr-2012 Linus Torvalds <torvalds@linux-foundation.org>

VM: add "vm_mmap()" helper function

This continues the theme started with vm_brk() and vm_munmap():
vm_mmap() does the same thing as do_mmap(), but additionally does the
required VM locking.

This uninlines (and rewrites it to be clearer) do_mmap(), which sadly
duplicates it in mm/mmap.c and mm/nommu.c. But that way we don't have
to export our internal do_mmap_pgoff() function.

Some day we hopefully don't have to export do_mmap() either, if all
modular users can become the simpler vm_mmap() instead. We're actually
very close to that already, with the notable exception of the (broken)
use in i810, and a couple of stragglers in binfmt_elf.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 14415745 17-Apr-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refactor get_fence() to use the common fence writing routine

We can also take advantage of the new 'no retire' mode for seqno waiting
to avoid having to take a reference on the old fence object whilst
flushing an existing fence.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ada726c7 17-Apr-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refactor fence clearing to use the common fence writing routine

Now that we have a routine that is able to clear the fences as well as
setup up the register for a tiled object, remove the surplus routines to
clear the fences.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 61050808 17-Apr-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refactor put_fence() to use the common fence writing routine

One clarification that we make is to the existing semantics of
obj->tiling_changed to only mean that we need to update an associated
fence register (including the NO_FENCE when executing an untiled but
fenced GPU command). If we do not have a fence register or pending
fenced GPU access for the object (after put_fence() for example), then
we can clear the tiling_changed flag as any fence will necessarily be
rewritten upon acquisition.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 9ce079e4 17-Apr-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prepare to consolidate fence writing

Update the existing architecture specific fence writing routines to
either update the fence to point to a tiled object or to clear them in
preparation to remove the other fence writing routes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 18991845 17-Apr-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove the unsightly "optimisation" from flush_fence()

As i915_wait_request() will first check for an already passed seqno,
doing it also in the caller is a waste of space for a cold path.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8fe301ad 17-Apr-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Simplify fence finding

As the fences are stored in LRU order, we can simply reuse the oldest if
we do not have an unused register.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1c293ea3 17-Apr-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Discard the unused obj->last_fenced_ring

As we now never pipeline a fence update, obj->last_fenced_ring is always
the same as the obj->ring whenever obj->last_fenced_seqno is active, so
remove it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 69963e7c 17-Apr-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove unused ring->setup_seqno

As we now no longer track a pipelined fence change, we never use
ring->setup_seqno and can kill it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a360bb1a 17-Apr-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove fence pipelining

Step 2 is then to replace the pipelined parameter with NULL and perform
constant folding to remove dead code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 06d98131 17-Apr-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove the pipelined parameter from get_fence()

We never succeeded in getting pipelined fencing to work (unresolved
spurious GPU hangs), so begin the process of dismantling and removal
the broken code.

Step 1 is the removal of the pipeline parameter to get_fence().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 48ecfa10 11-Apr-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: properly set ppgtt cacheability on snb

For some reason snb has 2 fields to set ppgtt cacheability. This one
here does not exist on gen7.

This might explain why ppgtt wasn't a win on snb like on ivb - not
enough pte caching.

v2: Fixup rebase fail.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# be901a5a 11-Apr-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: set w/a bit for snb pagefaults

Bspec says that we need to set this: vol1c.3 "Blitter Command
Streamer", Section 1.1.2.1 "GAB_CTL_REG - GAB Unit Control Register".

We don't really rely on pagefaults, but who knows what this all
affects.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c07496fa 13-Apr-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: don't pwrite tiled objects through the gtt

... we will botch up the bit17 swizzling. Furthermore tiled pwrite is
a (now) unused slowpath, so no one really cares.

This fixes the last swizzling issues I have with i-g-t on my bit17
swizzling i915G. No regression, it's been broken since the dawn of
gem, but it's nice for regression tracking when really _all_ i-g-t
tests work.

Actually this is not true, Chris Wilson noticed while reviewing this
patch that the commit

commit d9e86c0ee60f323e890484628f351bf50fa9a15d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Nov 10 16:40:20 2010 +0000

drm/i915: Pipelined fencing [infrastructure]

contained a functional change that broke things.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1500f7ea 11-Apr-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: hide (seqno-1) in ringbuffer code

Waiting for seqno-1 in our object synchronization code is an
implementation detail given how we've decided to do the waits within the
rest of our code.

Requested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e3a5a225 11-Apr-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: fix for when semaphore updates fail

This fixes a long standing issue where emitting the semaphore updates
may have failed, but we've already updated our internal data structure.

Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5816d648 11-Apr-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: i915_gem_object_sync must handle NULL

When I extracted the synchronization code for implementing semaphorified
pageflips (74f5f6e0), I neglected the non pipelined case which also
calls this code. The modesetting code wants to make sure the object has
finished rendering to the frame before configuring the scanout (ie.
non-pipelined case).

As a result of a follow on discussion on IRC, I've decided to add a
comment about the function itself which received much inspiration from
Chris as well. So really, this patch was ghost-written by Chris :).

Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f8413190 10-Apr-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Allow concurrent read access between CPU and GPU domain

Similar to allowing a buffer to be simultaneously read by the GPU and
through the GTT, we wish to allow readback of the pages through the CPU
domain whilst they are also being read by the GPU. Domain coherency
is managed by allowing multiple readers, but only a single writer.

This is used by mesa for its program cache which it may search for every
new program every frame and then renews should it need to add. During
renewal, mesa copies the program bo currently executing through a CPU
mapping onto the new bo. This patch allows the search and that copy to
proceed without causing a stall on the current batch.

Testcase: i-g-t/tests/gem_cpu_concurrent_blit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 2911a35b 05-Apr-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: use semaphores for the display plane

In theory this will have performance and power improvements. Performance
because we don't need to stall when the scanout BO is busy, and power
because we don't have to stall when the BO is busy (and the ring can
even go to sleep if the HW supports it).

v2:
squash 2 patches into 1 (me)
un-inline the enable_semaphores function (Daniel)
remove comment about SNB hangs from i915_gem_object_sync (Chris)
rename intel_enable_semaphores to i915_semaphore_is_enabled (me)
removed page flip comment; "no why" (Chris)

To address other comments from Daniel (irc):
update the comment to say 'vt-d is crap, don't enable semaphores'
- I think you misinterpreted Chris' comment, it already exists.
checking out whether we can pageflip on the render ring on ivb (didn't
work on early silicon)
- We don't want to enable workarounds for early silicon unless we have
to.
- I can't find any references in the docs about this.
optionally use it if the fb is already busy on the render ring
- This should be how the code already worked, unless I am
misunderstanding your meaning.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 9a5a53b3 22-Mar-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reorganise rules for get_fence/put_fence

By simplifying the rules to calling get_fence when writing to the
through the GTT in a tiled manner, and calling put_fence before writing
to the object through the GTT in a linear manner, the code becomes
clearer and there is less chance of making a mistake.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: fixed up conflict with ppgtt code and spelling in a new
comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 15a13bbd 11-Apr-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: clear fencing tracking state when retiring requests

This fixes a resume regression introduced in

commit 7dd4906586274f3945f2aeaaa5a33b451c3b4bba
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Mar 21 10:48:18 2012 +0000

drm/i915: Mark untiled BLT commands as fenced on gen2/3

which fixed fencing tracking for untiled blt commands.

A side effect of that patch was that now also untiled objects have a
non-zero obj->last_fenced_seqno to track when a fence can be set up
after a pipelined tiling change. Unfortunately this was only cleared
by the fence setup and teardown code, resulting in tons of untiled but
inactive objects with non-zero last_fenced_seqno.

Now after resume we completely reset the seqno tracking, both on the
driver side (by setting dev_priv->next_seqno = 1) and on the hw side
(by allocating a new hws page, which contains the seqnos). Hilarity
and indefinite waits ensued from the stale seqnos in
obj->last_fenced_seqno from before the suspend.

The fix is to properly clear the fencing tracking state like we
already do for the normal gpu rendering while moving objects off the
active list.

Reported-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Jiri Slaby <jslaby@suse.cz>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f534bc0b 26-Mar-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: disallow gem init ioctl on ilk

Ums is already disabled, but on ilk we can additionally disable gem
initialization when using user mode setting. Upstream never support
ilk without kernel modesetting and not even the RHEL ilk ums backport
needs gem - that driver is based on xf86-video-intel version 2.2,
which is pre-gem.

Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 7dd49065 21-Mar-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark untiled BLT commands as fenced on gen2/3

The BLT commands on gen2/3 utilize the fence registers and so we cannot
modify any fences for the object whilst those commands are in flight.
Currently we marked tiled commands as occupying a fence, but forgot to
restrict the untiled commands from preventing a fence being assigned
before they were completed.

One side-effect is that we ten have to double check that a fence was
allocated for a fenced buffer during move-to-active.

Reported-by: Jiri Slaby <jirislaby@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43427
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47990
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Testcase: i-g-t/tests/gem_tiled_after_untiled_blt
Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 55a254ac 21-Mar-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: properly restore the ppgtt page directory on resume

The ppgtt page directory lives in a snatched part of the gtt pte
range. Which naturally gets cleared on hibernate when we pull the
power. Suspend to ram (which is what I've tested) works because
despite the fact that this is a mmio region, it is actually back by
system ram.

Fix this by moving the page directory setup code to the ppgtt init
code (which gets called on resume).

This fixes hibernate on my ivb and snb.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 23e3f9b3 28-Mar-2012 Jesse Barnes <jbarnes@virtuousgeek.org>

drm/i915: check for disabled interrupts on ValleyView

Haven't seen this yet, but it doesn't hurt.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e7e58eb5 25-Mar-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: mark pwrite/pread slowpaths with unlikely

Beside helping the compiler untangle this maze they double-up as
documentation for which parts of the code aren't performance-critical
but just around to keep old (but already dead-slow) userspace from
breaking.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 23c18c71 25-Mar-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: fixup in-line clflushing on bit17 swizzled bos

The issue is that with inline clflushing the clflushing isn't properly
swizzled. Fix this by
- always clflushing entire 128 byte chunks and
- unconditionally flush before writes when swizzling a given page.
We could be clever and check whether we pwrite a partial 128 byte
chunk instead of a partial cacheline, but I've figured that's not
worth it.

Now the usual approach is to fold this into the original patch series, but
I've opted against this because
- this fixes a corner case only very old userspace relies on and
- I'd like to not invalidate all the testing the pwrite rewrite has gotten.

This fixes the regression notice by tests/gem_tiled_partial_prite_pread
from i-g-t. Unfortunately it doesn't fix the issues with partial pwrites to
tiled buffers on bit17 swizzling machines. But that is also broken without
the pwrite patches, so likely a different issue (or a problem with the
testcase).

v2: Simplify the patch by dropping the overly clever partial write
logic for swizzled pages.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f56f821f 25-Mar-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

mm: extend prefault helpers to fault in more than PAGE_SIZE

drm/i915 wants to read/write more than one page in its fastpath
and hence needs to prefault more than PAGE_SIZE bytes.

Add new functions in filemap.h to make that possible.

Also kill a copy&pasted spurious space in both functions while at it.

v2: As suggested by Andrew Morton, add a multipage parameter to both
functions to avoid the additional branch for the pagemap.c hotpath.
My gcc 4.6 here seems to dtrt and indeed reap these branches where not
needed.

v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to
the filemap.c hotpaths (that the compiler couldn't remove again),
let's go with separate new functions for the multipage use-case.

v4: Adjust comment to CodingStlye and fix spelling.

Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d174bd64 25-Mar-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: extract copy helpers from shmem_pread|pwrite

While moving around things, this two functions slowly grew out of any
sane bounds. So extract a few lines that do the copying and
clflushing. Also add a few comments to explain what's going on.

v2: Again do s/needs_clflush/needs_clflush_after/ in the write paths
as suggested by Chris Wilson.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 117babcd 25-Mar-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: use uncached writes in pwrite

It's around 20% faster.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# ffc62976 25-Mar-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: fall back to shmem pwrite when the buffer is not accessible

It's too expensive to move it around just for that pwrite, especially
when we're trashing on the mappable gtt part like crazy.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 58642885 25-Mar-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: implement inline clflush for pwrite

In micro-benchmarking of the usual pwrite use-pattern of alternating
pwrites with gtt domain reads from the gpu, this yields around 30%
improvement of pwrite throughput across all buffers size. The trick is
that we can avoid clflush cachelines that we will overwrite completely
anyway.

Furthermore for partial pwrites it gives a proportional speedup on top
of the 30% percent because we only clflush back the part of the buffer
we're actually writing.

v2: Simplify the clflush-before-write logic, as suggested by Chris
Wilson.

v3: Finishing touches suggested by Chris Wilson:
- add comment to needs_clflush_before and only set this if the bo is
uncached.
- s/needs_clflush/needs_clflush_after/ in the write paths to clearly
differentiate it from needs_clflush_before.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 96d79b52 25-Mar-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: don't clobber userspace memory before commiting to the pread

The pagemap.h prefault helpers do the prefaulting by simply writing
some data into every page. Hence we should not prefault when we're not
yet commited to to actually writing data to userspace. The problem is
now that
- we can't prefault while holding dev->struct_mutex for we could
deadlock with our own pagefault handler
- we need to grab dev->struct_mutex before copying to sync up with any
outsanding gpu writes.

Therefore only prefault when we're dropping the lock the first time in
the pread slowpath - at that point we're committed to the write, don't
wait on the gpu anymore and hence won't return early (with e.g.
-EINTR).

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 935aaa69 25-Mar-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: drop gtt slowpath

With the proper prefault, it's extremely unlikely that we fall back
to the gtt slowpath.

So just kill it and use the shmem_pwrite path as fallback.

To further clean up the code, move the preparatory gem calls into the
respective pwrite functions. This way the gtt_fast->shmem fallback
is much more obvious.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 692a576b 25-Mar-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: don't call shmem_read_mapping unnecessarily

This speeds up pwrite and pread from ~120 µs ro ~100 µs for
reading/writing 1mb on my snb (if the backing storage pages
are already pinned, of course).

v2: Chris Wilson pointed out a glaring page reference bug - I've
unconditionally dropped the reference. With that fixed (and the
associated reduction of dirt in dmesg) it's now even a notch faster.

v3: Unconditionaly grab a page reference when dropping
dev->struct_mutex to simplify the code-flow.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 3ae53783 25-Mar-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: don't use gtt_pwrite on LLC cached objects

~120 µs instead fo ~210 µs to write 1mb on my snb. I like this.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a0356fc3 25-Mar-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: kill ranged cpu read domain support

No longer needed.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8489731c 25-Mar-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: move clflushing into shmem_pread

This is obviously gonna slow down pread. But for a half-way realistic
micro-benchmark, it doesn't matter: Non-broken userspace reads back
data from the gpu once before the gpu again dirties it.

So all this ranged clflush tracking is just a waste of time.

No pread performance change (neglecting the dumb benchmark of
constantly reading the same data) measured.

As an added bonus, this avoids clflush on read on coherent objects.
Which means that partial preads on snb are now roughly 4x as fast.
This will be usefull for e.g. the libva encoder - when I finally get
around to fix that up.

v2: Properly sync with the gpu on LLC machines.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# dbf7bff0 25-Mar-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: merge shmem_pread slow&fast-path

With the previous rewrite, they've become essential identical.

v2: Simplify the page_do_bit17_swizzling logic as suggested by Chris
Wilson.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e244a443 25-Mar-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: merge shmem_pwrite slow&fast-path

With the previous rewrite, they've become essential identical.

v2: Simplify the page_do_bit17_swizzling logic as suggested by Chris
Wilson.

Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# dabdfe02 26-Mar-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid using mappable space for relocation processing through the CPU

We try to avoid writing the relocations through the uncached GTT, if the
buffer is currently in the CPU write domain and so will be flushed out to
main memory afterwards anyway. Also on SandyBridge we can safely write
to the pages in cacheable memory, so long as the buffer is LLC mapped.
In either of these cases, we therefore do not need to force the
reallocation of the buffer into the mappable region of the GTT, reducing
the aperture pressure.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 644ec02b 26-Mar-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: s/i915_gem_do_init/i915_gem_init_global_gtt

... because this is what it actually doesn now that we have the global
gtt vs. ppgtt split.

Also move it to the other global gtt functions in i915_gem_gtt.c

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a14917ee 24-Feb-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Release the mmap offset when purging a buffer

If we discard a buffer due to memory pressure, also release its alloted
mmap address space. As it may be sometime before userspace wakes up
and notices that it has buffers to purge from its cache, we may waste
valuable address space on unusable objects for a period of time.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47738
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# eb2c0c81 15-Feb-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: [dinq] shut up two instances -Wunitialized

Introduced in commit 8461d226 and 8c59967c

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: s/fix/shut up/ in the commit msg.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 0ebb9829 15-Feb-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: enable lazy global-gtt binding

Now that everything is in place, only bind to the global gtt
when actually required. Patch split-up suggested by Chris Wilson.

Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 74898d7e 15-Feb-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: bind objects to the global gtt only when needed

And track the existence of such a binding similar to the aliasing
ppgtt case. Speeds up binding/unbinding in the common case where we
only need a ppgtt binding (which is accessed in a cpu coherent fashion
by the gpu) and no gloabl gtt binding (which needs uc writes for the
ptes).

This patch just puts the required tracking in place.

v2: Check that global gtt mappings exist in the error_state capture
code (with Chris Wilson's llc reloc patches batchbuffers are no longer
relocated as mappable in all situations, so this matters). Suggested
by Chris Wilson.

v3: Adapted to Chris' latest llc-reloc patches.

v4: Fix a bug in the i915 error state capture code noticed by Chris
Wilson.

Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 74163907 15-Feb-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: split out dma mapping from global gtt bind/unbind functions

Note that there's a functional change buried in this patch wrt the ilk
dmar workaround: We now only idle the gpu while tearing down the dmar
mappings, not while clearing the gtt. Keeping the current semantics
would have made for some really ugly code and afaik the issue is only
with the dmar unmapping that needs a fully idle gpu.

Reviewed-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c501ae7f 14-Dec-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only clear the GPU domains upon a successful finish

By clearing the GPU read domains before waiting upon the buffer, we run
the risk of the wait being interrupted and the domains prematurely
cleared. The next time we attempt to wait upon the buffer (after
userspace handles the signal), we believe that the buffer is idle and so
skip the wait.

There are a number of bugs across all generations which show signs of an
overly haste reuse of active buffers.

Such as:

https://bugs.freedesktop.org/show_bug.cgi?id=29046
https://bugs.freedesktop.org/show_bug.cgi?id=35863
https://bugs.freedesktop.org/show_bug.cgi?id=38952
https://bugs.freedesktop.org/show_bug.cgi?id=40282
https://bugs.freedesktop.org/show_bug.cgi?id=41098
https://bugs.freedesktop.org/show_bug.cgi?id=41102
https://bugs.freedesktop.org/show_bug.cgi?id=41284
https://bugs.freedesktop.org/show_bug.cgi?id=42141

A couple of those pre-date i915_gem_object_finish_gpu(), so may be
unrelated (such as a wild write from a userspace command buffer), but
this does look like a convincing cause for most of those bugs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# eadb29a9 22-Feb-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Silence the error message from i915_wait_request()

This error message has since been superseded by the hangcheck, and does
not add any salient information beyond that already printed by hangcheck
discovering the GPU hang that lead to i915_wait_request() bombing out in
the first place.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a71d8d94 15-Feb-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Record the tail at each request and use it to estimate the head

By recording the location of every request in the ringbuffer, we know
that in order to retire the request the GPU must have finished reading
it and so the GPU head is now beyond the tail of the request. We can
therefore provide a conservative estimate of where the GPU is reading
from in order to avoid having to read back the ring buffer registers
when polling for space upon starting a new write into the ringbuffer.

A secondary effect is that this allows us to convert
intel_ring_buffer_wait() to use i915_wait_request() and so consolidate
upon the single function to handle the complicated task of waiting upon
the GPU. A necessary precaution is that we need to make that wait
uninterruptible to match the existing conditions as all the callers of
intel_ring_begin() have not been audited to handle ERESTARTSYS
correctly.

By using a conservative estimate for the head, and always processing all
outstanding requests first, we prevent a race condition between using
the estimate and direct reads of I915_RING_HEAD which could result in
the value of the head going backwards, and the tail overflowing once
again. We are also careful to mark any request that we skip over in
order to free space in ring as consumed which provides a
self-consistency check.

Given sufficient abuse, such as a set of unthrottled GPU bound
cairo-traces, avoiding the use of I915_RING_HEAD gives a 10-20% boost on
Sandy Bridge (i5-2520m):
firefox-paintball 18927ms -> 15646ms: 1.21x speedup
firefox-fishtank 12563ms -> 11278ms: 1.11x speedup
which is a mild consolation for the performance those traces achieved from
exploiting the buggy autoreported head.

v2: Add a few more comments and make request->tail a conservative
estimate as suggested by Daniel Vetter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: resolve conflicts with retirement defering and the lack of
the autoreport head removal (that will go in through -fixes).]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 53d227f2 25-Jan-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: fixup seqno allocation logic for lazy_request

Currently we reserve seqnos only when we emit the request to the ring
(by bumping dev_priv->next_seqno), but start using it much earlier for
ring->oustanding_lazy_request. When 2 threads compete for the gpu and
run on two different rings (e.g. ddx on blitter vs. compositor)
hilarity ensued, especially when we get constantly interrupted while
reserving buffers.

Breakage seems to have been introduced in

commit 6f392d548658a17600da7faaf8a5df25ee5f01f6
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Sat Aug 7 11:01:22 2010 +0100

drm/i915: Use a common seqno for all rings.

This patch fixes up the seqno reservation logic by moving it into
i915_gem_next_request_seqno. The ring->add_request functions now
superflously still return the new seqno through a pointer, that will
be refactored in the next patch.

Note that with this change we now unconditionally allocate a seqno,
even when ->add_request might fail because the rings are full and the
gpu died. But this does not open up a new can of worms because we can
already leave behind an outstanding_request_seqno if e.g. the caller
gets interrupted with a signal while stalling for the gpu in the
eviciton paths. And with the bugfix we only ever have one seqno
allocated per ring (and only that ring), so there are no ordering
issues with multiple outstanding seqnos on the same ring.

v2: Keep i915_gem_get_seqno (but move it to i915_gem.c) to make it
clear that we only have one seqno counter for all rings. Suggested by
Chris Wilson.

v3: As suggested by Chris Wilson use i915_gem_next_request_seqno
instead of ring->oustanding_lazy_request to make the follow-up
refactoring more clearly correct. Also improve the commit message
with issues discussed on irc.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45181
Tested-by: Nicolas Kalkhof nkalkhof()at()web.de
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5391d0cf 25-Jan-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: outstanding_lazy_request is a u32

So don't assign it false, that's just confusing ... No functional
change here.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e21af88d 09-Feb-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: enable ppgtt

We want to unconditionally enable ppgtt for two reasons:
- Windows uses this on snb and later.
- We need the basic hw support to work before we can think about real
per-process address spaces and other cool features we want.

But Chris Wilson was complaining all over irc and intel-gfx that this
will blow up if we don't have a module option to disable it. Hence add
one, to prevent this.

ppgtt support seems to slightly change the timings and make crashy
things slightly more or less crashy. Now in my testing and the testing
this got on troublesome snb machines, it seems to have improved things
only. But on ivb it makes quite a few crashes happen much more often,
see

https://bugs.freedesktop.org/show_bug.cgi?id=41353

Luckily Eugeni Dodonov seems to have a set of workarounds that fix
this issue.

v2: Don't try to enable ppgtt on pre-snb.

v3: Pimp commit message and make Chris Wilson less grumpy by adding a
module option.

v4: New try at making Chris Wilson happy.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 7bddb01f 09-Feb-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: ppgtt binding/unbinding support

This adds support to bind/unbind objects and wires it up. Objects are
only put into the ppgtt when necessary, i.e. at execbuf time.

Objects are still unconditionally put into the global gtt.

v2: Kill the quick hack and explicitly pass cache_level to ppgtt_bind
like for the global gtt function. Noticed by Chris Wilson.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 11782b02 31-Jan-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: consolidate swizzling control bit frobbing

On gen5 we also need to correctly set up swizzling in the display
scanout engine, but only there. Consolidate this into the same
function.

This has a small effect on ums setups - the kernel now also sets this
bit in addition to userspace setting it. Given that this code only
runs when userspace either can't (resume, gpu reset) or explicitly
won't(gem_init) touch the hw this shouldn't have an adverse effect.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# f691e2f4 02-Feb-2012 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: swizzling support for snb/ivb

We have to do this manually. Somebody had a Great Idea.

I've measured speed-ups just a few percent above the noise level
(below 5% for the best case), but no slowdows. Chris Wilson measured
quite a bit more (10-20% above the usual snb variance) on a more
recent and better tuned version of sna, but also recorded a few
slow-downs on benchmarks know for uglier amounts of snb-induced
variance.

v2: Incorporate Ben Widawsky's preliminary review comments and
elaborate a bit about the performance impact in the changelog.

v3: Add a comment as to why we don't need to check the 3rd memory
channel.

v4: Fixup whitespace.

Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8461d226 14-Dec-2011 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: rewrite shmem_pread_slow to use copy_to_user

Like for shmem_pwrite_slow. The only difference is that because we
read data, we can leave the fetched cachelines in the cpu: In the case
that the object isn't in the cpu read domain anymore, the clflush for
the next cpu read domain invalidation will simply drop these
cachelines.

slow_shmem_bit17_copy is now ununsed, so kill it.

With this patch tests/gem_mmap_gtt now actually works.

v2: add __ to copy_to_user_swizzled as suggested by Chris Wilson.

v3: Fixup the swizzling logic, it swizzled the wrong pages.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38115
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 8c59967c 14-Dec-2011 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: rewrite shmem_pwrite_slow to use copy_from_user

... instead of get_user_pages, because that fails on non page-backed
user addresses like e.g. a gtt mapping of a bo.

To get there essentially copy the vfs read path into pagecache. We
can't call that right away because we have to take care of bit17
swizzling. To not deadlock with our own pagefault handler we need
to completely drop struct_mutex, reducing the atomicty-guarantees
of our userspace abi. Implications for racing with other gem ioctl:

- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
already the risk of the pwrite call not being atomic, no degration.
- read/write access to mmaps: already fully racy, no degration.
- set_tiling: Calling set_tiling while reading/writing is already
pretty much undefined, now it just got a bit worse. set_tiling is
only called by libdrm on unused/new bos, so no problem.
- set_domain: When changing to the gtt domain while copying (without any
read/write access, e.g. for synchronization), we might leave unflushed
data in the cpu caches. The clflush_object at the end of pwrite_slow
takes care of this problem.
- truncating of purgeable objects: the shmem_read_mapping_page call could
reinstate backing storage for truncated objects. The check at the end
of pwrite_slow takes care of this.

v2:
- add missing intel_gtt_chipset_flush
- add __ to copy_from_user_swizzled as suggest by Chris Wilson.

v3: Fixup bit17 swizzling, it swizzled the wrong pages.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 5c0480f2 14-Dec-2011 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: fall through pwrite_gtt_slow to the shmem slow path

The gtt_pwrite slowpath grabs the userspace memory with
get_user_pages. This will not work for non-page backed memory, like a
gtt mmapped gem object. Hence fall throuh to the shmem paths if we hit
-EFAULT in the gtt paths.

Now the shmem paths have exactly the same problem, but this way we
only need to rearrange the code in one write path.

v2: v1 accidentaly falls back to shmem pwrite for phys objects. Fixed.

v3: Make the codeflow around phys_pwrite cleara as suggested by Chris
Wilson.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 068c6ff1 29-Jan-2012 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove the upper limit on the bo size for mapping into the CPU domain

The original intention of comparing the bo against the mappable GTT
limits was to prevent a subsequent faulting of the bo into the GTT from
clearing the entire GTT in vain. However, that was clearly a cut'n'paste
mistake as a CPU mapping never binds the bo into the aperture. Whilst
there may be some merit to limiting the maximum size of the bo to
something that can be utilized by the GPU, that limit itself does not
belong as a safeguard to mmapping the bo, so remove the check entirely.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 39965b37 14-Dec-2011 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: don't trash the gtt when running out of fences

With the fence accounting fixed up in the previous commit not finding
enough fences is a fatal error and userspace bug. Trashing the entire
gtt is not gonna turn up that missing fence, so don't to this by
returning another error thatn ENOSPC.

This has the added benefit that it's easier to distinguish fence
accounting errors from gtt space accounting issues.

TTM serves as precendence for the EDEADLK error code - it returns it
when the reservation code needs resources already blocked by the
current reservation.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 1690e1eb 14-Dec-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Separate fence pin counting from normal bind pin counting

In order to correctly account for reserving space in the GTT and fences
for a batch buffer, we need to independently track whether the fence is
pinned due to a fenced GPU access in the batch or whether the buffer is
pinned in the aperture. Currently we count the fenced as pinned if the
buffer has already been seen in the execbuffer. This leads to a false
accounting of available fence registers, causing frequent mass evictions.
Worse, if coupled with the change to make i915_gem_object_get_fence()
report EDADLK upon fence starvation, the batchbuffer can fail with only
one fence required...

Fixes intel-gpu-tools/tests/gem_fenced_exec_thrash

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38735
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Paul Neumann <paul104x@yahoo.de>
[danvet: Resolve the functional conflict with Jesse Barnes sprite
patches, acked by Chris Wilson on irc.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b93f9cf1 25-Jan-2012 Ben Widawsky <ben@bwidawsk.net>

drm/i915: argument to control retiring behavior

Sometimes it may be the case when we idle the gpu or wait on something
we don't actually want to process the retiring list. This patch allows
callers to choose the behavior.

Reviewed-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 3d29b842 17-Jan-2012 Eugeni Dodonov <eugeni.dodonov@intel.com>

drm/i915: add a LLC feature flag in device description

LLC is not SNB/IVB-specific, so we should check for it in a more generic
way.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e959b5db 22-Dec-2011 Eric Anholt <eric@anholt.net>

drm/i915: Make the fallback IRQ wait not sleep.

The waits we do here are generally so short that sleeping is a bad
idea unless we have an IRQ to wake us up. Improves regression test
performance from 18 minutes to 3.5 minutes on gen7, which is now
consistent with the previous generation.

Signed-off-by: Eric Anholt <eric@anholt.net>
Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Keith Packard <keithp@keithp.com>


# 7ea29b13 22-Dec-2011 Eric Anholt <eric@anholt.net>

drm/i915: Do the fallback non-IRQ wait in ring throttle, too.

As a workaround for IRQ synchronization issues in the gen7 BLT ring,
we want to turn the two wait functions into polling loops.

Signed-off-by: Eric Anholt <eric@anholt.net>
Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Keith Packard <keithp@keithp.com>


# ed4a5184 16-Dec-2011 Linus Torvalds <torvalds@linux-foundation.org>

Revert "drm/i915: fix infinite recursion on unbind due to ilk vt-d w/a"

This reverts commit eb1711bb94991e93669c5a1b5f84f11be2d51ea1.

It blows up the i915 seqno tracking, resulting in the

BUG_ON(seqno == 0);

in i915_wait_request() triggering, which will cause lock-ups.

See for example
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/903010
https://lkml.org/lkml/2011/12/14/395

Reported-requested-and-tested-by: Dirk Hohndel <dirk@hohndel.org>
Reported-by: Richard Eames <Richard.Eames@flinders.edu.au>
Reported-by: Rocko Requin <rockorequin@hotmail.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# eb1711bb 05-Dec-2011 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: fix infinite recursion on unbind due to ilk vt-d w/a

The recursion loop goes retire_requests->unbind->gpu_idle->retire_reqeusts.

Every time we go through this we need a
- active object that can be retired
- and there are no other references to that object than the one from
the active list, so that it gets unbound and freed immediately.
Otherwise the recursion stops. So the recursion is only limited by the
number of objects that fit these requirements sitting in the active list
any time retire_request is called.

Issue exercised by tests/gem_unref_active_buffers from i-g-t.

There's been a decent bikeshed discussion whether it wouldn't be
better to pass around a flag, but imo this is o.k. for such a limited
case that only supports a w/a.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42180

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson>
[ickle- we built better bikesheds, but this keeps the rain off for now]
Tested-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 457eafce 15-Nov-2011 Rakib Mullick <rakib.mullick@gmail.com>

drm, i915: Fix memory leak in i915_gem_busy_ioctl().

A call to i915_add_request() has been made in function i915_gem_busy_ioctl(). i915_add_request can fail,
so in it's exit path previously allocated memory needs to be freed.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Keith Packard <keithp@keithp.com>


# 14660ccd 01-Nov-2011 Eric Anholt <eric@anholt.net>

drm/i915: Fix object refcount leak on mmappable size limit error path.

I've been seeing memory leaks on my system in the form of large
(300-400MB) GEM objects created by now-dead processes laying around
clogging up memory. I usually notice when it gets to about 1.2GB of
them. Hopefully this clears up the issue, but I just found this bug
by inspection.

Signed-off-by: Eric Anholt <eric@anholt.net>
Cc: stable@kernel.org
Signed-off-by: Keith Packard <keithp@keithp.com>


# 680da876 03-Nov-2011 Jesse Barnes <jbarnes@virtuousgeek.org>

drm/i915: enable cacheable objects on Ivybridge

IVB supports these bits as well.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>


# 4b9de737 09-Oct-2011 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: add constants to size fence arrays and fields

In preparation of to support 32 fences on Ivybdrigde.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>


# ff56b0bc 01-Nov-2011 Eric Anholt <eric@anholt.net>

drm/i915: Fix object refcount leak on mmappable size limit error path.

I've been seeing memory leaks on my system in the form of large
(300-400MB) GEM objects created by now-dead processes laying around
clogging up memory. I usually notice when it gets to about 1.2GB of
them. Hopefully this clears up the issue, but I just found this bug
by inspection.

Signed-off-by: Eric Anholt <eric@anholt.net>
Cc: stable@kernel.org
Signed-off-by: Keith Packard <keithp@keithp.com>


# f372b854 17-Oct-2011 Ben Widawsky <ben@bwidawsk.net>

drm/i915: Remove early exit on i915_gpu_idle

[Description from: Daniel Vetter]
I've just discussed this quickly with Chris on irc and it's probably
best to just kill the list_empty early bailout. gpu_idle isn't a
fastpath, so who cares. One candidate where we emit commands to the ring
without adding anything onto these lists is e.g. pageflip. There are
probably more.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>


# 130c2561 17-Sep-2011 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: drop KM_USER0 argument to k(un)map_atomic

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>


# 8ffc0246 14-Sep-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Defend against userspace creating a gem object with size==0

We currently only round up the userspace size to the next page. We
assume that userspace hasn't made a mistake and requested a zero-length
gem object and all through our internal code we then presume that every
object is backed by at least a single page. Fix that oversight and
report EINVAL back to userspace if they try to create a zero length
object.

[danvet: This fixes tests/gem_bad_length]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Keith Packard <keithp@keithp.com>


# 6dacfd2f 12-Sep-2011 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: simplify swapin/out swizzle checking a bit

Use the helper function already employed by the pwrite/pread
functions.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>


# 0206e353 16-Aug-2011 Akshay Joshi <me@akshayjoshi.com>

Drivers: i915: Fix all space related issues.

Various issues involved with the space character were generating
warnings in the checkpatch.pl file. This patch removes most of those
warnings.

Signed-off-by: Akshay Joshi <me@akshayjoshi.com>
Signed-off-by: Keith Packard <keithp@keithp.com>


# b464e9a2 10-Aug-2011 Rob Clark <rob@ti.com>

drm/i915: use common functions for mmap offset creation

Signed-off-by: Rob Clark <rob@ti.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# e0e3fb48 29-Jul-2011 Keith Packard <keithp@keithp.com>

drm/i915: Ignore GPU wedged errors while pinning scanout buffers

Failing to pin a scanout buffer will most likely lead to a black
screen, so if the GPU is wedged, then just let the pin happen and hope
that things work out OK.

Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# f0b69efc 19-Jul-2011 Keith Packard <keithp@keithp.com>

drm/i915: Skip GPU wait for scanout pin while wedged

Failing to pin a scanout buffer will most likely lead to a black
screen, so if the GPU is wedged, then just let the pin happen and hope
that things work out OK.

v2: Just ignore any error from i915_gem_object_wait_rendering, as
suggested by Chris Wilson

Signed-off-by: Keith Packard <keithp@keithp.com>


# e28f8711 18-Jul-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix unfenced alignment on pre-G33 hardware

Align unfenced buffers on older hardware to the power-of-two object
size. The docs suggest that it should be possible to align only to a
power-of-two tile height, but using the already computed fence size is
easier and always correct. We also have to make sure that we unbind
misaligned buffers upon tiling changes.

In order to prevent a repetition of this bug, we change the interface
to the alignment computation routines to force the caller to provide
the requested alignment and size of the GTT binding rather than assume
the current values on the object.

Reported-and-tested-by: Sitosfe Wheeler <sitsofe@yahoo.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36326
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>


# 3e0dc6b0 29-Jun-2011 Ben Widawsky <ben@bwidawsk.net>

drm/i915: hangcheck disable parameter

Provide a parameter to disable hanghcheck. This is useful mostly for
developers trying to debug known problems, and probably should not be
touched by normal users.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Keith Packard <keithp@keithp.com>


# f01c22fd 28-Jun-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use chipset-specific irq installers

Konstantin Belousov pointed out that 4697995b98417 replaced the generic
i915_driver_irq_*install() functions with chipset specific routines
accessible only through driver->irq_*install(). So update the sanity
check in i915_request_wait() to match.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>


# e2377fe0 27-Jun-2011 Hugh Dickins <hughd@google.com>

drm/i915: use shmem_truncate_range

The interface to ->truncate_range is changing very slightly: once "tmpfs:
take control of its truncate_range" has been applied, this can be applied.
For now there is only a slight inefficiency while this remains unapplied,
but it will soon become essential for managing shmem's use of swap.

Change i915_gem_object_truncate() to use shmem_truncate_range() directly:
which should also spare i915 later change if we switch from
inode_operations->truncate_range to file_operations->fallocate.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5949eac4 27-Jun-2011 Hugh Dickins <hughd@google.com>

drm/i915: use shmem_read_mapping_page

Soon tmpfs will stop supporting ->readpage and read_cache_page_gfp(): once
"tmpfs: add shmem_read_mapping_page_gfp" has been applied, this patch can
be applied to ease the transition.

Make i915_gem_object_get_pages_gtt() use shmem_read_mapping_page_gfp() in
the one place it's needed; elsewhere use shmem_read_mapping_page(), with
the mapping's gfp_mask properly initialized.

Forget about __GFP_COLD: since tmpfs initializes its pages with memset,
asking for a cold page is counter-productive.

Include linux/shmem_fs.h also in drm_gem.c: with shmem_file_setup() now
declared there too, we shall remove the prototype from linux/mm.h later.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# b97c3d9c 24-Jun-2011 Keith Packard <keithp@keithp.com>

drm/i915: i915_gem_object_finish_gtt must always release gtt mmap

Even if the object is no longer in the GTT domain, there may still be
a user space mapping which needs to be released.

Without this fix, render-based text (mostly in firefox) would
occasionally get corrupted when the system was under load.

Signed-off-by: Keith Packard <keithp@keithp.com>


# e92d03bf 14-Jun-2011 Eric Anholt <eric@anholt.net>

Revert "drm/i915: Kill GTT mappings when moving from GTT domain"

This reverts commit 4a684a4117abd756291969336af454e8a958802f.
Userland has always been required to set the object's domain to GTT
before using it through a GTT mapping, it's not something that the
kernel is supposed to enforce. (The pagefault support is so that we
can handle multiple mappings without userland having to pin across
them, not so that userland can use GTT after GPU domains without
telling the kernel).

Fixes 19.2% +/- 0.8% (n=6) performance regression in cairo-gl
firefox-talos-gfx on my T420 latop.

Signed-off-by: Keith Packard <keithp@keithp.com>


# b65552f0 12-Jun-2011 Jesper Juhl <jj@chaosbits.net>

drm/i915: Don't leak in i915_gem_shmem_pread_slow()

It seems to me that we are leaking 'user_pages' in
drivers/gpu/drm/i915/i915_gem.c::i915_gem_shmem_pread_slow() if
read_cache_page_gfp() fails.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# a1871112 29-Mar-2011 Eric Anholt <eric@anholt.net>

drm/i915: Use the LLC mode on gen6 for everything but display.

Improves full-screen openarena on my laptop 20.3% +/- 4.0% (n=3)
Improves 800x600 nexuiz on my laptop 12.3% +/- 0.1% (n=3)

We have more room to improve with doing LLC caching for display using
GFDT, and in doing LLC+MLC caching, but this was an easy performance
win and incremental improvement toward those two.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a7ef0640 29-Mar-2011 Eric Anholt <eric@anholt.net>

drm/i915: Use the uncached domain for the display planes

The simplest and common method for ensuring scanout coherency on all
chipsets is to mark the scanout buffers as uncached (and for
userspace to remember to flush the render cache every so often).

We can improve upon this for later generations by marking scanout
objects as GFDT and only flush those cachelines when required. However,
we start simple.

[v2: Move the set to uncached above the clflush. Otherwise, we'd skip
the clflush and try to scan out data that was still sitting in the
cache.]

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 2da3b9b9 14-Apr-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Combine pinning with setting to the display plane

We need to perform a few operations in order to move the object into the
display plane (where it can be accessed coherently by the display
engine) that are important for future safety to forbid whilst pinned. As a
result, we want to need to perform some of the operations before pinning,
but some are required once we have been bound into the GTT. So combine
the pinning performed by all the callers with set_to_display_plane(), so
this complication is contained within the single function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# e4ffd173 04-Apr-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add an interface to dynamically change the cache level

[anholt v2: Don't forget that when going from cached to uncached, we
haven't been tracking the write domain from the CPU perspective, since
we haven't needed it for GPU coherency.]

[ickle v3: We also need to make sure we relinquish any fences on older
chipsets and clear the GTT for sane domain tracking.]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# b5ffc9bc 13-Apr-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Introduce i915_gem_object_finish_gtt()

Like its siblings finish_gpu(), this function clears the object from the
GTT domain forcing it to be trigger a domain invalidation should we ever
need to use via the GTT again.

Note that the most important side-effect of finishing the GTT domain
(aside from clearing the tracking read/write domains) is that it imposes
an memory barrier so that all accesses are complete before it returns,
which is important if you intend to be modifying translation tables
shortly afterwards. The second most important side-effect is that it
tears down the GTT mappings forcing a page-fault and invalidation on
next user access to the object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# a8198eea 13-Apr-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Introduce i915_gem_object_finish_gpu()

... reincarnated from i915_gem_object_flush_gpu(). The semantic
difference is that after calling finish_gpu() the object no longer
resides in any GPU domain, and so will cause the GPU caches to be
invalidated if it is ever used again.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# c8ebc2b0 12-May-2011 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/915: fix relaxed tiling on gen2: tile height

A tile on gen2 has a size of 2kb, stride of 128 bytes and 16 rows.

Userspace was broken and assumed 8 rows. Chris Wilson noted that the
kernel unfortunately can't reliable check that because libdrm rounds
up the size to the next bucket.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>


# c8cbbb8b 12-May-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: s/addr & ~PAGE_MASK/offset_in_page(addr)/

Convert our open coded offset_in_page() to the common macro.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Keith Packard <keithp@keithp.com>


# 1495f230 24-May-2011 Ying Han <yinghan@google.com>

vmscan: change shrinker API by passing shrink_control struct

Change each shrinker's API by consolidating the existing parameters into
shrink_control struct. This will simplify any further features added w/o
touching each file of shrinker.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix warning]
[kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
[akpm@linux-foundation.org: fix xfs warning]
[akpm@linux-foundation.org: update gfs2]
Signed-off-by: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 25aebfc3 06-May-2011 Eric Anholt <eric@anholt.net>

drm/i915: Add support for fence registers on Ivybridge.

The registers are the same as on Sandybridge. Fixes scrambled display
in X when it does software drawing to the GTT, and scans the results
out as tiled.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>


# 10ed13e4 06-May-2011 Eric Anholt <eric@anholt.net>

drm/i915: Use existing function instead of open-coding fence reg clear.

This is once less place to miss a new INTEL_INFO(dev)->gen update now.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>


# 9c23f7fc 29-Mar-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Do not clflush snooped objects

Rely on the GPU snooping into the CPU cache for appropriately bound
objects on MI_FLUSH. Or perhaps one day we will have a cache-coherent
CPU/GPU package...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>


# 93dfb40c 29-Mar-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rename agp_type to cache_level

... to clarify just how we use it inside the driver and remove the
confusion of the poorly matching agp_type names. We still need to
translate through agp_type for interface into the fake AGP driver.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>


# f6e47884 20-Mar-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid unmapping pages from a NULL address space

Found by gem_stress.

As we perform retirement from a workqueue, it is possible for us to free
and unbind objects after the last close on the device, and so after the
address space has been torn down and reset to NULL:

BUG: unable to handle kernel NULL pointer dereference at 00000054
IP: [<c1295a20>] mutex_lock+0xf/0x27
*pde = 00000000
Oops: 0002 [#1] SMP
last sysfs file: /sys/module/vt/parameters/default_utf8

Pid: 5, comm: kworker/u:0 Not tainted 2.6.38+ #214
EIP: 0060:[<c1295a20>] EFLAGS: 00010206 CPU: 1
EIP is at mutex_lock+0xf/0x27
EAX: 00000054 EBX: 00000054 ECX: 00000000 EDX: 00012fff
ESI: 00000028 EDI: 00000000 EBP: f706fe20 ESP: f706fe18
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process kworker/u:0 (pid: 5, ti=f706e000 task=f7060d00 task.ti=f706e000)
Stack:
f5aa3c60 00000000 f706fe74 c107e7df 00000246 dea55380 00000054 f5aa3c60
f706fe44 00000061 f70b4000 c13fff84 00000008 f706fe54 00000000 00000000
00012f00 00012fff 00000028 c109e575 f6b36700 00100000 00000000 f706fe90
Call Trace:
[<c107e7df>] unmap_mapping_range+0x7d/0x1e6
[<c109e575>] ? mntput_no_expire+0x52/0xb6
[<c11c12f6>] i915_gem_release_mmap+0x49/0x58
[<c11c3449>] i915_gem_object_unbind+0x4c/0x125
[<c11c353f>] i915_gem_free_object_tail+0x1d/0xdb
[<c11c55a2>] i915_gem_free_object+0x3d/0x41
[<c11a6be2>] ? drm_gem_object_free+0x0/0x27
[<c11a6c07>] drm_gem_object_free+0x25/0x27
[<c113c3ca>] kref_put+0x39/0x42
[<c11c0a59>] drm_gem_object_unreference+0x16/0x18
[<c11c0b15>] i915_gem_object_move_to_inactive+0xba/0xbe
[<c11c0c87>] i915_gem_retire_requests_ring+0x16e/0x1a5
[<c11c3645>] i915_gem_retire_requests+0x48/0x63
[<c11c36ac>] i915_gem_retire_work_handler+0x4c/0x117
[<c10385d1>] process_one_work+0x140/0x21b
[<c103734c>] ? __need_more_worker+0x13/0x2a
[<c10373b1>] ? need_to_create_worker+0x1c/0x35
[<c11c3660>] ? i915_gem_retire_work_handler+0x0/0x117
[<c1038faf>] worker_thread+0xd4/0x14b
[<c1038edb>] ? worker_thread+0x0/0x14b
[<c103be1b>] kthread+0x68/0x6d
[<c103bdb3>] ? kthread+0x0/0x6d
[<c12970f6>] kernel_thread_helper+0x6/0x10
Code: 00 e8 98 fe ff ff 5d c3 55 89 e5 3e 8d 74 26 00 ba 01 00 00 00 e8
84 fe ff ff 5d c3 55 89 e5 53 8d 64 24 fc 3e 8d 74 26 00 89 c3 <f0> ff
08 79 05 e8 ab ff ff ff 89 e0 25 00 e0 ff ff 89 43 10 58
EIP: [<c1295a20>] mutex_lock+0xf/0x27 SS:ESP 0068:f706fe18
CR2: 0000000000000054

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Keith Packard <keithp@keithp.com>


# 26e12f89 20-Mar-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix use after free within tracepoint

Detected by scripts/coccinelle/free/kfree.cocci.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Keith Packard <keithp@keithp.com>


# 36d527de 19-Mar-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Restore missing command flush before interrupt on BLT ring

We always skipped flushing the BLT ring if the request flush did not
include the RENDER domain. However, this neglects that we try to flush
the COMMAND domain after every batch and before the breadcrumb interrupt
(to make sure the batch is indeed completed prior to the interrupt
firing and so insuring CPU coherency). As a result of the missing flush,
incoherency did indeed creep in, most notable when using lots of command
buffers and so potentially rewritting an active command buffer (i.e.
the GPU was still executing from it even though the following interrupt
had already fired and the request/buffer retired).

As all ring->flush routines now have the same preconditions, de-duplicate
and move those checks up into i915_gem_flush_ring().

Fixes gem_linear_blit.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35284
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: mengmeng.meng@intel.com


# ed0291fd 19-Mar-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix computation of pitch for dumb bo creator

Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 29c5a587 17-Mar-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix tiling corruption from pipelined fencing

... even though it was disabled. A mistake in the handling of fence reuse
caused us to skip the vital delay of waiting for the object to finish
rendering before changing the register. This resulted in us changing the
fence register whilst the bo was active and so causing the blits to
complete using the wrong stride or even the wrong tiling. (Visually the
effect is that small blocks of the screen look like they have been
interlaced). The fix is to wait for the GPU to finish using the memory
region pointed to by the fence before changing it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34584
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[Note for 2.6.38-stable, we need to reintroduce the interruptible passing]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Dave Airlie <airlied@linux.ie>


# 09bfa517 17-Mar-2011 Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>

drm/i915: Prevent racy removal of request from client list

When i915_gem_retire_requests_ring calls i915_gem_request_remove_from_client,
the client_list for that request may already be removed in i915_gem_release.
So we may call twice list_del(&request->client_list), resulting in an
oops like this report:

[126167.230394] BUG: unable to handle kernel paging request at 00100104
[126167.230699] IP: [<f8c2ce44>] i915_gem_retire_requests_ring+0xd4/0x240 [i915]
[126167.231042] *pdpt = 00000000314c1001 *pde = 0000000000000000
[126167.231314] Oops: 0002 [#1] SMP
[126167.231471] last sysfs file: /sys/devices/LNXSYSTM:00/device:00/PNP0C0A:00/power_supply/BAT1/current_now
[126167.231901] Modules linked in: snd_seq_dummy nls_utf8 isofs btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs exportfs reiserfs cryptd aes_i586 aes_generic binfmt_misc vboxnetadp vboxnetflt vboxdrv parport_pc ppdev snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_hwdep arc4 snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq uvcvideo videodev snd_timer snd_seq_device joydev iwlagn iwlcore mac80211 snd cfg80211 soundcore i915 drm_kms_helper snd_page_alloc psmouse drm serio_raw i2c_algo_bit video lp parport usbhid hid sky2 sdhci_pci ahci sdhci libahci
[126167.232018]
[126167.232018] Pid: 1101, comm: Xorg Not tainted 2.6.38-6-generic-pae #34-Ubuntu Gateway MC7833U /
[126167.232018] EIP: 0060:[<f8c2ce44>] EFLAGS: 00213246 CPU: 0
[126167.232018] EIP is at i915_gem_retire_requests_ring+0xd4/0x240 [i915]
[126167.232018] EAX: 00200200 EBX: f1ac25b0 ECX: 00000040 EDX: 00100100
[126167.232018] ESI: f1a2801c EDI: e87fc060 EBP: ef4d7dd8 ESP: ef4d7db0
[126167.232018] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[126167.232018] Process Xorg (pid: 1101, ti=ef4d6000 task=f1ba6500 task.ti=ef4d6000)
[126167.232018] Stack:
[126167.232018] f1a28000 f1a2809c f1a28094 0058bd97 f1aa2400 f1a2801c 0058bd7b 0058bd85
[126167.232018] f1a2801c f1a28000 ef4d7e38 f8c2e995 ef4d7e30 ef4d7e60 c14d1ebc f6b3a040
[126167.232018] f1522cc0 000000db 00000000 f1ba6500 ffffffa1 00000000 00000001 f1a29214
[126167.232018] Call Trace:

Unfortunately the call trace reported was cut, but looking at debug
symbols the crash is at __list_del, when probably list_del is called
twice on the same request->client_list, as the dereferenced value is
LIST_POISON1 + 4, and by looking more at the debug symbols before
list_del call it should have being called by
i915_gem_request_remove_from_client

And as I can see in the code, it seems we indeed have the possibility
to remove a request->client_list twice, which would cause the above,
because we do list_del(&request->client_list) on both
i915_gem_request_remove_from_client and i915_gem_release

As Chris Wilson pointed out, it's indeed the case:
"(...) I had thought that the actual insertion/deletion was serialised
under the struct mutex and the intention of the spinlock was to protect
the unlocked list traversal during throttling. However, I missed that
i915_gem_release() is also called without struct mutex and so we do need
the double check for i915_gem_request_remove_from_client()."

This change does the required check to avoid the duplicate remove of
request->client_list.

Bugzilla: http://bugs.launchpad.net/bugs/733780
Cc: stable@kernel.org # 2.6.38
Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 467cffba 07-Mar-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rebind the buffer if its alignment constraints changes with tiling

Early gen3 and gen2 chipset do not have the relaxed per-surface tiling
constraints of the later chipsets, so we need to check that the GTT
alignment is correct for the new tiling. If it is not, we need to
rebind.

Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# ce453d81 21-Feb-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use a device flag for non-interruptible phases

The code paths for modesetting are growing in complexity as we may need
to move the buffers around in order to fit the scanout in the aperture.
Therefore we face a choice as to whether to thread the interruptible status
through the entire pinning and unbinding code paths or to add a flag to
the device when we may not be interrupted by a signal. This does the
latter and so fixes a few instances of modesetting failures under stress.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# c8725226 19-Feb-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Protect against drm_gem_object not being the first member

Dave Airlie spotted that we had a potential bug should we ever rearrange
the drm_i915_gem_object so not the base drm_gem_object was not its first
member. He noticed that we often convert the return of
drm_gem_object_lookup() immediately into drm_i915_gem_object and then
check the result for nullity. This is only valid when the base object is
the first member and so the superobject has the same address. Play safe
instead and use the compiler to convert back to the original return
address for sanity testing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# bed636ab 11-Feb-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: i915_mutex_interruptible() returns -EINTR

... so we handle that for i915_gem_fault() in the same manner as
ERESTARTSYS, or we send a SIGBUS to the faulting application.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 8d7e3de1 07-Feb-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Skip the no-op domain changes when already in CPU|GTT domains

Removes some superfluous fluff from tracing...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# db53a302 03-Feb-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refine tracepoints

A lot of minor tweaks to fix the tracepoints, improve the outputting for
ftrace, and to generally make the tracepoints useful again. It is a start
and enough to begin identifying performance issues and gaps in our
coverage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# d9bc7e9f 07-Feb-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix infinite loop regression from 21dd3734

By returning EAGAIN upon a wedged GPU before attempting to wait, we
would hit an infinite loop of repeating operation without ever
progressing. Instead this needs to be EIO so that userspace knows that
the GPU is truly wedged and not in the process of error recovery.

Similarly, we need to handle the error recovery during i915_gem_fault.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# ff72145b 06-Feb-2011 Dave Airlie <airlied@redhat.com>

drm: dumb scanout create/mmap for intel/radeon (v3)

This is just an idea that might or might not be a good idea,
it basically adds two ioctls to create a dumb and map a dumb buffer
suitable for scanout. The handle can be passed to the KMS ioctls to create
a framebuffer.

It looks to me like it would be useful in the following cases:
a) in development drivers - we can always provide a shadowfb fallback.
b) libkms users - we can clean up libkms a lot and avoid linking
to libdrm_*.
c) plymouth via libkms is a lot easier.

Userspace bits would be just calls + mmaps. We could probably
mark these handles somehow as not being suitable for acceleartion
so as top stop people who are dumber than dumb.

Signed-off-by: Dave Airlie <airlied@redhat.com>


# 21dd3734 26-Jan-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Defer reporting EIO until we try to use the GPU

Instead of reporting EIO upfront in the entrance of an ioctl that may or
may not attempt to use the GPU, defer the actual detection of an invalid
ioctl to when we issue a GPU instruction. This allows us to continue to
use bo in video memory (via pread/pwrite and mmap) after the GPU has hung.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# e110e8d6 26-Jan-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Check wedged status before throttling

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 29ee3991 24-Jan-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Silence a few -Wunused-but-set-variable

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# bee4a186 21-Jan-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915,agp/intel: Do not clear stolen entries

We can only utilize the stolen portion of the GTT if we are in sole
charge of the hardware. This is only true if using GEM and KMS,
otherwise VESA continues to access stolen memory.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 076e2c0e 21-Jan-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix use of invalid array size for ring->sync_seqno

There are I915_NUM_RINGS-1 inter-ring synchronisation counters, but we
were clearing I915_NUM_RINGS of them. Oops.

Reported-by: Jiri Slaby <jirislaby@gmail.com>
Tested-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 809b6334 10-Jan-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: If we hit OOM when allocating GTT pages, clear the aperture

Rather than evicting an object at random, which is unlikely to alleviate
the memory pressure sufficient to allow us to continue, zap the entire
aperture. That should give the system long enough to recover and reap
some pages from the evicted objects, forestalling the allocation error
for the new object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 0a58705b 09-Jan-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Periodically flush the active lists and requests

In order to retire active buffers whilst no client is active, we need to
insert our own flush requests onto the ring.

This is useful for servers that queue up some rendering and then go to
sleep as it allows us to the complete processing of those requests,
potentially making that memory available again much earlier.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 88241785 07-Jan-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Propagate error from flushing the ring

... in order to avoid a BUG() and potential unbounded waits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# b72f3acb 04-Jan-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Handle ringbuffer stalls when flushing

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 63256ec5 04-Jan-2011 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Enforce write ordering through the GTT

We need to ensure that writes through the GTT land before any
modification to the MMIO registers and so must impose a mandatory write
barrier when flushing the GTT domain. This was revealed by relaxing the
write ordering by experimentally mapping the registers and the GATT as
write-combining.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 72bfa19c 19-Dec-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Allow the application to choose the constant addressing mode

The relative-to-general state default is useless as it means having to
rewrite the streaming kernels for each batch. Relative-to-surface is
more useful, as that stream usually needs to be rewritten for each
batch. And absolute addressing mode, vital if you start streaming
state, is also only available by adjusting the register...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# b5ba177d 13-Dec-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Poll for seqno completion if IRQ is disabled

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32288
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# b13c2b96 13-Dec-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/ringbuffer: Make IRQ refcnting atomic

In order to enforce the correct memory barriers for irq get/put, we need
to perform the actual counting using atomic operations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 7a194876 07-Dec-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Emit a request to clear a flushed and idle ring for unbusy bo

In order for bos to retire eventually, a request must be sent down the
ring. This is expected, for example, by occlusion queries for which mesa
will wait upon (whilst running glean) before issuing more batches and so
the normal activity upon the ring is suspended and we need to emit a
request to clear the idle ring.

Reported-by: Jinjin, Wang <jinjin.wang@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30380
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 0be73284 06-Dec-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wait for the bo if a display flip is pipelined on the other ring

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 0ac74c6b 06-Dec-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only emit a flush if there is an outstanding gpu write

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 6bda10d1 05-Dec-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Completely disable fence pipelining.

I'm still seeing tiling corruption of PutImage and CopyArea (I think)
under mutter on pnv, so obviously the pipelining logic is deeply flawed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 1ec14ad3 04-Dec-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Implement GPU semaphores for inter-ring synchronisation on SNB

The bulk of the change is to convert the growing list of rings into an
array so that the relationship between the rings and the semaphore sync
registers can be easily computed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 60de2ba5 12-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Kill the get_fence tracepoint

As the tracepoint is now decoupled from when the actual register is
assigned and was never complemented by detailing when the object lost
its fence, it has outlived its limited usefulness. Profiling the actual
stalls is a far more profitable venture anyway.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# c6748e09 12-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove inactive LRU tracking from set_domain_ioctl

As the userspace mappings are torn down on every GPU write, we prefer to
track when the buffer is activated (via a fresh i915_gem_fault). This
makes the LRU conceptually simpler. With coherent mappings, the
remaining use-case for set_domain_ioctl is GPU synchronisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# d9e86c0e 10-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pipelined fencing [infrastructure]

With this change, every batchbuffer can use all available fences (save
pinned and scanout, of course) without ever stalling the gpu!

In theory. Currently the actual pipelined update of the register is
disabled due to some stability issues. However, just the deferred update
is a significant win.

Based on a series of patches by Daniel Vetter.

The premise is that before every access to a buffer through the GTT we
have to declare whether we need a register or not. If the access is by
the GPU, a pipelined update to the register is made via the ringbuffer,
and we track the last seqno of the batches that access it. If by the
CPU we wait for the last GPU access and update the register (either
to clear or to set it for the current buffer).

One advantage of being able to pipeline changes is that we can defer the
actual updating of the fence register until we first need to access the
object through the GTT, i.e. we can eliminate the stall on set_tiling.
This is important as the userspace bo cache does not track the tiling
status of active buffers which generate frequent stalls on gen3 when
enabling tiling for an already bound buffer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 87ca9c8a 02-Dec-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prevent stalling for a GTT read back from a read-only GPU target

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 7d2cb39c 27-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Release fenced GTT mapping on suspend

... so that upon first use after resume we will reacquire the fence reg.

Reported-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# de18a29e 27-Nov-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: fix regression due to ba3d8d749b01548b9

We don't track gpu flush request in any special way. So even with
obj->write_domain == 0, a gpu flush might be outstanding but no
yet executed. Even worse, the latest request might use the object
only for reading. So and unconditional call to object_wait_rendering
is needed for !pipelined.

Hence revert that patch fully and untangle the flushing from the
synchronization again.

Reported-by: Keith Packard <keithp@keithp.com>
Tested-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 432e58ed 25-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid allocation for execbuffer object list

Besides the minimal improvement in reducing the execbuffer overhead, the
real benefit is clarifying a few routines.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 54cf91dc 25-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Split i915_gem_execbuffer into its own file.

A number of dragons have been seen lurking within the execbuffer code.
The first step is then to isolate them from the rest and begin to
scrutinise them in depth. Suggested by Daniel Vetter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 6299f992 23-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Defer accounting until read from debugfs

Simply remove our accounting of objects inside the aperture, keeping
only track of what is in the aperture and its current usage. This
removes the over-complication of BUGs that were attempting to keep the
accounting correct and also removes the overhead of the accounting on
the hot-paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 2021746e 23-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Mark a few functions as __must_check

... to benefit from the compiler checking that we remember to handle
and propagate errors.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 312817a3 22-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only save and restore fences for UMS

With KMS, we can simply relinquish the fence when we idle the GPU and
reassign it upon first use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# c6642782 12-Nov-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Add a mechanism for pipelining fence register updates

Not employed just yet...

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# caea7476 12-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: More accurately track last fence usage by the GPU

Based on a patch by Daniel Vetter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# a7a09aeb 12-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rework execbuffer pinning

Avoid evicting buffers that will be used later in the batch in order to
make room for the initial buffers by pinning all bound buffers in a
single pass before binding (and evicting for) fresh buffer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 919926ae 12-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Thread the pipelining ring through the callers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# dddbc0e5 12-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove a defunct BUG_ON

This used to check the precondition that all fences were to be located
in a mappable area, redundant now as those two parameters are combined
into one.

After pinning, we assert that the buffer is bound into the desired
region.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# b6913e4b 12-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move the implementation details of PIPE_CONTROL to the ringbuffer

The pipe control object is allocated by the device for the sole use of the
render ringbuffer. Move this detail from the general code to the render
ring buffer initialisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 92b88aeb 09-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Not all mappable regions require GTT fence regions

Combining map_and_fenceable revealed a bug in
i915_gem_object_gtt_size() in that it always computed the appropriate
fence size for the object regardless of tiling state which caused us to
over-allocate linear buffers when binding to the GTT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 05394f39 08-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use drm_i915_gem_object as the preferred type

A glorified s/obj_priv/obj/ with a net reduction of over a 100 lines and
many characters!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 7c2e6fdf 06-Nov-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: move gtt handling to i915_gem_gtt.c

No more drm_*_agp in i915_gem.c!

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 93a37f20 05-Nov-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: track objects in the gtt

This is required to restore gtt mappings on resume when agp is gone.

The right way to do this would be to make sturct drm_mm_node embeddable
and use the allocation list maintained by the drm memory manager. But
that's a bigger project. Getting rid of the per bo agp_mem will save
more memory than this wastes, anyway.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 40ce6575 05-Nov-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915/gtt: call chipset flush directly

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 23ed992a 05-Nov-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915|intel-gtt: consolidate intel-gtt.h headers

... and a few other defines.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# bcf50e27 21-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Handle pagefaults in execbuffer user relocations

Currently if we hit a pagefault when applying a user relocation for the
execbuffer, we bail and return EFAULT to the application. Instead, we
need to unwind, drop the dev->struct_mutex, copy all the relocation
entries to a vmalloc array (to avoid any potential circular deadlocks
when resolving the pagefault), retake the mutex and then apply the
relocations. Afterwards, we need to again drop the lock and copy the
vmalloc array back to userspace.

v2: Incorporate feedback from Daniel Vetter.

Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# d1d78830 21-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prevent integer overflow when validating the execbuffer

Commit 2549d6c2 removed the vmalloc used for temporary storage of the
relocation lists used during execbuffer. However, our use of vmalloc was
being protected by an integer overflow check which we do want to
preserve!

Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 51311d0a 17-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Do not hold mutex when faulting in user addresses

Linus Torvalds found that it was rather trivial to trigger a system
freeze:

In fact, with lockdep, I don't even need to do the sysrq-d thing: it
shows the bug as it happens. It's the X server taking the same lock
recursively.

Here's the problem:

=============================================
[ INFO: possible recursive locking detected ]
2.6.37-rc2-00012-gbdbd01a #7
---------------------------------------------
Xorg/2816 is trying to acquire lock:
(&dev->struct_mutex){+.+.+.}, at: [<ffffffff812c626c>] i915_gem_fault+0x50/0x17e

but task is already holding lock:
(&dev->struct_mutex){+.+.+.}, at: [<ffffffff812c403b>] i915_mutex_lock_interruptible+0x28/0x4a

other info that might help us debug this:
2 locks held by Xorg/2816:
#0: (&dev->struct_mutex){+.+.+.}, at: [<ffffffff812c403b>] i915_mutex_lock_interruptible+0x28/0x4a
#1: (&mm->mmap_sem){++++++}, at: [<ffffffff81022d4f>] page_fault+0x156/0x37b

This recursion was introduced by rearranging the locking to avoid the
double locking on the fast path (4f27b5d and fbd5a26d) and the
introduction of the prefault to encourage the fast paths (b5e4f2b). In
order to undo the problem, we rearrange the code to perform the access
validation upfront, attempt to prefault and then fight for control of the
mutex. the best case scenario where the mutex is uncontended the
prefaulting is not wasted.

Reported-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 5e783301 14-Nov-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: fix relaxed tiling for gen <= 3 && !g33

g33/pineview doesn't have any alignment constrains for unfenced tiled
buffers. But older chips have. Fix this.

Problem introduced in a00b10c360b35d6431a94cbf130a4e162870d661.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 85345517 13-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Retire any pending operations on the old scanout when switching

An old and oft reported bug, is that of the GPU hanging on a
MI_WAIT_FOR_EVENT following a mode switch. The cause is that the GPU is
waiting on a scanline counter on an inactive pipe, and so waits for a
very long time until eventually the user reboots his machine.

We can prevent this either by moving the WAIT into the kernel and
thereby incurring considerable cost on every swapbuffers, or by waiting
for the GPU to retire the last batch that accesses the framebuffer
before installing a new one. As mode switches are much rarer than swap
buffers, this looks like an easy choice.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28964
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29252
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org


# 5d97eb69 10-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only add the lazy request if we end up waiting for it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# fce7d61b 30-Oct-2010 Joe Perches <joe@perches.com>

drivers/gpu/drm: Update WARN uses

Coalesce long formats.
Align arguments.
Add missing newlines.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# b47b30cc 07-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid might_fault during pwrite whilst holding our mutex

... and so prevent a potential circular reference:

[ INFO: possible circular locking dependency detected ]
2.6.37-rc1-uwe1+ #4
-------------------------------------------------------
Xorg/1401 is trying to acquire lock:
(&mm->mmap_sem){++++++}, at: [<c01e4ddb>] might_fault+0x4b/0xa0

but task is already holding lock:
(&dev->struct_mutex){+.+.+.}, at: [<f869c3ac>]
i915_mutex_lock_interruptible+0x3c/0x60 [i915]

which lock already depends on the new lock.

When the locking around the pwrite ioctl was simplified, I did not spot
that the phys path never took any locks and so we introduced this
potential circular reference.

Reported-by: Uwe Helm <uwe.helm@googlemail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 045e769a 07-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Handle GPU hangs during fault gracefully.

Instead of killing the process, just return no page found and reschedule
the process giving the GPU some time to (hopefully) recover.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 75e9e915 04-Nov-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: kill mappable/fenceable disdinction

a00b10c360b35d6431a "Only enforce fence limits inside the GTT" also
added a fenceable/mappable disdinction when binding/pinning buffers.
This only complicates the code with no pratical gain:

- In execbuffer this matters on for g33/pineview, as this is the only
chip that needs fences and has an unmappable gtt area. But fences
are only possible in the mappable part of the gtt, so need_fence
implies need_mappable. And need_mappable is only set independantly
with relocations which implies (for sane userspace) that the buffer
is untiled.

- The overlay code is only really used on i8xx, which doesn't have
unmappable gtt. And it doesn't support tiled buffers, currently.

- For all other buffers it's a bug to pass in a tiled bo.

In short, this disdinction doesn't have any practical gain.

I've also reverted mapping the overlay and context pages as possibly
unmappable. It's not worth being overtly clever here, all the big
gains from unmappable are for execbuf bos.

Also add a comment for a clever optimization that confused me
while reading the original patch by Chris Wilson.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 085ce264 03-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Ensure that if we ever try to pin+fence it is mappable.

When merging Daniel's full-gtt patches I had a set of tweaks which I
thought I had undone. I was half right...

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31286
Reported-by: jinjin.wang@intel.com
Reported-by: Alexey Fisher <bug-track@fisher-privat.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# c6afd658 01-Nov-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Apply big hammer to serialise buffer access between rings

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org


# 0f8c6d7c 31-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move the invalidate|flush information out of the device struct

... and into a local structure scoped for the single function in which
it is used.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 13b29289 31-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Apply big hammer to serialise buffer access between rings

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 5eac3ab4 31-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Evict just the purgeable GTT entries on the first pass

Take two passes to evict everything whilst searching for sufficient free
space to bind the batchbuffer. After searching for sufficient free space
using LRU eviction, evict everything that is purgeable and try again.
Only then if there is insufficient free space (or the GTT is too badly
fragmented) evict everything from the aperture and try one last time.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# ff75b9bc 30-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix typo from e5281ccd in i915_gem_attach_phys_object()

Accessing the uninitialised obj->pages instead of the local page lead to
an OOPs.

Reported-by: Xavier Chantry <chantry.xavier@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 872d860c 28-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove the duplicate domain-change tracepoint for GPU flush

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# a00b10c3 24-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only enforce fence limits inside the GTT.

So long as we adhere to the fence registers rules for alignment and no
overlaps (including with unfenced accesses to linear memory) and account
for the tiled access in our size allocation, we do not have to allocate
the full fenced region for the object. This allows us to fight the bloat
tiling imposed on pre-i965 chipsets and frees up RAM for real use. [Inside
the GTT we still suffer the additional alignment constraints, so it doesn't
magic allow us to render larger scenes without stalls -- we need the
expanded GTT and fence pipelining to overcome those...]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 7465378f 29-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Convert BUG_ON(pin_count) from an impossible condition

Also spotted by Dan Carpenter.

obj->pin_count is unsigned so the BUG_ON(obj->pin_count<0) will never
trigger.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# bbe2e11a 28-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Do not return -1 from shrinker when nr_to_scan == 0

The error code is only expected during the actual pruning and not during
the first measurement (nr_to_scan == 0) pass.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 395b70be 28-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Flush read-only buffers from the active list upon idle as well

It is possible for the active list to only contain a read-only buffer so
that the ring->gpu_write_list remains entry. This leads to an
inconsistency between i915_gpu_is_active() and i915_gpu_idle() causing
an infinite spin during the shrinker and an assertion failure that
i915_gpu_idle() does indeed flush all buffers from the active lists.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 4a684a41 28-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Kill GTT mappings when moving from GTT domain

In order to force a page-fault on a GTT mapping after we start using it
from the GPU and so enforce correct CPU/GPU synchronisation, we need to
invalidate the mapping.

Pointed out by Owain G. Ainsworth.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# e5281ccd 28-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Eliminate nested get/put pages

By using read_cache_page() for individual pages during pwrite/pread we
can eliminate an unnecessary large allocation (and immediate free) of
obj->pages. Also this eliminates any potential nesting of get/put pages,
simplifying the code and preparing the path for greater things.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 39a01d1f 28-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove mmap_offset

Since we rarely use the mmap_offset and it is easily computable from the
obj->map_list.hash, remove it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 17250b71 27-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make the inactive object shrinker per-device

Eliminate the racy device unload by embedding a shrinker into each
device. Smaller, simpler code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# da761a6e 27-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Bail early if we try to mmap an object too large to be mapped.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# fb7d516a 01-Oct-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: add accounting for mappable objects in gtt v2

More precisely: For those that _need_ to be mappable. Also add two
BUG_ONs in fault and pin to check the consistency of the mappable
flag.

Changes in v2:
- Add tracking of gtt mappable space (to notice mappable/unmappable
balancing issues).
- Improve the mappable working set tracking by tracking fault and pin
separately.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# ec57d260 30-Sep-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: add mappable to gem_object_bind tracepoint

This way we can make some more educated guesses as to why exactly
we can't use 2G apertures to their full potential ;)

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 53984635 22-Sep-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: use the complete gtt

At least the part that's currently enabled by the BIOS.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 16e809ac 16-Sep-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: unbind unmappable objects on fault/pin

In i915_gem_object_pin obviously unbind only if mappable is true.

This is the last part to enable gtt_mappable_end != gtt_size, which
the next patch will do.

v2: Fences on g33/pineview only work in the mappable part of the
gtt.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 920afa77 16-Sep-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: range-restricted bind_to_gtt

Like before add a parameter mappable (also to gem_object_pin) and
set it depending upon the context. Only bos that are brought into
the gtt due to an execbuffer call can be put into the unmappable
part of the gtt, everything else (especially pinned objects) need
to be put into the mappable part of the gtt.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# a6e0aa42 16-Sep-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: range-restricted eviction support

Add a mappable parameter to i915_gem_evict_something to distinguish
the two cases (non-restricted vs. mappable gtt allocations). No
functional changes because the mappable limit is set to the end of
the gtt currently.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 3cce469c 27-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Propagate error from failing to queue a request

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# b2223497 27-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove the confusing global waiting/irq seqno

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 7e318e18 27-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move object to GPU domains after dispatching execbuffer

In the event that we fail to dispatch the execbuffer, for example if
there is insufficient space on the ring, we were leaving the objects in
an inconsistent state. Notably they were marked as being in the GPU
write domain, but were not added to the ring or any list. This would
lead to inevitable oops:

[ 1010.522940] [drm:i915_gem_do_execbuffer] *ERROR* dispatch failed -16
[ 1010.523055] BUG: unable to handle kernel NULL pointer dereference at
0000000000000088
[ 1010.523097] IP: [<ffffffff8122d006>] i915_gem_flush_ring+0x26/0x140
[ 1010.523120] PGD 14cf2f067 PUD 14ce04067 PMD 0
[ 1010.523140] Oops: 0000 [#1] SMP
[ 1010.523154] last sysfs file: /sys/devices/virtual/vc/vcsa2/uevent
[ 1010.523173] CPU 0
[ 1010.523183] Pid: 716, comm: X Not tainted 2.6.36+ #34 LosLunas
CRB/SandyBridge Platform
[ 1010.523206] RIP: 0010:[<ffffffff8122d006>] [<ffffffff8122d006>]
i915_gem_flush_ring+0x26/0x140
[ 1010.523233] RSP: 0018:ffff88014bf97cd8 EFLAGS: 00010296
[ 1010.523249] RAX: ffff88014e2d1808 RBX: 0000000000000000 RCX: 0000000000000000
[ 1010.523270] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000000
[ 1010.523290] RBP: ffff88014e2d1000 R08: 0000000000000002 R09: 00000000400c645f
[ 1010.523311] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002
[ 1010.523331] R13: ffff88014e29a000 R14: 00000000000000c8 R15: ffffffff8162eb28
[ 1010.523352] FS: 00007fc62379d700(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000
[ 1010.523375] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1010.523392] CR2: 0000000000000088 CR3: 000000014bf87000 CR4: 00000000000406f0
[ 1010.523412] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1010.523433] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1010.523454] Process X (pid: 716, threadinfo ffff88014bf96000, task ffff88014cc1ee40)
[ 1010.523475] Stack:
[ 1010.523483] ffff88014d5199c0 0000000000000200 0000000000000000 ffff88014bcc6400
[ 1010.523509] <0> 0000000000000000 0000000000000001 ffff88014e29a000 ffff88014bcc6400
[ 1010.523537] <0> ffffffff8162eb28 ffffffff8122faa8 ffff88014e29a000 ffff88014bcc6400
[ 1010.523568] Call Trace:
[ 1010.523578] [<ffffffff8122faa8>] ? i915_gem_object_flush_gpu_write_domain+0x48/0x80
[ 1010.523601] [<ffffffff8122fb8e>] ? i915_gem_object_set_to_gtt_domain+0x2e/0xb0
[ 1010.523623] [<ffffffff8123113b>] ? i915_gem_set_domain_ioctl+0xdb/0x1f0
[ 1010.523644] [<ffffffff8120a3f1>] ? drm_ioctl+0x3d1/0x460
[ 1010.523660] [<ffffffff81231060>] ? i915_gem_set_domain_ioctl+0x0/0x1f0
[ 1010.523682] [<ffffffff81092618>] ? vma_prio_tree_insert+0x28/0x120
[ 1010.523701] [<ffffffff8109f379>] ? vma_link+0x99/0xf0
[ 1010.523717] [<ffffffff810a111d>] ? mmap_region+0x1ed/0x4f0
[ 1010.523734] [<ffffffff810c306f>] ? do_vfs_ioctl+0x9f/0x580
[ 1010.523750] [<ffffffff810c3599>] ? sys_ioctl+0x49/0x80
[ 1010.523767] [<ffffffff810022eb>] ? system_call_fastpath+0x16/0x1b
[ 1010.523785] Code: 00 00 00 00 00 41 57 89 ce 41 56 41 55 41 54 45 89 c4 55 48 89 fd 53 48 89 d3 44 89 c2 48 89 df 4c 8d b3 c8 00 00 00 48 83 ec 18 <ff> 93 88 00 00 00 48 8b 83 c8 00 00 00 4c 8b bd 30 03 00 00 48
[ 1010.523946] RIP [<ffffffff8122d006>] i915_gem_flush_ring+0x26/0x140
[ 1010.523966] RSP <ffff88014bf97cd8>
[ 1010.523977] CR2: 0000000000000088

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# e1f99ce6 26-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Propagate errors from writing to ringbuffer

Preparing the ringbuffer for adding new commands can fail (a timeout
whilst waiting for the GPU to catch up and free some space). So check
for any potential error before overwriting HEAD with new commands, and
propagate that error back to the user where possible.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 78501eac 26-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/ringbuffer: Drop the redundant dev from the vfunc interface

The ringbuffer keeps a pointer to the parent device, so we can use that
instead of passing around the pointer on the stack.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 3e4d3af5 26-Oct-2010 Peter Zijlstra <a.p.zijlstra@chello.nl>

mm: stack based kmap_atomic()

Keep the current interface but ignore the KM_type and use a stack based
approach.

The advantage is that we get rid of crappy code like:

#define __KM_PTE \
(in_nmi() ? KM_NMI_PTE : \
in_irq() ? KM_IRQ_PTE : \
KM_PTE0)

and in general can stop worrying about what context we're in and what kmap
slots might be appropriate for that.

The downside is that FRV kmap_atomic() gets more expensive.

For now we use a CPP trick suggested by Andrew:

#define kmap_atomic(page, args...) __kmap_atomic(page)

to avoid having to touch all kmap_atomic() users in a single patch.

[ not compiled on:
- mn10300: the arch doesn't actually build with highmem to begin with ]

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix up drivers/gpu/drm/i915/intel_overlay.c]
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 64193406 23-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move gpu_write_list to per-ring

... to prevent flush processing of an idle (or even absent) ring.

This fixes a regression during suspend from 87acb0a5.

Reported-and-tested-by: Alexey Fisher <bug-track@fisher-privat.net>
Tested-by: Peter Clifton <pcjc2@cam.ac.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# b6651458 23-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Invalidate the to-ring, flush the old-ring when updating domains

When the object has been written to by the gpu it remains on the ring
until its flush has been retired. However, when the object is moving to
the ring and the associated cache needs to be invalidated, we need to
perform the flush on the target ring, not the one it came from (which is
NULL in the reported case and so the flush was entirely absent).

Reported-by: Peter Clifton <pcjc2@cam.ac.uk>
Reported-and-tested-by: Alexey Fisher <bug-track@fisher-privat.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 878a3c37 22-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix flushing regression from 9af90d19f

Whilst moving the code around in 9af90d19f, I dropped the or'ing in of
new write domains which would zero out the write domain for a render
target if later reused as a source later in the batch. This meant that
we might drop a required flush before reading from the render target.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31043
Reported-by: xunx.fang@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 549f7365 19-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Enable SandyBridge blitter ring

Based on an original patch by Zhenyu Wang, this initializes the BLT ring for
SandyBridge and enables support for user execbuffers.

Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# b5dc608c 20-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Copy the updated reloc->presumed_offset back to the user

If the userspace driver is using a constant relocation array with a
static buffer, they will pass the same relocation array back to the
kernel. So we *do* need to update the presumed offset value in those
relocations to reflect the current object so that they remain correct
with future batchbuffers and we avoid the necessity of having to suspend
execution and perform redundant relocations.

Fixes the regression introduced by 12f889c for applications using
absolute addressing on trees of buffer (i.e. the current consumers of
libdrm_intel.so).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30996
Reported-by: Wang, Jinjin <jinjin.wang@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 69dc4987 19-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track objects in global active list (as well as per-ring)

To handle retirements, we need per-ring tracking of active objects.
To handle evictions, we need global tracking of active objects.

As we enable more rings, rebuilding the global list from the individual
per-ring lists quickly grows tiresome and overly complicated. Tracking the
active objects in two lists is the lesser of two evils.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 87acb0a5 19-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Simplify most HAS_BSD() checks

... by always initialising the empty ringbuffer it is always then safe
to check whether it is active.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 9af90d19 17-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: cache the last object lookup during pin_and_relocate()

The most frequent relocation within a batchbuffer is a contiguous sequence
of vertex buffer relocations, for which we can virtually eliminate the
drm_gem_object_lookup() overhead by caching the last handle to object
translation.

In doing so we refactor the pin and relocate retry loop out of
do_execbuffer into its own helper function and so improve the error
paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 1d7cfea1 17-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Do interrupible mutex lock first to avoid locking for unreference

One of the primarily consumers of the i915 driver is X, a large signal
driven application. Frequently when writing into the buffers, there is a
pending signal which causes us not to take the interruptible lock but
then we need to take that same lock around the object unreference. By
rearranging the code to do the interruptible lock as the first check, we
can avoid the frequent additional locking around the unreference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 4f27b75d 14-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: rearrange mutex acquisition for pread

... to avoid the double acquisition along fast[er] paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# fbd5a26d 14-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rearrange acquisition of mutex during pwrite

... to avoid reacquiring it to drop the object reference count on
exit. Note we have to make sure we now drop (and reacquire) the lock
around acquiring the mm semaphore on the slow paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# b5e4feb6 14-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Attempt to prefault user pages for pread/pwrite

... in the hope that it makes the atomic fast paths more likely.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 202f2fef 14-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid taking the mutex for dropping the refcnt upon creation

After allocation a handle for the fresh object, we know that we can
safely drop the refcnt without triggering a free so we do not need the
mutex. Strangely, this mutex acquisition is the one that appears on
driver profiles.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# f0c43d9b 13-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Perform relocations in CPU domain [if in CPU domain]

Avoid an early eviction of the batch buffer into the uncached GTT
domain, and so do the relocation fixup in cacheable memory.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 2549d6c2 13-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid vmallocing a buffer for the relocations

... perform an access validation check up front instead and copy them in
on-demand, during i915_gem_object_pin_and_relocate(). As around 20% of
the CPU overhead may be spent inside vmalloc for the relocation entries
when submitting an execbuffer [for x11perf -aa10text], the savings are
considerable and result in around a 10% throughput increase [for glyphs].

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# e59f2bac 07-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wait for pending flips on the GPU

Currently, if a batch buffer refers to an object with a pending flip,
then we sleep until that pending flip is completed (unpinned and
signalled). This is so that a flip can be queued and the user can
continue rendering to the backbuffer oblivious to whether the buffer is
still pinned as the scan out. (The kernel arbitrating at the last moment
to stall the batch and wait until the buffer is unpinned and replaced as
the front buffer.)

As we only have a queue depth of 1, we can simply wait for the current
pending flip to complete and continue rendering. We can achieve this
with a single WAIT_FOR_EVENT command inserted into the ring buffer prior
to executing the batch, *without* stalling the client.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 35b62a89 26-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Skip pread/pwrite if size to copy is 0.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 7dcd2499 26-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rephrase pwrite bounds checking to avoid any potential overflow

... and do the same for pread.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org


# ce9d419d 26-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Sanity check pread/pwrite

Move the access control up from the fast paths, which are no longer
universally taken first, up into the caller. This then duplicates some
sanity checking along the slow paths, but is much simpler.
Tracked as CVE-2010-2962.

Reported-by: Kees Cook <kees@ubuntu.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org


# 929f49bf 02-Oct-2010 Julia Lawall <julia@diku.dk>

drivers/gpu/drm/i915/i915_gem.c: Add missing error handling code

Extend the error handling code with operations found in other nearby error
handling code

A simplified version of the sematic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r exists@
@r@
statement S1,S2,S3;
constant C1,C2,C3;
@@

*if (...)
{... S1 return -C1;}
...
*if (...)
{... when != S1
return -C2;}
...
*if (...)
{... S1 return -C3;}
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org


# 1cdf7fef 02-Oct-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Don't mask the return code whilst relocating.

The return from move_to_gtt_domain() may indicate a pending signal which
needs to handled as opposed to an actual error, for instance, so report
the original return value rather than forcing an EINVAL.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 069efc1d 30-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Clear fence registers on GPU reset

When the GPU is reset, the fence registers are invalidated, so release
the objects and clear them out.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 812ed492 30-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Force the domain to CPU on unbinding whilst wedged.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30083
Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 73aa808f 30-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm: Move the GTT accounting to i915

Only drm/i915 does the bookkeeping that makes the information useful,
and the information maintained is driver specific, so move it out of the
core and into its single user.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@redhat.com>


# 29d08b3e 27-Sep-2010 Dave Airlie <airlied@redhat.com>

drm/gem: handlecount isn't really a kref so don't make it one.

There were lots of places being inconsistent since handle count
looked like a kref but it really wasn't.

Fix this my just making handle count an atomic on the object,
and have it increase the normal object kref.

Now i915/radeon/nouveau drivers can drop the normal reference on
userspace object creation, and have the handle hold it.

This patch fixes a memory leak or corruption on unload, because
the driver had no way of knowing if a handle had been actually
added for this object, and the fbcon object needed to know this
to clean itself up properly.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# f394940b 29-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove redundant deletion of obj->gpu_write_list

At that point as the object is no longer in any GPU write domain it must
not be on the list, so the list_del() is redundant.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 5cdf5881 27-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make get/put pages static

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 23bc5982 29-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/debug: Convert i915_verify_active() to scan all lists

... and check more regularly.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 891b48cf 28-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid blocking the kworker thread on a stuck mutex

Just reschedule the retire requests again if the device is currently
busy. The request list will be pruned along other paths so will never
grow unbounded and so we can afford to miss the occasional pruning.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 3d2a812a 29-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/debug: Remove default WATCH_BUF

Replaced by tracepoints.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 97d1ebaf 29-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915/debug: Remove defunct WATCH_LRU

This has bitrotted through inuse and superseded by tracing and debugfs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# a56ba56c 28-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

Revert "drm/i915: Drop ring->lazy_request"

With multiple rings generating requests independently, the outstanding
requests must also be track independently.

Reported-by: Wang Jinjin <jinjin.wang@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30380
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# ced270fa 26-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Ensure that the mode change flushing is currently uninterruptible

Introduced by 48b956c5, I had thought I had already fixed this. Oh well.

Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 1c25595f 26-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Convert the file mutex into a spinlock

Daniel Vetter pointed out that in this case is would be clearer and
cleaner to use a spinlock instead of a mutex to protect the per-file
request list manipulation. Make it so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 76c1dec1 25-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Make the mutex_lock interruptible on ioctl paths

... and combine it with the wedged completion handler.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 30dbf0c0 25-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Adjust hangcheck EIO semantics

Owain Ainsworth reported an issue between the interaction of the
hangcheck and userspace immediately (and permanently) falling back to
s/w rasterisation. In order to break the mutex and begin resetting the
GPU, we must abort the current operation (usually within the wait) and
climb sufficiently far back up the call chain to drop the mutex. In his
implementation, Owain has a loop within the ioctl handler to detect the
hang and then sleep until the error handler has run. I've chosen to
return to userspace and report an EAGAIN which should trigger the
userspace ioctl handler to repeat the call (simply because it felt less
invasive...). Before hitting a wedged GPU, we then wait upon completion
of the error handler.

Reported-by: Owain G. Ainsworth <zerooa@googlemail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# f787a5f5 24-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only hold a process-local lock whilst throttling.

Avoid cause latencies in other clients by not taking the global struct
mutex and moving the per-client request manipulation a local per-client
mutex. For example, this allows a compositor to schedule a page-flip
(through X) whilst an OpenGL application is monopolising the GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# e6c3a2a6 23-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use an uninterruptible wait for page-flips during modeset

We need to drain the pending flips prior to disabling the pipe during
modeset, and these need to be done in an uninterruptible fashion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 20f0cd55 23-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove the broken flush_ring from page-flip

This is already performed with the pipelined flush, so by the time we
schedule the flush in the page-flip, the ring is NULL and we OOPs
instead.

Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 9b74f734 22-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix 945GM regression in e259befd

A minor typo caused a single fence register to be incorrectly
programmed, resulting in occassional tiling corruption.

Reported-and-tested-by: Hans de Bruin <bruinjm@xs4all.nl>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=18962
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org


# 5c12a07e 22-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop ring->lazy_request

We are not currently using it as intended, so remove the complication.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# dfaae392 22-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Clear the gpu_write_list on resetting write_domain upon hang

Otherwise we will hit a list handling assertion when moving the object
to the inactive list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 9e0ae534 21-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Don't overwrite the returned error-code

During i915_gem_create_mmap_offset() if the subsystem reports an error
code, use it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# f13d3f73 20-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track pinned objects

Keep a list of pinned objects and display it via debugfs. Now all
objects that exist in the GTT are always tracked on one of the
active, flushing, inactive or pinned lists.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 265db958 20-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drain any pending flips on the fb prior to unpinning

If we have queued a page flip on the current fb and then request a mode
change, wait until the page flip completes before performing the new
request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# c78ec30b 19-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Merge ring flushing and lazy requests

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 53640e1d 20-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track gpu fence usage

Track if the gpu requires the fence for the execution of a batch buffer
and so only wait upon the retirement of the object's last rendering
seqno if the fence is in use by the GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# c7f9f9a8 19-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use ring->flush() instead of MI_FLUSH

Use the ring abstraction to hide the details of having choose the
appropriate flushing method.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 5c1143bb 15-Sep-2010 Xiang, Haihao <haihao.xiang@intel.com>

drm/i915: do not export the instances of struct intel_ring_buffer

Introduce intel_init_render_ring_buffer(), intel_init_bsd_ring_buffer
for ring initialization.

Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 77f01230 18-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Clear GPU read domains on reset

Clear the GPU read domain for the inactive objects on a reset so that
they are correctly invalidated on reuse.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 9375e446 18-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Clear flushing lists on GPU reset

Owain Ainsworth noticed that the reset code failed to clear the flushing
list leaving the driver in an inconsistent state following a hung GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 9220434a 18-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only emit a flush request on the active ring.

When flushing the GPU domains,we emit a flush on *both* rings, even
though they share a unified cache. Only emit the flush on the currently
active ring.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# b84d5f0c 17-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Inline i915_gem_ring_retire_request()

Change the semantics to retire any buffer older than the current seqno
rather than repeatedly calling calling the function to retire the
buffer at the head of the list matching the request seqno.

Whilst this should have no semantic impact on the implementation, Daniel
was wondering if there was a bug where we might miss a retirement and so
end up with a continually growing active list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# a6c45cf0 16-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: INTEL_INFO->gen supercedes i8xx, i9xx, i965g

Avoid confusion between i965g meaning broadwater and the gen4+ chipset
families.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# e259befd 16-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix Sandybridge fence registers

With 5 places to update when adding handling for fence registers, it is
easy to overlook one or two. Correct that oversight, but fence
management should be improved before a new set of registers is added.

Bugzilla: https://bugs.freedesktop.org/show_bug?id=30199
Original patch by: Yuanhan Liu <yuanhan.liu@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org


# 2b6efaa4 14-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove unused intel_ringbuffer->ring_flag

This can always be re-added should somebody find a use...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 2cf34d7b 14-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Allow get_fence_reg() to be uninterruptible

As we currently may need to acquire a fence register during a modeset,
we need to be able to do so in an uninterruptible manner. So expose that
parameter to the callers of the fence management code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 48b956c5 13-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Push pipelining of display plane flushes to the caller

This ensures that we do wait upon the flushes to complete if necessary
and avoid the visual tears, whilst enabling pipelined page-flips.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 0bc23aad 14-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix regression in ba3d8d749b

I pulled the wrong version of the patch from Daniel Vetter which was
missing the read barriers -- and the one that was causing all the trouble
was from i915_gem_object_put_fence_reg(), leading to GPU hangs on gen3.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 7213342d 13-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Consolidate flushing the display plane

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# b3b079db 13-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reduce hangcheck frequency

By reducing the hangcheck frequency we check less often, conserving
resources, and still detect a lock up quickly. On a fast machine with a
slow GPU (like a Core2 paired with a 945G) it is easy for the hangcheck to
misfire as we check too fast.

Also once hung and if we fail to completely reset the chip, we have a
nasty habit of proclaming a hang many times a second and generating a
strobe-like display.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 995b6762 20-Aug-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Quieten sparse warnings for missing prototypes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# de227ef0 03-Jul-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Kill the active list spinlock

This spinlock only served debugging purposes in a time when we could not
be sure of the mutex ever being released upon a GPU hang. As we now
should be able rely on hangcheck to do the job for us (and that error
reporting should not itself require the struct mutex) we can kill the
incomplete attempt at protection.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 8dc5d147 11-Aug-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Preallocate requests

By allocating the request prior to writing to the ringbuffer, we can
abort the operation without leaving the GPU in an inconsistent state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>


# 4fc6ee76 11-Feb-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: drop i915_add_request right in front of i915_wait_request

... take advantage of the new implicit request issuing of
i915_wait_request.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# ba3d8d74 11-Feb-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: move the wait_rendering call into flush_gpu_write_domain

One caller (for the pageflip support) wants a purely pipelined flush.
Distinguish this case by a new parameter. This will also be useful
later on for pipelined fencing.

v2: Simplify the code by depending upon the implicit request emitting
of i915_wait_request.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[ickle: And drop the non-interruptible support in the process.]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 617dbe27 11-Feb-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: drop seqno argument from i915_gem_object_move_to_active

By moving one i915_add_request we can solely depend on the new
auto-seqno-numbering behaviour.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 86394c66 02-Feb-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: kill a no longer necessary BUG_ON

i915_gem_object_move_to_active can handle zero seqno for us now.
And not emitting a request is not fatal here - we'll try to emit
a new one if we have to wait for some rendering to complete.

In case this assumption ever gets accidentally broken, there's already
a BUG_ON to catch it in i915_do_wait_request.

So just silently ignore ENOMEM here instead of screwing up the whole
drm.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 8a1a49f9 11-Feb-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: move flushing list processing to i915_retire_commands

... instead of threading flush_domains through the execbuf code to
i915_add_request.

With this change 2 small cleanups are possible (likewise the majority
of the patch):

- The flush_domains parameter of i915_add_request is always 0. Drop it
and the corresponding logic.
- Ditto for the seqno param of i915_gem_process_flushing_list.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# a6910434 02-Feb-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: only one interrupt per batchbuffer is not enough!

Previously I thought that one interrupt per batchbuffer should be
enough. Now tedious benchmarking showed this to be wrong.

Therefore track whether any commands have been isssued with a future
seqno (like pipelined fencing changes or flushes). If this is the case
emit a request before issueing the batchbuffer.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 8bff917c 11-Feb-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: move flushing list processing to i915_gem_flush

Now that we can move objects to the active list without already having
emitted a request, move the flushing list handling into i915_gem_flush.
This makes more sense and allows to drop a few i915_add_request calls
that are not strictly necessary.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# e35a41de 11-Feb-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: allow lazy emitting of requests

Sometimes (like when flushing in preparation of batchbuffer execution)
we know that we'll emit a request but haven't yet done so. Allow this
case by simply taking the next seqno by default. Ensure that a request
is eventually emitted before waiting for an request by issuing it
in i915_wait_request iff this is not yet done.

Also replace one open-coded version of i915_gem_object_wait_rendering,
to prevent future code-diversion.

Chris Wilson asked me to explain and clarify what this patch does and why.
Here it goes:

Old way of moving objects onto the active list and associating them with a
reques:

1. i915_add_request + store the returned seqno somewhere
2. i915_gem_object_move_to_active (with the stored seqno as parameter)

For the current users, this is all fine. But I'd like to associate objects
(and fence regs) with the batchbuffer request deep down in the execbuf
call-chain. I thought about three ways of implementing this.

a) Don't care, just emit request when we need a new seqno. When heavily
pipelining fence reg changes, this would have caused tons of superflous
request (and corresponding irqs).

b) Thread all changed fences, objects, whatever through the execbuf-maze,
so that when we emit a request, we can store the new seqno at all the right
places.

c) Kill that seqno-threading-around business by simply storing the next
seqno, i.e. allow 2. to be done before 1. in the above sequence.

I've decided to implement c) (in this patch). The following patches are
just fall-out that resulted from this small conceptual change.

* We can handle the flushing list processing where we actually emit a flush
(i915_gem_flush and i915_retire_commands) instead of in i915_add_request.
The code makes IMHO more sense this way (and i915_add_request looses the
flush_domains parameter, obviously).

* We can avoid emitting unnecessary requests. IMHO there's no point in
emitting more than one request per batchbuffer (with or without an
corresponding irq).

* By enforcing 2. before 1. ordering in the above sequence the seqno
argument of i915_gem_object_move_to_active is redundant and can be
dropped.

v2: Now i915_wait_request issues request if it is not yet emitted.
Also introduce i915_gem_next_request_seqno(dev) just in case we ever
need to do some prep work before using a new seqno.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[ickle: Keep i915_gem_object_set_to_display_plane() uninterruptible.]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 75ef9da2 20-Aug-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: unload: fix retire_work races

ums-gem code correctly cancels the retire work (at lastclose time),
kms does not do so. Fix this by canceling the work right after ideling
the gpu.

While staring at the code I noticed that the work function is not
static. Fix this, too.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# bc0c7f14 20-Aug-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: unload: fix error_work races

This is the first patch to clean up module unload races due to
outstanding timers/work. Preparatory step: Thou shalt not destroy
the workqueue when new work might still get enqued.

Now error_work gets queued by the hangcheck timer and only (atomically)
reads the chip wedged status. So cancel it right after the hangcheck
timer is killed. But the hangcheck is armed by interrupts, so move
everything after irqs are disabled.

Also change a del_timer to a del_timer_sync in the ums gem code, the
hangcheck timer is self-rearming.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# f8f235e5 26-Aug-2010 Zhenyu Wang <zhenyuw@linux.intel.com>

agp/intel: Fix cache control for Sandybridge

Sandybridge GTT has new cache control bits in PTE, which controls
graphics page cache in LLC or LLC/MLC, so we need to extend the mask
function to respect the new bits.

And set cache control to always LLC only by default on Gen6.

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: stable@kernel.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# c877cdce 23-Jun-2010 Dan Carpenter <error27@gmail.com>

i915: return -EFAULT if copy_to_user fails

copy_to_user() returns the number of bytes remaining to be copied and
I'm pretty sure we want to return a negative error code here.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org


# 1dfd9754 06-Sep-2010 Chris Wilson <chris@chris-wilson.co.uk>

Revert "drm/i915: Unreference object not handle on creation"

This reverts commit 86f100b136626e91f4f66f3776303475e2e58998.

The kref API requires the handlecount to be initialised to one on object
creation (so that kref_get() doesn't complain upon first use) so the
dalliance in the drivers is required in order to sink the initial
floating reference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org


# 156dadc1 15-Aug-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove the conflicting BUG_ON()

We now attempt to free "active" objects following a GPU hang as either
the GPU will be reset or the hang is permenant. In either case, the GPU
writes will not be flushed to main memory and it should be safe to
return that memory back to the system.

The BUG_ON(active) is thus overkill and can erroneously fire after a
EIO.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# bf79cb91 04-Aug-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm: Use ENOENT consistently for the error return for an unmatched handle.

This is consistent with trying to access a filename that not exist
within a directory which is a good analogy here. The main reason for the
change is that it is easy to confuse the error code of EBADF as an
performing an ioctl on an invalid file descriptor (rather than an
unknown object).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 6eeefaf3 07-Aug-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Apply i830 errata for cursor alignment

i830 requires 32bpp cursors to be aligned to 16KB, so we have to expose
the alignment parameter to i915_gem_attach_phys_object().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# ae9fed6b 07-Aug-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Truncate the shmem backing pages on purge

shmfs doesn't actually implement i_ops->truncate() so we were not
immedatiately releasing the backing pages when shrinking the gfx cache
under OOM. Instead use a combination of truncate_inode_pages() and
i_ops->truncate_range() as is used by shmem_delete_inode().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 7d1c4804 07-Aug-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Maintain LRU order of inactive objects upon access by CPU (v2)

In order to reduce the penalty of fallbacks under memory pressure and to
avoid a potential immediate ping-pong of evicting a mmaped buffer, we
move the object to the tail of the inactive list when a page is freshly
faulted or the object is moved into the CPU domain.

We choose not to protect the CPU objects from casual eviction,
preferring to keep the GPU active for as long as possible.

v2: Daniel Vetter found a bug where I forgot that pinned objects are
kept off the inactive list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# b47eb4a2 07-Aug-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Move the eviction logic to its own file.

The eviction code is the gnarly underbelly of memory management, and is
clearer if kept separated from the normal domain management in GEM.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 6f392d54 07-Aug-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use a common seqno for all rings.

This will be used by the eviction logic to maintain fairness between the
rings.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 0108a3ed 07-Aug-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: prepare for fair lru eviction

This does two little changes:

- Add an alignment parameter for evict_something. It's not really great to
whack a carefully sized hole into the gtt with the wrong alignment.
Especially since the fallback path is a full evict.

- With the inactive scan stuff we need to evict more that one object, so
move the unbind call into the helper function that scans for the object
to be evicted, too. And adjust its name.

No functional changes in this patch, just preparation.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# bf1a1092 07-Aug-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Append the object onto the inactive list on binding.

In order to properly track bound objects, they need to exist on one of
the inactive/active lists or be pinned. As this is a requirement, do the
work inside i915_gem_bind_to_gtt() rather than dotted around the
callsites.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# ae7d49d8 03-Aug-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Emit a backtrace if we attempt to rebind a pinned buffer

This debugging trace was useful for finding the fbcon regression on
i965, and it may prove useful again in future.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 0be555b6 04-Aug-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: report all active objects as busy

Incorporates a similar patch by Daniel Vetter, the alteration being to
report the current busy state after retiring.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 88f356b7 04-Aug-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only emit flushes on active rings.

This avoids the excess flush and requests on idle rings (and spamming
the debug log ;-)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# fca3ec01 04-Aug-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm,io-mapping: Specify slot to use for atomic mappings

This is required should we ever attempt to use an io-mapping where
KM_USER0 is verboten, such as inside an IRQ context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 86f100b1 24-Jul-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Unreference object not handle on creation

When creating an object, we create the handle by which it is known to
the process and which own the reference to the object. That reference to
the new handle is what we want to transfer to the process, not the lost
reference to the object; so free the local object reference *not* the
process's handle reference.

This brings i915_gem_object_create_ioctl() into line with
drm_gem_open_ioctl()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 8dc1775d 23-Jul-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Attempt to uncouple object after catastrophic failure in unbind

If we fail to flush outstanding GPU writes but return the memory to the
system, we risk corrupting memory should the GPU recovery and complete
those writes. On the other hand, if we bail early and free the object
then we have a definite use-after-free and real memory corruption.
Choose the lesser of two evils, since in order to recover from the hung
GPU we need to completely reset it, those pending writes should
never happen.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# be72615b 23-Jul-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Repeat unbinding during free if interrupted (v6)

If during the freeing of an object the unbind is interrupted by a system
call, which is quite possible if we have outstanding GPU writes that
must be flushed, the unbind is silently aborted. This still leaves the
AGP region and backing pages allocated, and perhaps more importantly,
the object remains upon the various lists exposing us to memory
corruption.

I think this is the cause behind the use-after-free, such as

Bug 15664 - Graphics hang and kernel backtrace when starting Azureus
with Compiz enabled
https://bugzilla.kernel.org/show_bug.cgi?id=15664

v2: Daniel Vetter reminded me that kernel space programming is never easy.
We cannot simply spin to clear the pending signal and so must deferred
the freeing of the object until later.
v3: Run from the top level retire requests.
v4: Tested with P(return -ERESTARTSYS)=.5 from i915_gem_do_wait_request()
v5: Rebase against Eric's for-linus tree.
v6: Refactor, split and add a comment about avoiding unbounded recursion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>


# b09a1fec 23-Jul-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refactor i915_gem_retire_requests()

Combine the iteration over active render rings into a common function.
This is in preparation for reusing the idle function to also retire
deferred free requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 2dafb1e0 07-Jun-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Propagate error from i915_gem_object_flush_gpu_write_domain()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 5f35308b 07-Jun-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Propagate error from drm_install_irq() during EnterVT

Simple fix for error propagation along the old UMS path.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 43b27f40 02-Jul-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Explosion following OOM in do_execbuffer.

Oops, when merging the extra details following an OOM, I missed that
driver_private is now NULL and the correct way to convert from the
drm_gem_object into the drm_i915_gem_object is to use to_intel_bo().

BUG: unable to handle kernel NULL pointer dereference at 00000069
IP: [<c11a4a02>] i915_gem_do_execbuffer+0x71f/0xbb6
*pde = 00000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/vc/vcsa3/uevent

Pid: 10993, comm: X Not tainted 2.6.35-rc2+ #67 /
EIP: 0060:[<c11a4a02>] EFLAGS: 00213202 CPU: 0
EIP is at i915_gem_do_execbuffer+0x71f/0xbb6
EAX: f647e8a8 EBX: 00000000 ECX: 00000003 EDX: 00000000
ESI: 00424000 EDI: 00000000 EBP: f6508e48 ESP: f6508dd4
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process X (pid: 10993, ti=f6508000 task=f6432880 task.ti=f6508000)
Stack:
f6508de0 f7130000 00000001 00000000 00000000 f647e8a8 00000000 f64f8480
<0> f7974414 00000000 00000006 00000000 00000000 f6578000 00000008 00000006
<0> f6797880 00400000 00000000 ffffffe4 f7974400 000000d0 000000d0 000001c0
Call Trace:
[<c11a4f3a>] ? i915_gem_execbuffer2+0xa1/0xe7
[<c118ab96>] ? drm_ioctl+0x22c/0x2fa
[<c11a4e99>] ? i915_gem_execbuffer2+0x0/0xe7
[<c107e88c>] ? do_sync_read+0x8f/0xca
[<c1088cbd>] ? vfs_ioctl+0x2c/0x96
[<c118a96a>] ? drm_ioctl+0x0/0x2fa
[<c10891f4>] ? do_vfs_ioctl+0x429/0x45a
[<c107e5c9>] ? fsnotify_access+0x54/0x5f
[<c107ee1c>] ? vfs_read+0x9a/0xae
[<c1089258>] ? sys_ioctl+0x33/0x4d
[<c1002610>] ? sysenter_do_call+0x12/0x26
Code: d0 89 4d c4 31 c9 89 45 d8 eb 44 8b 45 cc 8b 14 88 8b 42 50 89 45
bc 8b 45 a0 8b 52 38 89 55 d0 31 d2 f6 40 20 01 74 0d 8b 55 bc <f6> 42
69 30 0f 95 c2 0f b6 d2 8b 45 d0 c7 45 d4 00 00 00 00 89
EIP: [<c11a4a02>] i915_gem_do_execbuffer+0x71f/0xbb6 SS:ESP 0068:f6508dd4
CR2: 0000000000000069
---[ end trace 3f1d514b34d39381 ]---

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 94400120 19-Jul-2010 Dave Airlie <airlied@redhat.com>

drm/i915: enable low power render writes on GEN3 hardware.

A lot of 945GMs have had stability issues for a long time, this manifested as X hangs, blitter engine hangs, and lots of crashes.

one such report is at:
https://bugs.freedesktop.org/show_bug.cgi?id=20560

along with numerous distro bugzillas.

This only took a week of digging and hair ripping to figure out.

Tracked down and tested on a 945GM Lenovo T60,
previously running
x11perf -copypixwin500
or
x11perf -copywinpix500
repeatedly would cause the GPU to wedge within 4 or 5 tries, with random busy bits set.

After this patch no hangs were observed.

cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 7f8275d0 18-Jul-2010 Dave Chinner <dchinner@redhat.com>

mm: add context argument to shrinker callback

The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# cd9f040d 18-Jul-2010 Linus Torvalds <torvalds@linux-foundation.org>

drm/i915: add 'reclaimable' to i915 self-reclaimable page allocations

The hibernate issues that got fixed in commit 985b823b9192 ("drm/i915:
fix hibernation since i915 self-reclaim fixes") turn out to have been
incomplete. Vefa Bicakci tested lots of hibernate cycles, and without
the __GFP_RECLAIMABLE flag the system eventually fails to resume.

With the flag added, Vefa can apparently hibernate forever (or until he
gets bored running his automated scripts, whichever comes first).

The reclaimable flag was there originally, and was one of the flags that
were dropped (unintentionally) by commit 4bdadb978569 ("drm/i915:
Selectively enable self-reclaim") that introduced all these problems,
but I didn't want to just blindly add back all the flags in commit
985b823b9192, and it looked like __GFP_RECLAIM wasn't necessary. It
clearly was.

I still suspect that there is some subtle reason we're missing that
causes the problems, but __GFP_RECLAIMABLE is certainly not wrong to use
in this context, and is what the code historically used. And we have no
idea what the causes the corruption without it.

Reported-and-tested-by: M. Vefa Bicakci <bicave@superonline.com>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# db3307a9 02-Jul-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm: kill drm_mm_node->private

Only ever assigned, never used.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[glisse: I will re-add if needed for range-restricted allocations]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 6f772d7e 02-Jul-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Explosion following OOM in do_execbuffer.

Oops, when merging the extra details following an OOM, I missed that
driver_private is now NULL and the correct way to convert from the
drm_gem_object into the drm_i915_gem_object is to use to_intel_bo().

BUG: unable to handle kernel NULL pointer dereference at 00000069
IP: [<c11a4a02>] i915_gem_do_execbuffer+0x71f/0xbb6
*pde = 00000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/vc/vcsa3/uevent

Pid: 10993, comm: X Not tainted 2.6.35-rc2+ #67 /
EIP: 0060:[<c11a4a02>] EFLAGS: 00213202 CPU: 0
EIP is at i915_gem_do_execbuffer+0x71f/0xbb6
EAX: f647e8a8 EBX: 00000000 ECX: 00000003 EDX: 00000000
ESI: 00424000 EDI: 00000000 EBP: f6508e48 ESP: f6508dd4
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process X (pid: 10993, ti=f6508000 task=f6432880 task.ti=f6508000)
Stack:
f6508de0 f7130000 00000001 00000000 00000000 f647e8a8 00000000 f64f8480
<0> f7974414 00000000 00000006 00000000 00000000 f6578000 00000008 00000006
<0> f6797880 00400000 00000000 ffffffe4 f7974400 000000d0 000000d0 000001c0
Call Trace:
[<c11a4f3a>] ? i915_gem_execbuffer2+0xa1/0xe7
[<c118ab96>] ? drm_ioctl+0x22c/0x2fa
[<c11a4e99>] ? i915_gem_execbuffer2+0x0/0xe7
[<c107e88c>] ? do_sync_read+0x8f/0xca
[<c1088cbd>] ? vfs_ioctl+0x2c/0x96
[<c118a96a>] ? drm_ioctl+0x0/0x2fa
[<c10891f4>] ? do_vfs_ioctl+0x429/0x45a
[<c107e5c9>] ? fsnotify_access+0x54/0x5f
[<c107ee1c>] ? vfs_read+0x9a/0xae
[<c1089258>] ? sys_ioctl+0x33/0x4d
[<c1002610>] ? sysenter_do_call+0x12/0x26
Code: d0 89 4d c4 31 c9 89 45 d8 eb 44 8b 45 cc 8b 14 88 8b 42 50 89 45
bc 8b 45 a0 8b 52 38 89 55 d0 31 d2 f6 40 20 01 74 0d 8b 55 bc <f6> 42
69 30 0f 95 c2 0f b6 d2 8b 45 d0 c7 45 d4 00 00 00 00 89
EIP: [<c11a4a02>] i915_gem_do_execbuffer+0x71f/0xbb6 SS:ESP 0068:f6508dd4
CR2: 0000000000000069
---[ end trace 3f1d514b34d39381 ]---

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 985b823b 01-Jul-2010 Linus Torvalds <torvalds@linux-foundation.org>

drm/i915: fix hibernation since i915 self-reclaim fixes

Since commit 4bdadb9785696439c6e2b3efe34aa76df1149c83 ("drm/i915:
Selectively enable self-reclaim"), we've been passing GFP_MOVABLE to the
i915 page allocator where we weren't before due to some over-eager
removal of the page mapping gfp_flags games the code used to play.

This caused hibernate on Intel hardware to result in a lot of memory
corruptions on resume. See for example

http://bugzilla.kernel.org/show_bug.cgi?id=13811

Reported-by: Evengi Golov (in bugzilla)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Tested-by: M. Vefa Bicakci <bicave@superonline.com>
Cc: stable@kernel.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ab34c226 27-May-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix up address spaces in slow_kernel_write()

Since we now get_user_pages() outside of the mutex prior to performing
the copy, we kmap() the page inside the copy routine and so need to
perform an ordinary memcpy() and not copy_from_user().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 99a03df5 27-May-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Use non-atomic kmap for slow copy paths

As we do not have a requirement to be atomic and avoid sleeping whilst
performing the slow copy for shmem based pread and pwrite, we can use
kmap instead, thus simplifying the code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 9b8c4a0b 27-May-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid moving from CPU domain during pwrite

We can avoid an early clflush when pwriting if we use the current CPU
write domain rather than moving the object to the GTT domain for the
purposes of the pwrite. This has the advantage of not flushing the
presumably hot data that we want to upload into the bo, and of ascribing
the clflush to the execution when profiling.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 68f95ba9 27-May-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Cleanup after failed initialization of ringbuffers

The callers expect us to cleanup any partially initialised structures
before reporting the error.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 654fc607 27-May-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Reject bind_to_gtt() early if object > aperture

If the object is bigger than the entire aperture, reject it early
before evicting everything in a vain attempt to find space.

v2: Use E2BIG as suggested by Owain G. Ainsworth.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Signed-off-by: Eric Anholt <eric@anholt.net>


# 3d1cc470 27-May-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove spurious warning "Failure to install fence"

This particular warning is harmless as we emit during the normal
pinning process where the batch buffer requires more fences than is
available without eviction. Only if we fail to evict enough fences does
this become a problem, so include the requested number of fences in the
ultimate *error* message.

v2: Remember to compile test even trial patches to remove warnings.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# ac0c6b5a 27-May-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Rebind bo if currently bound with incorrect alignment.

Whilst pinning the buffer, check that that its current alignment
matches the requested alignment. If it does not, rebind.

This should clear up any final render errors whilst resuming,
for reference:

Bug 27070 - [i915] Page table errors with empty ringbuffer
https://bugs.freedesktop.org/show_bug.cgi?id=27070

Bug 15502 - render error detected, EIR: 0x00000010
https://bugzilla.kernel.org/show_bug.cgi?id=15502

Bug 13844 - i915 error: "render error detected"
https://bugzilla.kernel.org/show_bug.cgi?id=13844

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Signed-off-by: Eric Anholt <eric@anholt.net>


# 808b24d6 27-May-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Propagate error from unbinding an unfenceable object.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>


# b118c1e3 27-May-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid nesting of domain changes when setting display plane

Nesting domain changes will cause confusion when trying to interpret the
tracepoints describing the sequence of changes for the object, as well
as obscuring the order of operations for the reader of the code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 778c3544 13-May-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: combine all small integers into one single bitfield

This saves a whooping 7 dwords. Zero functional changes. Because
some of the refcounts are rather tightly calculated, I've put
BUG_ONs in the code to check for overflows.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>


# d1b851fc 20-May-2010 Zou Nan hai <nanhai.zou@intel.com>

drm/i915: implement BSD ring buffer V2

The BSD (bit stream decoder) ring is used for accessing the BSD engine
which decodes video bitstream for H.264 and VC1 on G45+. It is
asynchronous with the render ring and has access to separate parts of
the GPU from it, though the render cache is coherent between the two.

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Xiang Hai hao <haihao.xiang@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 852835f3 20-May-2010 Zou Nan hai <nanhai.zou@intel.com>

drm/i915: convert some gem structures to per-ring V2

The active list and request list move into the ringbuffer structure,
so each can track its active objects in the order they are in that
ring. The flushing list does not, as it doesn't matter which ring
caused data to end up in the render cache. Objects gain a pointer to
the ring they are active on (if any).

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Xiang Hai hao <haihao.xiang@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 8187a2b7 20-May-2010 Zou Nan hai <nanhai.zou@intel.com>

drm/i915: introduce intel_ring_buffer structure (V2)

Introduces a more complete intel_ring_buffer structure with callbacks
for setup and management of a particular ringbuffer, and converts the
render ring buffer consumers to use it.

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Xiang Hai hao <haihao.xiang@intel.com>
[anholt: Fixed up whitespace fail and rebased against prep patches]
Signed-off-by: Eric Anholt <eric@anholt.net>


# d3301d86 21-May-2010 Eric Anholt <eric@anholt.net>

drm/i915: Rename dev_priv->ring to dev_priv->render_ring.

With the advent of the BSD ring, be clear about which ring this is.
The docs are pretty consistent with calling this the Render engine at
this point.


# 62fdfeaf 21-May-2010 Eric Anholt <eric@anholt.net>

drm/i915: Move ringbuffer-related code to intel_ringbuffer.c.

This is preparation for supporting multiple ringbuffers on Ironlake.
The non-copy-and-paste changes are:
- de-staticing functions
- I915_GEM_GPU_DOMAINS moving to i915_drv.h to be used by both files.
- i915_gem_add_request had only half its implementation
copy-and-pasted out of the middle of it.


# 007cc8ac 28-Apr-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: move fence lru to struct drm_i915_fence_reg

This lru tracks fences, not objects, so move it to where it belongs.
As a side effect, this nicely shrinks drm_i915_gem_object by two
pointers.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 1637ef41 20-Apr-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Wait for the GPU whilst shrinking, if truly desperate.

By idling the GPU and discarding everything we can when under extreme
memory pressure, the number of OOM-killer events is dramatically
reduced. For instance, this makes it possible to run
firefox-planet-gnome.trace again on my swapless 512MiB i915.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 1918ad77 23-Apr-2010 Jesse Barnes <jbarnes@virtuousgeek.org>

drm/i915: fix non-Ironlake 965 class crashes

My PIPE_CONTROL fix (just sent via Eric's tree) was buggy; I was
testing a whole set of patches together and missed a conversion to the
new HAS_PIPE_CONTROL macro, which will cause breakage on non-Ironlake
965 class chips. Fortunately, the fix is trivial and has been tested.

Be sure to use the HAS_PIPE_CONTROL macro in i915_get_gem_seqno, or
we'll end up reading the wrong graphics memory, likely causing hangs,
crashes, or worse.

Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Tested-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e552eb70 21-Apr-2010 Jesse Barnes <jbarnes@virtuousgeek.org>

drm/i915: use PIPE_CONTROL instruction on Ironlake and Sandy Bridge

Since 965, the hardware has supported the PIPE_CONTROL command, which
provides fine grained GPU cache flushing control. On recent chipsets,
this instruction is required for reliable interrupt and sequence number
reporting in the driver.

So add support for this instruction, including workarounds, on Ironlake
and Sandy Bridge hardware.

https://bugs.freedesktop.org/show_bug.cgi?id=27108

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# a8089e84 09-Apr-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: drop pointer to drm_gem_object

Luckily the change is quite a little bit less invasive than I've
feared.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 62b8b215 09-Apr-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: don't use ->driver_private anymore

Thanks to the to_intel_bo helper, this change is rather trivial.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# c397b908 09-Apr-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: embed the gem object into drm_i915_gem_object

Just embed it and adjust the pointers, No other changes (that's
for later patches).

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# ac52bc56 09-Apr-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: introduce i915_gem_alloc_object

Just preparation, no functional change.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# fd632aa3 09-Apr-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm: free core gem object from driver callbacks

When drivers embed the core gem object into their own structures,
they'll have to do this. Temporarily this results in an ugly

kfree(gem_obj);

in every gem driver.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# c36a2a6d 17-Apr-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: fix tiling limits for i915 class hw v2

Current code is definitely crap: Largest pitch allowed spills into
the TILING_Y bit of the fence registers ... :(

I've rewritten the limits check under the assumption that 3rd gen hw
has a 3d pitch limit of 8kb (like 2nd gen). This is supported by an
otherwise totally misleading XXX comment.

This bug mostly resulted in tiling-corrupted pixmaps because the kernel
allowed too wide buffers to be tiled. Bug brought to the light by the
xf86-video-intel 2.11 release because that unconditionally enabled
tiling for pixmaps, relying on the kernel to check things. Tiling for
the framebuffer was not affected because the ddx does some additional
checks there ensure the buffer is within hw-limits.

v2: Instead of computing the value that would be written into the
hw fence registers and then checking the limits simply check whether
the stride is above the 8kb limit. To better document the hw, add
some WARN_ONs in i915_write_fence_reg like I've done for the i830
case (using the right limits).

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=27449
Tested-by: Alexander Lam <lambchop468@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Eric Anholt <eric@anholt.net>


# 5a0e3ad6 24-Mar-2010 Tejun Heo <tj@kernel.org>

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>


# 23010e43 08-Mar-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: introduce to_intel_bo helper

This is a purely cosmetic change to make changes in this area easier.
And hey, it's not only clearer and typechecked, but actually shorter,
too!

[anholt: To clarify, this is a change to let us later make
drm_i915_gem_object subclass drm_gem_object, instead of having
drm_gem_object have a pointer to i915's private data]

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Dave Airlie <airlied@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 1f2b1013 12-Mar-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Avoid NULL deref in get_pages() unwind after error.

Fixes:
http://bugzilla.kernel.org/show_bug.cgi?id=15527
NULL pointer dereference in i915_gem_object_save_bit_17_swizzle

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<f82b5d2b>] i915_gem_object_save_bit_17_swizzle+0x5b/0xc0 [i915]
Call Trace:
[<f82aea55>] ? i915_gem_object_put_pages+0x125/0x150 [i915]
[<f82aeb71>] ? i915_gem_object_get_pages+0xf1/0x110 [i915]
[<f82b0de8>] ? i915_gem_object_bind_to_gtt+0xb8/0x2a0 [i915]
[<c02db74d>] ? drm_mm_get_block_generic+0x4d/0x180
[<f82b11cd>] ? i915_gem_mmap_gtt_ioctl+0x16d/0x240 [i915]
[<f82ae786>] ? i915_gem_madvise_ioctl+0x86/0x120 [i915]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reported-by: maciej.rutecki@gmail.com
Cc: stable@kernel.org
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 71cf39b1 09-Mar-2010 Eric Anholt <eric@anholt.net>

drm/i915: Enable VS timer dispatch.

This could resolve HW deadlocks where a unit downstream of the VS is
waiting for more input, the VS has one vertex queued up but not
dispatched because it hopes to get one more vertex for 2x4 dispatch,
and software isn't handing more vertices down because it's waiting for
rendering to complete. The B-Spec says you should always have this
bit set.

Signed-off-by: Eric Anholt <eric@anholt.net>


# 5d939162 02-Mar-2010 Owain G. Ainsworth <zerooa@googlemail.com>

drm/i915: remove an unnecessary wait_request()

The continue just after this call with loop around and wait for the
request just added just fine. This leads to slightly more compact code.

Signed-Off-by: Owain G. Ainsworth <oga@openbsd.org>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 16edd550 19-Feb-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: check for multiple write domains in pin_and_relocate

The assumption that an object has only ever one write domain is deeply
threaded into gem (it's even encoded the the singular of the variable
name). Don't let userspace screw us over.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 922a2efc 19-Feb-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: clean-up i915_gem_flush_gpu_write_domain

Now that we have an exact gpu write domain tracking, we don't need
to move objects to the active list ourself. i915_add_request will
take care of that under all circumstances.

Idea stolen from a patch by Chris Wilson <chris@chris-wilson.co.uk>.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 4df2faf4 19-Feb-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: reuse i915_gpu_idle helper

We have it, so use it. This required moving the function to avoid
a forward declaration.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 63560396 19-Feb-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: ensure lru ordering of fence_list

The fence_list should be lru ordered for otherwise we might try
to steal a fence reg from an active object even though there are
fences from inactive objects available. lru ordering was obeyed
for gpu access everywhere save when moving dirty objects from
flushing_list to active_list.

Fixing this cause the code to indent way to much, so I've extracted
the flushing_list processing logic into its on function.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# ae3db24a 19-Feb-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: extract fence stealing code

The spaghetti logic in there tripped up my brain's code parser for a
few secs. Prevent this from happening again by extracting the fence
stealing code into a seperate functions. IMHO this slightly clears up
the code flow.

v2: Beautified according to ickle's comments.
v3: ickle forgot to flush his comment queue ... Now there's also a
we-are-paranoid BUG_ON in there.
v4: I've forgotten to switch on my brain when doing v3. Now the BUG_ON
actually checks something useful.
v5: Clean up a stale comment as noted by Eric Anholt.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 4a87b8ca 19-Feb-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: fixup active list locking in object_unbind

All other accesses take this spinlock, so do this here, too.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 798750e3 19-Feb-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: reuse i915_gem_object_put_fence_reg for fence stealing code

This has a few functional changes against the old code:

* a few more unnecessary loads and stores to the drm_i915_fence_reg
objects. Also an unnecessary store to the hw fence register.

* zaps any userspace mappings before doing other flushes. Only changes
anything when userspace does racy stuff against itself.

* also flush GTT domain. This is a noop, but still try to keep the
bookkeeping correct.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>


# f6e450a6 02-Nov-2009 Eric Anholt <eric@anholt.net>

drm/i915: Fix sandybridge status page setup.

The register's moved to the same location as the one for the BCS, it seems.

Signed-off-by: Eric Anholt <eric@anholt.net>


# 4e901fdc 26-Oct-2009 Eric Anholt <eric@anholt.net>

drm/i915: Set up fence registers on sandybridge.

Signed-off-by: Eric Anholt <eric@anholt.net>


# bad720ff 22-Oct-2009 Eric Anholt <eric@anholt.net>

drm/i915: Add initial bits for VGA modesetting bringup on Sandybridge.

Signed-off-by: Eric Anholt <eric@anholt.net>


# f590d279 18-Feb-2010 Owain Ainsworth <zerooa@googlemail.com>

drm/i915: reduce some of the duplication of tiling checking

i915_gem_object_fenceable was mostly just a repeat of the
i915_gem_object_fence_offset_ok, but also checking the size (which was
checkecd when we allowed that BO to be tiled in the first place). So
instead, export the latter function and use it in place.

Signed-Off-By: Owain G. Ainsworth <oga@openbsd.org>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 10ae9bd2 01-Feb-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: blow away userspace mappings before fence change

This aligns it with the other user of i915_gem_clear_fence_reg,
which blows away the mapping before changing the fence reg.

Only affects userspace if it races against itself when changing
tiling parameters, i.e. behaviour is undefined, anyway.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 4a726612 01-Feb-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: move a gtt flush to the correct place

No functional change, because gtt flushing is a no-op. Still, try
to keep the bookkeeping accurate. The if is still slightly wrong
for with execbuf2 even i915-class hw doesn't always need a fence
reg for gpu access. But that's for somewhen lateron.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>


# b397c836 26-Jan-2010 Eric Anholt <eric@anholt.net>

drm/i915: Don't reserve compatibility fence regs in KMS mode.

The fence start is for compatibility with UMS X Servers before fence
management. KMS X Servers only started doing tiling after fence
management appeared.

Signed-off-by: Eric Anholt <eric@anholt.net>


# 29105ccc 07-Jan-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Replace open-coded eviction in i915_gem_idle()

With the introduction of the hang-check, we can safely expect that
i915_wait_request() will always return even when the GPU hangs, and so
do not need to open code the wait in order to manually check for the
hang. Also we do not need to always evict all buffers, so only flush
the GPU (and wait for it to idle) for KMS, but continue to evict for UMS.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# bc9025bd 08-Feb-2010 Luca Barbieri <luca@luca-barbieri.com>

Use drm_gem_object_[handle_]unreference_unlocked where possible

Mostly obvious simplifications.

The i915 pread/pwrite ioctls, intel_overlay_put_image and
nouveau_gem_new were incorrectly using the locked versions
without locking: this is also fixed in this patch.

Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# a40e8d31 09-Feb-2010 Owain Ainsworth <zerooa@googlemail.com>

drm/i915: Correctly return -ENOMEM on allocation failure in cmdbuf ioctls.

Signed-off-by: Owain G. Ainsworth <oga@openbsd.org>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 99fcb766 07-Feb-2010 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: Update write_domains on active list after flush.

Before changing the status of a buffer with a pending write we will await
upon a new flush for that buffer. So we can take advantage of any flushes
posted whilst the buffer is active and pending processing by the GPU, by
clearing its write_domain and updating its last_rendering_seqno -- thus
saving a potential flush in deep queues and improves flushing behaviour
upon eviction for both GTT space and fences.

In order to reduce the time spent searching the active list for matching
write_domains, we move those to a separate list whose elements are
the buffers belong to the active/flushing list with pending writes.

Orignal patch by Chris Wilson <chris@chris-wilson.co.uk>, forward-ported
by me.

In addition to better performance, this also fixes a real bug. Before
this changes, i915_gem_evict_everything didn't work as advertised. When
the gpu was actually busy and processing request, the flush and subsequent
wait would not move active and dirty buffers to the inactive list, but
just to the flushing list. Which triggered the BUG_ON at the end of this
function. With the more tight dirty buffer tracking, all currently busy and
dirty buffers get moved to the inactive list by one i915_gem_flush operation.

I've left the BUG_ON I've used to prove this in there.

References:
Bug 25911 - 2.10.0 causes kernel oops and system hangs
http://bugs.freedesktop.org/show_bug.cgi?id=25911

Bug 26101 - [i915] xf86-video-intel 2.10.0 (and git) triggers kernel oops
within seconds after login
http://bugs.freedesktop.org/show_bug.cgi?id=26101

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Adam Lantos <hege@playma.org>
Cc: stable@kernel.org
Signed-off-by: Eric Anholt <eric@anholt.net>


# 93533c29 31-Jan-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix leak of relocs along do_execbuffer error path

Following a gpu hang, we would leak the relocation buffer. So simply
earrange the error path to always free the relocation buffer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 4bdadb97 27-Jan-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Selectively enable self-reclaim

Having missed the ENOMEM return via i915_gem_fault(), there are probably
other paths that I also missed. By not enabling NORETRY by default these
paths can run the shrinker and take memory from the system (but not from
our own inactive lists because our shrinker can not run whilst we hold
the struct mutex) and this may allow the system to survive a little longer
whilst our drivers consume all available memory.

References:
OOM killer unexpectedly called with kernel 2.6.32
http://bugzilla.kernel.org/show_bug.cgi?id=14933

v2: Pass gfp into page mapping.
v3: Use new read_cache_page_gfp() instead of open-coding.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0ce907f8 23-Jan-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Prevent use of uninitialized pointers along error path.

X.org hang with [drm:i915_gem_do_execbuffer] *ERROR* in dmesg
http://bugzilla.kernel.org/show_bug.cgi?id=15114

Matej found he was hitting an error path within i915_gem_do_execbuffer()
that led to the attempt to dereference an uninitialised pointer during
cleanup. This path used to be safe as we used to calloc the object
lists, but this was changed in c8e0f93. Daniel Vetter had also spotted
this error and proposed a similar patch.

[ 6379.732892] [drm:i915_gem_do_execbuffer] *ERROR* Object ffff880098cd6540 appears more than once in object list
[ 6379.740976] [drm:i915_gem_do_execbuffer] *ERROR* Object ffff880098cd6540 appears more than once in object list
[ 6379.740995] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
[ 6379.740998] IP: [<ffffffff8122ddb5>] i915_gem_do_execbuffer+0xba5/0x1260
[ 6379.741006] PGD babab067 PUD bb435067 PMD 0
[ 6379.741010] Oops: 0002 [#1] PREEMPT SMP
[ 6379.741014] last sysfs file: /sys/devices/pci0000:00/0000:00:1c.2/0000:06:00.0/ieee80211/phy0/rfkill0/state
[ 6379.741017] CPU 1
[ 6379.741021] Pid: 2186, comm: X Not tainted 2.6.33-rc4-00399-g24bc734 #142 M11D/ESPRIMO Mobile M9400
[ 6379.741023] RIP: 0010:[<ffffffff8122ddb5>] [<ffffffff8122ddb5>] i915_gem_do_execbuffer+0xba5/0x1260
[ 6379.741027] RSP: 0018:ffff8800b9047b78 EFLAGS: 00213206
[ 6379.741029] RAX: 0000000000000000 RBX: 000000000000004f RCX: ffff880098cac800
[ 6379.741032] RDX: ffff880098caca78 RSI: ffff8800b9047c98 RDI: ffff880098cd6540
[ 6379.741034] RBP: ffff8800b9047c78 R08: ffffffff814b96b5 R09: 0000000000000006
[ 6379.741036] R10: 0000000000000000 R11: 0000000000000003 R12: 000000000000004e
[ 6379.741038] R13: 00000000fffffff7 R14: 0000000000000000 R15: 0000000000000001
[ 6379.741041] FS: 0000000000000000(0000) GS:ffff880001900000(0063) knlGS:00000000f72636c0
[ 6379.741043] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 6379.741041] FS: 0000000000000000(0000) GS:ffff880001900000(0063) knlGS:00000000f72636c0
[ 6379.741043] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 6379.741045] CR2: 00000000000000a0 CR3: 00000000b9000000 CR4: 00000000000006e0
[ 6379.741048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6379.741050] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 6379.741052] Process X (pid: 2186, threadinfo ffff8800b9046000, task ffff8800bb5d8000)
[ 6379.741054] Stack:
[ 6379.741055] ffffc90023f57000 ffffc90023f56fff ffffc90023f56fff ffffc90023f55000
[ 6379.741059] <0> ffff8800b9047c98 ffff8800bb43c840 ffff8800bf1de800 ffff8800bf1de820
[ 6379.741063] <0> ffff8800b9047bd8 ffff880098cac800 0000000000000000 0000000000000002
[ 6379.741068] Call Trace:
[ 6379.741072] [<ffffffff8122e6cb>] ? i915_gem_execbuffer+0x6b/0x370
[ 6379.741077] [<ffffffff810a5f52>] ? __vmalloc_node+0xa2/0xb0
[ 6379.741080] [<ffffffff8122e6cb>] ? i915_gem_execbuffer+0x6b/0x370
[ 6379.741083] [<ffffffff8122e816>] i915_gem_execbuffer+0x1b6/0x370
[ 6379.741086] [<ffffffff8120cd55>] drm_ioctl+0x1d5/0x460
[ 6379.741089] [<ffffffff8122e660>] ? i915_gem_execbuffer+0x0/0x370
[ 6379.741093] [<ffffffff81248c35>] i915_compat_ioctl+0x45/0x50
[ 6379.741097] [<ffffffff810f1659>] compat_sys_ioctl+0xa9/0x1570
[ 6379.741102] [<ffffffff810b1d5c>] ? vfs_read+0x13c/0x1a0
[ 6379.741106] [<ffffffff81028424>] sysenter_dispatch+0x7/0x2b
[ 6379.741108] Code: 08 85 c0 74 52 31 db 0f 1f 80 00 00 00 00 48 63 c3 48 8b
8d 68 ff ff ff 48 8d 14 c1 48 8b 02 48 85 c0 74 25 48 8b 80 80 00 00 00 <c7> 80
a0 00 00 00 00 00 00 00 48 8b 3a 48 85 ff 74 0c 48 c7 c6
[ 6379.741142] RIP [<ffffffff8122ddb5>] i915_gem_do_execbuffer+0xba5/0x1260
[ 6379.741145] RSP <ffff8800b9047b78>
[ 6379.741147] CR2: 00000000000000a0
[ 6379.741159] ---[ end trace 0598809afa4c31db ]---

Reported-by: Matej Laitl <strohel@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 6036ae7e 15-Jan-2010 Eric Anholt <eric@anholt.net>

drm/i915: Remove chatty execbuf failure message.

Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> (in principle)
Signed-off-by: Eric Anholt <eric@anholt.net>


# b9241ea3 24-Nov-2009 Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915: Don't wait interruptible for possible plane buffer flush

When we setup buffer for display plane, we'll check any pending
required GPU flush and possible make interruptible wait for flush
complete. But that wait would be most possibly to fail in case of
signals received for X process, which will then fail modeset process
and put display engine in unconsistent state. The result could be
blank screen or CPU hang, and DDX driver would always turn on outputs
DPMS after whatever modeset fails or not.

So this one creates new helper for setup display plane buffer, and
when needing flush using uninterruptible wait for that.

This one should fix bug like https://bugs.freedesktop.org/show_bug.cgi?id=24009.
Also fixing mode switch stress test on Ironlake.

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>


# e6be8d9d 04-Jan-2010 Zhenyu Wang <zhenyu.z.wang@intel.com>

drm: remove address mask param for drm_pci_alloc()

drm_pci_alloc() has input of address mask for setting pci dma
mask on the device, which should be properly setup by drm driver.
And leave it as a param for drm_pci_alloc() would cause confusion
or mistake would corrupt the correct dma mask setting, as seen on
intel hw which set wrong dma mask for hw status page. So remove
it from drm_pci_alloc() function.

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# e3d8affb 04-Jan-2010 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Permit pinning whilst the device is 'suspended'

As pinning (allocating and binding GTT memory) does not actually invoke
GPU commands, it is safe, and indeed is attempted, during resumption
from suspension:

[drm:intel_init_clock_gating] *ERROR* failed to pin power context: -16

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reported-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: stable@kernel.org
Signed-off-by: Eric Anholt <eric@anholt.net>


# 76446cac 17-Dec-2009 Jesse Barnes <jbarnes@virtuousgeek.org>

drm/i915: execbuf2 support

This patch adds a new execbuf ioctl, execbuf2, for use by clients that
want to control fence register allocation more finely. The buffer
passed in to the new ioctl includes a new relocation type to indicate
whether a given object needs a fence register assigned for the command
buffer in question.

Compatibility with the existing execbuf ioctl is implemented in terms
of the new code, preserving the assumption that fence registers are
required for pre-965 rendering commands.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[ickle: Remove pre-emptive clear_fence_reg()]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
[anholt: Removed dmesg spam]
Signed-off-by: Eric Anholt <eric@anholt.net>


# 96b47b65 15-Dec-2009 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: fix order of fence release wrt flushing

i915_gem_object_unbind had the ordering wrong. The other user,
i915_gem_object_put_fence_reg already has the correct ordering.

Results was usually corrupted pixmaps, especially garbled font glyphs
after a suspend/resume (because this evicts everything).

I'm still waiting for the feedback from the bug-reporters, but
because this obviously fixes a bug (at least for me) I'm already
submitting it.

Bugzilla: http://bugs.freedesktop.org/show_bug.cgi?id=25406
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
CC: stable@kernel.org


# 5618ca6a 02-Dec-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Set the error code after failing to insert new offset into mm ht.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Signed-off-by: Eric Anholt <eric@anholt.net>


# f2b115e6 03-Dec-2009 Adam Jackson <ajax@redhat.com>

drm/i915: Fix product names and #defines

IGD* isn't a useful name. Replace with the codenames, as sourced from
pci.ids.

Signed-off-by: Adam Jackson <ajax@redhat.com>
[anholt: Fixed up for merge with pineview/ironlake changes]
Signed-off-by: Eric Anholt <eric@anholt.net>


# ffb47280 07-Dec-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Drop a some common DRM_ERROR()

These are handled by the error return being propagated to user-space and
do not any add any information to the original error, so are useless.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# af901ca1 14-Nov-2009 André Goddard Rosa <andre.goddard@gmail.com>

tree-wide: fix assorted typos all over the place

That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.

Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>


# 6b95a207 18-Nov-2009 Kristian Høgsberg <krh@bitplanet.net>

drm/i915: Add intel implementation of the pageflip ioctl

Acked-by: Jakob Bornecrantz <jakob@vmware.com>
Acked-by: Thomas Hellström <thomas@shipmail.org>
Review-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jesse "Orange Smoothie" Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Eric Anholt <eric@anholt.net>


# c8e0f93a 21-Nov-2009 Eric Anholt <eric@anholt.net>

drm/i915: Replace a calloc followed by copying data over it with malloc.

Execbufs involve quite a bit of payload, to the extent that cache misses
show up in the profiles here, and a suspicion that some of those cachelines
may get evicted and then reloaded in the subsequent copy.

This is still abstracted like drm_calloc_large since we want to check for
size overflow, and because we want to choose between kmalloc and vmalloc
on the fly. cairo's interface for malloc-with-calloc's-args was used as
the model.

Signed-off-by: Eric Anholt <eric@anholt.net>


# 44d98a61 08-Oct-2009 Zhao Yakui <yakui.zhao@intel.com>

drm/i915: Replace DRM_DEBUG with DRM_DEBUG_DRIVER

Replace the DRM_DEBUG with DRM_DEBUG_DRIVER in generic i915 driver.
Then the debug info can be obtained by adding the boot option of
"drm.debug=0x02".

At the same time the debug info in increase/decrease clock is also
printed by using DRM_DEBUG_DRIVER instead of DRM_DEBUG_KMS.

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 1df4b35b 15-Sep-2009 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: kill i915_lp_ring_sync

It's not needed anymore.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 5a5a0c64 15-Sep-2009 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: implement fastpath for overlay flip waiting

As long as the gpu can keep up, neither the cpu (waiting for gpu)
nore the gpu (waiting for vblank to do an overlay flip) stalls.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 48764bf4 15-Sep-2009 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/i915: add i915_lp_ring_sync helper

This just waits until the hw passed the current ring position with
cmd execution. This slightly changes the existing i915_wait_request
function to make uninterruptible waiting possible - no point in
returning to userspace while mucking around with the overlay, that
piece of hw is just too fragile.

Also replace a magic 0 with the symbolic constant (and kill the then
superflous comment) while I was looking at the code.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 9d34e5db 23-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Enable irq to trace batch buffer completion.

If we trigger a tracepoint for batch buffer submission, it is a reasonable
assumption that we wish to also trace the batch buffer completion. So in
order to capture the completion events, we need to enable irqs... However,
we cannot rely on the completion event to disable the irq later, so we
defer the irq disable to the retire request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 8f0dc5bf 23-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: batch submit seqno off-by-one.

We increment the seqno number between submitting the batch buffer and
the flush/interrupt that demarcates its end, so the tracepoint needs to
reference the incremented value to match the completion event.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# c715089f 22-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Handle ERESTARTSYS during page fault

During a page fault and rebinding the buffer there exists a window for a
signal to arrive during the i915_wait_request() and trigger a
ERESTARTSYS. This used to be handled by returning SIGBUS and thereby
killing the application. Try 'cairo-perf-trace & cairo-test-suite' and
watch X go boom!

The solution as suggested by H. Peter Anvin is to simply return NOPAGE and
leave the higher layers to spot we did not fill the page and resubmit
the page fault.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
[anholt: Mostly squash it with another commit]


# ab18282d 22-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Warn before mmaping a purgeable buffer.

Only allow the user to mmap buffers that have not been marked as
purgeable.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# bb6baf76 22-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Track purged state.

In order to correctly prevent the invalid reuse of a purged buffer, we
need to track such events and warn the user before something bad
happens.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 9731129c 20-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove eviction debug spam

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 2d7ef395 20-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Immediately discard any backing storage for uneeded objects

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 963b4836 20-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Do not mis-classify clean objects as purgeable

Whilst cleaning up the patches for submission, I mis-classified non-dirty
objects as purgeable. This was causing the backing pages for those
objects to be evicted under memory-pressure, discarding valid and
unreplaceable texture data.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 13a05fd9 20-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Whitespace correction for madv

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# a32808c0 20-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: BUG_ON page refleak during unbind

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 9a1e2582 20-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Search harder for a reusable object

As evict_something() is called by routines that do not repeatedly search
again, try harder in the initial search to find an object that matches
the request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# ab5ee576 20-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Clean up evict from list.

First the routine attempted to unlock a mutex it did not own along the
error path.

Secondly the routine should never be called on any list but the inactive
one, since we attempt to unbind those objects, so fix the calling semantics.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 1c5d22f7 25-Aug-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add tracepoints

By adding tracepoint equivalents for WATCH_BUF/EXEC we are able to monitor
the lifetimes of objects, requests and significant events. These events can
then be probed using the tracing frameworks, such as systemtap and, in
particular, perf.

For example to record the stack trace for every GPU stall during a run, use

$ perf record -e i915:i915_gem_request_wait_begin -c 1 -g

And

$ perf report

to view the results.

[Updated to fix compilation issues caused.]
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ben Gamari <bgamari@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


# 8542a0bb 09-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Skip the sanity checks if the current relocation is valid

If the presumed_offset as feed to userspace and returned to the kernel
from a previous execbuffer is still valid, then we do not need to rewrite
the relocation entry and may skip the offset sanity checks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# cd0b9fb4 15-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Check that the relocation points to within the target

Eric noted a potential concern with the low bits not being strictly used
as part of the absolute offset (instead part of the command stream to the
GPU), but in practice that should not be an issue.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Andy Whitcroft <apw@canonical.com>
Cc: Eric Anholt <eric@anholt.net>
CC: stable@kernel.org
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 07f73f69 14-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Improve behaviour under memory pressure

Due to the necessity of having to take the struct_mutex, the i915
shrinker can not free the inactive lists if we fail to allocate memory
whilst processing a batch buffer, triggering an OOM and an ENOMEM that
is reported back to userspace. In order to fare better under such
circumstances we need to manually retry a failed allocation after
evicting inactive buffers.

To do so involves 3 steps:
1. Marking the backing shm pages as NORETRY.
2. Updating the get_pages() callers to evict something on failure and then
retry.
3. Revamping the evict something logic to be smarter about the required
buffer size and prefer to use volatile or clean inactive pages.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 3ef94daa 14-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add ioctl to set 'purgeability' of objects

Similar to the madvise() concept, the application may wish to mark some
data as volatile. That is in the event of memory pressure the kernel is
free to discard such buffers safe in the knowledge that the application
can recreate them on demand, and is simply using these as a cache.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 31169714 14-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Register a shrinker to free inactive lists under memory pressure

This should help GEM handle memory pressure sitatuions more gracefully.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# e67b8ce1 14-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Remove stored gtt_alignment

There is no need to store the gtt_alignment as it is either explicitly
set according to the hardware requirements (e.g. scanout) or the
minimum alignment is computed on demand.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 4960aaca 14-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Add buffer to inactive list immediately during fault

If we failed to set the domain, the buffer was no longer being tracked
on any list.

Cc: stable@kernel.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# ba1234d1 14-Sep-2009 Ben Gamari <bgamari.foss@gmail.com>

drm/i915: Make dev_priv->mm.wedged an atomic_t

There is a very real possibility that multiple CPUs will notice that the
GPU is wedged. This introduces all sorts of potential race conditions.
Make the wedged flag atomic to mitigate this risk.

Signed-off-by: Ben Gamari <bgamari.foss@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# f65d9421 14-Sep-2009 Ben Gamari <bgamari.foss@gmail.com>

drm/i915: Add hangcheck timer

We set a periodic timer to check on the GPU, resetting it every time a
batch is completed. If the timer elapses, we check acthd. If acthd
hasn't changed in two timer periods, we assume the chip is wedged.

This is implemented in such a way that it leaves the option open to
employ adaptive timer intervals in the future. One could wait until
several timer periods have elapsed before declaring the chip dead. If
the chip comes back after several periods but before the "dead"
threshold, the timer interval or dead threshold could be raised.

It is important to note that while checking for active requests, we need
to account for the fact that requests are removed from the list (i.e.
retired) in a deferred work queue handler. This means that merely
checking for an empty request_list is insufficient; the list could be
non-empty yet the GPU still idle, causing the hangcheck timer to
incorrectly mark the GPU as wedged (it took me a while to figure that
out---sigh...)

Signed-off-by: Ben Gamari <bgamari.foss@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 22be1724 14-Sep-2009 Ben Gamari <bgamari.foss@gmail.com>

drm/i915: make i915_seqno_passed non-static

We'll need it in i915_irq.c for checking whether there are outstanding
requests. Also, the function really ought to return a bool, not an int.

Signed-off-by: Ben Gamari <bgamari.foss@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# ffed1d09 14-Sep-2009 Ben Gamari <bgamari.foss@gmail.com>

drm/i915: Check whether chip is wedged in i915_wait_request()

i915_wait_request() only checks mm.wedged after it interacts with the
hardware, generally causing the driver to lock up waiting for a wedged
chip. Make sure we check mm.wedged as the first thing we do.

Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Gamari <bgamari.foss@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 7e616158 10-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Only destroy a constructed mmap offset

drm_ht_remove_item() does not handle removing an absent item and the hlist
in particular is incorrectly initialised. The easy remedy is simply skip
calling i915_gem_free_mmap_offset() unless we have actually created the
offset and associated ht entry.

This also fixes the mishandling of a partially constructed offset which
leaves pointers initialized after freeing them along the
i915_gem_create_mmap_offset() error paths.

In particular this should fix the oops found here:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/415357/comments/8

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Cc: stable@kernel.org


# e517a5e9 10-Sep-2009 Eric Anholt <eric@anholt.net>

agp/intel: Fix the pre-9xx chipset flush.

Ever since we enabled GEM, the pre-9xx chipsets (particularly 865) have had
serious stability issues. Back in May a wbinvd was added to the DRM to
work around much of the problem. Some failure remained -- easily visible
by dragging a window around on an X -retro desktop, or by looking at bugzilla.

The chipset flush was on the right track -- hitting the right amount of
memory, and it appears to be the only way to flush on these chipsets, but the
flush page was mapped uncached. As a result, the writes trying to clear the
writeback cache ended up bypassing the cache, and not flushing anything! The
wbinvd would flush out other writeback data and often cause the data we wanted
to get flushed, but not always. By removing the setting of the page to UC
and instead just clflushing the data we write to try to flush it, we get the
desired behavior with no wbinvd.

This exports clflush_cache_range(), which was laying around and happened to
basically match the code I was otherwise going to copy from the DRM.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org>
Cc: stable@kernel.org


# 5323fd04 09-Sep-2009 Eric Anholt <eric@anholt.net>

drm/i915: Zap mmaps of objects before unbinding them from the GTT.

Otherwise, some other userland writing into its buffer may race to land
writes either after the CPU thinks it's got a coherent view, or after its
GTT entries have been redirected to point at the scratch page. Either
result is unpleasant.

Signed-off-by: Eric Anholt <eric@anholt.net>


# e6890f6f 08-Sep-2009 Linus Torvalds <torvalds@linux-foundation.org>

i915: disable interrupts before tearing down GEM state

Reinette Chatre reports a frozen system (with blinking keyboard LEDs)
when switching from graphics mode to the text console, or when
suspending (which does the same thing). With netconsole, the oops
turned out to be

BUG: unable to handle kernel NULL pointer dereference at 0000000000000084
IP: [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]

and it's due to the i915_gem.c code doing drm_irq_uninstall() after
having done i915_gem_idle(). And the i915_gem_idle() path will do

i915_gem_idle() ->
i915_gem_cleanup_ringbuffer() ->
i915_gem_cleanup_hws() ->
dev_priv->hw_status_page = NULL;

but if an i915 interrupt comes in after this stage, it may want to
access that hw_status_page, and gets the above NULL pointer dereference.

And since the NULL pointer dereference happens from within an interrupt,
and with the screen still in graphics mode, the common end result is
simply a silently hung machine.

Fix it by simply uninstalling the irq handler before idling rather than
after. Fixes

http://bugzilla.kernel.org/show_bug.cgi?id=13819

Reported-and-tested-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0ef82af7 05-Sep-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Pad ringbuffer with NOOPs before wrapping

According to the docs, the ringbuffer is not allowed to wrap in the middle
of an instruction.

G45 PRM, Vol 1b, p101:
While the “free space” wrap may allow commands to be wrapped around the
end of the Ring Buffer, the wrap should only occur between commands.
Padding (with NOP) may be required to follow this restriction.

Do as commanded.

[Having seen bug reports where there is evidence of split commands, but
apparently the GPU has continued on merrily before a bizarre and untimely
death, this may or may not fix a few random hangs.]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
CC: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 652c393a 17-Aug-2009 Jesse Barnes <jbarnes@virtuousgeek.org>

drm/i915: add dynamic clock frequency control

There are several sources of unnecessary power consumption on Intel
graphics systems. The first is the LVDS clock. TFTs don't suffer from
persistence issues like CRTs, and so we can reduce the LVDS refresh rate
when the screen is idle. It will be automatically upclocked when
userspace triggers graphical activity. Beyond that, we can enable memory
self refresh. This allows the memory to go into a lower power state when
the graphics are idle. Finally, we can drop some clocks on the gpu
itself. All of these things can be reenabled between frames when GPU
activity is triggered, and so there should be no user visible graphical
changes.

Signed-off-by: Jesse Barnes <jesse.barnes@intel.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 58c2fb64 31-Aug-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Unref old_obj on get_fence_reg() error path

Remember to release the local reference if we fail to wait on
the rendering.

(Also whilst in the vicinity add some whitespace so that the phasing of
the operations is clearer.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# a09ba7fa 29-Aug-2009 Eric Anholt <eric@anholt.net>

drm/i915: Fix CPU-spinning hangs related to fence usage by using an LRU.

The lack of a proper LRU was partially worked around by taking the fence
from the object containing the oldest seqno. But if there are multiple
objects inactive, then they don't have seqnos and the first fence reg
among them would be chosen. If you were trying to copy data between two
mappings, this could result in each page fault stealing the fence from
the other argument, and your application hanging.

https://bugs.freedesktop.org/show_bug.cgi?id=23566
https://bugs.freedesktop.org/show_bug.cgi?id=23220
https://bugs.freedesktop.org/show_bug.cgi?id=23253
https://bugs.freedesktop.org/show_bug.cgi?id=23366

Cc: Stable Team <stable@kernel.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>


# a1a2d1d3 22-Aug-2009 Pekka Paalanen <pq@iki.fi>

drm: GEM handles are u32, not int

Several functions in the GEM kernel API used int as handle type, but
user API has it __u32 which is also the intended type.

Replace int with u32.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 9c9fe1f8 03-Aug-2009 Eric Anholt <eric@anholt.net>

drm/i915: Use our own workqueue to avoid wedging the system along with the GPU.

Signed-off-by: Eric Anholt <eric@anholt.net>


# d05ca301 10-Jul-2009 Eric Anholt <eric@anholt.net>

drm/i915: Zap the GTT mapping when transitioning from untiled to tiled.

As of 52dc7d32b88156248167864f77a9026abe27b432, we could leave an old
linear GTT mapping in place, so that apps trying to GTT-mapped write in
tiled data wouldn't get the fence added, and garbage would get displayed.

Signed-off-by: Eric Anholt <eric@anholt.net>


# 901782b2 10-Jul-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Refactor calls to unmap_mapping_range

As we call unmap_mapping_range() twice in identical fashion, refactor
and attempt to explain why we need to call unmap_mapping_range().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# b5aa8a0f 23-Jun-2009 Grégoire Henry <Gregoire.Henry@pps.jussieu.fr>

drm/i915: initialize fence registers to zero when loading GEM

Unitialized fence register could leads to corrupted display. Problem
encountered on MacBooks (revision 1 and 2), directly booting from EFI
or through BIOS emulation.

(bug #21710 at freedestop.org)

Signed-off-by: Grégoire Henry <henry@pps.jussieu.fr>
Signed-off-by: Eric Anholt <eric@anholt.net>


# cfd43c02 19-Jun-2009 Krzysztof Halasa <khc@pm.waw.pl>

drm/i915: Fix size_t handling in off-by-default debug printfs

Signed-off-by: Krzysztof Halasa <khc@pm.waw.pl>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 9a298b2a 24-Mar-2009 Eric Anholt <eric@anholt.net>

drm: Remove memory debugging infrastructure.

It hasn't been used in ages, and having the user tell your how much
memory is being freed at free time is a recipe for disaster even if it
was ever used.

Signed-off-by: Eric Anholt <eric@anholt.net>


# 52dc7d32 06-Jun-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Clear fence register on tiling stride change.

The fence register value also depends upon the stride of the object, so we
need to clear the fence if that is changed as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[anholt: Added 8xx and 965 paths, and renamed the confusing
i915_gem_object_tiling_ok function to i915_gem_object_fence_offset_ok]
Signed-off-by: Eric Anholt <eric@anholt.net>


# 8c4b8c3f 17-Jun-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Install fence register for tiled scanout on i915

With the work by Jesse Barnes to eliminate allocation of fences during
execbuffer, it becomes possible to write to the scan-out buffer with it
never acquiring a fence (simply by only ever writing to the object using
tiled GPU commands and never writing to it via the GTT). So for pre-i965
chipsets which require fenced access for tiled scan-out buffers, we need
to obtain a fence register.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# d78b47b9 17-Jun-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: detach/attach get/put pages symmetry

After performing an operation over the page list for a buffer retrieved by
i915_gem_object_get_pages() the pages need to be returned with
i915_gem_object_put_pages(). This was not being observed for the phys
objects which were thus leaking references to their backing pages.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
CC: Dave Airlie <airlied@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 2939e1f5 06-Jun-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: NOMEM->NOSPC

To differentiate between encountering an out-of-memory error with running
out of space in the aperture, use ENOSPC for the later.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 21d509e3 06-Jun-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: use I915_GEM_GPU_DOMAINS

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# b1ce786c 06-Jun-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: no need to hold mutex for object lookup

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 5f26a2c7 06-Jun-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: OR in the COMMAND read domain for the batch buffer.

The batch buffer may be shared with another read buffer, so we should not
ignore any previously set domains, but just or in the command domain (and
check that the buffer is not writable).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 83d60795 06-Jun-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Sanity check execbuffer arguments before touching state.

By sending a broken execbuffer (its length was not suitably aligned) I
triggered an operation upon a freed object. The invalid alignment was
discovered after updating the write_domain on the object but before the
object was placed on the active queue. So during the unwind process
following the error, the now freed object attempts to flush its
non-existent, but outstanding, GPU writes causing this use-after-free.

[drm:i915_dispatch_gem_execbuffer] *ERROR* alignment
[drm:i915_gem_execbuffer] *ERROR* dispatch failed -22
WARNING: at lib/kref.c:43 warn_slowpath_null+0x10/0x15()
Modules linked in:
Pid: 4552, comm: lt-csi-drm Not tainted 2.6.30-rc6 #423
Call Trace:
[<c0119ef3>] warn_slowpath_fmt+0x57/0x6d
[<c014de24>] ? get_pageblock_migratetype+0x18/0x1e
[<c014e8fd>] ? free_hot_page+0xa/0xc
[<c014e915>] ? __free_pages+0x16/0x1f
[<c0153ebf>] ? shmem_truncate_range+0x63e/0x656
[<c015fb2f>] ? slob_page_alloc+0x146/0x1c8
[<c0119f19>] warn_slowpath_null+0x10/0x15
[<c01f55f2>] kref_get+0x1b/0x21
[<c02605db>] i915_gem_object_move_to_active+0x1f/0x56
[<c0261302>] i915_add_request+0x156/0x19a
[<c026136e>] i915_gem_object_flush_gpu_write_domain+0x28/0x3f
[<c0261eca>] i915_gem_object_unbind+0x4a/0x124
[<c0261fd7>] i915_gem_free_object+0x33/0x9b
[<c0250d6b>] drm_gem_object_free+0x28/0x4a
[<c0250d43>] ? drm_gem_object_free+0x0/0x4a
[<c01f55ce>] kref_put+0x38/0x41
[<c0250cbf>] drm_gem_object_unreference+0x11/0x13
[<c0250d06>] drm_gem_object_handle_unreference+0x1e/0x21
[<c0250d13>] drm_gem_object_release_handle+0xa/0xe
[<c01f3e6b>] idr_for_each+0x5f/0x98
[<c0250d09>] ? drm_gem_object_release_handle+0x0/0xe
[<c0250daf>] drm_gem_release+0x22/0x34
[<c025046f>] drm_release+0x1e8/0x3c4
[<c0162d25>] __fput+0xaf/0x146
[<c0162dce>] fput+0x12/0x14
[<c01605ef>] filp_close+0x48/0x52
[<c011b182>] put_files_struct+0x57/0x9b
[<c011b1e4>] exit_files+0x1e/0x20
[<c011c6b6>] do_exit+0x16d/0x511
[<c03704ab>] ? __schedule+0x3d4/0x3e5
[<c0103f0d>] ? handle_irq+0xd/0x69
[<c011caa7>] do_group_exit+0x4d/0x73
[<c011cae0>] sys_exit_group+0x13/0x17
[<c010268c>] sysenter_do_call+0x12/0x2b

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 036a4a7d 08-Jun-2009 Zhenyu Wang <zhenyuw@linux.intel.com>

drm/i915: handle interrupt on new chipset

Update interrupt handling methods for IGDNG with new registers
for display and graphics interrupt functions. As we won't use
irq-based vblank sync in dri2, so display interrupt on new chip
will be used for hotplug only in future.

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>


# b962442e 03-Jun-2009 Eric Anholt <eric@anholt.net>

drm/i915: Change GEM throttling to be 20ms like the comment says.

keithp didn't like the original 20ms plan because a cooperative client could
be starved by an uncooperative client. There may even have been problems
with cooperative clients versus cooperative clients. So keithp changed
throttle to just wait for the second to last seqno emitted by that client.
It worked well, until we started getting more round-trips to the server
due to DRI2 -- the server throttles in BlockHandler, and so if you did more
than one round trip after finishing your frame, you'd end up unintentionally
syncing to the swap.

Fix this by keeping track of the client's requests, so the client can wait
when it has an outstanding request over 20ms old. This should have
non-starving behavior, good behavior in the presence of restarts, and less
waiting. Improves high-settings openarena performance on my GM45 by 50%.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 0e7ddf7e 04-Jun-2009 Eric Anholt <eric@anholt.net>

drm/i915: Remove a bad BUG_ON in the fence management code.

This could be triggered by a gtt mapping fault on 965 that decides to
remove the fence from another object that happens to be active currently.
Since the other object doesn't rely on the fence reg for its execution, we
don't wait for it to finish. We'll soon be not waiting on 915 most of the
time as well, so just drop the BUG_ON.

Signed-off-by: Eric Anholt <eric@anholt.net>


# 07f4f3e8 27-May-2009 Kristian Høgsberg <krh@redhat.com>

i915: Set object to gtt domain when faulting it back in

When a GEM object is evicted from the GTT we set it to the CPU domain,
as it might get swapped in and out or ever mmapped regularly. If the
object is mmapped through the GTT it can still get evicted in this way
by other objects requiring GTT space. When the GTT mapping is touched
again we fault it back into the GTT, but fail to set it back to the
GTT domain. This means we fail to flush any cached CPU writes to the
pages backing the object which will then happen "eventually", typically
after we write to the page through the uncached GTT mapping.

[anholt: Note that userland does do a set_domain(GTT, GTT) when starting
to access the GTT mapping. That covers getting the existing mapping of the
object synchronized if it's bound to the GTT. But set_domain(GTT, GTT)
doesn't do anything if the object is currently unbound. This fix covers the
transition to being bound for GTT mapping.]

Fixes glyph and other pixmap corruption during swapping. fd.o bug #21790

Signed-off-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>


# cfa16a0d 26-May-2009 Eric Anholt <eric@anholt.net>

drm/i915: Apply a big hammer to 865 GEM object CPU cache flushing.

On the 865, but not the 855, the clflush we do appears to not actually make
it out to the hardware all the time. An easy way to safely reproduce was
X -retro, which would show that some of the blits involved in drawing the
lovely root weave didn't make it out to the hardware. Those blits are 32
bytes each, and 1-2 would be missing at various points around the screen.
Other experimentation (doing more clflush, doing more AGP chipset flush,
poking at some more device registers to maybe trigger more flushing) didn't
help. krh came up with the wbinvd as a way to successfully get all those
blits to appear.

Signed-off-by: Eric Anholt <eric@anholt.net>


# e76a16de 26-May-2009 Eric Anholt <eric@anholt.net>

drm/i915: Fix tiling pitch handling on 8xx.

The pitch field is an exponent on pre-965, so we were rejecting buffers
on 8xx that we shouldn't have. 915 got lucky in that the largest legal
value happened to match (8KB / 512 = 0x10), but 8xx has a smaller tile width.
Additionally, we programmed that bad value into the register on 8xx, so the
only pitch that would work correctly was 4096 (512-1023 pixels), while others
would probably give bad rendering or hangs.

Signed-off-by: Eric Anholt <eric@anholt.net>

fd.o bug #20473.


# 14b60391 20-May-2009 Jesse Barnes <jbarnes@virtuousgeek.org>

i915: support 8xx desktop cursors

For some reason we never added 8xx desktop cursor support to the
kernel. This patch fixes that.

[krh: Also set the size on pre-i915 hw.]
Tested-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 8e7d2b2c 08-May-2009 Jesse Barnes <jbarnes@virtuousgeek.org>

drm/i915: allocate large pointer arrays with vmalloc

For awhile now, many of the GEM code paths have allocated page or
object arrays with the slab allocator. This is nice and fast, but
won't work well if memory is fragmented, since the slab allocator works
with physically contiguous memory (i.e. order > 2 allocations are
likely to fail fairly early after booting and doing some work).

This patch works around the issue by falling back to vmalloc for
>PAGE_SIZE allocations. This is ugly, but much less work than chaining
a bunch of pages together by hand (suprisingly there's not a bunch of
generic kernel helpers for this yet afaik). vmalloc space is somewhat
precious on 32 bit kernels, but our allocations shouldn't be big enough
to cause problems, though they're routinely more than a page.

Note that this patch doesn't address the unchecked
alloc-based-on-ioctl-args in GEM; that needs to be fixed in a separate
patch.

Also, I've deliberately ignored the DRM's "area" junk. I don't think
anyone actually uses it anymore and I'm hoping it gets ripped out soon.

[Updated: removed size arg to new free function. We could unify the
free functions as well once the DRM mem tracking is ripped out.]

fd.o bug #20152 (part 1/3)

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 802c7eb6 05-May-2009 Jesse Barnes <jbarnes@virtuousgeek.org>

drm/i915: sanity check IER at wait_request time

We might sleep here anyway so I hope an extra uncached read is ok to
add.

In #20896 we found that vbetool clobbers the IER. In KMS mode this is
particularly bad since we don't set the interrupt regs late (in
EnterVT), so we'd fail to get *any* interrupts at all after X started
(since some distros have scripts that call vbetool at X startup
apparently).

So this patch checks IER at wait_request time, and re-enables
interrupts if it's been clobbered. In a proper config this check
should never be triggered.

This is really a distro issue, but having a sanity check is nice, as
long as it doesn't have a real performance hit.

Tested-by: Mateusz Kaduk <mateusz.kaduk@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[anholt: Moved the check inside of the sleeping case to avoid perf cost]
Signed-off-by: Eric Anholt <eric@anholt.net>


# d816f6ac 17-Apr-2009 Wu Fengguang <fengguang.wu@intel.com>

drm/i915: fix unpaired i915 device mutex on entervt failure.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 68c84342 07-Apr-2009 Shaohua Li <shaohua.li@intel.com>

drm/i915: fix scheduling while holding the new active list spinlock

regression caused by commit 5e118f4139feafe97e913df67b1f7c1e5083e535:
i915_gem_object_move_to_inactive() should be called in task context,
as it calls fput();

Signed-off-by: Shaohua Li<shaohua.li@intel.com>
[anholt: Add more detail to the comment about the lock break that's added]
Signed-off-by: Eric Anholt <eric@anholt.net>


# 280b713b 12-Mar-2009 Eric Anholt <eric@anholt.net>

drm/i915: Allow tiling of objects with bit 17 swizzling by the CPU.

Save the bit 17 state of the pages when freeing the page list, and
reswizzle them if necessary when rebinding the pages (in case they were
swapped out). Since we have userland with expectations that the swizzle
enums let it pread and pwrite contents accurately, we can't expose a new
swizzle enum for bit 17 (which it would have to GTT map to handle), so we
handle it down in pread and pwrite by swizzling the copy when bit 17 of the
page address is set.

Signed-off-by: Eric Anholt <eric@anholt.net>


# e5e9ecde 07-Apr-2009 Eric Anholt <eric@anholt.net>

drm/i915: Correctly set the write flag for get_user_pages in pread.

Otherwise, the results of our read didn't show up when we were faulting in
the page being read into (as happened with a testcase reading into a big
stack area). Likely accounts for some conformance test failures.

Signed-off-by: Eric Anholt <eric@anholt.net>


# 2bc43b5c 06-Apr-2009 Florian Mickler <florian@mickler.org>

drm/i915: Fix use of uninitialized var in 40a5f0de

i915_gem_put_relocs_to_user returned an uninitialized value which
got returned to userspace. This caused libdrm in my setup to never
get out of a do{}while() loop retrying i915_gem_execbuffer.

result was hanging X, overheating of cpu and 2-3gb of logfile-spam.

This patch adresses the issue by
1. initializing vars in this file where necessary
2. correcting wrongly interpreted return values of copy_[from/to]_user

Signed-off-by: Florian Mickler <florian@mickler.org>
[anholt: cleanups of unnecessary changes, consistency in APIs]
Signed-off-by: Eric Anholt <eric@anholt.net>


# 6911a9b8 02-Apr-2009 Ben Gamari <bgamari@gmail.com>

drm/i915: Implement batch and ring buffer dumping

We create a debugfs node (i915_ringbuffer_data) to expose a hex dump
of the ring buffer itself. We also expose another debugfs node
(i915_ringbuffer_info) with information on the state (i.e. head, tail
addresses) of the ringbuffer.

For batchbuffer dumping, we look at the device's active_list, dumping
each object which has I915_GEM_DOMAIN_COMMAND in its read
domains. This is all exposed through the dri/i915_batchbuffers debugfs
file with a header for each object (giving the objects gtt_offset so
that it can be matched against the offset given in the
BATCH_BUFFER_START command.

Signed-off-by: Ben Gamari <bgamari@gmail.com>
Signed-off-by: Carl Worth <cworth@cworth.org>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 5e118f41 20-Mar-2009 Carl Worth <cworth@cworth.org>

drm/i915: Add a spinlock to protect the active_list

This is a baby-step in the direction of having finer-grained
locking than the struct_mutex. Specifically, this will enable
new debugging code to read the active list for printing out
GPU state when the GPU is wedged, (while the struct_mutex is
held, of course).

Signed-off-by: Carl Worth <cworth@cworth.org>
[anholt: indentation fix]
Signed-off-by: Eric Anholt <eric@anholt.net>


# 959b887c 20-Mar-2009 Jesse Barnes <jbarnes@virtuousgeek.org>

drm/i915: check for -EINVAL from vm_insert_pfn

Indicates something is wrong with the mapping; and apparently triggers
in current kernels.

Signed-off-by: Jesse Barnes <jbarnes@virtuosugeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 8d7773a3 29-Mar-2009 Daniel Vetter <daniel@biene.ffwll.ch>

drm/i915: fix up tiling/fence reg setup on i8xx class hw

This fixes all the tiling problems with the 2d ddx. glxgears still doesn't work.
Changes:

- fix a copy&paste error in i8xx fence reg setup. It resulted in an at most a
512KB offset of the fence reg window, so was only visible sometimes.
- add tests for stride and object size constrains (also for i915 and 1965 class
hw). Userspace seems to have an of-by-one bug there, which changes the fence
size by at most 512KB due to an overflow.
- because i8xx hw is quite old (and therefore not as well-tested) I left 2 debug
WARN_ONs in the i8xx fence reg setup code to hopefully catch any further
overflows in the bit-fields. Lastly there's one small change to make the
alignment checks more consistent.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=20289
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>


# d0088775 28-Mar-2009 Dave Airlie <airlied@redhat.com>

drm/i915: check the return value from the copy from user

This produced a warning on my build, not sure why super-warning-man didn't
notice this one, its much worse than the %z one.

Signed-off-by: Dave Airlie <airlied@redhat.com>


# ad086c83 20-Feb-2009 Owain G. Ainsworth <oga@openbsd.org>

i915/drm: Remove two redundant agp_chipset_flushes

agp_chipset_flush() is for flushing the intel GMCH write cache via the
IFP, these two uses are for when we're getting the object into the cpu
READ domain, and thus should not be needed. This confused me when I was
getting my head around the code.

With thanks to airlied for helping me check my mental picture of how the
flushes and clflushes are supposed to be used.

Signed-off-by: Owain G. Ainsworth <oga@openbsd.org>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 40a5f0de 12-Mar-2009 Eric Anholt <eric@anholt.net>

drm/i915: Fix lock order reversal in GEM relocation entry copying.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Packard <keithp@keithp.com>


# 201361a5 11-Mar-2009 Eric Anholt <eric@anholt.net>

drm/i915: Fix lock order reversal with cliprects and cmdbuf in non-DRI2 paths.

This introduces allocation in the batch submission path that wasn't there
previously, but these are compatibility paths so we care about simplicity
more than performance.

kernel.org bug #12419.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Packard <keithp@keithp.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# eb01459f 10-Mar-2009 Eric Anholt <eric@anholt.net>

drm/i915: Fix lock order reversal in shmem pread path.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 40123c1f 09-Mar-2009 Eric Anholt <eric@anholt.net>

drm/i915: Fix lock order reversal in shmem pwrite path.

Like the GTT pwrite path fix, this uses an optimistic path and a
fallback to get_user_pages. Note that this means we have to stop using
vfs_write and roll it ourselves.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 856fa198 19-Mar-2009 Eric Anholt <eric@anholt.net>

drm/i915: Make GEM object's page lists refcounted instead of get/free.

We've wanted this for a few consumers that touch the pages directly (such as
the following commit), which have been doing the refcounting outside of
get/put pages.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 3de09aa3 09-Mar-2009 Eric Anholt <eric@anholt.net>

drm/i915: Fix lock order reversal in GTT pwrite path.

Since the pagefault path determines that the lock order we use has to be
mmap_sem -> struct_mutex, we can't allow page faults to occur while the
struct_mutex is held. To fix this in pwrite, we first try optimistically to
see if we can copy from user without faulting. If it fails, fall back to
using get_user_pages to pin the user's memory, and map those pages
atomically when copying it to the GPU.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 995e37ca 20-Feb-2009 Owain G. Ainsworth <oga@openbsd.org>

i915/drm: Remove two redundant agp_chipset_flushes

agp_chipset_flush() is for flushing the intel GMCH write cache via the
IFP, these two uses are for when we're getting the object into the cpu
READ domain, and thus should not be needed. This confused me when I was
getting my head around the code.

With thanks to airlied for helping me check my mental picture of how the
flushes and clflushes are supposed to be used.

Signed-off-by: Owain G. Ainsworth <oga@openbsd.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# f77d390c 01-Feb-2009 Benjamin Herrenschmidt <benh@kernel.crashing.org>

drm: Split drm_map and drm_local_map

Once upon a time, the DRM made the distinction between the drm_map
data structure exchanged with user space and the drm_local_map used
in the kernel.

For some reasons, while the BSD port still has that "feature", the
linux part abused drm_map for kernel internal usage as the local
map only existed as a typedef of the struct drm_map.

This patch fixes it by declaring struct drm_local_map separately
(though its content is currently identical to the userspace variant),
and changing the kernel code to only use that, except when it's a
user<->kernel interface (ie. ioctl).

This allows subsequent changes to the in-kernel format

I've also replaced the use of drm_local_map_t with struct drm_local_map
in a couple of places. Mostly by accident but they are the same (the
former is a typedef of the later) and I have some remote plans and
half finished patch to completely kill the drm_local_map_t typedef
so I left those bits in.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>


# dc529a4f 10-Mar-2009 Eric Anholt <eric@anholt.net>

drm/i915: fix 945 fence register writes for fence 8 and above.

The last 8 fence registers sit at a different offset, so when we went to set
fence number 8 in the lower offset, we instead set PGETBL_CTL, and the GPU
got all sorts of angry at us.

fd.o bug #20567. Easily reproducible by running glxgears and killing it about
6 times.

Signed-off-by: Eric Anholt <eric@anholt.net>


# d7619c4b 11-Feb-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Protect active fences on i915

The i915 also uses the fence registers for GPU access to tiled buffers so
we cannot reallocate one whilst it is on the active list. By performing a
LRU scan of the fenced buffers we also avoid waiting the possibility of
waiting on a pinned, or otherwise unusable, buffer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# fc7170ba 11-Feb-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Check to see if we've pinned all available fences

We need to check and report if there are no available fences - or else we
spin endlessly waiting for a buffer to magically unpin itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 22c344e9 11-Feb-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Check fence status on every pin.

As we may steal the fence register of an unpinned buffer for another,
every time we repin the buffer we need to recheck whether it needs to be
allocated a fence.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 9b2412f9 11-Feb-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: First recheck for an empty fence register.

If we wait upon a request and successfully unbind a buffer occupying a
fence register, then that slot will be freed and cause a NULL derefrence
upon rescanning.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>


# 0fce81e3 28-Feb-2009 Kyle McMartin <kyle@redhat.com>

i915: add newline to i915_gem_object_pin failure msg

Prevents formatting nasty as below:

[drm:i915_gem_object_pin] *ERROR* Failure to bind: -12<3>[drm:i915_gem_evict_something] *ERROR* inactive empty 1 request empty 1 flushing empty 1

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>


# b70d11da 03-Mar-2009 Kristian Høgsberg <krh@redhat.com>

drm: Return EINVAL on duplicate objects in execbuffer object list

If userspace passes an object list with the same object appearing more
than once, we end up hitting the BUG_ON() in
i915_gem_object_set_to_gpu_domain() as it gets called a second time
for the same object.

Signed-off-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>


# f1800536 02-Mar-2009 Ingo Molnar <mingo@elte.hu>

x86, mm: dont use non-temporal stores in pagecache accesses

Impact: standardize IO on cached ops

On modern CPUs it is almost always a bad idea to use non-temporal stores,
as the regression in this commit has shown it:

30d697f: x86: fix performance regression in write() syscall

The kernel simply has no good information about whether using non-temporal
stores is a good idea or not - and trying to add heuristics only increases
complexity and inserts fragility.

The regression on cached write()s took very long to be found - over two
years. So dont take any chances and let the hardware decide how it makes
use of its caches.

The only exception is drivers/gpu/drm/i915/i915_gem.c: there were we are
absolutely sure that another entity (the GPU) will pick up the dirty
data immediately and that the CPU will not touch that data before the
GPU will.

Also, keep the _nocache() primitives to make it easier for people to
experiment with these details. There may be more clear-cut cases where
non-cached copies can be used, outside of filemap.c.

Cc: Salman Qazi <sqazi@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 3255aa2e 25-Feb-2009 Ingo Molnar <mingo@elte.hu>

x86, mm: pass in 'total' to __copy_from_user_*nocache()

Impact: cleanup, enable future change

Add a 'total bytes copied' parameter to __copy_from_user_*nocache(),
and update all the callsites.

The parameter is not used yet - architecture code can use it to
more intelligently decide whether the copy should be cached or
non-temporal.

Cc: Salman Qazi <sqazi@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# e08fb4f6 24-Feb-2009 Dave Airlie <airlied@linux.ie>

drm/i915: convert DRM_ERROR to DRM_DEBUG in phys object pwrite path

This snuck in when I wrote phys object support.

Signed-off-by: Dave Airlie <airlied@redhat.com>


# 6c0594a3 23-Feb-2009 Karsten Wiese <fzu@wemgehoertderstaat.de>

Fix an oops in i915_gem_retire_requests()

dev_priv->hw_status_page can be NULL, if i915_gem_retire_requests()
is called from i915_gem_busy_ioctl().

Signed-off-by Karsten Wiese <fzu@wemgehoertderstaat.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# bab2d1f6 20-Feb-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Fix regression in 95ca9d

The object is dereferenced before the NULL check. Oops.

Fixes http://bugs.freedesktop.org/show_bug.cgi?id=20235

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# f21289b3 18-Feb-2009 Eric Anholt <eric@anholt.net>

drm/i915: Retire requests from i915_gem_busy_ioctl.

This ensures that the user gets the latest information from the hardware
on whether the buffer is busy, potentially reducing the working set of objects
that the user chooses.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 5669fcac 17-Feb-2009 Jesse Barnes <jbarnes@virtuousgeek.org>

drm/i915: suspend/resume GEM when KMS is active

In the KMS case, we need to suspend/resume GEM as well. So on suspend, make
sure we idle GEM and stop any new rendering from coming in, and on resume,
re-init the framebuffer and clear the suspended flag.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# efbeed96 19-Feb-2009 Eric Anholt <eric@anholt.net>

drm/i915: Don't let a device flush to prepare buffers clear new write_domains.

The problem was that object_set_to_gpu_domain would set the new write_domains
that are getting set by this batchbuffer, then the accumulated flushes required
for all the objects in preparation for this batchbuffer were posted, and the
brand new write domain would get cleared by the flush being posted. Instead,
hang on to the new (or old if we're not changing it) value and set it after
the flush is queued.

Results from this noticably included conformance test failures from reads
shortly after writes (where the new write domain had been lost and thus not
flushed and waited on), but is a suspected cause of hangs in some apps when
a write domain is lost on a buffer that gets reused for instruction or
commmand state.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 8b0e378a 19-Feb-2009 Eric Anholt <eric@anholt.net>

drm/i915: Cut two args to set_to_gpu_domain that confused this tricky path.

While not strictly required, it helped while thinking about the following
change. This change should be invariant.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# ab00b3e5 11-Feb-2009 Jesse Barnes <jbarnes@virtuousgeek.org>

drm/i915: Keep refs on the object over the lifetime of vmas for GTT mmap.

This fixes potential fault at fault time if the object was unreferenced
while the mapping still existed. Now, while the mmap_offset only lives
for the lifetime of the object, the object also stays alive while a vma
exists that needs it.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 85a7bb98 11-Feb-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Cleanup the hws on ringbuffer constrution failure.

If we fail to create the ringbuffer, then we need to cleanup the allocated
hws.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>


# 3eb2ee77 11-Feb-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Unpin the hws if we fail to kmap.

A missing unpin on the error path.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>


# 47ed185a 11-Feb-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Unpin the ringbuffer if we fail to ioremap it.

A missing unpin on the error path.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>


# 491152b8 11-Feb-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: unpin for an invalid memory domain.

A missing unreference and unpin after rejecting the relocation for an
invalid memory domain.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>


# 13af1062 11-Feb-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Release and unlock on mmap_gtt error path.

We failed to unlock the mutex after failing to create the mmap offset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>


# a35f2e2b 06-Feb-2009 Roland Dreier <rdreier@cisco.com>

drm/i915: Fix potential AB-BA deadlock in i915_gem_execbuffer()

Lockdep warns that i915_gem_execbuffer() can trigger a page fault (which
takes mmap_sem) while holding dev->struct_mutex, while drm_vm_open()
(which is called with mmap_sem already held) takes dev->struct_mutex.
So this is a potential AB-BA deadlock.

The way that i915_gem_execbuffer() triggers a page fault is by doing
copy_to_user() when returning new buffer offsets back to userspace;
however there is no reason to hold the struct_mutex when doing this
copy, since what is being copied is the contents of an array private to
i915_gem_execbuffer() anyway. So we can fix the potential deadlock (and
get rid of the lockdep warning) by simply moving the copy_to_user()
outside of where struct_mutex is held.

This fixes <http://bugzilla.kernel.org/show_bug.cgi?id=12491>.

Reported-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>


# 96dec61d 08-Feb-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: refleak along pin() error path.

A missing unreference if the user calls pin() a second time on a pinned
buffer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>


# a198bc80 06-Feb-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Cleanup trivial leak on execbuffer error path.

Also spotted by Owain Ainsworth.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>


# f06da264 09-Feb-2009 Linus Torvalds <torvalds@linux-foundation.org>

i915: Fix more size_t format string warnings

The DRI people seem to have a hard time getting these right (see also
commit aeb565dfc3ac4c8b47c5049085b4c7bfb2c7d5d7).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7d8d58b2 04-Feb-2009 Chris Wilson <chris@chris-wilson.co.uk>

drm/i915: Unlock mutex on i915_gem_fault() error path

If we failed to allocate a new fence register we would return
VM_FAULT_SIGBUS without relinquishing the lock.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>


# 0f973f27 26-Jan-2009 Jesse Barnes <jbarnes@virtuousgeek.org>

drm/i915: add fence register management to execbuf

Adds code to set up fence registers at execbuf time on pre-965 chips as
necessary. Also fixes up a few bugs in the pre-965 tile register support
(get_order != ffs). The number of fences available to the kernel defaults
to the hw limit minus 3 (for legacy X front/back/depth), but a new parameter
allows userspace to override that as needed.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>


# d9ddcb96 27-Jan-2009 Eric Anholt <eric@anholt.net>

drm/i915: Return error from i915_gem_object_get_fence_reg() when failing.

Previously, the caller would continue along without knowing that the
function failed, resulting in potential mis-rendering. Right now vm_fault
just returns SIGBUS in that case, and we may need to disable signal handling
to avoid that happening.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>


# ab657db1 23-Jan-2009 Eric Anholt <eric@anholt.net>

drm/i915: Set up an MTRR covering the GTT at driver load.

We'd love to just be using PAT, but even on chips with PAT it gets disabled
sometimes due to an errata. It would probably be better to have pat_enabled
exported and only bother with this when !pat_enabled.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>


# e806b495 22-Jan-2009 Eric Anholt <eric@anholt.net>

drm/i915: Suppress GEM teardown on X Server exit in KMS mode.

Fixes hangs when starting X for the second time.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>


# aeb565df 26-Jan-2009 Linus Torvalds <torvalds@linux-foundation.org>

Fix annoying DRM_ERROR() string warning

Use '%zu' to print out a size_t variable, not '%d'. Another case of the
"let's keep at least Linus' defconfig compile warningless" rule.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 260883c8 22-Jan-2009 Dave Airlie <airlied@redhat.com>

i915: fix freeing path for gem phys objects.

This off-by-one was pointed out by Jesse Barnes.

Signed-off-by: Dave Airlie <airlied@redhat.com>


# 71acb5eb 30-Dec-2008 Dave Airlie <airlied@linux.ie>

drm/i915: add support for physical memory objects

This is an initial patch to do support for objects which needs physical
contiguous main ram, cursors and overlay registers on older chipsets.

These objects are bound on cursor bin, like pinning, and we copy
the data to/from the backing store object into the real one on attach/detach.

notes:
possible over the top in attach/detach operations.
no overlay support yet.

Signed-off-by: Dave Airlie <airlied@redhat.com>


# 9b4778f6 07-Jan-2009 Harvey Harrison <harvey.harrison@gmail.com>

trivial: replace last usages of __FUNCTION__ in kernel

__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9bb2d6f9 23-Dec-2008 Eric Anholt <eric@anholt.net>

drm/i915: Don't allow objects to get bound while VT switched.

This avoids a BUG_ON in the enter_vt path due to objects being in the GTT
when we shouldn't have ever let them be (as we're not supposed to touch the
device during that time).

This was triggered by a change in the 2D driver to use the GTT mapping of
objects after pinning them to improve software fallback performance.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>


# aad87dff 21-Dec-2008 Julia Lawall <julia@diku.dk>

drm/i915: Remove redundant test in error path.

The error path for object list being null is in the second goto target.

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>


# f1acec93 19-Dec-2008 Eric Anholt <eric@anholt.net>

drm/i915: Don't print to dmesg when taking signal during object_pin.

This showed up in logs where people had a hung chip, so pinning was blocked
on the chip unpinning other buffers, and the X Server took its scheduler
signal during that time.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>


# b1177636 10-Dec-2008 Eric Anholt <eric@anholt.net>

drm/i915: Don't double-unpin buffers if we take a signal in evict_everything().

We haven't seen this in practice, but it was visible when looking at a bug
report from when i915_gem_evict_everything() was broken and would always
return error.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>


# 79e53945 07-Nov-2008 Jesse Barnes <jbarnes@virtuousgeek.org>

DRM: i915: add mode setting support

This commit adds i915 driver support for the DRM mode setting APIs.
Currently, VGA, LVDS, SDVO DVI & VGA, TV and DVO LVDS outputs are
supported. HDMI, DisplayPort and additional SDVO output support will
follow.

Support for the mode setting code is controlled by the new 'modeset'
module option. A new config option, CONFIG_DRM_I915_KMS controls the
default behavior, and whether a PCI ID list is built into the module for
use by user level module utilities.

Note that if mode setting is enabled, user level drivers that access
display registers directly or that don't use the kernel graphics memory
manager will likely corrupt kernel graphics memory, disrupt output
configuration (possibly leading to hangs and/or blank displays), and
prevent panic/oops messages from appearing. So use caution when
enabling this code; be sure your user level code supports the new
interfaces.

A new SysRq key, 'g', provides emergency support for switching back to
the kernel's framebuffer console; which is useful for testing.

Co-authors: Dave Airlie <airlied@linux.ie>, Hong Liu <hong.liu@intel.com>

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# de151cf6 12-Nov-2008 Jesse Barnes <jbarnes@virtuousgeek.org>

drm/i915: add GEM GTT mapping support

Use the new core GEM object mapping code to allow GTT mapping of GEM
objects on i915. The fault handler will make sure a fence register is
allocated too, if the object in question is tiled.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# c4de0a5d 14-Dec-2008 Eric Anholt <eric@anholt.net>

drm/i915: Don't return busy for buffers left on the flushing list.

These buffers don't have active rendering still occurring to them, they just
need either a flush to be emitted or a retire_requests to occur so that we
notice they're done. Return unbusy so that one of the two occurs. The two
expected consumers of this interface (OpenGL and libdrm_intel BO cache) both
want this behavior.

Signed-off-by: Eric Anholt <eric@anholt.net>
Acked-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 15c35334 06-Dec-2008 Owain Ainsworth <zerooa@googlemail.com>

drm/i915: Don't return error in evict_everything when we get to the end.

Returning -ENOMEM errored all the way out of execbuf, so the rendering never
occurred.

Signed-off-by: Dave Airlie <airlied@redhat.com>


# 02354392 26-Nov-2008 Eric Anholt <eric@anholt.net>

drm/i915: Return error in i915_gem_set_to_gtt_domain if we're not in the GTT.

It's only for flushing caches appropriately for GTT access, not for actually
getting it there. Prevents potential smashing of cpu read/write domains on
unbound objects.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# ac94a962 21-Nov-2008 Keith Packard <keithp@keithp.com>

drm/i915: Retry execbuffer pinning after clearing the GTT

If we fail to pin all of the buffers in an execbuffer request, go through
and clear the GTT and try again to see if its just a matter of fragmentation

Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 646f0f6e 21-Nov-2008 Keith Packard <keithp@keithp.com>

drm/i915: Move the execbuffer domain computations together

This eliminates the dev_set_domain function and just in-lines it
where its used, with the goal of moving the manipulation and use of
invalidate_domains and flush_domains closer together. This also
avoids calling add_request unless some domain has been flushed.

Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# c0d90829 21-Nov-2008 Keith Packard <keithp@keithp.com>

drm/i915: Rename object_set_domain to object_set_to_gpu_domain

Now that the CPU and GTT domain operations are isolated to their own
functions, the previously general-purpose set_domain function is now used
only to set GPU domains. It also has no failure cases, which is important as
this eliminates any possible interruption of the computation of new object
domains and subsequent emmission of the flushing instructions into the ring.

Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# e47c68e9 14-Nov-2008 Eric Anholt <eric@anholt.net>

drm/i915: Make a single set-to-cpu-domain path and use it wherever needed.

This fixes several domain management bugs, including potential lack of cache
invalidation for pread, potential failure to wait for set_domain(CPU, 0),
and more, along with producing more intelligible code.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 2ef7eeaa 10-Nov-2008 Eric Anholt <eric@anholt.net>

drm/i915: Make a single set-to-gtt-domain path.

This fixes failure to flush caches in the relocation update path, and
failure to wait in the set_domain ioctl, each of which could lead to incorrect
rendering.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# b670d815 14-Nov-2008 Eric Anholt <eric@anholt.net>

drm/i915: If interrupted while setting object domains, still emit the flush.

Otherwise, we would leave the objects in an inconsistent state, such as
write_domain == 0 but on the flushing list.

Signed-off-by: Dave Airlie <airlied@redhat.com>


# ce44b0ea 06-Nov-2008 Eric Anholt <eric@anholt.net>

drm/i915: Move flushing list cleanup from flush request retire to request emit.

obj_priv->write_domain is "write domain if the GPU went idle now", not
"write domain at this moment." By postponing the clear, we confused the
concept, required more storage, and potentially emitted more flushes than
are required.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 151903d5 30-Nov-2008 Eric Anholt <eric@anholt.net>

drm/i915: Fix copy'n'pasteo that broke VT switch if flushing was non-empty.

Introduced in the "Avoid BUG_ONs on VT switch" commit.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 6133047a 21-Nov-2008 Keith Packard <keithp@keithp.com>

drm/i915: execbuffer pins objects, no need to ensure they're still in the GTT

Before we had the notion of pinning objects, we had a kludge around to make
sure all of the objects were still resident in the GTT before we committed
to executing a batch buffer. We don't need this any longer, and it sticks an
error return in the middle of object domain computations that must be
associated with a subsequent flush/invalidate emmission into the ring.

Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 2678d9d6 20-Nov-2008 Keith Packard <keithp@keithp.com>

drm/i915: Subtract total pinned bytes from available aperture size

The old code was wandering through the active list looking for pinned
buffers; there may be other pinned buffers around. Fortunately, we keep a
count of the total amount of pinned memory and can use that instead.

Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 28dfe52a 13-Nov-2008 Eric Anholt <eric@anholt.net>

drm/i915: Avoid BUG_ONs on VT switch with a wedged chipset.

Instead, just warn that bad things are happening and do our best to clean up
the mess without the GPU's help.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 6a47baa6 03-Nov-2008 Owen Taylor <otaylor@redhat.com>

i915: Don't attempt to short-circuit object_wait_rendering by checking domains.

This could return early when reading after writing a buffer, if somebody
had already put it on the flushing list (write domains are 0, but still
active), leading to glReadPixels failure.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@linux.ie>


# 5a125c3c 22-Oct-2008 Eric Anholt <eric@anholt.net>

i915: Add GEM ioctl to get available aperture size.

This will let userland know when to submit its batchbuffers, before they get
too big to fit in the aperture.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 0839ccb8 30-Oct-2008 Keith Packard <keithp@keithp.com>

i915: use io-mapping interfaces instead of a variety of mapping kludges

Impact: optimize/clean-up the IO mapping implementation of the i915 DRM driver

Switch the i915 device aperture mapping to the io-mapping interface, taking
advantage of the cleaner API to extend it across all of the mapping uses,
including both pwrite and relocation updates.

This dramatically improves performance on 64-bit kernels which were using
the same slow path as 32-bit non-HIGHMEM kernels prior to this patch.

Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 9e44af79 16-Oct-2008 Keith Packard <keithp@keithp.com>

drm/i915: hold dev->struct_mutex and DRM lock during vblank ring operations

To synchronize clip lists with the X server, the DRM lock must be held while
looking at drawable clip lists. To synchronize with other ring access, the
ring mutex must be held while inserting commands into the ring. Failure to
do the first resulted in easy visual corruption when moving windows, and the
second could have corrupted the ring with DRI2.

Grabbing the DRM lock involves using the DRM tasklet mechanism, grabbing the
ring mutex means potentially sleeping. Deal with both of these by always
running the tasklet from a work handler.

Also, protect from clip list changes since the vblank request was queued by
making sure the window has at least one rectangle while looking inside,
preventing oopses .

Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# e8848a17 20-Oct-2008 Thomas Gleixner <tglx@linutronix.de>

fix CONFIG_HIGHMEM compile error in drivers/gpu/drm/i915/i915_gem.c

commit 9b7530cc329eb036cfa589930c270e85031f554c ("i915: cleanup coding
horrors in i915_gem_gtt_pwrite()")

broke the i386 build for CONFIG_HIGHMEM=y.

Caught by automatic testing http://www.tglx.de/autoqa-logs/000137-0006-0001.log

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ My bad. It's the same patch I sent out earlier, nobody noticed then either.. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9b7530cc 20-Oct-2008 Linus Torvalds <torvalds@linux-foundation.org>

i915: cleanup coding horrors in i915_gem_gtt_pwrite()

Yes, this will probably be switched over to a cleaner model anyway, but
in the meantime I don't want to see the 'unused variable' warnings that
come from the disgusting #ifdef code. Make the special case be a nice
inlien function of its own, clean up the code, and make the warning go
away.

I wish people didn't write code that gets (valid) warnings from the
compiler, but I'll limit my fixes to code that I actually care about (in
this case just because I see the warning and it annoys me).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6dbe2772 14-Oct-2008 Keith Packard <keithp@keithp.com>

i915: Don't run retire work handler while suspended

At leavevt and lastclose time, cancel any pending retire work handler
invocation, and keep the retire work handler from requeuing itself if it is
currently running.

This patch restructures i915_gem_idle to perform all of these tasks instead
of having both leavevt and lastclose call a sequence of functions.

Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# ba1eb1d8 14-Oct-2008 Keith Packard <keithp@keithp.com>

i915: Map status page cached for chips with GTT-based HWS location.

This should improve performance by avoiding uncached reads by the CPU (the
point of having a status page), and may improve stability. This patch only
affects G33, GM45 and G45 chips as those are the only ones using GTT-based
HWS mappings.

Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 50aa253d 14-Oct-2008 Keith Packard <keithp@keithp.com>

i915: Fix up ring initialization to cover G45 oddities

G45 appears quite sensitive to ring initialization register writes,
sometimes leaving the HEAD register with the START register contents. Check
to make sure HEAD is reset correctly when START is written, and fix it up,
screaming loudly.

Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# e7d22bc3 06-Oct-2008 Dave Airlie <airlied@redhat.com>

i915: add missing return in error path.

Signed-off-by: Dave Airlie <airlied@redhat.com>


# 3043c60c 02-Oct-2008 Eric Anholt <eric@anholt.net>

drm: Clean up many sparse warnings in i915.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# bd88ee4c 23-Sep-2008 Eric Anholt <eric@anholt.net>

drm: Use ioremap_wc in i915_driver instead of ioremap, since we always want WC.

Fixes failure to map the ringbuffer when PAT tells us we don't get to do
uncached on something that's already mapped WC, or something along those lines.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 4f481ed2 10-Sep-2008 Eric Anholt <eric@anholt.net>

drm: Avoid oops in GEM execbuffers with bad arguments.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# dbb19d30 20-Aug-2008 Kristian Høgsberg <krh@redhat.com>

i915 gem: install and uninstall irq handler in entervt and leavevt ioctls.

Signed-off-by: Kristian Høgsberg <krh@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 546b0974 01-Sep-2008 Eric Anholt <eric@anholt.net>

i915: Use struct_mutex to protect ring in GEM mode.

In the conversion for GEM, we had stopped using the hardware lock to protect
ring usage, since it was all internal to the DRM now. However, some paths
weren't converted to using struct_mutex to prevent multiple threads from
concurrently working on the ring, in particular between the vblank swap handler
and ioctls.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>


# 673a394b 30-Jul-2008 Eric Anholt <eric@anholt.net>

drm: Add GEM ("graphics execution manager") to i915 driver.

GEM allows the creation of persistent buffer objects accessible by the
graphics device through new ioctls for managing execution of commands on the
device. The userland API is almost entirely driver-specific to ensure that
any driver building on this model can easily map the interface to individual
driver requirements.

GEM is used by the 2d driver for managing its internal state allocations and
will be used for pixmap storage to reduce memory consumption and enable
zero-copy GLX_EXT_texture_from_pixmap, and in the 3d driver is used to enable
GL_EXT_framebuffer_object and GL_ARB_pixel_buffer_object.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>