History log of /linux-master/drivers/gpu/host1x/job.c
Revision Date Author Comments
# c24973ed 19-Jan-2023 Mikko Perttunen <mperttunen@nvidia.com>

gpu: host1x: Implement job tracking using DMA fences

In anticipation of removal of the intr API, implement job tracking
using DMA fences instead. The main two things about this are
making cdma_update schedule the work since fence completion can
now be called from interrupt context, and some complication in
ensuring the callback is not running when we free the fence.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 3e9c4584 24-Mar-2022 Thierry Reding <treding@nvidia.com>

gpu: host1x: Do not use mapping cache for job submissions

Buffer mappings used in job submissions are usually small and not
rapidly reused as opposed to framebuffers (which are usually large and
rapidly reused, for example when page-flipping between double-buffered
framebuffers). Avoid going through the mapping cache for these buffers
since the cache would also lead to leaks if nobody is ever releasing
the cache's last reference. For DRM/KMS these last references are
dropped when the framebuffers are removed and therefore no longer
needed.

While at it, also add a note about the need to explicitly remove the
final reference to the mapping in the cache.

Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 1f39b1df 07-Feb-2020 Thierry Reding <treding@nvidia.com>

drm/tegra: Implement buffer object cache

This cache is used to avoid mapping and unmapping buffer objects
unnecessarily. Mappings are cached per client and stay hot until
the buffer object is destroyed.

Signed-off-by: Thierry Reding <treding@nvidia.com>


# c6aeaf56 09-Sep-2021 Thierry Reding <treding@nvidia.com>

drm/tegra: Implement correct DMA-BUF semantics

DMA-BUF requires that each device that accesses a DMA-BUF attaches to it
separately. To do so the host1x_bo_pin() and host1x_bo_unpin() functions
need to be reimplemented so that they can return a mapping, which either
represents an attachment or a map of the driver's own GEM object.

Signed-off-by: Thierry Reding <treding@nvidia.com>


# 0fddaa85 10-Jun-2021 Mikko Perttunen <mperttunen@nvidia.com>

gpu: host1x: Add option to skip firewall for a job

The new UAPI will have its own firewall, and we don't want to run
the firewall in the Host1x driver for those jobs. As such, add a
parameter to host1x_job_alloc to specify if we want to skip the
firewall in the Host1x driver.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# e902585f 10-Jun-2021 Mikko Perttunen <mperttunen@nvidia.com>

gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer

Add support for inserting syncpoint waits in the CDMA pushbuffer.
These waits need to be done in HOST1X class, while gather submitted
by the application execute in engine class.

Support is added by converting the gather list of job into a command
list that can include both gathers and waits. When the job is
submitted, these commands are pushed as the appropriate opcodes
on the CDMA pushbuffer.

Also supported are waits relative to the start of the job,
which are useful for jobs doing multiple things with an engine
that doesn't natively support pipelining.

While at it, use 32-bit waits on chips that support them.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 17a298e9 10-Jun-2021 Mikko Perttunen <mperttunen@nvidia.com>

gpu: host1x: Add job release callback

Add a callback field to the job structure, to be called just before
the job is to be freed. This allows the job's submitter to clean
up any of its own state, like decrement runtime PM refcounts.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# c78f837a 10-Jun-2021 Mikko Perttunen <mperttunen@nvidia.com>

gpu: host1x: Add no-recovery mode

Add a new property for jobs to enable or disable recovery i.e.
CPU increments of syncpoints to max value on job timeout. This
allows for a more solid model for hanged jobs, where userspace
doesn't need to guess if a syncpoint increment happened because
the job completed, or because job timeout was triggered.

On job timeout, we stop the channel, NOP all future jobs on the
channel using the same syncpoint, mark the syncpoint as locked
and resume the channel from the next job, if any.

The future jobs are NOPed, since because we don't do the CPU
increments, the value of the syncpoint is no longer synchronized,
and any waiters would become confused if a future job incremented
the syncpoint. The syncpoint is marked locked to ensure that any
future jobs cannot increment the syncpoint either, until the
application has recognized the situation and reallocated the
syncpoint.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 2aed4f5a 29-Mar-2021 Mikko Perttunen <mperttunen@nvidia.com>

gpu: host1x: Cleanup and refcounting for syncpoints

Add reference counting for allocated syncpoints to allow keeping
them allocated while jobs are referencing them. Additionally,
clean up various places using syncpoint IDs to use host1x_syncpt
pointers instead.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 67ed9f9d 28-Apr-2020 Marek Szyprowski <m.szyprowski@samsung.com>

drm: host1x: fix common struct sg_table related issues

The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function
returns the number of the created entries in the DMA address space.
However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and
dma_unmap_sg must be called with the original number of the entries
passed to the dma_map_sg().

struct sg_table is a common structure used for describing a non-contiguous
memory buffer, used commonly in the DRM and graphics subsystems. It
consists of a scatterlist with memory pages and DMA addresses (sgl entry),
as well as the number of scatterlist entries: CPU pages (orig_nents entry)
and DMA mapped pages (nents entry).

It turned out that it was a common mistake to misuse nents and orig_nents
entries, calling DMA-mapping functions with a wrong number of entries or
ignoring the number of mapped entries returned by the dma_map_sg()
function.

To avoid such issues, lets use a common dma-mapping wrappers operating
directly on the struct sg_table objects and use scatterlist page
iterators where possible. This, almost always, hides references to the
nents and orig_nents entries, making the code robust, easier to follow
and copy/paste safe.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>


# fd323e9e 28-Jun-2020 Dmitry Osipenko <digetx@gmail.com>

gpu: host1x: Put gather's BO on pinning error

This patch fixes gather's BO refcounting on a pinning error. Gather's BO
won't be leaked now if something goes wrong.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 26c8de5e 28-Jun-2020 Dmitry Osipenko <digetx@gmail.com>

gpu: host1x: Optimize BOs usage when firewall is enabled

We don't need to hold and pin original BOs of the gathers in a case of
enabled firewall because in this case gather's content is copied and the
copy is used by the executed job.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 98ae41ad 04-Feb-2020 Thierry Reding <treding@nvidia.com>

gpu: host1x: Set DMA direction only for DMA-mapped buffer objects

The DMA direction is only used by the DMA API, so there is no use in
setting it when a buffer object isn't mapped with the DMA API.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>


# 273da5a0 04-Feb-2020 Thierry Reding <treding@nvidia.com>

drm/tegra: Reuse IOVA mapping where possible

This partially reverts the DMA API support that was recently merged
because it was causing performance regressions on older Tegra devices.
Unfortunately, the cache maintenance performed by dma_map_sg() and
dma_unmap_sg() causes performance to drop by a factor of 10.

The right solution for this would be to cache mappings for buffers per
consumer device, but that's a bit involved. Instead, we simply revert to
the old behaviour of sharing IOVA mappings when we know that devices can
do so (i.e. they share the same IOMMU domain).

Cc: <stable@vger.kernel.org> # v5.5
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>


# 7a8139c5 18-Nov-2019 Daniel Vetter <daniel.vetter@ffwll.ch>

drm/tegra: Map cmdbuf once for reloc processing

A few reasons to drop kmap:

- For native objects all we do is look at obj->vaddr anyway, so might
as well not call functions for every page.

- Reloc-processing on dma-buf is ... questionable.

- Plus most dma-buf that bother kernel cpu mmaps give you at least
vmap, much less kmaps. And all the ones relevant for arm-soc are
again doing a obj->vaddr game anyway, there's no real kmap going on
on arm it seems.

Plus this seems to be the only real in-tree user of dma_buf_kmap, and
I'd like to get rid of that.

Reviewed-by: Thierry Reding <treding@nvidia.com>
Tested-by: Thierry Reding <treding@nvidia.com>
Acked-by: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: linux-tegra@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20191118103536.17675-2-daniel.vetter@ffwll.ch


# af1cbfb9 28-Oct-2019 Thierry Reding <treding@nvidia.com>

gpu: host1x: Support DMA mapping of buffers

If host1x_bo_pin() returns an SG table, create a DMA mapping for the
buffer. For buffers that the host1x client has already mapped itself,
host1x_bo_pin() returns NULL and the existing DMA address is used.

Signed-off-by: Thierry Reding <treding@nvidia.com>


# b78e70c0 28-Oct-2019 Thierry Reding <treding@nvidia.com>

gpu: host1x: Allocate gather copy for host1x

Currently when the gather buffers are copied, they are copied to a
buffer that is allocated for the host1x client that wants to execute the
command streams in the buffers. However, the gather buffers will be read
by the host1x device, which causes SMMU faults if the DMA API is backed
by an IOMMU.

Fix this by allocating the gather buffer copy for the host1x device,
which makes sure that it will be mapped into the host1x's IOVA space if
the DMA API is backed by an IOMMU.

Signed-off-by: Thierry Reding <treding@nvidia.com>


# 80327ce3 28-Oct-2019 Thierry Reding <treding@nvidia.com>

gpu: host1x: Overhaul host1x_bo_{pin,unpin}() API

The host1x_bo_pin() and host1x_bo_unpin() APIs are used to pin and unpin
buffers during host1x job submission. Pinning currently returns the SG
table and the DMA address (an IOVA if an IOMMU is used or a physical
address if no IOMMU is used) of the buffer. The DMA address is only used
for buffers that are relocated, whereas the host1x driver will map
gather buffers into its own IOVA space so that they can be processed by
the CDMA engine.

This approach has a couple of issues. On one hand it's not very useful
to return a DMA address for the buffer if host1x doesn't need it. On the
other hand, returning the SG table of the buffer is suboptimal because a
single SG table cannot be shared for multiple mappings, because the DMA
address is stored within the SG table, and the DMA address may be
different for different devices.

Subsequent patches will move the host1x driver over to the DMA API which
doesn't work with a single shared SG table. Fix this by returning a new
SG table each time a buffer is pinned. This allows the buffer to be
referenced by multiple jobs for different engines.

Change the prototypes of host1x_bo_pin() and host1x_bo_unpin() to take a
struct device *, specifying the device for which the buffer should be
pinned. This is required in order to be able to properly construct the
SG table. While at it, make host1x_bo_pin() return the SG table because
that allows us to return an ERR_PTR()-encoded error code if we need to,
or return NULL to signal that we don't need the SG table to be remapped
and can simply use the DMA address as-is. At the same time, returning
the DMA address is made optional because in the example of command
buffers, host1x doesn't need to know the DMA address since it will have
to create its own mapping anyway.

Signed-off-by: Thierry Reding <treding@nvidia.com>


# 9952f691 28-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 201

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation this program
is distributed in the hope it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not see http www gnu org
licenses

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 228 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190528171438.107155473@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# ec589232 06-Jul-2018 Dmitry Osipenko <digetx@gmail.com>

gpu: host1x: Check whether size of unpin isn't 0

Only gather pins are mapped by the Host1x driver, regular BO relocations
are not. Check whether size of unpin isn't 0, otherwise IOVA allocation at
0x0 could be erroneously released.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 326bbd79 16-May-2018 Thierry Reding <treding@nvidia.com>

gpu: host1x: Use not explicitly sized types

The number of words and the offset in a gather don't need to be
explicitly sized, so make them unsigned int instead.

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 06490bb9 16-May-2018 Thierry Reding <treding@nvidia.com>

gpu: host1x: Rename relocarray -> relocs for consistency

All other array variables use a plural, and this is the only one using
the *array suffix. This is confusing, so rename it for consistency.

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# d4ad3ad9 23-Mar-2018 Thierry Reding <treding@nvidia.com>

gpu: host1x: Cleanup loop variable usage

Use unsigned int where possible and don't unnecessarily initialize the
loop variable.

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 24c94e16 05-May-2018 Thierry Reding <treding@nvidia.com>

gpu: host1x: Remove wait check support

The job submission userspace ABI doesn't support this and there are no
plans to implement it, so all of this code is dead and can be removed.

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 18b3f5ac 01-Aug-2017 Mikko Perttunen <mperttunen@nvidia.com>

gpu: host1x: Don't fail on NULL bo physical address

Pinning a Host1x BO currently cannot fail and zero is a valid address
for a BO when IOMMU is enabled. To avoid false errors remove checks
for NULL BO physical addresses.

Fixes: 404bfb78daf3 ("gpu: host1x: Add IOMMU support")
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 43240bbd 14-Jun-2017 Dmitry Osipenko <digetx@gmail.com>

gpu: host1x: At first try a non-blocking allocation for the gather copy

The blocking gather copy allocation is a major performance downside of the
Host1x firewall, it may take hundreds milliseconds which is unacceptable
for the real-time graphics operations. Let's try a non-blocking allocation
first as a least invasive solution, it makes opentegra (Xorg driver)
performance indistinguishable with/without the firewall.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# a47ac10e 14-Jun-2017 Dmitry Osipenko <digetx@gmail.com>

gpu: host1x: Check waits in the firewall

Check waits in the firewall in a way it is done for relocations.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 0f563a4b 14-Jun-2017 Dmitry Osipenko <digetx@gmail.com>

gpu: host1x: Forbid unrelated SETCLASS opcode in the firewall

Several channels could be made to write the same unit concurrently via
the SETCLASS opcode, trusting userspace is a bad idea. It should be
possible to drop the per-client channel reservation and add a per-unit
locking by inserting MLOCK's to the command stream to re-allow the
SETCLASS opcode, but it will be much more work. Let's forbid the
unit-unrelated class changes for now.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# ef816249 14-Jun-2017 Dmitry Osipenko <digetx@gmail.com>

gpu: host1x: Forbid RESTART opcode in the firewall

The RESTART opcode terminates the gather and restarts the CDMA fetching
from a specified word << 2 relative to the CDMA start address. That
shouldn't be allowed to be done by userspace.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 571cbf70 14-Jun-2017 Dmitry Osipenko <digetx@gmail.com>

gpu: host1x: Forbid relocation address shifting in the firewall

Incorrectly shifted relocation address will cause a lower memory
corruption and likely a hang on a write or a read of an arbitrary data
in case of IOMMU absence. As of now, there is no known use for the
address shifting and adding a proper shifts / sizes validation is a much
more work. Let's forbid shifts in the firewall till a proper validation
is implemented.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 47f89c10 14-Jun-2017 Dmitry Osipenko <digetx@gmail.com>

gpu: host1x: Do not leak BO's phys address to userspace

Perform gathers coping before patching them, so that original gathers are
left untouched. That's not as bad as leaking kernel addresses, but still
doesn't feel right.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# e5855aa3 14-Jun-2017 Dmitry Osipenko <digetx@gmail.com>

gpu: host1x: Correct host1x_job_pin() error handling

In case of relocations / waitchecks patching failure the jobs pins stay
referenced till DRM file get closed, wasting memory. Add the missed
unpinning.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 3833d16f 14-Jun-2017 Dmitry Osipenko <digetx@gmail.com>

gpu: host1x: Initialize firewall class to the job's one

The commands stream is prepended by the jobs class on the CDMA
submission, so that explicitly setting a module class in the commands
stream isn't necessary. The firewall initializes its class to 0 and the
command stream that doesn't explicitly specify the class effectively
bypasses the firewall.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 404bfb78 14-Dec-2016 Mikko Perttunen <mperttunen@nvidia.com>

gpu: host1x: Add IOMMU support

Add support for the Host1x unit to be located behind
an IOMMU. This is required when gather buffers may be
allocated non-contiguously in physical memory, as can
be the case when TegraDRM is also using the IOMMU.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# f08ef2d1 08-Nov-2016 Arto Merilainen <amerilainen@nvidia.com>

gpu: host1x: Store device address to all bufs

Currently job pinning is optimized to handle only the first buffer
using a certain host1x_bo object and all subsequent buffers using
the same host1x_bo are considered done.

In most cases this is correct, however, in case the same host1x_bo
is used in multiple gathers inside the same job, we skip also
storing the device address (physical or iova) to this buffer.

This patch reworks the host1x_job_pin() to store the device address
to all gathers.

Signed-off-by: Andrew Chew <achew@nvidia.com>
Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 0b8070d1 23-Jun-2016 Thierry Reding <treding@nvidia.com>

gpu: host1x: Whitespace cleanup for readability

Insert a number of blank lines in places where they increase readability
of the code. Also collapse various variable declarations to shorten some
functions and finally rewrite some code for readability.

Signed-off-by: Thierry Reding <treding@nvidia.com>


# 6df633d0 23-Jun-2016 Thierry Reding <treding@nvidia.com>

gpu: host1x: Fix a couple of checkpatch warnings

Fix a couple of occurrences where no blank line was used to separate
variable declarations from code or where block comments were wrongly
formatted.

Signed-off-by: Thierry Reding <treding@nvidia.com>


# 5c0d8d38 23-Jun-2016 Thierry Reding <treding@nvidia.com>

gpu: host1x: Use unsigned int consistently for IDs

IDs can never be negative so use unsigned int. In some instances an
explicitly sized type (such as u32) was used for no particular reason,
so turn those into unsigned int as well for consistency.

Signed-off-by: Thierry Reding <treding@nvidia.com>


# 341917fe 18-Dec-2015 Markus Elfring <elfring@users.sourceforge.net>

gpu: host1x: Use a signed return type for do_relocs()

The return type "unsigned int" was used by the do_relocs() function
despite the fact that it will eventually return a negative error code.
Use a signed integer instead to accomodate for error codes.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# f6e45661 22-Jan-2016 Luis R. Rodriguez <mcgrof@suse.com>

dma, mm/pat: Rename dma_*_writecombine() to dma_*_wc()

Rename dma_*_writecombine() to dma_*_wc(), so that the naming
is coherent across the various write-combining APIs. Keep the
old names for compatibility for a while, these can be removed
at a later time. A guard is left to enable backporting of the
rename, and later remove of the old mapping defines seemlessly.

Build tested successfully with allmodconfig.

The following Coccinelle SmPL patch was used for this simple
transformation:

@ rename_dma_alloc_writecombine @
expression dev, size, dma_addr, gfp;
@@

-dma_alloc_writecombine(dev, size, dma_addr, gfp)
+dma_alloc_wc(dev, size, dma_addr, gfp)

@ rename_dma_free_writecombine @
expression dev, size, cpu_addr, dma_addr;
@@

-dma_free_writecombine(dev, size, cpu_addr, dma_addr)
+dma_free_wc(dev, size, cpu_addr, dma_addr)

@ rename_dma_mmap_writecombine @
expression dev, vma, cpu_addr, dma_addr, size;
@@

-dma_mmap_writecombine(dev, vma, cpu_addr, dma_addr, size)
+dma_mmap_wc(dev, vma, cpu_addr, dma_addr, size)

We also keep the old names as compatibility helpers, and
guard against their definition to make backporting easier.

Generated-by: Coccinelle SmPL
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bhelgaas@google.com
Cc: bp@suse.de
Cc: dan.j.williams@intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dhowells@redhat.com
Cc: julia.lawall@lip6.fr
Cc: konrad.wilk@oracle.com
Cc: linux-fbdev@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Cc: luto@amacapital.net
Cc: mst@redhat.com
Cc: tomi.valkeinen@ti.com
Cc: toshi.kani@hp.com
Cc: vinod.koul@intel.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1453516462-4844-1-git-send-email-mcgrof@do-not-panic.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 961e3bea 10-Jun-2014 Thierry Reding <treding@nvidia.com>

drm/tegra: Make job submission 64-bit safe

Job submission currently relies on the fact that struct drm_tegra_reloc
and struct host1x_reloc are the same size and uses a simple call to the
copy_from_user() function to copy them to kernel space. This causes the
handle to be stored in the buffer object field, which then needs a cast
to a 32 bit integer to resolve it to a proper buffer object pointer and
store it back in the buffer object field.

On 64-bit architectures that will no longer work, since pointers are 64
bits wide whereas handles will remain 32 bits. This causes the sizes of
both structures to because different and copying will no longer work.

Fix this by adding a new function, host1x_reloc_get_user(), that copies
the structures field by field.

While at it, use substructures for the command and target buffers in
struct host1x_reloc for better readability. Also use unsized types to
make it more obvious that this isn't part of userspace ABI.

Signed-off-by: Thierry Reding <treding@nvidia.com>


# 89e6e8c8 07-Jan-2014 Erik Faye-Lund <kusmabite@gmail.com>

gpu: host1x: do not check previously handled gathers

When patching gathers, we don't need to check against
gathers with lower indices than the current one, as
they are guaranteed to already have been handled.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Acked-By: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# fae798a1 08-Nov-2013 Thierry Reding <treding@nvidia.com>

gpu: host1x: Export public API

Make the public API symbols visible so that depending drivers can be
built as a module.

Signed-off-by: Thierry Reding <treding@nvidia.com>


# 35d747a8 24-Sep-2013 Thierry Reding <treding@nvidia.com>

gpu: host1x: Expose syncpt and channel functionality

Expose the buffer objects, syncpoint and channel functionality in the
public public header so that drivers can use them.

Signed-off-by: Thierry Reding <treding@nvidia.com>


# d77563ff 10-Oct-2013 Thierry Reding <treding@nvidia.com>

gpu: host1x: firewall: Refactor register check

The same code sequence is used in various places to validate a register
access in the command stream. This can be refactored into a separate
function.

Signed-off-by: Thierry Reding <treding@nvidia.com>


# d7fbcf47 10-Oct-2013 Thierry Reding <treding@nvidia.com>

gpu: host1x: firewall: Rename cmdbuf_id -> cmdbuf

The value stored in this field is a pointer to a command buffer, not an
ID. Avoid some confusion by reflecting that in the field's name.

Signed-off-by: Thierry Reding <treding@nvidia.com>


# 37857cd2 10-Oct-2013 Thierry Reding <treding@nvidia.com>

gpu: host1x: Fix alignment of function arguments

Arguments on subsequent lines should be aligned with the first argument.
This one occurrence went unnoticed during code review.

Signed-off-by: Thierry Reding <treding@nvidia.com>


# a9ff9995 04-Oct-2013 Erik Faye-Lund <kusmabite@gmail.com>

gpu: host1x: check relocs after all gathers are consumed

The num_relocs count are passed to the kernel per job, not per gather.

For multi-gather jobs, we would previously fail if there were relocs in
other gathers aside from the first one.

Fix this by simply moving the check until all gathers have been
consumed.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
Acked-By: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 745cecc0 23-Aug-2013 Dan Carpenter <dan.carpenter@oracle.com>

gpu: host1x: returning success instead of -ENOMEM

There is a mistake here so it returns PTR_ERR(NULL) which is success
instead of -ENOMEM.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# f5fda676 23-Aug-2013 Dan Carpenter <dan.carpenter@oracle.com>

gpu: host1x: fix an integer overflow check

Tegra is a 32 bit arch. On 32 bit systems then size_t is 32 bits so
"total" will never be higher than UINT_MAX because of integer overflows.
We need cast to u64 first before doing the math.

Also the addition earlier:

unsigned int num_unpins = num_cmdbufs + num_relocs;

That can overflow as well, but I think it's still safe because we check
both "num_cmdbufs" and "num_relocs" again in this test.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>


# 3364cd28 29-May-2013 Arto Merilainen <amerilainen@nvidia.com>

gpu: host1x: Copy gathers before verification

The firewall verified gather buffers before copying them. This
allowed a malicious application to rewrite the buffer content by
timing the rewrite carefully.

This patch makes the buffer validation occur after copying the
buffers.

Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>


# afac0e43 29-May-2013 Terje Bergstrom <tbergstrom@nvidia.com>

gpu: host1x: Don't reset firewall between gathers

The firewall was reinitialised for each gather. Because the filter
was reinitialised, it did not track the class over gather boundaries.
This allowed the user application to set host1x class to one class
in one gather and use that class in another gather without firewall
having knowledge about that.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>


# 5060d8ec 29-May-2013 Arto Merilainen <amerilainen@nvidia.com>

gpu: host1x: Check reloc table before usage

The firewall assumed that the user space always delivers a relocation
table when it is accessing address registers. If userspace did not
deliver a relocation table and tried to access the address registers,
the code performed bad memory accesses.

This patch modifies the firewall to check correctly that the firewall
table is available before accessing it. In addition, check_reloc() is
converted to use boolean return value (true when the reloc is valid,
false when invalid).

Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Acked-By: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>


# 64c173d3 29-May-2013 Terje Bergstrom <tbergstrom@nvidia.com>

gpu: host1x: Check INCR opcode correctly

The firewall code used a wrong loop condition (pointer to a
structure) while checking INCR opcode. This patch fixes the code to
use correct loop condition (number of words remaining).

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>


# 6579324a 22-Mar-2013 Terje Bergstrom <tbergstrom@nvidia.com>

gpu: host1x: Add channel support

Add support for host1x client modules, and host1x channels to submit
work to the clients.

Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Thierry Reding <thierry.reding@avionic-design.de>
Tested-by: Thierry Reding <thierry.reding@avionic-design.de>
Tested-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>